Statistical Physics of High Dimensional Inference
NASA Astrophysics Data System (ADS)
Advani, Madhu; Ganguli, Surya
To model modern large-scale datasets, we need efficient algorithms to infer a set of P unknown model parameters from N noisy measurements. What are fundamental limits on the accuracy of parameter inference, given limited measurements, signal-to-noise ratios, prior information, and computational tractability requirements? How can we combine prior information with measurements to achieve these limits? Classical statistics gives incisive answers to these questions as the measurement density α =N/P --> ∞ . However, modern high-dimensional inference problems, in fields ranging from bio-informatics to economics, occur at finite α. We formulate and analyze high-dimensional inference analytically by applying the replica and cavity methods of statistical physics where data serves as quenched disorder and inferred parameters play the role of thermal degrees of freedom. Our analysis reveals that widely cherished Bayesian inference algorithms such as maximum likelihood and maximum a posteriori are suboptimal in the modern setting, and yields new tractable, optimal algorithms to replace them as well as novel bounds on the achievable accuracy of a large class of high-dimensional inference algorithms. Thanks to Stanford Graduate Fellowship and Mind Brain Computation IGERT grant for support.
High-dimensional statistical inference: From vector to matrix
NASA Astrophysics Data System (ADS)
Zhang, Anru
Statistical inference for sparse signals or low-rank matrices in high-dimensional settings is of significant interest in a range of contemporary applications. It has attracted significant recent attention in many fields including statistics, applied mathematics and electrical engineering. In this thesis, we consider several problems in including sparse signal recovery (compressed sensing under restricted isometry) and low-rank matrix recovery (matrix recovery via rank-one projections and structured matrix completion). The first part of the thesis discusses compressed sensing and affine rank minimization in both noiseless and noisy cases and establishes sharp restricted isometry conditions for sparse signal and low-rank matrix recovery. The analysis relies on a key technical tool which represents points in a polytope by convex combinations of sparse vectors. The technique is elementary while leads to sharp results. It is shown that, in compressed sensing, delta kA < 1/3, deltak A+ thetak,kA < 1, or deltatkA < √( t - 1)/t for any given constant t ≥ 4/3 guarantee the exact recovery of all k sparse signals in the noiseless case through the constrained ℓ1 minimization, and similarly in affine rank minimization delta rM < 1/3, deltar M + thetar, rM < 1, or deltatrM< √( t - 1)/t ensure the exact reconstruction of all matrices with rank at most r in the noiseless case via the constrained nuclear norm minimization. Moreover, for any epsilon > 0, delta kA < 1/3 + epsilon, deltak A + thetak,kA < 1 + epsilon, or deltatkA< √(t - 1) / t + epsilon are not sufficient to guarantee the exact recovery of all k-sparse signals for large k. Similar result also holds for matrix recovery. In addition, the conditions delta kA<1/3, deltak A+ thetak,kA<1, delta tkA < √(t - 1)/t and deltarM<1/3, delta rM+ thetar,rM<1, delta trM< √(t - 1)/ t are also shown to be sufficient respectively for stable recovery of approximately sparse signals and low-rank matrices in the noisy case
NASA Astrophysics Data System (ADS)
Khan, Shahjahan
Often scientific information on various data generating processes are presented in the from of numerical and categorical data. Except for some very rare occasions, generally such data represent a small part of the population, or selected outcomes of any data generating process. Although, valuable and useful information is lurking in the array of scientific data, generally, they are unavailable to the users. Appropriate statistical methods are essential to reveal the hidden "jewels" in the mess of the row data. Exploratory data analysis methods are used to uncover such valuable characteristics of the observed data. Statistical inference provides techniques to make valid conclusions about the unknown characteristics or parameters of the population from which scientifically drawn sample data are selected. Usually, statistical inference includes estimation of population parameters as well as performing test of hypotheses on the parameters. However, prediction of future responses and determining the prediction distributions are also part of statistical inference. Both Classical or Frequentists and Bayesian approaches are used in statistical inference. The commonly used Classical approach is based on the sample data alone. In contrast, increasingly popular Beyesian approach uses prior distribution on the parameters along with the sample data to make inferences. The non-parametric and robust methods are also being used in situations where commonly used model assumptions are unsupported. In this chapter,we cover the philosophical andmethodological aspects of both the Classical and Bayesian approaches.Moreover, some aspects of predictive inference are also included. In the absence of any evidence to support assumptions regarding the distribution of the underlying population, or if the variable is measured only in ordinal scale, non-parametric methods are used. Robust methods are employed to avoid any significant changes in the results due to deviations from the model
NASA Astrophysics Data System (ADS)
Khan, Shahjahan
Often scientific information on various data generating processes are presented in the from of numerical and categorical data. Except for some very rare occasions, generally such data represent a small part of the population, or selected outcomes of any data generating process. Although, valuable and useful information is lurking in the array of scientific data, generally, they are unavailable to the users. Appropriate statistical methods are essential to reveal the hidden “jewels” in the mess of the row data. Exploratory data analysis methods are used to uncover such valuable characteristics of the observed data. Statistical inference provides techniques to make valid conclusions about the unknown characteristics or parameters of the population from which scientifically drawn sample data are selected. Usually, statistical inference includes estimation of population parameters as well as performing test of hypotheses on the parameters. However, prediction of future responses and determining the prediction distributions are also part of statistical inference. Both Classical or Frequentists and Bayesian approaches are used in statistical inference. The commonly used Classical approach is based on the sample data alone. In contrast, increasingly popular Beyesian approach uses prior distribution on the parameters along with the sample data to make inferences. The non-parametric and robust methods are also being used in situations where commonly used model assumptions are unsupported. In this chapter,we cover the philosophical andmethodological aspects of both the Classical and Bayesian approaches.Moreover, some aspects of predictive inference are also included. In the absence of any evidence to support assumptions regarding the distribution of the underlying population, or if the variable is measured only in ordinal scale, non-parametric methods are used. Robust methods are employed to avoid any significant changes in the results due to deviations from the model
Statistical inference and string theory
NASA Astrophysics Data System (ADS)
Heckman, Jonathan J.
2015-09-01
In this paper, we expose some surprising connections between string theory and statistical inference. We consider a large collective of agents sweeping out a family of nearby statistical models for an M-dimensional manifold of statistical fitting parameters. When the agents making nearby inferences align along a d-dimensional grid, we find that the pooled probability that the collective reaches a correct inference is the partition function of a nonlinear sigma model in d dimensions. Stability under perturbations to the original inference scheme requires the agents of the collective to distribute along two dimensions. Conformal invariance of the sigma model corresponds to the condition of a stable inference scheme, directly leading to the Einstein field equations for classical gravity. By summing over all possible arrangements of the agents in the collective, we reach a string theory. We also use this perspective to quantify how much an observer can hope to learn about the internal geometry of a superstring compactification. Finally, we present some brief speculative remarks on applications to the AdS/CFT correspondence and Lorentzian signature space-times.
Statistical Inference: The Big Picture.
Kass, Robert E
2011-02-01
Statistics has moved beyond the frequentist-Bayesian controversies of the past. Where does this leave our ability to interpret results? I suggest that a philosophy compatible with statistical practice, labelled here statistical pragmatism, serves as a foundation for inference. Statistical pragmatism is inclusive and emphasizes the assumptions that connect statistical models with observed data. I argue that introductory courses often mis-characterize the process of statistical inference and I propose an alternative "big picture" depiction.
NASA Astrophysics Data System (ADS)
Sjöstrand, Karl; Cardenas, Valerie A.; Larsen, Rasmus; Studholme, Colin
2008-03-01
Whole-brain morphometry denotes a group of methods with the aim of relating clinical and cognitive measurements to regions of the brain. Typically, such methods require the statistical analysis of a data set with many variables (voxels and exogenous variables) paired with few observations (subjects). A common approach to this ill-posed problem is to analyze each spatial variable separately, dividing the analysis into manageable subproblems. A disadvantage of this method is that the correlation structure of the spatial variables is not taken into account. This paper investigates the use of ridge regression to address this issue, allowing for a gradual introduction of correlation information into the model. We make the connections between ridge regression and voxel-wise procedures explicit and discuss relations to other statistical methods. Results are given on an in-vivo data set of deformation based morphometry from a study of cognitive decline in an elderly population.
Statistical inference and Aristotle's Rhetoric.
Macdonald, Ranald R
2004-11-01
Formal logic operates in a closed system where all the information relevant to any conclusion is present, whereas this is not the case when one reasons about events and states of the world. Pollard and Richardson drew attention to the fact that the reasoning behind statistical tests does not lead to logically justifiable conclusions. In this paper statistical inferences are defended not by logic but by the standards of everyday reasoning. Aristotle invented formal logic, but argued that people mostly get at the truth with the aid of enthymemes--incomplete syllogisms which include arguing from examples, analogies and signs. It is proposed that statistical tests work in the same way--in that they are based on examples, invoke the analogy of a model and use the size of the effect under test as a sign that the chance hypothesis is unlikely. Of existing theories of statistical inference only a weak version of Fisher's takes this into account. Aristotle anticipated Fisher by producing an argument of the form that there were too many cases in which an outcome went in a particular direction for that direction to be plausibly attributed to chance. We can therefore conclude that Aristotle would have approved of statistical inference and there is a good reason for calling this form of statistical inference classical.
Statistical learning and selective inference
Taylor, Jonathan; Tibshirani, Robert J.
2015-01-01
We describe the problem of “selective inference.” This addresses the following challenge: Having mined a set of data to find potential associations, how do we properly assess the strength of these associations? The fact that we have “cherry-picked”—searched for the strongest associations—means that we must set a higher bar for declaring significant the associations that we see. This challenge becomes more important in the era of big data and complex statistical modeling. The cherry tree (dataset) can be very large and the tools for cherry picking (statistical learning methods) are now very sophisticated. We describe some recent new developments in selective inference and illustrate their use in forward stepwise regression, the lasso, and principal components analysis. PMID:26100887
Statistical learning and selective inference.
Taylor, Jonathan; Tibshirani, Robert J
2015-06-23
We describe the problem of "selective inference." This addresses the following challenge: Having mined a set of data to find potential associations, how do we properly assess the strength of these associations? The fact that we have "cherry-picked"--searched for the strongest associations--means that we must set a higher bar for declaring significant the associations that we see. This challenge becomes more important in the era of big data and complex statistical modeling. The cherry tree (dataset) can be very large and the tools for cherry picking (statistical learning methods) are now very sophisticated. We describe some recent new developments in selective inference and illustrate their use in forward stepwise regression, the lasso, and principal components analysis.
Statistical inference for inverse problems
NASA Astrophysics Data System (ADS)
Bissantz, Nicolai; Holzmann, Hajo
2008-06-01
In this paper we study statistical inference for certain inverse problems. We go beyond mere estimation purposes and review and develop the construction of confidence intervals and confidence bands in some inverse problems, including deconvolution and the backward heat equation. Further, we discuss the construction of certain hypothesis tests, in particular concerning the number of local maxima of the unknown function. The methods are illustrated in a case study, where we analyze the distribution of heliocentric escape velocities of galaxies in the Centaurus galaxy cluster, and provide statistical evidence for its bimodality.
Statistical Inference at Work: Statistical Process Control as an Example
ERIC Educational Resources Information Center
Bakker, Arthur; Kent, Phillip; Derry, Jan; Noss, Richard; Hoyles, Celia
2008-01-01
To characterise statistical inference in the workplace this paper compares a prototypical type of statistical inference at work, statistical process control (SPC), with a type of statistical inference that is better known in educational settings, hypothesis testing. Although there are some similarities between the reasoning structure involved in…
Redshift data and statistical inference
NASA Technical Reports Server (NTRS)
Newman, William I.; Haynes, Martha P.; Terzian, Yervant
1994-01-01
Frequency histograms and the 'power spectrum analysis' (PSA) method, the latter developed by Yu & Peebles (1969), have been widely employed as techniques for establishing the existence of periodicities. We provide a formal analysis of these two classes of methods, including controlled numerical experiments, to better understand their proper use and application. In particular, we note that typical published applications of frequency histograms commonly employ far greater numbers of class intervals or bins than is advisable by statistical theory sometimes giving rise to the appearance of spurious patterns. The PSA method generates a sequence of random numbers from observational data which, it is claimed, is exponentially distributed with unit mean and variance, essentially independent of the distribution of the original data. We show that the derived random processes is nonstationary and produces a small but systematic bias in the usual estimate of the mean and variance. Although the derived variable may be reasonably described by an exponential distribution, the tail of the distribution is far removed from that of an exponential, thereby rendering statistical inference and confidence testing based on the tail of the distribution completely unreliable. Finally, we examine a number of astronomical examples wherein these methods have been used giving rise to widespread acceptance of statistically unconfirmed conclusions.
The Reasoning behind Informal Statistical Inference
ERIC Educational Resources Information Center
Makar, Katie; Bakker, Arthur; Ben-Zvi, Dani
2011-01-01
Informal statistical inference (ISI) has been a frequent focus of recent research in statistics education. Considering the role that context plays in developing ISI calls into question the need to be more explicit about the reasoning that underpins ISI. This paper uses educational literature on informal statistical inference and philosophical…
Predict! Teaching Statistics Using Informational Statistical Inference
ERIC Educational Resources Information Center
Makar, Katie
2013-01-01
Statistics is one of the most widely used topics for everyday life in the school mathematics curriculum. Unfortunately, the statistics taught in schools focuses on calculations and procedures before students have a chance to see it as a useful and powerful tool. Researchers have found that a dominant view of statistics is as an assortment of tools…
Statistical Inference in Graphical Models
2008-06-17
Probabilistic Network Library ( PNL ). While not fully mature, PNL does provide the most commonly-used algorithms for inference and learning with the efficiency...of C++, and also offers interfaces for calling the library from MATLAB and R 1361. Notably, both BNT and PNL provide learning and inference algorithms...mature and has been used for research purposes for several years, it is written in MATLAB and thus is not suitable to be used in real-time settings. PNL
Local and Global Thinking in Statistical Inference
ERIC Educational Resources Information Center
Pratt, Dave; Johnston-Wilder, Peter; Ainley, Janet; Mason, John
2008-01-01
In this reflective paper, we explore students' local and global thinking about informal statistical inference through our observations of 10- to 11-year-olds, challenged to infer the unknown configuration of a virtual die, but able to use the die to generate as much data as they felt necessary. We report how they tended to focus on local changes…
Ranald Macdonald and statistical inference.
Smith, Philip T
2009-05-01
Ranald Roderick Macdonald (1945-2007) was an important contributor to mathematical psychology in the UK, as a referee and action editor for British Journal of Mathematical and Statistical Psychology and as a participant and organizer at the British Psychological Society's Mathematics, statistics and computing section meetings. This appreciation argues that his most important contribution was to the foundations of significance testing, where his concern about what information was relevant in interpreting the results of significance tests led him to be a persuasive advocate for the 'Weak Fisherian' form of hypothesis testing.
Making statistical inferences about software reliability
NASA Technical Reports Server (NTRS)
Miller, Douglas R.
1988-01-01
Failure times of software undergoing random debugging can be modelled as order statistics of independent but nonidentically distributed exponential random variables. Using this model inferences can be made about current reliability and, if debugging continues, future reliability. This model also shows the difficulty inherent in statistical verification of very highly reliable software such as that used by digital avionics in commercial aircraft.
Making statistical inferences about software reliability
NASA Technical Reports Server (NTRS)
Miller, Douglas R.
1986-01-01
Failure times of software undergoing random debugging can be modeled as order statistics of independent but nonidentically distributed exponential random variables. Using this model inferences can be made about current reliability and, if debugging continues, future reliability. This model also shows the difficulty inherent in statistical verification of very highly reliable software such as that used by digital avionics in commercial aircraft.
Investigating Mathematics Teachers' Thoughts of Statistical Inference
ERIC Educational Resources Information Center
Yang, Kai-Lin
2012-01-01
Research on statistical cognition and application suggests that statistical inference concepts are commonly misunderstood by students and even misinterpreted by researchers. Although some research has been done on students' misunderstanding or misconceptions of confidence intervals (CIs), few studies explore either students' or mathematics…
Inference and the introductory statistics course
NASA Astrophysics Data System (ADS)
Pfannkuch, Maxine; Regan, Matt; Wild, Chris; Budgett, Stephanie; Forbes, Sharleen; Harraway, John; Parsonage, Ross
2011-10-01
This article sets out some of the rationale and arguments for making major changes to the teaching and learning of statistical inference in introductory courses at our universities by changing from a norm-based, mathematical approach to more conceptually accessible computer-based approaches. The core problem of the inferential argument with its hypothetical probabilistic reasoning process is examined in some depth. We argue that the revolution in the teaching of inference must begin. We also discuss some perplexing issues, problematic areas and some new insights into language conundrums associated with introducing the logic of inference through randomization methods.
Bayesian Cosmological inference beyond statistical isotropy
NASA Astrophysics Data System (ADS)
Souradeep, Tarun; Das, Santanu; Wandelt, Benjamin
2016-10-01
With advent of rich data sets, computationally challenge of inference in cosmology has relied on stochastic sampling method. First, I review the widely used MCMC approach used to infer cosmological parameters and present a adaptive improved implementation SCoPE developed by our group. Next, I present a general method for Bayesian inference of the underlying covariance structure of random fields on a sphere. We employ the Bipolar Spherical Harmonic (BipoSH) representation of general covariance structure on the sphere. We illustrate the efficacy of the method with a principled approach to assess violation of statistical isotropy (SI) in the sky maps of Cosmic Microwave Background (CMB) fluctuations. The general, principled, approach to a Bayesian inference of the covariance structure in a random field on a sphere presented here has huge potential for application to other many aspects of cosmology and astronomy, as well as, more distant areas of research like geosciences and climate modelling.
Thermodynamics of statistical inference by cells.
Lang, Alex H; Fisher, Charles K; Mora, Thierry; Mehta, Pankaj
2014-10-03
The deep connection between thermodynamics, computation, and information is now well established both theoretically and experimentally. Here, we extend these ideas to show that thermodynamics also places fundamental constraints on statistical estimation and learning. To do so, we investigate the constraints placed by (nonequilibrium) thermodynamics on the ability of biochemical signaling networks to estimate the concentration of an external signal. We show that accuracy is limited by energy consumption, suggesting that there are fundamental thermodynamic constraints on statistical inference.
Inference and the Introductory Statistics Course
ERIC Educational Resources Information Center
Pfannkuch, Maxine; Regan, Matt; Wild, Chris; Budgett, Stephanie; Forbes, Sharleen; Harraway, John; Parsonage, Ross
2011-01-01
This article sets out some of the rationale and arguments for making major changes to the teaching and learning of statistical inference in introductory courses at our universities by changing from a norm-based, mathematical approach to more conceptually accessible computer-based approaches. The core problem of the inferential argument with its…
Statistical Mechanics of Optimal Convex Inference in High Dimensions
NASA Astrophysics Data System (ADS)
Advani, Madhu; Ganguli, Surya
2016-07-01
A fundamental problem in modern high-dimensional data analysis involves efficiently inferring a set of P unknown model parameters governing the relationship between the inputs and outputs of N noisy measurements. Various methods have been proposed to regress the outputs against the inputs to recover the P parameters. What are fundamental limits on the accuracy of regression, given finite signal-to-noise ratios, limited measurements, prior information, and computational tractability requirements? How can we optimally combine prior information with measurements to achieve these limits? Classical statistics gives incisive answers to these questions as the measurement density α =(N /P )→∞ . However, these classical results are not relevant to modern high-dimensional inference problems, which instead occur at finite α . We employ replica theory to answer these questions for a class of inference algorithms, known in the statistics literature as M-estimators. These algorithms attempt to recover the P model parameters by solving an optimization problem involving minimizing the sum of a loss function that penalizes deviations between the data and model predictions, and a regularizer that leverages prior information about model parameters. Widely cherished algorithms like maximum likelihood (ML) and maximum-a posteriori (MAP) inference arise as special cases of M-estimators. Our analysis uncovers fundamental limits on the inference accuracy of a subclass of M-estimators corresponding to computationally tractable convex optimization problems. These limits generalize classical statistical theorems like the Cramer-Rao bound to the high-dimensional setting with prior information. We further discover the optimal M-estimator for log-concave signal and noise distributions; we demonstrate that it can achieve our high-dimensional limits on inference accuracy, while ML and MAP cannot. Intriguingly, in high dimensions, these optimal algorithms become computationally simpler than
Pointwise probability reinforcements for robust statistical inference.
Frénay, Benoît; Verleysen, Michel
2014-02-01
Statistical inference using machine learning techniques may be difficult with small datasets because of abnormally frequent data (AFDs). AFDs are observations that are much more frequent in the training sample that they should be, with respect to their theoretical probability, and include e.g. outliers. Estimates of parameters tend to be biased towards models which support such data. This paper proposes to introduce pointwise probability reinforcements (PPRs): the probability of each observation is reinforced by a PPR and a regularisation allows controlling the amount of reinforcement which compensates for AFDs. The proposed solution is very generic, since it can be used to robustify any statistical inference method which can be formulated as a likelihood maximisation. Experiments show that PPRs can be easily used to tackle regression, classification and projection: models are freed from the influence of outliers. Moreover, outliers can be filtered manually since an abnormality degree is obtained for each observation.
Conditional statistical inference with multistage testing designs.
Zwitser, Robert J; Maris, Gunter
2015-03-01
In this paper it is demonstrated how statistical inference from multistage test designs can be made based on the conditional likelihood. Special attention is given to parameter estimation, as well as the evaluation of model fit. Two reasons are provided why the fit of simple measurement models is expected to be better in adaptive designs, compared to linear designs: more parameters are available for the same number of observations; and undesirable response behavior, like slipping and guessing, might be avoided owing to a better match between item difficulty and examinee proficiency. The results are illustrated with simulated data, as well as with real data.
Statistical Inference for Data Adaptive Target Parameters.
Hubbard, Alan E; Kherad-Pajouh, Sara; van der Laan, Mark J
2016-05-01
Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in an estimation sample (one of the V subsamples) and corresponding complementary parameter-generating sample. For each of the V parameter-generating samples, we apply an algorithm that maps the sample to a statistical target parameter. We define our sample-split data adaptive statistical target parameter as the average of these V-sample specific target parameters. We present an estimator (and corresponding central limit theorem) of this type of data adaptive target parameter. This general methodology for generating data adaptive target parameters is demonstrated with a number of practical examples that highlight new opportunities for statistical learning from data. This new framework provides a rigorous statistical methodology for both exploratory and confirmatory analysis within the same data. Given that more research is becoming "data-driven", the theory developed within this paper provides a new impetus for a greater involvement of statistical inference into problems that are being increasingly addressed by clever, yet ad hoc pattern finding methods. To suggest such potential, and to verify the predictions of the theory, extensive simulation studies, along with a data analysis based on adaptively determined intervention rules are shown and give insight into how to structure such an approach. The results show that the data adaptive target parameter approach provides a general framework and resulting methodology for data-driven science.
Investigation of Statistical Inference Methodologies Through Scale Model Propagation Experiments
2015-09-30
Investigation of Statistical Inference Methodologies Through Scale Model Propagation Experiments Jason D. Sagers Applied Research Laboratories...statistical inference methodologies for ocean-acoustic problems by investigating and applying statistical methods to data collected from scale -model...experiments over a translationally invariant wedge, (2) to plan and conduct 3D propagation experiments over the Hudson Canyon scale -model bathymetry, and (3
Reasoning about Informal Statistical Inference: One Statistician's View
ERIC Educational Resources Information Center
Rossman, Allan J.
2008-01-01
This paper identifies key concepts and issues associated with the reasoning of informal statistical inference. I focus on key ideas of inference that I think all students should learn, including at secondary level as well as tertiary. I argue that a fundamental component of inference is to go beyond the data at hand, and I propose that statistical…
Bayesian Statistical Inference for Coefficient Alpha. ACT Research Report Series.
ERIC Educational Resources Information Center
Li, Jun Corser; Woodruff, David J.
Coefficient alpha is a simple and very useful index of test reliability that is widely used in educational and psychological measurement. Classical statistical inference for coefficient alpha is well developed. This paper presents two methods for Bayesian statistical inference for a single sample alpha coefficient. An approximate analytic method…
Verbal framing of statistical evidence drives children's preference inferences.
Garvin, Laura E; Woodward, Amanda L
2015-05-01
Although research has shown that statistical information can support children's inferences about specific psychological causes of others' behavior, previous work leaves open the question of how children interpret statistical information in more ambiguous situations. The current studies investigated the effect of specific verbal framing information on children's ability to infer mental states from statistical regularities in behavior. We found that preschool children inferred others' preferences from their statistically non-random choices only when they were provided with verbal information placing the person's behavior in a specifically preference-related context, not when the behavior was presented in a non-mentalistic action context or an intentional choice context. Furthermore, verbal framing information showed some evidence of supporting children's mental state inferences even from more ambiguous statistical data. These results highlight the role that specific, relevant framing information can play in supporting children's ability to derive novel insights from statistical information.
Statistical inference for serial dilution assay data.
Lee, M L; Whitmore, G A
1999-12-01
Serial dilution assays are widely employed for estimating substance concentrations and minimum inhibitory concentrations. The Poisson-Bernoulli model for such assays is appropriate for count data but not for continuous measurements that are encountered in applications involving substance concentrations. This paper presents practical inference methods based on a log-normal model and illustrates these methods using a case application involving bacterial toxins.
An argument for mechanism-based statistical inference in cancer.
Geman, Donald; Ochs, Michael; Price, Nathan D; Tomasetti, Cristian; Younes, Laurent
2015-05-01
Cancer is perhaps the prototypical systems disease, and as such has been the focus of extensive study in quantitative systems biology. However, translating these programs into personalized clinical care remains elusive and incomplete. In this perspective, we argue that realizing this agenda—in particular, predicting disease phenotypes, progression and treatment response for individuals—requires going well beyond standard computational and bioinformatics tools and algorithms. It entails designing global mathematical models over network-scale configurations of genomic states and molecular concentrations, and learning the model parameters from limited available samples of high-dimensional and integrative omics data. As such, any plausible design should accommodate: biological mechanism, necessary for both feasible learning and interpretable decision making; stochasticity, to deal with uncertainty and observed variation at many scales; and a capacity for statistical inference at the patient level. This program, which requires a close, sustained collaboration between mathematicians and biologists, is illustrated in several contexts, including learning biomarkers, metabolism, cell signaling, network inference and tumorigenesis.
An argument for mechanism-based statistical inference in cancer
Ochs, Michael; Price, Nathan D.; Tomasetti, Cristian; Younes, Laurent
2015-01-01
Cancer is perhaps the prototypical systems disease, and as such has been the focus of extensive study in quantitative systems biology. However, translating these programs into personalized clinical care remains elusive and incomplete. In this perspective, we argue that realizing this agenda—in particular, predicting disease phenotypes, progression and treatment response for individuals—requires going well beyond standard computational and bioinformatics tools and algorithms. It entails designing global mathematical models over network-scale configurations of genomic states and molecular concentrations, and learning the model parameters from limited available samples of high-dimensional and integrative omics data. As such, any plausible design should accommodate: biological mechanism, necessary for both feasible learning and interpretable decision making; stochasticity, to deal with uncertainty and observed variation at many scales; and a capacity for statistical inference at the patient level. This program, which requires a close, sustained collaboration between mathematicians and biologists, is illustrated in several contexts, including learning bio-markers, metabolism, cell signaling, network inference and tumorigenesis. PMID:25381197
Statistical Inference and Patterns of Inequality in the Global North
ERIC Educational Resources Information Center
Moran, Timothy Patrick
2006-01-01
Cross-national inequality trends have historically been a crucial field of inquiry across the social sciences, and new methodological techniques of statistical inference have recently improved the ability to analyze these trends over time. This paper applies Monte Carlo, bootstrap inference methods to the income surveys of the Luxembourg Income…
Combining statistical inference and decisions in ecology.
Williams, Perry J; Hooten, Mevin B
2016-09-01
Statistical decision theory (SDT) is a sub-field of decision theory that formally incorporates statistical investigation into a decision-theoretic framework to account for uncertainties in a decision problem. SDT provides a unifying analysis of three types of information: statistical results from a data set, knowledge of the consequences of potential choices (i.e., loss), and prior beliefs about a system. SDT links the theoretical development of a large body of statistical methods, including point estimation, hypothesis testing, and confidence interval estimation. The theory and application of SDT have mainly been developed and published in the fields of mathematics, statistics, operations research, and other decision sciences, but have had limited exposure in ecology. Thus, we provide an introduction to SDT for ecologists and describe its utility for linking the conventionally separate tasks of statistical investigation and decision making in a single framework. We describe the basic framework of both Bayesian and frequentist SDT, its traditional use in statistics, and discuss its application to decision problems that occur in ecology. We demonstrate SDT with two types of decisions: Bayesian point estimation and an applied management problem of selecting a prescribed fire rotation for managing a grassland bird species. Central to SDT, and decision theory in general, are loss functions. Thus, we also provide basic guidance and references for constructing loss functions for an SDT problem.
Combining statistical inference and decisions in ecology
Williams, Perry J.; Hooten, Mevin B.
2016-01-01
Statistical decision theory (SDT) is a sub-field of decision theory that formally incorporates statistical investigation into a decision-theoretic framework to account for uncertainties in a decision problem. SDT provides a unifying analysis of three types of information: statistical results from a data set, knowledge of the consequences of potential choices (i.e., loss), and prior beliefs about a system. SDT links the theoretical development of a large body of statistical methods including point estimation, hypothesis testing, and confidence interval estimation. The theory and application of SDT have mainly been developed and published in the fields of mathematics, statistics, operations research, and other decision sciences, but have had limited exposure in ecology. Thus, we provide an introduction to SDT for ecologists and describe its utility for linking the conventionally separate tasks of statistical investigation and decision making in a single framework. We describe the basic framework of both Bayesian and frequentist SDT, its traditional use in statistics, and discuss its application to decision problems that occur in ecology. We demonstrate SDT with two types of decisions: Bayesian point estimation, and an applied management problem of selecting a prescribed fire rotation for managing a grassland bird species. Central to SDT, and decision theory in general, are loss functions. Thus, we also provide basic guidance and references for constructing loss functions for an SDT problem.
Nuclear Forensic Inferences Using Iterative Multidimensional Statistics
Robel, M; Kristo, M J; Heller, M A
2009-06-09
Nuclear forensics involves the analysis of interdicted nuclear material for specific material characteristics (referred to as 'signatures') that imply specific geographical locations, production processes, culprit intentions, etc. Predictive signatures rely on expert knowledge of physics, chemistry, and engineering to develop inferences from these material characteristics. Comparative signatures, on the other hand, rely on comparison of the material characteristics of the interdicted sample (the 'questioned sample' in FBI parlance) with those of a set of known samples. In the ideal case, the set of known samples would be a comprehensive nuclear forensics database, a database which does not currently exist. In fact, our ability to analyze interdicted samples and produce an extensive list of precise materials characteristics far exceeds our ability to interpret the results. Therefore, as we seek to develop the extensive databases necessary for nuclear forensics, we must also develop the methods necessary to produce the necessary inferences from comparison of our analytical results with these large, multidimensional sets of data. In the work reported here, we used a large, multidimensional dataset of results from quality control analyses of uranium ore concentrate (UOC, sometimes called 'yellowcake'). We have found that traditional multidimensional techniques, such as principal components analysis (PCA), are especially useful for understanding such datasets and drawing relevant conclusions. In particular, we have developed an iterative partial least squares-discriminant analysis (PLS-DA) procedure that has proven especially adept at identifying the production location of unknown UOC samples. By removing classes which fell far outside the initial decision boundary, and then rebuilding the PLS-DA model, we have consistently produced better and more definitive attributions than with a single pass classification approach. Performance of the iterative PLS-DA method
Statistical inference for tumor growth inhibition T/C ratio.
Wu, Jianrong
2010-09-01
The tumor growth inhibition T/C ratio is commonly used to quantify treatment effects in drug screening tumor xenograft experiments. The T/C ratio is converted to an antitumor activity rating using an arbitrary cutoff point and often without any formal statistical inference. Here, we applied a nonparametric bootstrap method and a small sample likelihood ratio statistic to make a statistical inference of the T/C ratio, including both hypothesis testing and a confidence interval estimate. Furthermore, sample size and power are also discussed for statistical design of tumor xenograft experiments. Tumor xenograft data from an actual experiment were analyzed to illustrate the application.
Simultaneous statistical inference for epigenetic data.
Schildknecht, Konstantin; Olek, Sven; Dickhaus, Thorsten
2015-01-01
Epigenetic research leads to complex data structures. Since parametric model assumptions for the distribution of epigenetic data are hard to verify we introduce in the present work a nonparametric statistical framework for two-group comparisons. Furthermore, epigenetic analyses are often performed at various genetic loci simultaneously. Hence, in order to be able to draw valid conclusions for specific loci, an appropriate multiple testing correction is necessary. Finally, with technologies available for the simultaneous assessment of many interrelated biological parameters (such as gene arrays), statistical approaches also need to deal with a possibly unknown dependency structure in the data. Our statistical approach to the nonparametric comparison of two samples with independent multivariate observables is based on recently developed multivariate multiple permutation tests. We adapt their theory in order to cope with families of hypotheses regarding relative effects. Our results indicate that the multivariate multiple permutation test keeps the pre-assigned type I error level for the global null hypothesis. In combination with the closure principle, the family-wise error rate for the simultaneous test of the corresponding locus/parameter-specific null hypotheses can be controlled. In applications we demonstrate that group differences in epigenetic data can be detected reliably with our methodology.
Introducing Statistical Inference to Biology Students through Bootstrapping and Randomization
ERIC Educational Resources Information Center
Lock, Robin H.; Lock, Patti Frazer
2008-01-01
Bootstrap methods and randomization tests are increasingly being used as alternatives to standard statistical procedures in biology. They also serve as an effective introduction to the key ideas of statistical inference in introductory courses for biology students. We discuss the use of such simulation based procedures in an integrated curriculum…
Computationally Efficient Composite Likelihood Statistics for Demographic Inference.
Coffman, Alec J; Hsieh, Ping Hsun; Gravel, Simon; Gutenkunst, Ryan N
2016-02-01
Many population genetics tools employ composite likelihoods, because fully modeling genomic linkage is challenging. But traditional approaches to estimating parameter uncertainties and performing model selection require full likelihoods, so these tools have relied on computationally expensive maximum-likelihood estimation (MLE) on bootstrapped data. Here, we demonstrate that statistical theory can be applied to adjust composite likelihoods and perform robust computationally efficient statistical inference in two demographic inference tools: ∂a∂i and TRACTS. On both simulated and real data, the adjustments perform comparably to MLE bootstrapping while using orders of magnitude less computational time.
LOWER LEVEL INFERENCE CONTROL IN STATISTICAL DATABASE SYSTEMS
Lipton, D.L.; Wong, H.K.T.
1984-02-01
An inference is the process of transforming unclassified data values into confidential data values. Most previous research in inference control has studied the use of statistical aggregates to deduce individual records. However, several other types of inference are also possible. Unknown functional dependencies may be apparent to users who have 'expert' knowledge about the characteristics of a population. Some correlations between attributes may be concluded from 'commonly-known' facts about the world. To counter these threats, security managers should use random sampling of databases of similar populations, as well as expert systems. 'Expert' users of the DATABASE SYSTEM may form inferences from the variable performance of the user interface. Users may observe on-line turn-around time, accounting statistics. the error message received, and the point at which an interactive protocol sequence fails. One may obtain information about the frequency distributions of attribute values, and the validity of data object names from this information. At the back-end of a database system, improved software engineering practices will reduce opportunities to bypass functional units of the database system. The term 'DATA OBJECT' should be expanded to incorporate these data object types which generate new classes of threats. The security of DATABASES and DATABASE SySTEMS must be recognized as separate but related problems. Thus, by increased awareness of lower level inferences, system security managers may effectively nullify the threat posed by lower level inferences.
The Philosophical Foundations of Prescriptive Statements and Statistical Inference
ERIC Educational Resources Information Center
Sun, Shuyan; Pan, Wei
2011-01-01
From the perspectives of the philosophy of science and statistical inference, we discuss the challenges of making prescriptive statements in quantitative research articles. We first consider the prescriptive nature of educational research and argue that prescriptive statements are a necessity in educational research. The logic of deduction,…
A Framework for Thinking about Informal Statistical Inference
ERIC Educational Resources Information Center
Makar, Katie; Rubin, Andee
2009-01-01
Informal inferential reasoning has shown some promise in developing students' deeper understanding of statistical processes. This paper presents a framework to think about three key principles of informal inference--generalizations "beyond the data," probabilistic language, and data as evidence. The authors use primary school classroom…
Statistical inference in behavior analysis: Experimental control is better
Perone, Michael
1999-01-01
Statistical inference promises automatic, objective, reliable assessments of data, independent of the skills or biases of the investigator, whereas the single-subject methods favored by behavior analysts often are said to rely too much on the investigator's subjective impressions, particularly in the visual analysis of data. In fact, conventional statistical methods are difficult to apply correctly, even by experts, and the underlying logic of null-hypothesis testing has drawn criticism since its inception. By comparison, single-subject methods foster direct, continuous interaction between investigator and subject and development of strong forms of experimental control that obviate the need for statistical inference. Treatment effects are demonstrated in experimental designs that incorporate replication within and between subjects, and the visual analysis of data is adequate when integrated into such designs. Thus, single-subject methods are ideal for shaping—and maintaining—the kind of experimental practices that will ensure the continued success of behavior analysis. PMID:22478328
Targeted estimation of nuisance parameters to obtain valid statistical inference.
van der Laan, Mark J
2014-01-01
In order to obtain concrete results, we focus on estimation of the treatment specific mean, controlling for all measured baseline covariates, based on observing independent and identically distributed copies of a random variable consisting of baseline covariates, a subsequently assigned binary treatment, and a final outcome. The statistical model only assumes possible restrictions on the conditional distribution of treatment, given the covariates, the so-called propensity score. Estimators of the treatment specific mean involve estimation of the propensity score and/or estimation of the conditional mean of the outcome, given the treatment and covariates. In order to make these estimators asymptotically unbiased at any data distribution in the statistical model, it is essential to use data-adaptive estimators of these nuisance parameters such as ensemble learning, and specifically super-learning. Because such estimators involve optimal trade-off of bias and variance w.r.t. the infinite dimensional nuisance parameter itself, they result in a sub-optimal bias/variance trade-off for the resulting real-valued estimator of the estimand. We demonstrate that additional targeting of the estimators of these nuisance parameters guarantees that this bias for the estimand is second order and thereby allows us to prove theorems that establish asymptotic linearity of the estimator of the treatment specific mean under regularity conditions. These insights result in novel targeted minimum loss-based estimators (TMLEs) that use ensemble learning with additional targeted bias reduction to construct estimators of the nuisance parameters. In particular, we construct collaborative TMLEs (C-TMLEs) with known influence curve allowing for statistical inference, even though these C-TMLEs involve variable selection for the propensity score based on a criterion that measures how effective the resulting fit of the propensity score is in removing bias for the estimand. As a particular special
Statistical detection of EEG synchrony using empirical bayesian inference.
Singh, Archana K; Asoh, Hideki; Takeda, Yuji; Phillips, Steven
2015-01-01
There is growing interest in understanding how the brain utilizes synchronized oscillatory activity to integrate information across functionally connected regions. Computing phase-locking values (PLV) between EEG signals is a popular method for quantifying such synchronizations and elucidating their role in cognitive tasks. However, high-dimensionality in PLV data incurs a serious multiple testing problem. Standard multiple testing methods in neuroimaging research (e.g., false discovery rate, FDR) suffer severe loss of power, because they fail to exploit complex dependence structure between hypotheses that vary in spectral, temporal and spatial dimension. Previously, we showed that a hierarchical FDR and optimal discovery procedures could be effectively applied for PLV analysis to provide better power than FDR. In this article, we revisit the multiple comparison problem from a new Empirical Bayes perspective and propose the application of the local FDR method (locFDR; Efron, 2001) for PLV synchrony analysis to compute FDR as a posterior probability that an observed statistic belongs to a null hypothesis. We demonstrate the application of Efron's Empirical Bayes approach for PLV synchrony analysis for the first time. We use simulations to validate the specificity and sensitivity of locFDR and a real EEG dataset from a visual search study for experimental validation. We also compare locFDR with hierarchical FDR and optimal discovery procedures in both simulation and experimental analyses. Our simulation results showed that the locFDR can effectively control false positives without compromising on the power of PLV synchrony inference. Our results from the application locFDR on experiment data detected more significant discoveries than our previously proposed methods whereas the standard FDR method failed to detect any significant discoveries.
Quantitative evaluation of statistical inference in resting state functional MRI.
Yang, Xue; Kang, Hakmook; Newton, Allen; Landman, Bennett A
2012-01-01
Modern statistical inference techniques may be able to improve the sensitivity and specificity of resting state functional MRI (rs-fMRI) connectivity analysis through more realistic characterization of distributional assumptions. In simulation, the advantages of such modern methods are readily demonstrable. However quantitative empirical validation remains elusive in vivo as the true connectivity patterns are unknown and noise/artifact distributions are challenging to characterize with high fidelity. Recent innovations in capturing finite sample behavior of asymptotically consistent estimators (i.e., SIMulation and EXtrapolation - SIMEX) have enabled direct estimation of bias given single datasets. Herein, we leverage the theoretical core of SIMEX to study the properties of inference methods in the face of diminishing data (in contrast to increasing noise). The stability of inference methods with respect to synthetic loss of empirical data (defined as resilience) is used to quantify the empirical performance of one inference method relative to another. We illustrate this new approach in a comparison of ordinary and robust inference methods with rs-fMRI.
The NIRS Analysis Package: noise reduction and statistical inference.
Fekete, Tomer; Rubin, Denis; Carlson, Joshua M; Mujica-Parodi, Lilianne R
2011-01-01
Near infrared spectroscopy (NIRS) is a non-invasive optical imaging technique that can be used to measure cortical hemodynamic responses to specific stimuli or tasks. While analyses of NIRS data are normally adapted from established fMRI techniques, there are nevertheless substantial differences between the two modalities. Here, we investigate the impact of NIRS-specific noise; e.g., systemic (physiological), motion-related artifacts, and serial autocorrelations, upon the validity of statistical inference within the framework of the general linear model. We present a comprehensive framework for noise reduction and statistical inference, which is custom-tailored to the noise characteristics of NIRS. These methods have been implemented in a public domain Matlab toolbox, the NIRS Analysis Package (NAP). Finally, we validate NAP using both simulated and actual data, showing marked improvement in the detection power and reliability of NIRS.
Statistical inference for extinction rates based on last sightings.
Nakamura, Miguel; Del Monte-Luna, Pablo; Lluch-Belda, Daniel; Lluch-Cota, Salvador E
2013-09-21
Rates of extinction can be estimated from sighting records and are assumed to be implicitly constant by many data analysis methods. However, historical sightings are scarce. Frequently, the only information available for inferring extinction is the date of the last sighting. In this study, we developed a probabilistic model and a corresponding statistical inference procedure based on last sightings. We applied this procedure to data on recent marine extirpations and extinctions, seeking to test the null hypothesis of a constant extinction rate. We found that over the past 500 years extirpations in the ocean have been increasing but at an uncertain rate, whereas a constant rate of global marine extinctions is statistically plausible. The small sample sizes of marine extinction records generate such high uncertainty that different combinations of model inputs can yield different outputs that fit the observed data equally well. Thus, current marine extinction trends may be idiosyncratic.
Statistical inference for noisy nonlinear ecological dynamic systems.
Wood, Simon N
2010-08-26
Chaotic ecological dynamic systems defy conventional statistical analysis. Systems with near-chaotic dynamics are little better. Such systems are almost invariably driven by endogenous dynamic processes plus demographic and environmental process noise, and are only observable with error. Their sensitivity to history means that minute changes in the driving noise realization, or the system parameters, will cause drastic changes in the system trajectory. This sensitivity is inherited and amplified by the joint probability density of the observable data and the process noise, rendering it useless as the basis for obtaining measures of statistical fit. Because the joint density is the basis for the fit measures used by all conventional statistical methods, this is a major theoretical shortcoming. The inability to make well-founded statistical inferences about biological dynamic models in the chaotic and near-chaotic regimes, other than on an ad hoc basis, leaves dynamic theory without the methods of quantitative validation that are essential tools in the rest of biological science. Here I show that this impasse can be resolved in a simple and general manner, using a method that requires only the ability to simulate the observed data on a system from the dynamic model about which inferences are required. The raw data series are reduced to phase-insensitive summary statistics, quantifying local dynamic structure and the distribution of observations. Simulation is used to obtain the mean and the covariance matrix of the statistics, given model parameters, allowing the construction of a 'synthetic likelihood' that assesses model fit. This likelihood can be explored using a straightforward Markov chain Monte Carlo sampler, but one further post-processing step returns pure likelihood-based inference. I apply the method to establish the dynamic nature of the fluctuations in Nicholson's classic blowfly experiments.
Two dimensional unstable scar statistics.
Warne, Larry Kevin; Jorgenson, Roy Eberhardt; Kotulski, Joseph Daniel; Lee, Kelvin S. H. (ITT Industries/AES Los Angeles, CA)
2006-12-01
This report examines the localization of time harmonic high frequency modal fields in two dimensional cavities along periodic paths between opposing sides of the cavity. The cases where these orbits lead to unstable localized modes are known as scars. This paper examines the enhancements for these unstable orbits when the opposing mirrors are both convex and concave. In the latter case the construction includes the treatment of interior foci.
Breakdown of statistical inference from some random experiments
NASA Astrophysics Data System (ADS)
Kupczynski, Marian; De Raedt, Hans
2016-03-01
Many experiments can be interpreted in terms of random processes operating according to some internal protocols. When experiments are costly or cannot be repeated only one or a few finite samples are available. In this paper we study data generated by pseudo-random computer experiments operating according to particular internal protocols. We show that the standard statistical analysis performed on a sample, containing 105 data points or more, may sometimes be highly misleading and statistical errors largely underestimated. Our results confirm in a dramatic way the dangers of standard asymptotic statistical inference if a sample is not homogeneous. We demonstrate that analyzing various subdivisions of samples by multiple chi-square tests and chi-square frequency graphs is very effective in detecting sample inhomogeneity. Therefore to assure correctness of the statistical inference the above mentioned chi-square tests and other non-parametric sample homogeneity tests should be incorporated in any statistical analysis of experimental data. If such tests are not performed the reported conclusions and estimates of the errors cannot be trusted.
Gene regulatory network inference using out of equilibrium statistical mechanics
Benecke, Arndt
2008-01-01
Spatiotemporal control of gene expression is fundamental to multicellular life. Despite prodigious efforts, the encoding of gene expression regulation in eukaryotes is not understood. Gene expression analyses nourish the hope to reverse engineer effector-target gene networks using inference techniques. Inference from noisy and circumstantial data relies on using robust models with few parameters for the underlying mechanisms. However, a systematic path to gene regulatory network reverse engineering from functional genomics data is still impeded by fundamental problems. Recently, Johannes Berg from the Theoretical Physics Institute of Cologne University has made two remarkable contributions that significantly advance the gene regulatory network inference problem. Berg, who uses gene expression data from yeast, has demonstrated a nonequilibrium regime for mRNA concentration dynamics and was able to map the gene regulatory process upon simple stochastic systems driven out of equilibrium. The impact of his demonstration is twofold, affecting both the understanding of the operational constraints under which transcription occurs and the capacity to extract relevant information from highly time-resolved expression data. Berg has used his observation to predict target genes of selected transcription factors, and thereby, in principle, demonstrated applicability of his out of equilibrium statistical mechanics approach to the gene network inference problem. PMID:19404429
Indirect Fourier transform in the context of statistical inference.
Muthig, Michael; Prévost, Sylvain; Orglmeister, Reinhold; Gradzielski, Michael
2016-09-01
Inferring structural information from the intensity of a small-angle scattering (SAS) experiment is an ill-posed inverse problem. Thus, the determination of a solution is in general non-trivial. In this work, the indirect Fourier transform (IFT), which determines the pair distance distribution function from the intensity and hence yields structural information, is discussed within two different statistical inference approaches, namely a frequentist one and a Bayesian one, in order to determine a solution objectively From the frequentist approach the cross-validation method is obtained as a good practical objective function for selecting an IFT solution. Moreover, modern machine learning methods are employed to suppress oscillatory behaviour of the solution, hence extracting only meaningful features of the solution. By comparing the results yielded by the different methods presented here, the reliability of the outcome can be improved and thus the approach should enable more reliable information to be deduced from SAS experiments.
Statistical Inference for Big Data Problems in Molecular Biophysics
Ramanathan, Arvind; Savol, Andrej; Burger, Virginia; Quinn, Shannon; Agarwal, Pratul K; Chennubhotla, Chakra
2012-01-01
We highlight the role of statistical inference techniques in providing biological insights from analyzing long time-scale molecular simulation data. Technologi- cal and algorithmic improvements in computation have brought molecular simu- lations to the forefront of techniques applied to investigating the basis of living systems. While these longer simulations, increasingly complex reaching petabyte scales presently, promise a detailed view into microscopic behavior, teasing out the important information has now become a true challenge on its own. Mining this data for important patterns is critical to automating therapeutic intervention discovery, improving protein design, and fundamentally understanding the mech- anistic basis of cellular homeostasis.
Online Updating of Statistical Inference in the Big Data Setting.
Schifano, Elizabeth D; Wu, Jing; Wang, Chun; Yan, Jun; Chen, Ming-Hui
2016-01-01
We present statistical methods for big data arising from online analytical processing, where large amounts of data arrive in streams and require fast analysis without storage/access to the historical data. In particular, we develop iterative estimating algorithms and statistical inferences for linear models and estimating equations that update as new data arrive. These algorithms are computationally efficient, minimally storage-intensive, and allow for possible rank deficiencies in the subset design matrices due to rare-event covariates. Within the linear model setting, the proposed online-updating framework leads to predictive residual tests that can be used to assess the goodness-of-fit of the hypothesized model. We also propose a new online-updating estimator under the estimating equation setting. Theoretical properties of the goodness-of-fit tests and proposed estimators are examined in detail. In simulation studies and real data applications, our estimator compares favorably with competing approaches under the estimating equation setting.
Statistical inference involving binomial and negative binomial parameters.
García-Pérez, Miguel A; Núñez-Antón, Vicente
2009-05-01
Statistical inference about two binomial parameters implies that they are both estimated by binomial sampling. There are occasions in which one aims at testing the equality of two binomial parameters before and after the occurrence of the first success along a sequence of Bernoulli trials. In these cases, the binomial parameter before the first success is estimated by negative binomial sampling whereas that after the first success is estimated by binomial sampling, and both estimates are related. This paper derives statistical tools to test two hypotheses, namely, that both binomial parameters equal some specified value and that both parameters are equal though unknown. Simulation studies are used to show that in small samples both tests are accurate in keeping the nominal Type-I error rates, and also to determine sample size requirements to detect large, medium, and small effects with adequate power. Additional simulations also show that the tests are sufficiently robust to certain violations of their assumptions.
Statistical inference to advance network models in epidemiology.
Welch, David; Bansal, Shweta; Hunter, David R
2011-03-01
Contact networks are playing an increasingly important role in the study of epidemiology. Most of the existing work in this area has focused on considering the effect of underlying network structure on epidemic dynamics by using tools from probability theory and computer simulation. This work has provided much insight on the role that heterogeneity in host contact patterns plays on infectious disease dynamics. Despite the important understanding afforded by the probability and simulation paradigm, this approach does not directly address important questions about the structure of contact networks such as what is the best network model for a particular mode of disease transmission, how parameter values of a given model should be estimated, or how precisely the data allow us to estimate these parameter values. We argue that these questions are best answered within a statistical framework and discuss the role of statistical inference in estimating contact networks from epidemiological data.
Statistical Models for Inferring Vegetation Composition from Fossil Pollen
NASA Astrophysics Data System (ADS)
Paciorek, C.; McLachlan, J. S.; Shang, Z.
2011-12-01
Fossil pollen provide information about vegetation composition that can be used to help understand how vegetation has changed over the past. However, these data have not traditionally been analyzed in a way that allows for statistical inference about spatio-temporal patterns and trends. We build a Bayesian hierarchical model called STEPPS (Spatio-Temporal Empirical Prediction from Pollen in Sediments) that predicts forest composition in southern New England, USA, over the last two millenia based on fossil pollen. The critical relationships between abundances of tree taxa in the pollen record and abundances in actual vegetation are estimated using modern (Forest Inventory Analysis) data and (witness tree) data from colonial records. This gives us two time points at which both pollen and direct vegetation data are available. Based on these relationships, and incorporating our uncertainty about them, we predict forest composition using fossil pollen. We estimate the spatial distribution and relative abundances of tree species and draw inference about how these patterns have changed over time. Finally, we describe ongoing work to extend the modeling to the upper Midwest of the U.S., including an approach to infer tree density and thereby estimate the prairie-forest boundary in Minnesota and Wisconsin. This work is part of the PalEON project, which brings together a team of ecosystem modelers, paleoecologists, and statisticians with the goal of reconstructing vegetation responses to climate during the last two millenia in the northeastern and midwestern United States. The estimates from the statistical modeling will be used to assess and calibrate ecosystem models that are used to project ecological changes in response to global change.
Bayesian inference on the sphere beyond statistical isotropy
Das, Santanu; Souradeep, Tarun; Wandelt, Benjamin D. E-mail: wandelt@iap.fr
2015-10-01
We present a general method for Bayesian inference of the underlying covariance structure of random fields on a sphere. We employ the Bipolar Spherical Harmonic (BipoSH) representation of general covariance structure on the sphere. We illustrate the efficacy of the method as a principled approach to assess violation of statistical isotropy (SI) in the sky maps of Cosmic Microwave Background (CMB) fluctuations. SI violation in observed CMB maps arise due to known physical effects such as Doppler boost and weak lensing; yet unknown theoretical possibilities like cosmic topology and subtle violations of the cosmological principle, as well as, expected observational artefacts of scanning the sky with a non-circular beam, masking, foreground residuals, anisotropic noise, etc. We explicitly demonstrate the recovery of the input SI violation signals with their full statistics in simulated CMB maps. Our formalism easily adapts to exploring parametric physical models with non-SI covariance, as we illustrate for the inference of the parameters of a Doppler boosted sky map. Our approach promises to provide a robust quantitative evaluation of the evidence for SI violation related anomalies in the CMB sky by estimating the BipoSH spectra along with their complete posterior.
Statistics for nuclear engineers and scientists. Part 1. Basic statistical inference
Beggs, W.J.
1981-02-01
This report is intended for the use of engineers and scientists working in the nuclear industry, especially at the Bettis Atomic Power Laboratory. It serves as the basis for several Bettis in-house statistics courses. The objectives of the report are to introduce the reader to the language and concepts of statistics and to provide a basic set of techniques to apply to problems of the collection and analysis of data. Part 1 covers subjects of basic inference. The subjects include: descriptive statistics; probability; simple inference for normally distributed populations, and for non-normal populations as well; comparison of two populations; the analysis of variance; quality control procedures; and linear regression analysis.
Algebraic Statistical Model for Biochemical Network Dynamics Inference.
Linder, Daniel F; Rempala, Grzegorz A
2013-12-01
With modern molecular quantification methods, like, for instance, high throughput sequencing, biologists may perform multiple complex experiments and collect longitudinal data on RNA and DNA concentrations. Such data may be then used to infer cellular level interactions between the molecular entities of interest. One method which formalizes such inference is the stoichiometric algebraic statistical model (SASM) of [2] which allows to analyze the so-called conic (or single source) networks. Despite its intuitive appeal, up until now the SASM has been only heuristically studied on few simple examples. The current paper provides a more formal mathematical treatment of the SASM, expanding the original model to a wider class of reaction systems decomposable into multiple conic subnetworks. In particular, it is proved here that on such networks the SASM enjoys the so-called sparsistency property, that is, it asymptotically (with the number of observed network trajectories) discards the false interactions by setting their reaction rates to zero. For illustration, we apply the extended SASM to in silico data from a generic decomposable network as well as to biological data from an experimental search for a possible transcription factor for the heat shock protein 70 (Hsp70) in the zebrafish retina.
Simple statistical inference algorithms for task-dependent wellness assessment.
Kailas, A; Chong, C-C; Watanabe, F
2012-07-01
Stress is a key indicator of wellness in human beings and a prime contributor to performance degradation and errors during various human tasks. The overriding purpose of this paper is to propose two algorithms (probabilistic and non-probabilistic) that iteratively track stress states to compute a wellness index in terms of the stress levels. This paper adopts the physiological view-point that high stress is accompanied with large deviations in biometrics such as body temperature, heart rate, etc., and the proposed algorithms iteratively track these fluctuations to compute a personalized wellness index that is correlated to the engagement levels of the tasks performed by the user. In essence, this paper presents a quantitative relationship between temperature, occupational stress, and wellness during different tasks. The simplicity of the statistical inference algorithms make them favorable candidates for implementation on mobile platforms such as smart phones in the future, thereby providing users an inexpensive application for self-wellness monitoring for a healthier lifestyle.
Multiple Illuminant Colour Estimation via Statistical Inference on Factor Graphs.
Mutimbu, Lawrence; Robles-Kelly, Antonio
2016-08-31
This paper presents a method to recover a spatially varying illuminant colour estimate from scenes lit by multiple light sources. Starting with the image formation process, we formulate the illuminant recovery problem in a statistically datadriven setting. To do this, we use a factor graph defined across the scale space of the input image. In the graph, we utilise a set of illuminant prototypes computed using a data driven approach. As a result, our method delivers a pixelwise illuminant colour estimate being devoid of libraries or user input. The use of a factor graph also allows for the illuminant estimates to be recovered making use of a maximum a posteriori (MAP) inference process. Moreover, we compute the probability marginals by performing a Delaunay triangulation on our factor graph. We illustrate the utility of our method for pixelwise illuminant colour recovery on widely available datasets and compare against a number of alternatives. We also show sample colour correction results on real-world images.
Testing manifest monotonicity using order-constrained statistical inference.
Tijmstra, Jesper; Hessen, David J; van der Heijden, Peter G M; Sijtsma, Klaas
2013-01-01
Most dichotomous item response models share the assumption of latent monotonicity, which states that the probability of a positive response to an item is a nondecreasing function of a latent variable intended to be measured. Latent monotonicity cannot be evaluated directly, but it implies manifest monotonicity across a variety of observed scores, such as the restscore, a single item score, and in some cases the total score. In this study, we show that manifest monotonicity can be tested by means of the order-constrained statistical inference framework. We propose a procedure that uses this framework to determine whether manifest monotonicity should be rejected for specific items. This approach provides a likelihood ratio test for which the p-value can be approximated through simulation. A simulation study is presented that evaluates the Type I error rate and power of the test, and the procedure is applied to empirical data.
CALUX measurements: statistical inferences for the dose-response curve.
Elskens, M; Baston, D S; Stumpf, C; Haedrich, J; Keupers, I; Croes, K; Denison, M S; Baeyens, W; Goeyens, L
2011-09-30
Chemical Activated LUciferase gene eXpression [CALUX] is a reporter gene mammalian cell bioassay used for detection and semi-quantitative analyses of dioxin-like compounds. CALUX dose-response curves for 2,3,7,8-tetrachlorodibenzo-p-dioxin [TCDD] are typically smooth and sigmoidal when the dose is portrayed on a logarithmic scale. Non-linear regression models are used to calibrate the CALUX response versus TCDD standards and to convert the sample response into Bioanalytical EQuivalents (BEQs). Several complications may arise in terms of statistical inference, specifically and most important is the uncertainty assessment of the predicted BEQ. This paper presents the use of linear calibration functions based on Box-Cox transformations to overcome the issue of uncertainty assessment. Main issues being addressed are (i) confidence and prediction intervals for the CALUX response, (ii) confidence and prediction intervals for the predicted BEQ-value, and (iii) detection/estimation capabilities for the sigmoid and linearized models. Statistical comparisons between different calculation methods involving inverse prediction, effective concentration ratios (ECR(20-50-80)) and slope ratio were achieved with example datasets in order to provide guidance for optimizing BEQ determinations and expand assay performance with the recombinant mouse hepatoma CALUX cell line H1L6.1c3.
NASA Astrophysics Data System (ADS)
Albert, Carlo; Ulzega, Simone; Stoop, Ruedi
2016-04-01
Measured time-series of both precipitation and runoff are known to exhibit highly non-trivial statistical properties. For making reliable probabilistic predictions in hydrology, it is therefore desirable to have stochastic models with output distributions that share these properties. When parameters of such models have to be inferred from data, we also need to quantify the associated parametric uncertainty. For non-trivial stochastic models, however, this latter step is typically very demanding, both conceptually and numerically, and always never done in hydrology. Here, we demonstrate that methods developed in statistical physics make a large class of stochastic differential equation (SDE) models amenable to a full-fledged Bayesian parameter inference. For concreteness we demonstrate these methods by means of a simple yet non-trivial toy SDE model. We consider a natural catchment that can be described by a linear reservoir, at the scale of observation. All the neglected processes are assumed to happen at much shorter time-scales and are therefore modeled with a Gaussian white noise term, the standard deviation of which is assumed to scale linearly with the system state (water volume in the catchment). Even for constant input, the outputs of this simple non-linear SDE model show a wealth of desirable statistical properties, such as fat-tailed distributions and long-range correlations. Standard algorithms for Bayesian inference fail, for models of this kind, because their likelihood functions are extremely high-dimensional intractable integrals over all possible model realizations. The use of Kalman filters is illegitimate due to the non-linearity of the model. Particle filters could be used but become increasingly inefficient with growing number of data points. Hamiltonian Monte Carlo algorithms allow us to translate this inference problem to the problem of simulating the dynamics of a statistical mechanics system and give us access to most sophisticated methods
NASA Astrophysics Data System (ADS)
Vali Ahmadi, Mohammad; Doostparast, Mahdi; Ahmadi, Jafar
2015-04-01
In manufacturing industries, the lifetime of an item is usually characterised by a random variable X and considered to be satisfactory if X exceeds a given lower lifetime limit L. The probability of a satisfactory item is then ηL := P(X ≥ L), called conforming rate. In industrial companies, however, the lifetime performance index, proposed by Montgomery and denoted by CL, is widely used as a process capability index instead of the conforming rate. Assuming a parametric model for the random variable X, we show that there is a connection between the conforming rate and the lifetime performance index. Consequently, the statistical inferences about ηL and CL are equivalent. Hence, we restrict ourselves to statistical inference for CL based on generalised order statistics, which contains several ordered data models such as usual order statistics, progressively Type-II censored data and records. Various point and interval estimators for the parameter CL are obtained and optimal critical regions for the hypothesis testing problems concerning CL are proposed. Finally, two real data-sets on the lifetimes of insulating fluid and ball bearings, due to Nelson (1982) and Caroni (2002), respectively, and a simulated sample are analysed.
Statistical inference of regulatory networks for circadian regulation.
Aderhold, Andrej; Husmeier, Dirk; Grzegorczyk, Marco
2014-06-01
We assess the accuracy of various state-of-the-art statistics and machine learning methods for reconstructing gene and protein regulatory networks in the context of circadian regulation. Our study draws on the increasing availability of gene expression and protein concentration time series for key circadian clock components in Arabidopsis thaliana. In addition, gene expression and protein concentration time series are simulated from a recently published regulatory network of the circadian clock in A. thaliana, in which protein and gene interactions are described by a Markov jump process based on Michaelis-Menten kinetics. We closely follow recent experimental protocols, including the entrainment of seedlings to different light-dark cycles and the knock-out of various key regulatory genes. Our study provides relative network reconstruction accuracy scores for a critical comparative performance evaluation, and sheds light on a series of highly relevant questions: it quantifies the influence of systematically missing values related to unknown protein concentrations and mRNA transcription rates, it investigates the dependence of the performance on the network topology and the degree of recurrency, it provides deeper insight into when and why non-linear methods fail to outperform linear ones, it offers improved guidelines on parameter settings in different inference procedures, and it suggests new hypotheses about the structure of the central circadian gene regulatory network in A. thaliana.
Multivariate Statistical Inference of Lightning Occurrence, and Using Lightning Observations
NASA Technical Reports Server (NTRS)
Boccippio, Dennis
2004-01-01
Two classes of multivariate statistical inference using TRMM Lightning Imaging Sensor, Precipitation Radar, and Microwave Imager observation are studied, using nonlinear classification neural networks as inferential tools. The very large and globally representative data sample provided by TRMM allows both training and validation (without overfitting) of neural networks with many degrees of freedom. In the first study, the flashing / or flashing condition of storm complexes is diagnosed using radar, passive microwave and/or environmental observations as neural network inputs. The diagnostic skill of these simple lightning/no-lightning classifiers can be quite high, over land (above 80% Probability of Detection; below 20% False Alarm Rate). In the second, passive microwave and lightning observations are used to diagnose radar reflectivity vertical structure. A priori diagnosis of hydrometeor vertical structure is highly important for improved rainfall retrieval from either orbital radars (e.g., the future Global Precipitation Mission "mothership") or radiometers (e.g., operational SSM/I and future Global Precipitation Mission passive microwave constellation platforms), we explore the incremental benefit to such diagnosis provided by lightning observations.
Physics of epigenetic landscapes and statistical inference by cells
NASA Astrophysics Data System (ADS)
Lang, Alex H.
Biology is currently in the midst of a revolution. Great technological advances have led to unprecedented quantitative data at the whole genome level. However, new techniques are needed to deal with this deluge of high-dimensional data. Therefore, statistical physics has the potential to help develop systems biology level models that can incorporate complex data. Additionally, physicists have made great strides in understanding non-equilibrium thermodynamics. However, the consequences of these advances have yet to be fully incorporated into biology. There are three specific problems that I address in my dissertation. First, a common metaphor for describing development is a rugged "epigenetic landscape'' where cell fates are represented as attracting valleys resulting from a complex regulatory network. I introduce a framework for explicitly constructing epigenetic landscapes that combines genomic data with techniques from spin-glass physics. The model reproduces known reprogramming protocols and identifies candidate transcription factors for reprogramming to novel cell fates, suggesting epigenetic landscapes are a powerful paradigm for understanding cellular identity. Second, I examine the dynamics of cellular reprogramming. By reanalyzing all available time-series data, I show that gene expression dynamics during reprogramming follow a simple one-dimensional reaction coordinate that is independent of both the time and details of experimental protocol used. I show that such a reaction coordinate emerges naturally from epigenetic landscape models of cell identity where cellular reprogramming is viewed as a "barrier-crossing'' between the starting and ending cell fates. Overall, the analysis and model suggest that gene expression dynamics during reprogramming follow a canonical trajectory consistent with the idea of an ``optimal path'' in gene expression space for reprogramming. Third, an important task of cells is to perform complex computations in response to
Development of Statistical Methods Using Predictive Inference and Entropy.
1986-03-01
Inference and Entopy APPENDIX B: Achieab Accuracy in Parametric Estimation of B-I Multivariate spectra ii LWl OF MIUMU AND TABLES FIGURES PAGE Figre1...1986e). "Achievable Accuracy in Parametric Estimation of Multivariate Spec- tra’. Draft. Larimore, WE. (1983a). ’Predictive inference, sufficiency... PARAMETRIC ESTIMATION OF MULTIVARIATE SPECTRA By Wallace E. Larimore Scientific Systems Inc., Cambridge, Massachusetts, U.SA. Research Sponsored by the
Wilkinson, Michael
2014-03-01
Decisions about support for predictions of theories in light of data are made using statistical inference. The dominant approach in sport and exercise science is the Neyman-Pearson (N-P) significance-testing approach. When applied correctly it provides a reliable procedure for making dichotomous decisions for accepting or rejecting zero-effect null hypotheses with known and controlled long-run error rates. Type I and type II error rates must be specified in advance and the latter controlled by conducting an a priori sample size calculation. The N-P approach does not provide the probability of hypotheses or indicate the strength of support for hypotheses in light of data, yet many scientists believe it does. Outcomes of analyses allow conclusions only about the existence of non-zero effects, and provide no information about the likely size of true effects or their practical/clinical value. Bayesian inference can show how much support data provide for different hypotheses, and how personal convictions should be altered in light of data, but the approach is complicated by formulating probability distributions about prior subjective estimates of population effects. A pragmatic solution is magnitude-based inference, which allows scientists to estimate the true magnitude of population effects and how likely they are to exceed an effect magnitude of practical/clinical importance, thereby integrating elements of subjective Bayesian-style thinking. While this approach is gaining acceptance, progress might be hastened if scientists appreciate the shortcomings of traditional N-P null hypothesis significance testing.
Bayesian Statistical Inference in Psychology: Comment on Trafimow (2003)
ERIC Educational Resources Information Center
Lee, Michael D.; Wagenmakers, Eric-Jan
2005-01-01
D. Trafimow presented an analysis of null hypothesis significance testing (NHST) using Bayes's theorem. Among other points, he concluded that NHST is logically invalid, but that logically valid Bayesian analyses are often not possible. The latter conclusion reflects a fundamental misunderstanding of the nature of Bayesian inference. This view…
Building Intuitions about Statistical Inference Based on Resampling
ERIC Educational Resources Information Center
Watson, Jane; Chance, Beth
2012-01-01
Formal inference, which makes theoretical assumptions about distributions and applies hypothesis testing procedures with null and alternative hypotheses, is notoriously difficult for tertiary students to master. The debate about whether this content should appear in Years 11 and 12 of the "Australian Curriculum: Mathematics" has gone on…
Some challenges with statistical inference in adaptive designs.
Hung, H M James; Wang, Sue-Jane; Yang, Peiling
2014-01-01
Adaptive designs have generated a great deal of attention to clinical trial communities. The literature contains many statistical methods to deal with added statistical uncertainties concerning the adaptations. Increasingly encountered in regulatory applications are adaptive statistical information designs that allow modification of sample size or related statistical information and adaptive selection designs that allow selection of doses or patient populations during the course of a clinical trial. For adaptive statistical information designs, a few statistical testing methods are mathematically equivalent, as a number of articles have stipulated, but arguably there are large differences in their practical ramifications. We pinpoint some undesirable features of these methods in this work. For adaptive selection designs, the selection based on biomarker data for testing the correlated clinical endpoints may increase statistical uncertainty in terms of type I error probability, and most importantly the increased statistical uncertainty may be impossible to assess.
Statistical mechanics of complex neural systems and high dimensional data
NASA Astrophysics Data System (ADS)
Advani, Madhu; Lahiri, Subhaneil; Ganguli, Surya
2013-03-01
Recent experimental advances in neuroscience have opened new vistas into the immense complexity of neuronal networks. This proliferation of data challenges us on two parallel fronts. First, how can we form adequate theoretical frameworks for understanding how dynamical network processes cooperate across widely disparate spatiotemporal scales to solve important computational problems? Second, how can we extract meaningful models of neuronal systems from high dimensional datasets? To aid in these challenges, we give a pedagogical review of a collection of ideas and theoretical methods arising at the intersection of statistical physics, computer science and neurobiology. We introduce the interrelated replica and cavity methods, which originated in statistical physics as powerful ways to quantitatively analyze large highly heterogeneous systems of many interacting degrees of freedom. We also introduce the closely related notion of message passing in graphical models, which originated in computer science as a distributed algorithm capable of solving large inference and optimization problems involving many coupled variables. We then show how both the statistical physics and computer science perspectives can be applied in a wide diversity of contexts to problems arising in theoretical neuroscience and data analysis. Along the way we discuss spin glasses, learning theory, illusions of structure in noise, random matrices, dimensionality reduction and compressed sensing, all within the unified formalism of the replica method. Moreover, we review recent conceptual connections between message passing in graphical models, and neural computation and learning. Overall, these ideas illustrate how statistical physics and computer science might provide a lens through which we can uncover emergent computational functions buried deep within the dynamical complexities of neuronal networks.
Statistical inference for stochastic simulation models--theory and application.
Hartig, Florian; Calabrese, Justin M; Reineking, Björn; Wiegand, Thorsten; Huth, Andreas
2011-08-01
Statistical models are the traditional choice to test scientific theories when observations, processes or boundary conditions are subject to stochasticity. Many important systems in ecology and biology, however, are difficult to capture with statistical models. Stochastic simulation models offer an alternative, but they were hitherto associated with a major disadvantage: their likelihood functions can usually not be calculated explicitly, and thus it is difficult to couple them to well-established statistical theory such as maximum likelihood and Bayesian statistics. A number of new methods, among them Approximate Bayesian Computing and Pattern-Oriented Modelling, bypass this limitation. These methods share three main principles: aggregation of simulated and observed data via summary statistics, likelihood approximation based on the summary statistics, and efficient sampling. We discuss principles as well as advantages and caveats of these methods, and demonstrate their potential for integrating stochastic simulation models into a unified framework for statistical modelling.
Statistical inference and sensitivity to sampling in 11-month-old infants.
Xu, Fei; Denison, Stephanie
2009-07-01
Research on initial conceptual knowledge and research on early statistical learning mechanisms have been, for the most part, two separate enterprises. We report a study with 11-month-old infants investigating whether they are sensitive to sampling conditions and whether they can integrate intentional information in a statistical inference task. Previous studies found that infants were able to make inferences from samples to populations, and vice versa [Xu, F., & Garcia, V. (2008). Intuitive statistics by 8-month-old infants. Proceedings of the National Academy of Sciences of the United States of America, 105, 5012-5015]. We found that when employing this statistical inference mechanism, infants are sensitive to whether a sample was randomly drawn from a population or not, and they take into account intentional information (e.g., explicitly expressed preference, visual access) when computing the relationship between samples and populations. Our results suggest that domain-specific knowledge is integrated with statistical inference mechanisms early in development.
PROBLEMS OF STATISTICAL INFERENCE FOR BIRTH AND DEATH QUEUEING MODELS
A large sample theory is presented for birth and death queueing processes which are ergodic and metrically transitive. The theory is applied to make...inferences about how arrival and service rates vary with the number in the system. Likelihood ratio tests and maximum likelihood estimators are...derived for simple models which describe this variation. Composite hypotheses such as that the arrival rate does not vary with the number in the system are
For and Against Methodologies: Some Perspectives on Recent Causal and Statistical Inference Debates.
Greenland, Sander
2017-01-01
I present an overview of two methods controversies that are central to analysis and inference: That surrounding causal modeling as reflected in the "causal inference" movement, and that surrounding null bias in statistical methods as applied to causal questions. Human factors have expanded what might otherwise have been narrow technical discussions into broad philosophical debates. There seem to be misconceptions about the requirements and capabilities of formal methods, especially in notions that certain assumptions or models (such as potential-outcome models) are necessary or sufficient for valid inference. I argue that, once these misconceptions are removed, most elements of the opposing views can be reconciled. The chief problem of causal inference then becomes one of how to teach sound use of formal methods (such as causal modeling, statistical inference, and sensitivity analysis), and how to apply them without generating the overconfidence and misinterpretations that have ruined so many statistical practices.
The Role of the Sampling Distribution in Understanding Statistical Inference
ERIC Educational Resources Information Center
Lipson, Kay
2003-01-01
Many statistics educators believe that few students develop the level of conceptual understanding essential for them to apply correctly the statistical techniques at their disposal and to interpret their outcomes appropriately. It is also commonly believed that the sampling distribution plays an important role in developing this understanding.…
ERIC Educational Resources Information Center
Larwin, Karen H.; Larwin, David A.
2011-01-01
Bootstrapping methods and random distribution methods are increasingly recommended as better approaches for teaching students about statistical inference in introductory-level statistics courses. The authors examined the effect of teaching undergraduate business statistics students using random distribution and bootstrapping simulations. It is the…
Inference Based on Simple Step Statistics for the Location Model.
1981-07-01
function. Let TN,k(9) - Zak(’)Vi(e). Then TNk is called the k-step statistic. Noether (1973) studied the 1-step statistic with particular emphasis on...opposed to the sign statistic. These latter two comparisons were first discussed by Noether (1973) in a somewhat different setting. Notice that the...obtained by Noether (1973). If k - 3, we seek the (C + 1)’st and (2N - bI - b2 - C)’th ordered Walsh averages in D The algorithm of Section 3 modified to
Statistical Inferences from Formaldehyde Dna-Protein Cross-Link Data
Physiologically-based pharmacokinetic (PBPK) modeling has reached considerable sophistication in its application in the pharmacological and environmental health areas. Yet, mature methodologies for making statistical inferences have not been routinely incorporated in these applic...
Statistical Inference Models for Image Datasets with Systematic Variations.
Kim, Won Hwa; Bendlin, Barbara B; Chung, Moo K; Johnson, Sterling C; Singh, Vikas
2015-06-01
Statistical analysis of longitudinal or cross sectional brain imaging data to identify effects of neurodegenerative diseases is a fundamental task in various studies in neuroscience. However, when there are systematic variations in the images due to parameter changes such as changes in the scanner protocol, hardware changes, or when combining data from multi-site studies, the statistical analysis becomes problematic. Motivated by this scenario, the goal of this paper is to develop a unified statistical solution to the problem of systematic variations in statistical image analysis. Based in part on recent literature in harmonic analysis on diffusion maps, we propose an algorithm which compares operators that are resilient to the systematic variations. These operators are derived from the empirical measurements of the image data and provide an efficient surrogate to capturing the actual changes across images. We also establish a connection between our method to the design of wavelets in non-Euclidean space. To evaluate the proposed ideas, we present various experimental results on detecting changes in simulations as well as show how the method offers improved statistical power in the analysis of real longitudinal PIB-PET imaging data acquired from participants at risk for Alzheimer's disease (AD).
Statistical Inference Models for Image Datasets with Systematic Variations
Kim, Won Hwa; Bendlin, Barbara B.; Chung, Moo K.; Johnson, Sterling C.; Singh, Vikas
2016-01-01
Statistical analysis of longitudinal or cross sectional brain imaging data to identify effects of neurodegenerative diseases is a fundamental task in various studies in neuroscience. However, when there are systematic variations in the images due to parameter changes such as changes in the scanner protocol, hardware changes, or when combining data from multi-site studies, the statistical analysis becomes problematic. Motivated by this scenario, the goal of this paper is to develop a unified statistical solution to the problem of systematic variations in statistical image analysis. Based in part on recent literature in harmonic analysis on diffusion maps, we propose an algorithm which compares operators that are resilient to the systematic variations. These operators are derived from the empirical measurements of the image data and provide an efficient surrogate to capturing the actual changes across images. We also establish a connection between our method to the design of wavelets in non-Euclidean space. To evaluate the proposed ideas, we present various experimental results on detecting changes in simulations as well as show how the method offers improved statistical power in the analysis of real longitudinal PIB-PET imaging data acquired from participants at risk for Alzheimer’s disease (AD). PMID:26989336
Statistical inference in behavior analysis: Friend or foe?
Baron, Alan
1999-01-01
Behavior analysts are undecided about the proper role to be played by inferential statistics in behavioral research. The traditional view, as expressed in Sidman's Tactics of Scientific Research (1960), was that inferential statistics has no place within a science that focuses on the steady-state behavior of individual organisms. Despite this admonition, there have been steady inroads of statistical techniques into behavior analysis since then, as evidenced by publications in the Journal of the Experimental Analysis of Behavior. The issues raised by these developments were considered at a panel held at the 24th annual convention of the Association for Behavior Analysis, Orlando, Florida (May, 1998). The proceedings are reported in this and the following articles. PMID:22478323
Contrasting Diversity Values: Statistical Inferences Based on Overlapping Confidence Intervals
MacGregor-Fors, Ian; Payton, Mark E.
2013-01-01
Ecologists often contrast diversity (species richness and abundances) using tests for comparing means or indices. However, many popular software applications do not support performing standard inferential statistics for estimates of species richness and/or density. In this study we simulated the behavior of asymmetric log-normal confidence intervals and determined an interval level that mimics statistical tests with P(α) = 0.05 when confidence intervals from two distributions do not overlap. Our results show that 84% confidence intervals robustly mimic 0.05 statistical tests for asymmetric confidence intervals, as has been demonstrated for symmetric ones in the past. Finally, we provide detailed user-guides for calculating 84% confidence intervals in two of the most robust and highly-used freeware related to diversity measurements for wildlife (i.e., EstimateS, Distance). PMID:23437239
Contrasting diversity values: statistical inferences based on overlapping confidence intervals.
MacGregor-Fors, Ian; Payton, Mark E
2013-01-01
Ecologists often contrast diversity (species richness and abundances) using tests for comparing means or indices. However, many popular software applications do not support performing standard inferential statistics for estimates of species richness and/or density. In this study we simulated the behavior of asymmetric log-normal confidence intervals and determined an interval level that mimics statistical tests with P(α) = 0.05 when confidence intervals from two distributions do not overlap. Our results show that 84% confidence intervals robustly mimic 0.05 statistical tests for asymmetric confidence intervals, as has been demonstrated for symmetric ones in the past. Finally, we provide detailed user-guides for calculating 84% confidence intervals in two of the most robust and highly-used freeware related to diversity measurements for wildlife (i.e., EstimateS, Distance).
A Test by Any Other Name: P Values, Bayes Factors, and Statistical Inference.
Stern, Hal S
2016-01-01
Procedures used for statistical inference are receiving increased scrutiny as the scientific community studies the factors associated with insuring reproducible research. This note addresses recent negative attention directed at p values, the relationship of confidence intervals and tests, and the role of Bayesian inference and Bayes factors, with an eye toward better understanding these different strategies for statistical inference. We argue that researchers and data analysts too often resort to binary decisions (e.g., whether to reject or accept the null hypothesis) in settings where this may not be required.
Wang, Xiaoxiao; Wang, Huan; Huang, Jinfeng; Zhou, Yifeng; Tzvetanov, Tzvetomir
2017-01-01
The contrast sensitivity function that spans the two dimensions of contrast and spatial frequency is crucial in predicting functional vision both in research and clinical applications. In this study, the use of Bayesian inference was proposed to determine the parameters of the two-dimensional contrast sensitivity function. Two-dimensional Bayesian inference was extensively simulated in comparison to classical one-dimensional measures. Its performance on two-dimensional data gathered with different sampling algorithms was also investigated. The results showed that the two-dimensional Bayesian inference method significantly improved the accuracy and precision of the contrast sensitivity function, as compared to the more common one-dimensional estimates. In addition, applying two-dimensional Bayesian estimation to the final data set showed similar levels of reliability and efficiency across widely disparate and established sampling methods (from classical one-dimensional sampling, such as Ψ or staircase, to more novel multi-dimensional sampling methods, such as quick contrast sensitivity function and Fisher information gain). Furthermore, the improvements observed following the application of Bayesian inference were maintained even when the prior poorly matched the subject's contrast sensitivity function. Simulation results were confirmed in a psychophysical experiment. The results indicated that two-dimensional Bayesian inference of contrast sensitivity function data provides similar estimates across a wide range of sampling methods. The present study likely has implications for the measurement of contrast sensitivity function in various settings (including research and clinical settings) and would facilitate the comparison of existing data from previous studies. PMID:28119563
Wang, Xiaoxiao; Wang, Huan; Huang, Jinfeng; Zhou, Yifeng; Tzvetanov, Tzvetomir
2016-01-01
The contrast sensitivity function that spans the two dimensions of contrast and spatial frequency is crucial in predicting functional vision both in research and clinical applications. In this study, the use of Bayesian inference was proposed to determine the parameters of the two-dimensional contrast sensitivity function. Two-dimensional Bayesian inference was extensively simulated in comparison to classical one-dimensional measures. Its performance on two-dimensional data gathered with different sampling algorithms was also investigated. The results showed that the two-dimensional Bayesian inference method significantly improved the accuracy and precision of the contrast sensitivity function, as compared to the more common one-dimensional estimates. In addition, applying two-dimensional Bayesian estimation to the final data set showed similar levels of reliability and efficiency across widely disparate and established sampling methods (from classical one-dimensional sampling, such as Ψ or staircase, to more novel multi-dimensional sampling methods, such as quick contrast sensitivity function and Fisher information gain). Furthermore, the improvements observed following the application of Bayesian inference were maintained even when the prior poorly matched the subject's contrast sensitivity function. Simulation results were confirmed in a psychophysical experiment. The results indicated that two-dimensional Bayesian inference of contrast sensitivity function data provides similar estimates across a wide range of sampling methods. The present study likely has implications for the measurement of contrast sensitivity function in various settings (including research and clinical settings) and would facilitate the comparison of existing data from previous studies.
Statistical inference and forensic evidence: evaluating a bullet lead match.
Kaasa, Suzanne O; Peterson, Tiamoyo; Morris, Erin K; Thompson, William C
2007-10-01
This experiment tested the ability of undergraduate mock jurors (N=295) to draw appropriate conclusions from statistical data on the diagnostic value of forensic evidence. Jurors read a summary of a homicide trial in which the key evidence was a bullet lead "match" that was either highly diagnostic, non-diagnostic, or of unknown diagnostic value. There was also a control condition in which the forensic "match" was not presented. The results indicate that jurors as a group used the statistics appropriately to distinguish diagnostic from non-diagnostic forensic evidence, giving considerable weight to the former and little or no weight to the latter. However, this effect was attributable to responses of a subset of jurors who expressed confidence in their ability to use statistical data. Jurors who lacked confidence in their statistical ability failed to distinguish highly diagnostic from non-diagnostic forensic evidence; they gave no weight to the forensic evidence regardless of its diagnostic value. Confident jurors also gave more weight to evidence of unknown diagnostic value. Theoretical and legal implications are discussed.
Technology Focus: Using Technology to Explore Statistical Inference
ERIC Educational Resources Information Center
Garofalo, Joe; Juersivich, Nicole
2007-01-01
There is much research that documents what many teachers know, that students struggle with many concepts in probability and statistics. This article presents two sample activities the authors use to help preservice teachers develop ideas about how they can use technology to promote their students' ability to understand mathematics and connect…
Statistical Inference and Simulation with StatKey
ERIC Educational Resources Information Center
Quinn, Anne
2016-01-01
While looking for an inexpensive technology package to help students in statistics classes, the author found StatKey, a free Web-based app. Not only is StatKey useful for students' year-end projects, but it is also valuable for helping students learn fundamental content such as the central limit theorem. Using StatKey, students can engage in…
Trans-dimensional Bayesian inference for large sequential data sets
NASA Astrophysics Data System (ADS)
Mandolesi, E.; Dettmer, J.; Dosso, S. E.; Holland, C. W.
2015-12-01
This work develops a sequential Monte Carlo method to infer seismic parameters of layered seabeds from large sequential reflection-coefficient data sets. The approach provides parameter estimates and uncertainties along survey tracks with the goal to aid in the detection of unexploded ordnance in shallow water. The sequential data are acquired by a moving platform with source and receiver array towed close to the seabed. This geometry requires consideration of spherical reflection coefficients, computed efficiently by massively parallel implementation of the Sommerfeld integral via Levin integration on a graphics processing unit. The seabed is parametrized with a trans-dimensional model to account for changes in the environment (i.e. changes in layering) along the track. The method combines advanced Markov chain Monte Carlo methods (annealing) with particle filtering (resampling). Since data from closely-spaced source transmissions (pings) often sample similar environments, the solution from one ping can be utilized to efficiently estimate the posterior for data from subsequent pings. Since reflection-coefficient data are highly informative, the likelihood function can be extremely peaked, resulting in little overlap between posteriors of adjacent pings. This is addressed by adding bridging distributions (via annealed importance sampling) between pings for more efficient transitions. The approach assumes the environment to be changing slowly enough to justify the local 1D parametrization. However, bridging allows rapid changes between pings to be addressed and we demonstrate the method to be stable in such situations. Results are in terms of trans-D parameter estimates and uncertainties along the track. The algorithm is examined for realistic simulated data along a track and applied to a dataset collected by an autonomous underwater vehicle on the Malta Plateau, Mediterranean Sea. [Work supported by the SERDP, DoD.
Inferring Master Painters' Esthetic Biases from the Statistics of Portraits.
Aleem, Hassan; Correa-Herran, Ivan; Grzywacz, Norberto M
2017-01-01
The Processing Fluency Theory posits that the ease of sensory information processing in the brain facilitates esthetic pleasure. Accordingly, the theory would predict that master painters should display biases toward visual properties such as symmetry, balance, and moderate complexity. Have these biases been occurring and if so, have painters been optimizing these properties (fluency variables)? Here, we address these questions with statistics of portrait paintings from the Early Renaissance period. To do this, we first developed different computational measures for each of the aforementioned fluency variables. Then, we measured their statistics in 153 portraits from 26 master painters, in 27 photographs of people in three controlled poses, and in 38 quickly snapped photographs of individual persons. A statistical comparison between Early Renaissance portraits and quickly snapped photographs revealed that painters showed a bias toward balance, symmetry, and moderate complexity. However, a comparison between portraits and controlled-pose photographs showed that painters did not optimize each of these properties. Instead, different painters presented biases toward different, narrow ranges of fluency variables. Further analysis suggested that the painters' individuality stemmed in part from having to resolve the tension between complexity vs. symmetry and balance. We additionally found that constraints on the use of different painting materials by distinct painters modulated these fluency variables systematically. In conclusion, the Processing Fluency Theory of Esthetic Pleasure would need expansion if we were to apply it to the history of visual art since it cannot explain the lack of optimization of each fluency variables. To expand the theory, we propose the existence of a Neuroesthetic Space, which encompasses the possible values that each of the fluency variables can reach in any given art period. We discuss the neural mechanisms of this Space and propose that it
ERIC Educational Resources Information Center
Denbleyker, John Nickolas
2012-01-01
The shortcomings of the proportion above cut (PAC) statistic used so prominently in the educational landscape renders it a very problematic measure for making correct inferences with student test data. The limitations of PAC-based statistics are more pronounced with cross-test comparisons due to their dependency on cut-score locations. A better…
Statistical Inference and Sensitivity to Sampling in 11-Month-Old Infants
ERIC Educational Resources Information Center
Xu, Fei; Denison, Stephanie
2009-01-01
Research on initial conceptual knowledge and research on early statistical learning mechanisms have been, for the most part, two separate enterprises. We report a study with 11-month-old infants investigating whether they are sensitive to sampling conditions and whether they can integrate intentional information in a statistical inference task.…
Inferring Master Painters' Esthetic Biases from the Statistics of Portraits
Aleem, Hassan; Correa-Herran, Ivan; Grzywacz, Norberto M.
2017-01-01
The Processing Fluency Theory posits that the ease of sensory information processing in the brain facilitates esthetic pleasure. Accordingly, the theory would predict that master painters should display biases toward visual properties such as symmetry, balance, and moderate complexity. Have these biases been occurring and if so, have painters been optimizing these properties (fluency variables)? Here, we address these questions with statistics of portrait paintings from the Early Renaissance period. To do this, we first developed different computational measures for each of the aforementioned fluency variables. Then, we measured their statistics in 153 portraits from 26 master painters, in 27 photographs of people in three controlled poses, and in 38 quickly snapped photographs of individual persons. A statistical comparison between Early Renaissance portraits and quickly snapped photographs revealed that painters showed a bias toward balance, symmetry, and moderate complexity. However, a comparison between portraits and controlled-pose photographs showed that painters did not optimize each of these properties. Instead, different painters presented biases toward different, narrow ranges of fluency variables. Further analysis suggested that the painters' individuality stemmed in part from having to resolve the tension between complexity vs. symmetry and balance. We additionally found that constraints on the use of different painting materials by distinct painters modulated these fluency variables systematically. In conclusion, the Processing Fluency Theory of Esthetic Pleasure would need expansion if we were to apply it to the history of visual art since it cannot explain the lack of optimization of each fluency variables. To expand the theory, we propose the existence of a Neuroesthetic Space, which encompasses the possible values that each of the fluency variables can reach in any given art period. We discuss the neural mechanisms of this Space and propose that it
Image analysis and statistical inference in neuroimaging with R.
Tabelow, K; Clayden, J D; de Micheaux, P Lafaye; Polzehl, J; Schmid, V J; Whitcher, B
2011-04-15
R is a language and environment for statistical computing and graphics. It can be considered an alternative implementation of the S language developed in the 1970s and 1980s for data analysis and graphics (Becker and Chambers, 1984; Becker et al., 1988). The R language is part of the GNU project and offers versions that compile and run on almost every major operating system currently available. We highlight several R packages built specifically for the analysis of neuroimaging data in the context of functional MRI, diffusion tensor imaging, and dynamic contrast-enhanced MRI. We review their methodology and give an overview of their capabilities for neuroimaging. In addition we summarize some of the current activities in the area of neuroimaging software development in R.
A Test By Any Other Name: P-values, Bayes Factors and Statistical Inference
Stern, Hal S.
2016-01-01
The exchange between Hoitjink, van Kooten and Hulsker (in press) (HKH) and Morey, Wagenmakers, and Rouder (in press) (MWR) in this issue is focused on the use of Bayes factors for statistical inference but raises a number of more general questions about Bayesian and frequentist approaches to inference. This note addresses recent negative attention directed at p-values, the relationship of confidence intervals and tests, and the role of Bayesian inference and Bayes factors, with an eye towards better understanding these different strategies for statistical inference. We argue that researchers and data analysts too often resort to binary decisions (e.g., whether to reject or accept the null hypothesis) in settings where this may not be required. PMID:26881954
Difference to Inference: teaching logical and statistical reasoning through on-line interactivity.
Malloy, T E
2001-05-01
Difference to Inference is an on-line JAVA program that simulates theory testing and falsification through research design and data collection in a game format. The program, based on cognitive and epistemological principles, is designed to support learning of the thinking skills underlying deductive and inductive logic and statistical reasoning. Difference to Inference has database connectivity so that game scores can be counted as part of course grades.
Statistical inference from capture data on closed animal populations
Otis, David L.; Burnham, Kenneth P.; White, Gary C.; Anderson, David R.
1978-01-01
The estimation of animal abundance is an important problem in both the theoretical and applied biological sciences. Serious work to develop estimation methods began during the 1950s, with a few attempts before that time. The literature on estimation methods has increased tremendously during the past 25 years (Cormack 1968, Seber 1973). However, in large part, the problem remains unsolved. Past efforts toward comprehensive and systematic estimation of density (D) or population size (N) have been inadequate, in general. While more than 200 papers have been published on the subject, one is generally left without a unified approach to the estimation of abundance of an animal population This situation is unfortunate because a number of pressing research problems require such information. In addition, a wide array of environmental assessment studies and biological inventory programs require the estimation of animal abundance. These needs have been further emphasized by the requirement for the preparation of Environmental Impact Statements imposed by the National Environmental Protection Act in 1970. This publication treats inference procedures for certain types of capture data on closed animal populations. This includes multiple capture-recapture studies (variously called capture-mark-recapture, mark-recapture, or tag-recapture studies) involving livetrapping techniques and removal studies involving kill traps or at least temporary removal of captured individuals during the study. Animals do not necessarily need to be physically trapped; visual sightings of marked animals and electrofishing studies also produce data suitable for the methods described in this monograph. To provide a frame of reference for what follows, we give an exampled of a capture-recapture experiment to estimate population size of small animals using live traps. The general field experiment is similar for all capture-recapture studies (a removal study is, of course, slightly different). A typical
Statistical inference for nanopore sequencing with a biased random walk model.
Emmett, Kevin J; Rosenstein, Jacob K; van de Meent, Jan-Willem; Shepard, Ken L; Wiggins, Chris H
2015-04-21
Nanopore sequencing promises long read-lengths and single-molecule resolution, but the stochastic motion of the DNA molecule inside the pore is, as of this writing, a barrier to high accuracy reads. We develop a method of statistical inference that explicitly accounts for this error, and demonstrate that high accuracy (>99%) sequence inference is feasible even under highly diffusive motion by using a hidden Markov model to jointly analyze multiple stochastic reads. Using this model, we place bounds on achievable inference accuracy under a range of experimental parameters.
Social Inferences from Faces: Ambient Images Generate a Three-Dimensional Model
ERIC Educational Resources Information Center
Sutherland, Clare A. M.; Oldmeadow, Julian A.; Santos, Isabel M.; Towler, John; Burt, D. Michael; Young, Andrew W.
2013-01-01
Three experiments are presented that investigate the two-dimensional valence/trustworthiness by dominance model of social inferences from faces (Oosterhof & Todorov, 2008). Experiment 1 used image averaging and morphing techniques to demonstrate that consistent facial cues subserve a range of social inferences, even in a highly variable sample of…
Schumacher, Johannes; Wunderle, Thomas; Fries, Pascal; Jäkel, Frank; Pipa, Gordon
2015-08-01
In neuroscience, data are typically generated from neural network activity. The resulting time series represent measurements from spatially distributed subsystems with complex interactions, weakly coupled to a high-dimensional global system. We present a statistical framework to estimate the direction of information flow and its delay in measurements from systems of this type. Informed by differential topology, gaussian process regression is employed to reconstruct measurements of putative driving systems from measurements of the driven systems. These reconstructions serve to estimate the delay of the interaction by means of an analytical criterion developed for this purpose. The model accounts for a range of possible sources of uncertainty, including temporally evolving intrinsic noise, while assuming complex nonlinear dependencies. Furthermore, we show that if information flow is delayed, this approach also allows for inference in strong coupling scenarios of systems exhibiting synchronization phenomena. The validity of the method is demonstrated with a variety of delay-coupled chaotic oscillators. In addition, we show that these results seamlessly transfer to local field potentials in cat visual cortex.
Ganju, Jitendra; Yu, Xinxin; Ma, Guoguang Julie
2013-01-01
Formal inference in randomized clinical trials is based on controlling the type I error rate associated with a single pre-specified statistic. The deficiency of using just one method of analysis is that it depends on assumptions that may not be met. For robust inference, we propose pre-specifying multiple test statistics and relying on the minimum p-value for testing the null hypothesis of no treatment effect. The null hypothesis associated with the various test statistics is that the treatment groups are indistinguishable. The critical value for hypothesis testing comes from permutation distributions. Rejection of the null hypothesis when the smallest p-value is less than the critical value controls the type I error rate at its designated value. Even if one of the candidate test statistics has low power, the adverse effect on the power of the minimum p-value statistic is not much. Its use is illustrated with examples. We conclude that it is better to rely on the minimum p-value rather than a single statistic particularly when that single statistic is the logrank test, because of the cost and complexity of many survival trials.
Assessing colour-dependent occupation statistics inferred from galaxy group catalogues
NASA Astrophysics Data System (ADS)
Campbell, Duncan; van den Bosch, Frank C.; Hearin, Andrew; Padmanabhan, Nikhil; Berlind, Andreas; Mo, H. J.; Tinker, Jeremy; Yang, Xiaohu
2015-09-01
We investigate the ability of current implementations of galaxy group finders to recover colour-dependent halo occupation statistics. To test the fidelity of group catalogue inferred statistics, we run three different group finders used in the literature over a mock that includes galaxy colours in a realistic manner. Overall, the resulting mock group catalogues are remarkably similar, and most colour-dependent statistics are recovered with reasonable accuracy. However, it is also clear that certain systematic errors arise as a consequence of correlated errors in group membership determination, central/satellite designation, and halo mass assignment. We introduce a new statistic, the halo transition probability (HTP), which captures the combined impact of all these errors. As a rule of thumb, errors tend to equalize the properties of distinct galaxy populations (i.e. red versus blue galaxies or centrals versus satellites), and to result in inferred occupation statistics that are more accurate for red galaxies than for blue galaxies. A statistic that is particularly poorly recovered from the group catalogues is the red fraction of central galaxies as a function of halo mass. Group finders do a good job in recovering galactic conformity, but also have a tendency to introduce weak conformity when none is present. We conclude that proper inference of colour-dependent statistics from group catalogues is best achieved using forward modelling (i.e. running group finders over mock data) or by implementing a correction scheme based on the HTP, as long as the latter is not too strongly model dependent.
Protein and gene model inference based on statistical modeling in k-partite graphs.
Gerster, Sarah; Qeli, Ermir; Ahrens, Christian H; Bühlmann, Peter
2010-07-06
One of the major goals of proteomics is the comprehensive and accurate description of a proteome. Shotgun proteomics, the method of choice for the analysis of complex protein mixtures, requires that experimentally observed peptides are mapped back to the proteins they were derived from. This process is also known as protein inference. We present Markovian Inference of Proteins and Gene Models (MIPGEM), a statistical model based on clearly stated assumptions to address the problem of protein and gene model inference for shotgun proteomics data. In particular, we are dealing with dependencies among peptides and proteins using a Markovian assumption on k-partite graphs. We are also addressing the problems of shared peptides and ambiguous proteins by scoring the encoding gene models. Empirical results on two control datasets with synthetic mixtures of proteins and on complex protein samples of Saccharomyces cerevisiae, Drosophila melanogaster, and Arabidopsis thaliana suggest that the results with MIPGEM are competitive with existing tools for protein inference.
Bayesian Inference of High-Dimensional Dynamical Ocean Models
NASA Astrophysics Data System (ADS)
Lin, J.; Lermusiaux, P. F. J.; Lolla, S. V. T.; Gupta, A.; Haley, P. J., Jr.
2015-12-01
This presentation addresses a holistic set of challenges in high-dimension ocean Bayesian nonlinear estimation: i) predict the probability distribution functions (pdfs) of large nonlinear dynamical systems using stochastic partial differential equations (PDEs); ii) assimilate data using Bayes' law with these pdfs; iii) predict the future data that optimally reduce uncertainties; and (iv) rank the known and learn the new model formulations themselves. Overall, we allow the joint inference of the state, equations, geometry, boundary conditions and initial conditions of dynamical models. Examples are provided for time-dependent fluid and ocean flows, including cavity, double-gyre and Strait flows with jets and eddies. The Bayesian model inference, based on limited observations, is illustrated first by the estimation of obstacle shapes and positions in fluid flows. Next, the Bayesian inference of biogeochemical reaction equations and of their states and parameters is presented, illustrating how PDE-based machine learning can rigorously guide the selection and discovery of complex ecosystem models. Finally, the inference of multiscale bottom gravity current dynamics is illustrated, motivated in part by classic overflows and dense water formation sites and their relevance to climate monitoring and dynamics. This is joint work with our MSEAS group at MIT.
Young Children's Use of Statistical Sampling Evidence to Infer the Subjectivity of Preferences
ERIC Educational Resources Information Center
Ma, Lili; Xu, Fei
2011-01-01
A crucial task in social interaction involves understanding subjective mental states. Here we report two experiments with toddlers exploring whether they can use statistical evidence to infer the subjective nature of preferences. We found that 2-year-olds were likely to interpret another person's nonrandom sampling behavior as a cue for a…
NASA Astrophysics Data System (ADS)
Bakker, Arthur; Ben-Zvi, Dani; Makar, Katie
2017-01-01
To understand how statistical and other types of reasoning are coordinated with actions to reduce uncertainty, we conducted a case study in vocational education that involved statistical hypothesis testing. We analyzed an intern's research project in a hospital laboratory in which reducing uncertainties was crucial to make a valid statistical inference. In his project, the intern, Sam, investigated whether patients' blood could be sent through pneumatic post without influencing the measurement of particular blood components. We asked, in the process of making a statistical inference, how are reasons and actions coordinated to reduce uncertainty? For the analysis, we used the semantic theory of inferentialism, specifically, the concept of webs of reasons and actions—complexes of interconnected reasons for facts and actions; these reasons include premises and conclusions, inferential relations, implications, motives for action, and utility of tools for specific purposes in a particular context. Analysis of interviews with Sam, his supervisor and teacher as well as video data of Sam in the classroom showed that many of Sam's actions aimed to reduce variability, rule out errors, and thus reduce uncertainties so as to arrive at a valid inference. Interestingly, the decisive factor was not the outcome of a t test but of the reference change value, a clinical chemical measure of analytic and biological variability. With insights from this case study, we expect that students can be better supported in connecting statistics with context and in dealing with uncertainty.
Elucidating the foundations of statistical inference with 2 x 2 tables.
Choi, Leena; Blume, Jeffrey D; Dupont, William D
2015-01-01
To many, the foundations of statistical inference are cryptic and irrelevant to routine statistical practice. The analysis of 2 x 2 contingency tables, omnipresent in the scientific literature, is a case in point. Fisher's exact test is routinely used even though it has been fraught with controversy for over 70 years. The problem, not widely acknowledged, is that several different p-values can be associated with a single table, making scientific inference inconsistent. The root cause of this controversy lies in the table's origins and the manner in which nuisance parameters are eliminated. However, fundamental statistical principles (e.g., sufficiency, ancillarity, conditionality, and likelihood) can shed light on the controversy and guide our approach in using this test. In this paper, we use these fundamental principles to show how much information is lost when the tables origins are ignored and when various approaches are used to eliminate unknown nuisance parameters. We present novel likelihood contours to aid in the visualization of information loss and show that the information loss is often virtually non-existent. We find that problems arising from the discreteness of the sample space are exacerbated by p-value-based inference. Accordingly, methods that are less sensitive to this discreteness - likelihood ratios, posterior probabilities and mid-p-values - lead to more consistent inferences.
Elucidating the Foundations of Statistical Inference with 2 x 2 Tables
Choi, Leena; Blume, Jeffrey D.; Dupont, William D.
2015-01-01
To many, the foundations of statistical inference are cryptic and irrelevant to routine statistical practice. The analysis of 2 x 2 contingency tables, omnipresent in the scientific literature, is a case in point. Fisher's exact test is routinely used even though it has been fraught with controversy for over 70 years. The problem, not widely acknowledged, is that several different p-values can be associated with a single table, making scientific inference inconsistent. The root cause of this controversy lies in the table's origins and the manner in which nuisance parameters are eliminated. However, fundamental statistical principles (e.g., sufficiency, ancillarity, conditionality, and likelihood) can shed light on the controversy and guide our approach in using this test. In this paper, we use these fundamental principles to show how much information is lost when the tables origins are ignored and when various approaches are used to eliminate unknown nuisance parameters. We present novel likelihood contours to aid in the visualization of information loss and show that the information loss is often virtually non-existent. We find that problems arising from the discreteness of the sample space are exacerbated by p-value-based inference. Accordingly, methods that are less sensitive to this discreteness - likelihood ratios, posterior probabilities and mid-p-values - lead to more consistent inferences. PMID:25849515
Inferring the connectivity of coupled oscillators from time-series statistical similarity analysis
Tirabassi, Giulio; Sevilla-Escoboza, Ricardo; Buldú, Javier M.; Masoller, Cristina
2015-01-01
A system composed by interacting dynamical elements can be represented by a network, where the nodes represent the elements that constitute the system, and the links account for their interactions, which arise due to a variety of mechanisms, and which are often unknown. A popular method for inferring the system connectivity (i.e., the set of links among pairs of nodes) is by performing a statistical similarity analysis of the time-series collected from the dynamics of the nodes. Here, by considering two systems of coupled oscillators (Kuramoto phase oscillators and Rössler chaotic electronic oscillators) with known and controllable coupling conditions, we aim at testing the performance of this inference method, by using linear and non linear statistical similarity measures. We find that, under adequate conditions, the network links can be perfectly inferred, i.e., no mistakes are made regarding the presence or absence of links. These conditions for perfect inference require: i) an appropriated choice of the observed variable to be analysed, ii) an appropriated interaction strength, and iii) an adequate thresholding of the similarity matrix. For the dynamical units considered here we find that the linear statistical similarity measure performs, in general, better than the non-linear ones. PMID:26042395
Emmert-Streib, Frank; Dehmer, Matthias; Haibe-Kains, Benjamin
2014-01-01
In this paper, we shed light on approaches that are currently used to infer networks from gene expression data with respect to their biological meaning. As we will show, the biological interpretation of these networks depends on the chosen theoretical perspective. For this reason, we distinguish a statistical perspective from a mathematical modeling perspective and elaborate their differences and implications. Our results indicate the imperative need for a genomic network ontology in order to avoid increasing confusion about the biological interpretation of inferred networks, which can be even enhanced by approaches that integrate multiple data sets, respectively, data types.
High-Dimensional Statistical Learning: Roots, Justifications, and Potential Machineries
Zollanvari, Amin
2015-01-01
High-dimensional data generally refer to data in which the number of variables is larger than the sample size. Analyzing such datasets poses great challenges for classical statistical learning because the finite-sample performance of methods developed within classical statistical learning does not live up to classical asymptotic premises in which the sample size unboundedly grows for a fixed dimensionality of observations. Much work has been done in developing mathematical–statistical techniques for analyzing high-dimensional data. Despite remarkable progress in this field, many practitioners still utilize classical methods for analyzing such datasets. This state of affairs can be attributed, in part, to a lack of knowledge and, in part, to the ready-to-use computational and statistical software packages that are well developed for classical techniques. Moreover, many scientists working in a specific field of high-dimensional statistical learning are either not aware of other existing machineries in the field or are not willing to try them out. The primary goal in this work is to bring together various machineries of high-dimensional analysis, give an overview of the important results, and present the operating conditions upon which they are grounded. When appropriate, readers are referred to relevant review articles for more information on a specific subject. PMID:27081307
ERIC Educational Resources Information Center
Sotos, Ana Elisa Castro; Vanhoof, Stijn; Van den Noortgate, Wim; Onghena, Patrick
2007-01-01
A solid understanding of "inferential statistics" is of major importance for designing and interpreting empirical results in any scientific discipline. However, students are prone to many misconceptions regarding this topic. This article structurally summarizes and describes these misconceptions by presenting a systematic review of publications…
Statistical inference of the generation probability of T-cell receptors from sequence repertoires.
Murugan, Anand; Mora, Thierry; Walczak, Aleksandra M; Callan, Curtis G
2012-10-02
Stochastic rearrangement of germline V-, D-, and J-genes to create variable coding sequence for certain cell surface receptors is at the origin of immune system diversity. This process, known as "VDJ recombination", is implemented via a series of stochastic molecular events involving gene choices and random nucleotide insertions between, and deletions from, genes. We use large sequence repertoires of the variable CDR3 region of human CD4+ T-cell receptor beta chains to infer the statistical properties of these basic biochemical events. Because any given CDR3 sequence can be produced in multiple ways, the probability distribution of hidden recombination events cannot be inferred directly from the observed sequences; we therefore develop a maximum likelihood inference method to achieve this end. To separate the properties of the molecular rearrangement mechanism from the effects of selection, we focus on nonproductive CDR3 sequences in T-cell DNA. We infer the joint distribution of the various generative events that occur when a new T-cell receptor gene is created. We find a rich picture of correlation (and absence thereof), providing insight into the molecular mechanisms involved. The generative event statistics are consistent between individuals, suggesting a universal biochemical process. Our probabilistic model predicts the generation probability of any specific CDR3 sequence by the primitive recombination process, allowing us to quantify the potential diversity of the T-cell repertoire and to understand why some sequences are shared between individuals. We argue that the use of formal statistical inference methods, of the kind presented in this paper, will be essential for quantitative understanding of the generation and evolution of diversity in the adaptive immune system.
Evaluation of statistical inference on empirical resting state fMRI.
Yang, Xue; Kang, Hakmook; Newton, Allen T; Landman, Bennett A
2014-04-01
Modern statistical inference techniques may be able to improve the sensitivity and specificity of resting state functional magnetic resonance imaging (rs-fMRI) connectivity analysis through more realistic assumptions. In simulation, the advantages of such methods are readily demonstrable. However, quantitative empirical validation remains elusive in vivo as the true connectivity patterns are unknown and noise distributions are challenging to characterize, especially in ultra-high field (e.g., 7T fMRI). Though the physiological characteristics of the fMRI signal are difficult to replicate in controlled phantom studies, it is critical that the performance of statistical techniques be evaluated. The SIMulation EXtrapolation (SIMEX) method has enabled estimation of bias with asymptotically consistent estimators on empirical finite sample data by adding simulated noise . To avoid the requirement of accurate estimation of noise structure, the proposed quantitative evaluation approach leverages the theoretical core of SIMEX to study the properties of inference methods in the face of diminishing data (in contrast to increasing noise). The performance of ordinary and robust inference methods in simulation and empirical rs-fMRI are compared using the proposed quantitative evaluation approach. This study provides a simple, but powerful method for comparing a proxy for inference accuracy using empirical data.
Research participant compensation: A matter of statistical inference as well as ethics.
Swanson, David M; Betensky, Rebecca A
2015-11-01
The ethics of compensation of research subjects for participation in clinical trials has been debated for years. One ethical issue of concern is variation among subjects in the level of compensation for identical treatments. Surprisingly, the impact of variation on the statistical inferences made from trial results has not been examined. We seek to identify how variation in compensation may influence any existing dependent censoring in clinical trials, thereby also influencing inference about the survival curve, hazard ratio, or other measures of treatment efficacy. In simulation studies, we consider a model for how compensation structure may influence the censoring model. Under existing dependent censoring, we estimate survival curves under different compensation structures and observe how these structures induce variability in the estimates. We show through this model that if the compensation structure affects the censoring model and dependent censoring is present, then variation in that structure induces variation in the estimates and affects the accuracy of estimation and inference on treatment efficacy. From the perspectives of both ethics and statistical inference, standardization and transparency in the compensation of participants in clinical trials are warranted.
Zhu, Hongjian
2016-12-12
Seamless phase II/III clinical trials have attracted increasing attention recently. They mainly use Bayesian response adaptive randomization (RAR) designs. There has been little research into seamless clinical trials using frequentist RAR designs because of the difficulty in performing valid statistical inference following this procedure. The well-designed frequentist RAR designs can target theoretically optimal allocation proportions, and they have explicit asymptotic results. In this paper, we study the asymptotic properties of frequentist RAR designs with adjusted target allocation proportions, and investigate statistical inference for this procedure. The properties of the proposed design provide an important theoretical foundation for advanced seamless clinical trials. Our numerical studies demonstrate that the design is ethical and efficient.
Emura, Takeshi; Konno, Yoshihiko; Michimae, Hirofumi
2015-07-01
Doubly truncated data consist of samples whose observed values fall between the right- and left- truncation limits. With such samples, the distribution function of interest is estimated using the nonparametric maximum likelihood estimator (NPMLE) that is obtained through a self-consistency algorithm. Owing to the complicated asymptotic distribution of the NPMLE, the bootstrap method has been suggested for statistical inference. This paper proposes a closed-form estimator for the asymptotic covariance function of the NPMLE, which is computationally attractive alternative to bootstrapping. Furthermore, we develop various statistical inference procedures, such as confidence interval, goodness-of-fit tests, and confidence bands to demonstrate the usefulness of the proposed covariance estimator. Simulations are performed to compare the proposed method with both the bootstrap and jackknife methods. The methods are illustrated using the childhood cancer dataset.
PyClone: statistical inference of clonal population structure in cancer.
Roth, Andrew; Khattra, Jaswinder; Yap, Damian; Wan, Adrian; Laks, Emma; Biele, Justina; Ha, Gavin; Aparicio, Samuel; Bouchard-Côté, Alexandre; Shah, Sohrab P
2014-04-01
We introduce PyClone, a statistical model for inference of clonal population structures in cancers. PyClone is a Bayesian clustering method for grouping sets of deeply sequenced somatic mutations into putative clonal clusters while estimating their cellular prevalences and accounting for allelic imbalances introduced by segmental copy-number changes and normal-cell contamination. Single-cell sequencing validation demonstrates PyClone's accuracy.
NASA Astrophysics Data System (ADS)
Cocco, Simona; Monasson, Rémi; Weigt, Martin
2013-12-01
We consider the Hopfield-Potts model for the covariation between residues in protein families recently introduced in Cocco, Monasson, Weigt (2013). The patterns of the model are inferred from the data within a new gauge, more symmetric in the residues. We compute the statistical error bars on the pattern components. Results are illustrated on real data for a response regulator receiver domain (Pfam ID PF00072) family.
A variance components model for statistical inference on functional connectivity networks.
Fiecas, Mark; Cribben, Ivor; Bahktiari, Reyhaneh; Cummine, Jacqueline
2017-01-24
We propose a variance components linear modeling framework to conduct statistical inference on functional connectivity networks that directly accounts for the temporal autocorrelation inherent in functional magnetic resonance imaging (fMRI) time series data and for the heterogeneity across subjects in the study. The novel method estimates the autocorrelation structure in a nonparametric and subject-specific manner, and estimates the variance due to the heterogeneity using iterative least squares. We apply the new model to a resting-state fMRI study to compare the functional connectivity networks in both typical and reading impaired young adults in order to characterize the resting state networks that are related to reading processes. We also compare the performance of our model to other methods of statistical inference on functional connectivity networks that do not account for the temporal autocorrelation or heterogeneity across the subjects using simulated data, and show that by accounting for these sources of variation and covariation results in more powerful tests for statistical inference.
Yang, Lan-Yan; Chi, Yunchan; Chow, Shein-Chung
2011-05-01
In clinical research, it is not uncommon to modify a trial procedure and/or statistical methods of ongoing clinical trials through protocol amendments. A major modification to the study protocol could result in a shift in target patient population. In addition, frequent and significant modifications could lead to a totally different study that is unable to address the medical questions that the original study intended to answer. In this article, we propose a logistic regression model for statistical inference based on a binary study endpoint for trials with protocol amendments. Under the proposed method, sample size adjustment is also derived.
Trans-dimensional Bayesian inference for gravitational lens substructures
NASA Astrophysics Data System (ADS)
Brewer, Brendon J.; Huijser, David; Lewis, Geraint F.
2016-01-01
We introduce a Bayesian solution to the problem of inferring the density profile of strong gravitational lenses when the lens galaxy may contain multiple dark or faint substructures. The source and lens models are based on a superposition of an unknown number of non-negative basis functions (or `blobs') whose form was chosen with speed as a primary criterion. The prior distribution for the blobs' properties is specified hierarchically, so the mass function of substructures is a natural output of the method. We use reversible jump Markov Chain Monte Carlo within Diffusive Nested Sampling to sample the posterior distribution and evaluate the marginal likelihood of the model, including the summation over the unknown number of blobs in the source and the lens. We demonstrate the method on two simulated data sets: one with a single substructure, and the other with 10. We also apply the method to the g-band image of the `Cosmic Horseshoe' system, and find evidence for more than zero substructures. However, these have large spatial extent and probably only point to misspecifications in the model (such as the shape of the smooth lens component or the point-spread function), which are difficult to guard against in full generality.
Inference in infinite-dimensional inverse problems - Discretization and duality
NASA Technical Reports Server (NTRS)
Stark, Philip B.
1992-01-01
Many techniques for solving inverse problems involve approximating the unknown model, a function, by a finite-dimensional 'discretization' or parametric representation. The uncertainty in the computed solution is sometimes taken to be the uncertainty within the parametrization; this can result in unwarranted confidence. The theory of conjugate duality can overcome the limitations of discretization within the 'strict bounds' formalism, a technique for constructing confidence intervals for functionals of the unknown model incorporating certain types of prior information. The usual computational approach to strict bounds approximates the 'primal' problem in a way that the resulting confidence intervals are at most long enough to have the nominal coverage probability. There is another approach based on 'dual' optimization problems that gives confidence intervals with at least the nominal coverage probability. The pair of intervals derived by the two approaches bracket a correct confidence interval. The theory is illustrated with gravimetric, seismic, geomagnetic, and helioseismic problems and a numerical example in seismology.
Graffelman, Jan; Sánchez, Milagros; Cook, Samantha; Moreno, Victor
2013-01-01
In genetic association studies, tests for Hardy-Weinberg proportions are often employed as a quality control checking procedure. Missing genotypes are typically discarded prior to testing. In this paper we show that inference for Hardy-Weinberg proportions can be biased when missing values are discarded. We propose to use multiple imputation of missing values in order to improve inference for Hardy-Weinberg proportions. For imputation we employ a multinomial logit model that uses information from allele intensities and/or neighbouring markers. Analysis of an empirical data set of single nucleotide polymorphisms possibly related to colon cancer reveals that missing genotypes are not missing completely at random. Deviation from Hardy-Weinberg proportions is mostly due to a lack of heterozygotes. Inbreeding coefficients estimated by multiple imputation of the missings are typically lowered with respect to inbreeding coefficients estimated by discarding the missings. Accounting for missings by multiple imputation qualitatively changed the results of 10 to 17% of the statistical tests performed. Estimates of inbreeding coefficients obtained by multiple imputation showed high correlation with estimates obtained by single imputation using an external reference panel. Our conclusion is that imputation of missing data leads to improved statistical inference for Hardy-Weinberg proportions.
Inference for High-dimensional Differential Correlation Matrices *
Cai, T. Tony; Zhang, Anru
2015-01-01
Motivated by differential co-expression analysis in genomics, we consider in this paper estimation and testing of high-dimensional differential correlation matrices. An adaptive thresholding procedure is introduced and theoretical guarantees are given. Minimax rate of convergence is established and the proposed estimator is shown to be adaptively rate-optimal over collections of paired correlation matrices with approximately sparse differences. Simulation results show that the procedure significantly outperforms two other natural methods that are based on separate estimation of the individual correlation matrices. The procedure is also illustrated through an analysis of a breast cancer dataset, which provides evidence at the gene co-expression level that several genes, of which a subset has been previously verified, are associated with the breast cancer. Hypothesis testing on the differential correlation matrices is also considered. A test, which is particularly well suited for testing against sparse alternatives, is introduced. In addition, other related problems, including estimation of a single sparse correlation matrix, estimation of the differential covariance matrices, and estimation of the differential cross-correlation matrices, are also discussed. PMID:26500380
Inference for High-dimensional Differential Correlation Matrices.
Cai, T Tony; Zhang, Anru
2016-01-01
Motivated by differential co-expression analysis in genomics, we consider in this paper estimation and testing of high-dimensional differential correlation matrices. An adaptive thresholding procedure is introduced and theoretical guarantees are given. Minimax rate of convergence is established and the proposed estimator is shown to be adaptively rate-optimal over collections of paired correlation matrices with approximately sparse differences. Simulation results show that the procedure significantly outperforms two other natural methods that are based on separate estimation of the individual correlation matrices. The procedure is also illustrated through an analysis of a breast cancer dataset, which provides evidence at the gene co-expression level that several genes, of which a subset has been previously verified, are associated with the breast cancer. Hypothesis testing on the differential correlation matrices is also considered. A test, which is particularly well suited for testing against sparse alternatives, is introduced. In addition, other related problems, including estimation of a single sparse correlation matrix, estimation of the differential covariance matrices, and estimation of the differential cross-correlation matrices, are also discussed.
Statistical entropy of charged two-dimensional black holes
NASA Astrophysics Data System (ADS)
Teo, Edward
1998-06-01
The statistical entropy of a five-dimensional black hole in Type II string theory was recently derived by showing that it is U-dual to the three-dimensional Bañados-Teitelboim-Zanelli black hole, and using Carlip's method to count the microstates of the latter. This is valid even for the non-extremal case, unlike the derivation which relies on D-brane techniques. In this letter, I shall exploit the U-duality that exists between the five-dimensional black hole and the two-dimensional charged black hole of McGuigan, Nappi and Yost, to microscopically compute the entropy of the latter. It is shown that this result agrees with previous calculations using thermodynamic arguments.
Hupé, Jean-Michel
2015-01-01
Published studies using functional and structural MRI include many errors in the way data are analyzed and conclusions reported. This was observed when working on a comprehensive review of the neural bases of synesthesia, but these errors are probably endemic to neuroimaging studies. All studies reviewed had based their conclusions using Null Hypothesis Significance Tests (NHST). NHST have yet been criticized since their inception because they are more appropriate for taking decisions related to a Null hypothesis (like in manufacturing) than for making inferences about behavioral and neuronal processes. Here I focus on a few key problems of NHST related to brain imaging techniques, and explain why or when we should not rely on “significance” tests. I also observed that, often, the ill-posed logic of NHST was even not correctly applied, and describe what I identified as common mistakes or at least problematic practices in published papers, in light of what could be considered as the very basics of statistical inference. MRI statistics also involve much more complex issues than standard statistical inference. Analysis pipelines vary a lot between studies, even for those using the same software, and there is no consensus which pipeline is the best. I propose a synthetic view of the logic behind the possible methodological choices, and warn against the usage and interpretation of two statistical methods popular in brain imaging studies, the false discovery rate (FDR) procedure and permutation tests. I suggest that current models for the analysis of brain imaging data suffer from serious limitations and call for a revision taking into account the “new statistics” (confidence intervals) logic. PMID:25745383
Local dependence in random graph models: characterization, properties and statistical inference.
Schweinberger, Michael; Handcock, Mark S
2015-06-01
Dependent phenomena, such as relational, spatial and temporal phenomena, tend to be characterized by local dependence in the sense that units which are close in a well-defined sense are dependent. In contrast with spatial and temporal phenomena, though, relational phenomena tend to lack a natural neighbourhood structure in the sense that it is unknown which units are close and thus dependent. Owing to the challenge of characterizing local dependence and constructing random graph models with local dependence, many conventional exponential family random graph models induce strong dependence and are not amenable to statistical inference. We take first steps to characterize local dependence in random graph models, inspired by the notion of finite neighbourhoods in spatial statistics and M-dependence in time series, and we show that local dependence endows random graph models with desirable properties which make them amenable to statistical inference. We show that random graph models with local dependence satisfy a natural domain consistency condition which every model should satisfy, but conventional exponential family random graph models do not satisfy. In addition, we establish a central limit theorem for random graph models with local dependence, which suggests that random graph models with local dependence are amenable to statistical inference. We discuss how random graph models with local dependence can be constructed by exploiting either observed or unobserved neighbourhood structure. In the absence of observed neighbourhood structure, we take a Bayesian view and express the uncertainty about the neighbourhood structure by specifying a prior on a set of suitable neighbourhood structures. We present simulation results and applications to two real world networks with 'ground truth'.
Local dependence in random graph models: characterization, properties and statistical inference
Schweinberger, Michael; Handcock, Mark S.
2015-01-01
Summary Dependent phenomena, such as relational, spatial and temporal phenomena, tend to be characterized by local dependence in the sense that units which are close in a well-defined sense are dependent. In contrast with spatial and temporal phenomena, though, relational phenomena tend to lack a natural neighbourhood structure in the sense that it is unknown which units are close and thus dependent. Owing to the challenge of characterizing local dependence and constructing random graph models with local dependence, many conventional exponential family random graph models induce strong dependence and are not amenable to statistical inference. We take first steps to characterize local dependence in random graph models, inspired by the notion of finite neighbourhoods in spatial statistics and M-dependence in time series, and we show that local dependence endows random graph models with desirable properties which make them amenable to statistical inference. We show that random graph models with local dependence satisfy a natural domain consistency condition which every model should satisfy, but conventional exponential family random graph models do not satisfy. In addition, we establish a central limit theorem for random graph models with local dependence, which suggests that random graph models with local dependence are amenable to statistical inference. We discuss how random graph models with local dependence can be constructed by exploiting either observed or unobserved neighbourhood structure. In the absence of observed neighbourhood structure, we take a Bayesian view and express the uncertainty about the neighbourhood structure by specifying a prior on a set of suitable neighbourhood structures. We present simulation results and applications to two real world networks with ‘ground truth’. PMID:26560142
A Comprehensive Statistical Model for Cell Signaling and Protein Activity Inference
Yörük, Erdem; Ochs, Michael F.; Geman, Donald; Younes, Laurent
2010-01-01
Protein signaling networks play a central role in transcriptional regulation and the etiology of many diseases. Statistical methods, particularly Bayesian networks, have been widely used to model cell signaling, mostly for model organisms and with focus on uncovering connectivity rather than inferring aberrations. Extensions to mammalian systems have not yielded compelling results, due likely to greatly increased complexity and limited proteomic measurements in vivo. In this study, we propose a comprehensive statistical model that is anchored to a predefined core topology, has a limited complexity due to parameter sharing and uses micorarray data of mRNA transcripts as the only observable components of signaling. Specifically, we account for cell heterogeneity and a multi-level process, representing signaling as a Bayesian network at the cell level, modeling measurements as ensemble averages at the tissue level and incorporating patient-to-patient differences at the population level. Motivated by the goal of identifying individual protein abnormalities as potential therapeutical targets, we applied our method to the RAS-RAF network using a breast cancer study with 118 patients. We demonstrated rigorous statistical inference, established reproducibility through simulations and the ability to recover receptor status from available microarray data. PMID:20855924
Schmidt, Paul; Schmid, Volker J; Gaser, Christian; Buck, Dorothea; Bührlen, Susanne; Förschler, Annette; Mühlau, Mark
2013-01-01
Aiming at iron-related T2-hypointensity, which is related to normal aging and neurodegenerative processes, we here present two practicable approaches, based on Bayesian inference, for preprocessing and statistical analysis of a complex set of structural MRI data. In particular, Markov Chain Monte Carlo methods were used to simulate posterior distributions. First, we rendered a segmentation algorithm that uses outlier detection based on model checking techniques within a Bayesian mixture model. Second, we rendered an analytical tool comprising a Bayesian regression model with smoothness priors (in the form of Gaussian Markov random fields) mitigating the necessity to smooth data prior to statistical analysis. For validation, we used simulated data and MRI data of 27 healthy controls (age: [Formula: see text]; range, [Formula: see text]). We first observed robust segmentation of both simulated T2-hypointensities and gray-matter regions known to be T2-hypointense. Second, simulated data and images of segmented T2-hypointensity were analyzed. We found not only robust identification of simulated effects but also a biologically plausible age-related increase of T2-hypointensity primarily within the dentate nucleus but also within the globus pallidus, substantia nigra, and red nucleus. Our results indicate that fully Bayesian inference can successfully be applied for preprocessing and statistical analysis of structural MRI data.
Three enhancements to the inference of statistical protein-DNA potentials.
AlQuraishi, Mohammed; McAdams, Harley H
2013-03-01
The energetics of protein-DNA interactions are often modeled using so-called statistical potentials, that is, energy models derived from the atomic structures of protein-DNA complexes. Many statistical protein-DNA potentials based on differing theoretical assumptions have been investigated, but little attention has been paid to the types of data and the parameter estimation process used in deriving the statistical potentials. We describe three enhancements to statistical potential inference that significantly improve the accuracy of predicted protein-DNA interactions: (i) incorporation of binding energy data of protein-DNA complexes, in conjunction with their X-ray crystal structures, (ii) use of spatially-aware parameter fitting, and (iii) use of ensemble-based parameter fitting. We apply these enhancements to three widely-used statistical potentials and use the resulting enhanced potentials in a structure-based prediction of the DNA binding sites of proteins. These enhancements are directly applicable to all statistical potentials used in protein-DNA modeling, and we show that they can improve the accuracy of predicted DNA binding sites by up to 21%.
A statistical model for brain networks inferred from large-scale electrophysiological signals.
Obando, Catalina; De Vico Fallani, Fabrizio
2017-03-01
Network science has been extensively developed to characterize the structural properties of complex systems, including brain networks inferred from neuroimaging data. As a result of the inference process, networks estimated from experimentally obtained biological data represent one instance of a larger number of realizations with similar intrinsic topology. A modelling approach is therefore needed to support statistical inference on the bottom-up local connectivity mechanisms influencing the formation of the estimated brain networks. Here, we adopted a statistical model based on exponential random graph models (ERGMs) to reproduce brain networks, or connectomes, estimated by spectral coherence between high-density electroencephalographic (EEG) signals. ERGMs are made up by different local graph metrics, whereas the parameters weight the respective contribution in explaining the observed network. We validated this approach in a dataset of N = 108 healthy subjects during eyes-open (EO) and eyes-closed (EC) resting-state conditions. Results showed that the tendency to form triangles and stars, reflecting clustering and node centrality, better explained the global properties of the EEG connectomes than other combinations of graph metrics. In particular, the synthetic networks generated by this model configuration replicated the characteristic differences found in real brain networks, with EO eliciting significantly higher segregation in the alpha frequency band (8-13 Hz) than EC. Furthermore, the fitted ERGM parameter values provided complementary information showing that clustering connections are significantly more represented from EC to EO in the alpha range, but also in the beta band (14-29 Hz), which is known to play a crucial role in cortical processing of visual input and externally oriented attention. Taken together, these findings support the current view of the functional segregation and integration of the brain in terms of modules and hubs, and provide a
Chen, Zhe; Putrino, David F; Ghosh, Soumya; Barbieri, Riccardo; Brown, Emery N
2011-04-01
The ability to accurately infer functional connectivity between ensemble neurons using experimentally acquired spike train data is currently an important research objective in computational neuroscience. Point process generalized linear models and maximum likelihood estimation have been proposed as effective methods for the identification of spiking dependency between neurons. However, unfavorable experimental conditions occasionally results in insufficient data collection due to factors such as low neuronal firing rates or brief recording periods, and in these cases, the standard maximum likelihood estimate becomes unreliable. The present studies compares the performance of different statistical inference procedures when applied to the estimation of functional connectivity in neuronal assemblies with sparse spiking data. Four inference methods were compared: maximum likelihood estimation, penalized maximum likelihood estimation, using either l(2) or l(1) regularization, and hierarchical Bayesian estimation based on a variational Bayes algorithm. Algorithmic performances were compared using well-established goodness-of-fit measures in benchmark simulation studies, and the hierarchical Bayesian approach performed favorably when compared with the other algorithms, and this approach was then successfully applied to real spiking data recorded from the cat motor cortex. The identification of spiking dependencies in physiologically acquired data was encouraging, since their sparse nature would have previously precluded them from successful analysis using traditional methods.
Jones, Graham; Sagitov, Serik; Oxelman, Bengt
2013-05-01
Polyploidy is an important speciation mechanism, particularly in land plants. Allopolyploid species are formed after hybridization between otherwise intersterile parental species. Recent theoretical progress has led to successful implementation of species tree models that take population genetic parameters into account. However, these models have not included allopolyploid hybridization and the special problems imposed when species trees of allopolyploids are inferred. Here, 2 new models for the statistical inference of the evolutionary history of allopolyploids are evaluated using simulations and demonstrated on 2 empirical data sets. It is assumed that there has been a single hybridization event between 2 diploid species resulting in a genomic allotetraploid. The evolutionary history can be represented as a species network or as a multilabeled species tree, in which some pairs of tips are labeled with the same species. In one of the models (AlloppMUL), the multilabeled species tree is inferred directly. This is the simplest model and the most widely applicable, since fewer assumptions are made. The second model (AlloppNET) incorporates the hybridization event explicitly which means that fewer parameters need to be estimated. Both models are implemented in the BEAST framework. Simulations show that both models are useful and that AlloppNET is more accurate if the assumptions it is based on are valid. The models are demonstrated on previously analyzed data from the genera Pachycladon (Brassicaceae) and Silene (Caryophyllaceae).
n-dimensional Statistical Inverse Graphical Hydraulic Test Simulator
2012-09-12
nSIGHTS (n-dimensional Statistical Inverse Graphical Hydraulic Test Simulator) is a comprehensive well test analysis software package. It provides a user-interface, a well test analysis model and many tools to analyze both field and simulated data. The well test analysis model simulates a single-phase, one-dimensional, radial/non-radial flow regime, with a borehole at the center of the modeled flow system. nSIGHTS solves the radially symmetric n-dimensional forward flow problem using a solver based on a graph-theoretic approach. The results of the forward simulation are pressure, and flow rate, given all the input parameters. The parameter estimation portion of nSIGHTS uses a perturbation-based approach to interpret the best-fit well and reservoir parameters, given an observed dataset of pressure and flow rate.
Approximation of epidemic models by diffusion processes and their statistical inference.
Guy, Romain; Larédo, Catherine; Vergu, Elisabeta
2015-02-01
Multidimensional continuous-time Markov jump processes [Formula: see text] on [Formula: see text] form a usual set-up for modeling [Formula: see text]-like epidemics. However, when facing incomplete epidemic data, inference based on [Formula: see text] is not easy to be achieved. Here, we start building a new framework for the estimation of key parameters of epidemic models based on statistics of diffusion processes approximating [Formula: see text]. First, previous results on the approximation of density-dependent [Formula: see text]-like models by diffusion processes with small diffusion coefficient [Formula: see text], where [Formula: see text] is the population size, are generalized to non-autonomous systems. Second, our previous inference results on discretely observed diffusion processes with small diffusion coefficient are extended to time-dependent diffusions. Consistent and asymptotically Gaussian estimates are obtained for a fixed number [Formula: see text] of observations, which corresponds to the epidemic context, and for [Formula: see text]. A correction term, which yields better estimates non asymptotically, is also included. Finally, performances and robustness of our estimators with respect to various parameters such as [Formula: see text] (the basic reproduction number), [Formula: see text], [Formula: see text] are investigated on simulations. Two models, [Formula: see text] and [Formula: see text], corresponding to single and recurrent outbreaks, respectively, are used to simulate data. The findings indicate that our estimators have good asymptotic properties and behave noticeably well for realistic numbers of observations and population sizes. This study lays the foundations of a generic inference method currently under extension to incompletely observed epidemic data. Indeed, contrary to the majority of current inference techniques for partially observed processes, which necessitates computer intensive simulations, our method being mostly an
Jagiella, Nick; Rickert, Dennis; Theis, Fabian J; Hasenauer, Jan
2017-02-22
Mechanistic understanding of multi-scale biological processes, such as cell proliferation in a changing biological tissue, is readily facilitated by computational models. While tools exist to construct and simulate multi-scale models, the statistical inference of the unknown model parameters remains an open problem. Here, we present and benchmark a parallel approximate Bayesian computation sequential Monte Carlo (pABC SMC) algorithm, tailored for high-performance computing clusters. pABC SMC is fully automated and returns reliable parameter estimates and confidence intervals. By running the pABC SMC algorithm for ∼10(6) hr, we parameterize multi-scale models that accurately describe quantitative growth curves and histological data obtained in vivo from individual tumor spheroid growth in media droplets. The models capture the hybrid deterministic-stochastic behaviors of 10(5)-10(6) of cells growing in a 3D dynamically changing nutrient environment. The pABC SMC algorithm reliably converges to a consistent set of parameters. Our study demonstrates a proof of principle for robust, data-driven modeling of multi-scale biological systems and the feasibility of multi-scale model parameterization through statistical inference.
A statistical method for lung tumor segmentation uncertainty in PET images based on user inference.
Zheng, Chaojie; Wang, Xiuying; Feng, Dagan
2015-01-01
PET has been widely accepted as an effective imaging modality for lung tumor diagnosis and treatment. However, standard criteria for delineating tumor boundary from PET are yet to develop largely due to relatively low quality of PET images, uncertain tumor boundary definition, and variety of tumor characteristics. In this paper, we propose a statistical solution to segmentation uncertainty on the basis of user inference. We firstly define the uncertainty segmentation band on the basis of segmentation probability map constructed from Random Walks (RW) algorithm; and then based on the extracted features of the user inference, we use Principle Component Analysis (PCA) to formulate the statistical model for labeling the uncertainty band. We validated our method on 10 lung PET-CT phantom studies from the public RIDER collections [1] and 16 clinical PET studies where tumors were manually delineated by two experienced radiologists. The methods were validated using Dice similarity coefficient (DSC) to measure the spatial volume overlap. Our method achieved an average DSC of 0.878 ± 0.078 on phantom studies and 0.835 ± 0.039 on clinical studies.
Statistical inference of seabed sound-speed structure in the Gulf of Oman Basin.
Sagers, Jason D; Knobles, David P
2014-06-01
Addressed is the statistical inference of the sound-speed depth profile of a thick soft seabed from broadband sound propagation data recorded in the Gulf of Oman Basin in 1977. The acoustic data are in the form of time series signals recorded on a sparse vertical line array and generated by explosive sources deployed along a 280 km track. The acoustic data offer a unique opportunity to study a deep-water bottom-limited thickly sedimented environment because of the large number of time series measurements, very low seabed attenuation, and auxiliary measurements. A maximum entropy method is employed to obtain a conditional posterior probability distribution (PPD) for the sound-speed ratio and the near-surface sound-speed gradient. The multiple data samples allow for a determination of the average error constraint value required to uniquely specify the PPD for each data sample. Two complicating features of the statistical inference study are addressed: (1) the need to develop an error function that can both utilize the measured multipath arrival structure and mitigate the effects of data errors and (2) the effect of small bathymetric slopes on the structure of the bottom interacting arrivals.
Coulson, Melissa; Healey, Michelle; Fidler, Fiona; Cumming, Geoff
2010-01-01
A statistically significant result, and a non-significant result may differ little, although significance status may tempt an interpretation of difference. Two studies are reported that compared interpretation of such results presented using null hypothesis significance testing (NHST), or confidence intervals (CIs). Authors of articles published in psychology, behavioral neuroscience, and medical journals were asked, via email, to interpret two fictitious studies that found similar results, one statistically significant, and the other non-significant. Responses from 330 authors varied greatly, but interpretation was generally poor, whether results were presented as CIs or using NHST. However, when interpreting CIs respondents who mentioned NHST were 60% likely to conclude, unjustifiably, the two results conflicted, whereas those who interpreted CIs without reference to NHST were 95% likely to conclude, justifiably, the two results were consistent. Findings were generally similar for all three disciplines. An email survey of academic psychologists confirmed that CIs elicit better interpretations if NHST is not invoked. Improved statistical inference can result from encouragement of meta-analytic thinking and use of CIs but, for full benefit, such highly desirable statistical reform requires also that researchers interpret CIs without recourse to NHST.
Coulson, Melissa; Healey, Michelle; Fidler, Fiona; Cumming, Geoff
2010-01-01
A statistically significant result, and a non-significant result may differ little, although significance status may tempt an interpretation of difference. Two studies are reported that compared interpretation of such results presented using null hypothesis significance testing (NHST), or confidence intervals (CIs). Authors of articles published in psychology, behavioral neuroscience, and medical journals were asked, via email, to interpret two fictitious studies that found similar results, one statistically significant, and the other non-significant. Responses from 330 authors varied greatly, but interpretation was generally poor, whether results were presented as CIs or using NHST. However, when interpreting CIs respondents who mentioned NHST were 60% likely to conclude, unjustifiably, the two results conflicted, whereas those who interpreted CIs without reference to NHST were 95% likely to conclude, justifiably, the two results were consistent. Findings were generally similar for all three disciplines. An email survey of academic psychologists confirmed that CIs elicit better interpretations if NHST is not invoked. Improved statistical inference can result from encouragement of meta-analytic thinking and use of CIs but, for full benefit, such highly desirable statistical reform requires also that researchers interpret CIs without recourse to NHST. PMID:21607077
Inference of reaction rate parameters based on summary statistics from experiments
Khalil, Mohammad; Chowdhary, Kamaljit Singh; Safta, Cosmin; ...
2016-10-15
Here, we present the results of an application of Bayesian inference and maximum entropy methods for the estimation of the joint probability density for the Arrhenius rate para meters of the rate coefficient of the H2/O2-mechanism chain branching reaction H + O2 → OH + O. Available published data is in the form of summary statistics in terms of nominal values and error bars of the rate coefficient of this reaction at a number of temperature values obtained from shock-tube experiments. Our approach relies on generating data, in this case OH concentration profiles, consistent with the given summary statistics, usingmore » Approximate Bayesian Computation methods and a Markov Chain Monte Carlo procedure. The approach permits the forward propagation of parametric uncertainty through the computational model in a manner that is consistent with the published statistics. A consensus joint posterior on the parameters is obtained by pooling the posterior parameter densities given each consistent data set. To expedite this process, we construct efficient surrogates for the OH concentration using a combination of Pad'e and polynomial approximants. These surrogate models adequately represent forward model observables and their dependence on input parameters and are computationally efficient to allow their use in the Bayesian inference procedure. We also utilize Gauss-Hermite quadrature with Gaussian proposal probability density functions for moment computation resulting in orders of magnitude speedup in data likelihood evaluation. Despite the strong non-linearity in the model, the consistent data sets all res ult in nearly Gaussian conditional parameter probability density functions. The technique also accounts for nuisance parameters in the form of Arrhenius parameters of other rate coefficients with prescribed uncertainty. The resulting pooled parameter probability density function is propagated through stoichiometric hydrogen-air auto-ignition computations to illustrate
Inference of reaction rate parameters based on summary statistics from experiments
Khalil, Mohammad; Chowdhary, Kamaljit Singh; Safta, Cosmin; Sargsyan, Khachik; Najm, Habib N.
2016-10-15
Here, we present the results of an application of Bayesian inference and maximum entropy methods for the estimation of the joint probability density for the Arrhenius rate para meters of the rate coefficient of the H_{2}/O_{2}-mechanism chain branching reaction H + O_{2} → OH + O. Available published data is in the form of summary statistics in terms of nominal values and error bars of the rate coefficient of this reaction at a number of temperature values obtained from shock-tube experiments. Our approach relies on generating data, in this case OH concentration profiles, consistent with the given summary statistics, using Approximate Bayesian Computation methods and a Markov Chain Monte Carlo procedure. The approach permits the forward propagation of parametric uncertainty through the computational model in a manner that is consistent with the published statistics. A consensus joint posterior on the parameters is obtained by pooling the posterior parameter densities given each consistent data set. To expedite this process, we construct efficient surrogates for the OH concentration using a combination of Pad'e and polynomial approximants. These surrogate models adequately represent forward model observables and their dependence on input parameters and are computationally efficient to allow their use in the Bayesian inference procedure. We also utilize Gauss-Hermite quadrature with Gaussian proposal probability density functions for moment computation resulting in orders of magnitude speedup in data likelihood evaluation. Despite the strong non-linearity in the model, the consistent data sets all res ult in nearly Gaussian conditional parameter probability density functions. The technique also accounts for nuisance parameters in the form of Arrhenius parameters of other rate coefficients with prescribed uncertainty. The resulting pooled parameter probability density function is propagated through stoichiometric hydrogen-air auto
On statistical inference in time series analysis of the evolution of road safety.
Commandeur, Jacques J F; Bijleveld, Frits D; Bergel-Hayat, Ruth; Antoniou, Constantinos; Yannis, George; Papadimitriou, Eleonora
2013-11-01
Data collected for building a road safety observatory usually include observations made sequentially through time. Examples of such data, called time series data, include annual (or monthly) number of road traffic accidents, traffic fatalities or vehicle kilometers driven in a country, as well as the corresponding values of safety performance indicators (e.g., data on speeding, seat belt use, alcohol use, etc.). Some commonly used statistical techniques imply assumptions that are often violated by the special properties of time series data, namely serial dependency among disturbances associated with the observations. The first objective of this paper is to demonstrate the impact of such violations to the applicability of standard methods of statistical inference, which leads to an under or overestimation of the standard error and consequently may produce erroneous inferences. Moreover, having established the adverse consequences of ignoring serial dependency issues, the paper aims to describe rigorous statistical techniques used to overcome them. In particular, appropriate time series analysis techniques of varying complexity are employed to describe the development over time, relating the accident-occurrences to explanatory factors such as exposure measures or safety performance indicators, and forecasting the development into the near future. Traditional regression models (whether they are linear, generalized linear or nonlinear) are shown not to naturally capture the inherent dependencies in time series data. Dedicated time series analysis techniques, such as the ARMA-type and DRAG approaches are discussed next, followed by structural time series models, which are a subclass of state space methods. The paper concludes with general recommendations and practice guidelines for the use of time series models in road safety research.
Marzouk, Youssef
2016-08-31
Predictive simulation of complex physical systems increasingly rests on the interplay of experimental observations with computational models. Key inputs, parameters, or structural aspects of models may be incomplete or unknown, and must be developed from indirect and limited observations. At the same time, quantified uncertainties are needed to qualify computational predictions in the support of design and decision-making. In this context, Bayesian statistics provides a foundation for inference from noisy and limited data, but at prohibitive computional expense. This project intends to make rigorous predictive modeling *feasible* in complex physical systems, via accelerated and scalable tools for uncertainty quantification, Bayesian inference, and experimental design. Specific objectives are as follows: 1. Develop adaptive posterior approximations and dimensionality reduction approaches for Bayesian inference in high-dimensional nonlinear systems. 2. Extend accelerated Bayesian methodologies to large-scale {\\em sequential} data assimilation, fully treating nonlinear models and non-Gaussian state and parameter distributions. 3. Devise efficient surrogate-based methods for Bayesian model selection and the learning of model structure. 4. Develop scalable simulation/optimization approaches to nonlinear Bayesian experimental design, for both parameter inference and model selection. 5. Demonstrate these inferential tools on chemical kinetic models in reacting flow, constructing and refining thermochemical and electrochemical models from limited data. Demonstrate Bayesian filtering on canonical stochastic PDEs and in the dynamic estimation of inhomogeneous subsurface properties and flow fields.
Statistical Downscaling in Multi-dimensional Wave Climate Forecast
NASA Astrophysics Data System (ADS)
Camus, P.; Méndez, F. J.; Medina, R.; Losada, I. J.; Cofiño, A. S.; Gutiérrez, J. M.
2009-04-01
Wave climate at a particular site is defined by the statistical distribution of sea state parameters, such as significant wave height, mean wave period, mean wave direction, wind velocity, wind direction and storm surge. Nowadays, long-term time series of these parameters are available from reanalysis databases obtained by numerical models. The Self-Organizing Map (SOM) technique is applied to characterize multi-dimensional wave climate, obtaining the relevant "wave types" spanning the historical variability. This technique summarizes multi-dimension of wave climate in terms of a set of clusters projected in low-dimensional lattice with a spatial organization, providing Probability Density Functions (PDFs) on the lattice. On the other hand, wind and storm surge depend on instantaneous local large-scale sea level pressure (SLP) fields while waves depend on the recent history of these fields (say, 1 to 5 days). Thus, these variables are associated with large-scale atmospheric circulation patterns. In this work, a nearest-neighbors analog method is used to predict monthly multi-dimensional wave climate. This method establishes relationships between the large-scale atmospheric circulation patterns from numerical models (SLP fields as predictors) with local wave databases of observations (monthly wave climate SOM PDFs as predictand) to set up statistical models. A wave reanalysis database, developed by Puertos del Estado (Ministerio de Fomento), is considered as historical time series of local variables. The simultaneous SLP fields calculated by NCEP atmospheric reanalysis are used as predictors. Several applications with different size of sea level pressure grid and with different temporal domain resolution are compared to obtain the optimal statistical model that better represents the monthly wave climate at a particular site. In this work we examine the potential skill of this downscaling approach considering perfect-model conditions, but we will also analyze the
ERIC Educational Resources Information Center
Henriques, Ana; Oliveira, Hélia
2016-01-01
This paper reports on the results of a study investigating the potential to embed Informal Statistical Inference in statistical investigations, using TinkerPlots, for assisting 8th grade students' informal inferential reasoning to emerge, particularly their articulations of uncertainty. Data collection included students' written work on a…
McDonald, L.L.; Erickson, W.P.; Strickland, M.D.
1995-12-31
The objective of the Coastal Habitat Injury Assessment study was to document and quantify injury to biota of the shallow subtidal, intertidal, and supratidal zones throughout the shoreline affected by oil or cleanup activity associated with the Exxon Valdez oil spill. The results of these studies were to be used to support the Trustee`s Type B Natural Resource Damage Assessment under the Comprehensive Environmental Response, Compensation, and Liability Act of 1980 (CERCLA). A probability based stratified random sample of shoreline segments was selected with probability proportional to size from each of 15 strata (5 habitat types crossed with 3 levels of potential oil impact) based on those data available in July, 1989. Three study regions were used: Prince William Sound, Cook Inlet/Kenai Peninsula, and Kodiak/Alaska Peninsula. A Geographic Information System was utilized to combine oiling and habitat data and to select the probability sample of study sites. Quasi-experiments were conducted where randomly selected oiled sites were compared to matched reference sites. Two levels of statistical inferences, philosophical bases, and limitations are discussed and illustrated with example data from the resulting studies. 25 refs., 4 figs., 1 tab.
Johnson, Eric D; Tubau, Elisabet
2016-09-27
Presenting natural frequencies facilitates Bayesian inferences relative to using percentages. Nevertheless, many people, including highly educated and skilled reasoners, still fail to provide Bayesian responses to these computationally simple problems. We show that the complexity of relational reasoning (e.g., the structural mapping between the presented and requested relations) can help explain the remaining difficulties. With a non-Bayesian inference that required identical arithmetic but afforded a more direct structural mapping, performance was universally high. Furthermore, reducing the relational demands of the task through questions that directed reasoners to use the presented statistics, as compared with questions that prompted the representation of a second, similar sample, also significantly improved reasoning. Distinct error patterns were also observed between these presented- and similar-sample scenarios, which suggested differences in relational-reasoning strategies. On the other hand, while higher numeracy was associated with better Bayesian reasoning, higher-numerate reasoners were not immune to the relational complexity of the task. Together, these findings validate the relational-reasoning view of Bayesian problem solving and highlight the importance of considering not only the presented task structure, but also the complexity of the structural alignment between the presented and requested relations.
Palstra, Friso P; Heyer, Evelyne; Austerlitz, Frédéric
2015-06-01
The demographic history of modern humans constitutes a combination of expansions, colonizations, contractions, and remigrations. The advent of large scale genetic data combined with statistically refined methods facilitates inference of this complex history. Here we study the demographic history of two genetically admixed ethnic groups in Central Asia, an area characterized by high levels of genetic diversity and a history of recurrent immigration. Using Approximate Bayesian Computation, we infer that the timing of admixture markedly differs between the two groups. Admixture in the traditionally agricultural Tajiks could be dated back to the onset of the Neolithic transition in the region, whereas admixture in Kyrgyz is more recent, and may have involved the westward movement of Turkic peoples. These results are confirmed by a coalescent method that fits an isolation-with-migration model to the genetic data, with both Central Asian groups having received gene flow from the extremities of Eurasia. Interestingly, our analyses also uncover signatures of gene flow from Eastern to Western Eurasia during Paleolithic times. In conclusion, the high genetic diversity currently observed in these two Central Asian peoples most likely reflects the effects of recurrent immigration that likely started before historical times. Conversely, conquests during historical times may have had a relatively limited genetic impact. These results emphasize the need for a better understanding of the genetic consequences of transmission of culture and technological innovations, as well as those of invasions and conquests.
Specificity and timescales of cortical adaptation as inferences about natural movie statistics
Snow, Michoel; Coen-Cagli, Ruben; Schwartz, Odelia
2016-01-01
Adaptation is a phenomenological umbrella term under which a variety of temporal contextual effects are grouped. Previous models have shown that some aspects of visual adaptation reflect optimal processing of dynamic visual inputs, suggesting that adaptation should be tuned to the properties of natural visual inputs. However, the link between natural dynamic inputs and adaptation is poorly understood. Here, we extend a previously developed Bayesian modeling framework for spatial contextual effects to the temporal domain. The model learns temporal statistical regularities of natural movies and links these statistics to adaptation in primary visual cortex via divisive normalization, a ubiquitous neural computation. In particular, the model divisively normalizes the present visual input by the past visual inputs only to the degree that these are inferred to be statistically dependent. We show that this flexible form of normalization reproduces classical findings on how brief adaptation affects neuronal selectivity. Furthermore, prior knowledge acquired by the Bayesian model from natural movies can be modified by prolonged exposure to novel visual stimuli. We show that this updating can explain classical results on contrast adaptation. We also simulate the recent finding that adaptation maintains population homeostasis, namely, a balanced level of activity across a population of neurons with different orientation preferences. Consistent with previous disparate observations, our work further clarifies the influence of stimulus-specific and neuronal-specific normalization signals in adaptation. PMID:27699416
Demidenko, Eugene; Williams, Benjamin B; Flood, Ann Barry; Swartz, Harold M
2013-05-30
This paper develops a new metric, the standard error of inverse prediction (SEIP), for a dose-response relationship (calibration curve) when dose is estimated from response via inverse regression. SEIP can be viewed as a generalization of the coefficient of variation to regression problem when x is predicted using y-value. We employ nonstandard statistical methods to treat the inverse prediction, which has an infinite mean and variance due to the presence of a normally distributed variable in the denominator. We develop confidence intervals and hypothesis testing for SEIP on the basis of the normal approximation and using the exact statistical inference based on the noncentral t-distribution. We derive the power functions for both approaches and test them via statistical simulations. The theoretical SEIP, as the ratio of the regression standard error to the slope, is viewed as reciprocal of the signal-to-noise ratio, a popular measure of signal processing. The SEIP, as a figure of merit for inverse prediction, can be used for comparison of calibration curves with different dependent variables and slopes. We illustrate our theory with electron paramagnetic resonance tooth dosimetry for a rapid estimation of the radiation dose received in the event of nuclear terrorism.
One-dimensional statistical parametric mapping in Python.
Pataky, Todd C
2012-01-01
Statistical parametric mapping (SPM) is a topological methodology for detecting field changes in smooth n-dimensional continua. Many classes of biomechanical data are smooth and contained within discrete bounds and as such are well suited to SPM analyses. The current paper accompanies release of 'SPM1D', a free and open-source Python package for conducting SPM analyses on a set of registered 1D curves. Three example applications are presented: (i) kinematics, (ii) ground reaction forces and (iii) contact pressure distribution in probabilistic finite element modelling. In addition to offering a high-level interface to a variety of common statistical tests like t tests, regression and ANOVA, SPM1D also emphasises fundamental concepts of SPM theory through stand-alone example scripts. Source code and documentation are available at: www.tpataky.net/spm1d/.
Lagrangian statistics in forced two-dimensional turbulence
NASA Astrophysics Data System (ADS)
Kamps, Oliver; Friedrich, Rudolf
2007-11-01
In recent years the Lagrangian description of turbulent flows has attracted much interest from the experimental point of view and as well is in the focus of numerical and analytical investigations. We present detailed numerical investigations of Lagrangian tracer particles in the inverse energy cascade of two-dimensional turbulence. In the first part we focus on the shape and scaling properties of the probability distribution functions for the velocity increments and compare them to the Eulerian case and the increment statistics in three dimensions. Motivated by our observations we address the important question of translating increment statistics from one frame of reference to the other [1]. To reveal the underlying physical mechanism we determine numerically the involved transition probabilities. In this way we shed light on the source of Lagrangian intermittency.[1ex] [1] R. Friedrich, R. Grauer, H. Hohmann, O. Kamps, A Corrsin type approximation for Lagrangian fluid Turbulence , arXiv:0705.3132
Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations.
Neuwald, Andrew F; Altschul, Stephen F
2016-12-01
Over evolutionary time, members of a superfamily of homologous proteins sharing a common structural core diverge into subgroups filling various functional niches. At the sequence level, such divergence appears as correlations that arise from residue patterns distinct to each subgroup. Such a superfamily may be viewed as a population of sequences corresponding to a complex, high-dimensional probability distribution. Here we model this distribution as hierarchical interrelated hidden Markov models (hiHMMs), which describe these sequence correlations implicitly. By characterizing such correlations one may hope to obtain information regarding functionally-relevant properties that have thus far evaded detection. To do so, we infer a hiHMM distribution from sequence data using Bayes' theorem and Markov chain Monte Carlo (MCMC) sampling, which is widely recognized as the most effective approach for characterizing a complex, high dimensional distribution. Other routines then map correlated residue patterns to available structures with a view to hypothesis generation. When applied to N-acetyltransferases, this reveals sequence and structural features indicative of functionally important, yet generally unknown biochemical properties. Even for sets of proteins for which nothing is known beyond unannotated sequences and structures, this can lead to helpful insights. We describe, for example, a putative coenzyme-A-induced-fit substrate binding mechanism mediated by arginine residue switching between salt bridge and π-π stacking interactions. A suite of programs implementing this approach is available (psed.igs.umaryland.edu).
Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations
Neuwald, Andrew F.
2016-01-01
Over evolutionary time, members of a superfamily of homologous proteins sharing a common structural core diverge into subgroups filling various functional niches. At the sequence level, such divergence appears as correlations that arise from residue patterns distinct to each subgroup. Such a superfamily may be viewed as a population of sequences corresponding to a complex, high-dimensional probability distribution. Here we model this distribution as hierarchical interrelated hidden Markov models (hiHMMs), which describe these sequence correlations implicitly. By characterizing such correlations one may hope to obtain information regarding functionally-relevant properties that have thus far evaded detection. To do so, we infer a hiHMM distribution from sequence data using Bayes’ theorem and Markov chain Monte Carlo (MCMC) sampling, which is widely recognized as the most effective approach for characterizing a complex, high dimensional distribution. Other routines then map correlated residue patterns to available structures with a view to hypothesis generation. When applied to N-acetyltransferases, this reveals sequence and structural features indicative of functionally important, yet generally unknown biochemical properties. Even for sets of proteins for which nothing is known beyond unannotated sequences and structures, this can lead to helpful insights. We describe, for example, a putative coenzyme-A-induced-fit substrate binding mechanism mediated by arginine residue switching between salt bridge and π-π stacking interactions. A suite of programs implementing this approach is available (psed.igs.umaryland.edu). PMID:28002465
Zhang, Kai; Traskin, Mikhail; Small, Dylan S
2012-03-01
For group-randomized trials, randomization inference based on rank statistics provides robust, exact inference against nonnormal distributions. However, in a matched-pair design, the currently available rank-based statistics lose significant power compared to normal linear mixed model (LMM) test statistics when the LMM is true. In this article, we investigate and develop an optimal test statistic over all statistics in the form of the weighted sum of signed Mann-Whitney-Wilcoxon statistics under certain assumptions. This test is almost as powerful as the LMM even when the LMM is true, but it is much more powerful for heavy tailed distributions. A simulation study is conducted to examine the power.
Validi, AbdoulAhad
2014-03-01
This study introduces a non-intrusive approach in the context of low-rank separated representation to construct a surrogate of high-dimensional stochastic functions, e.g., PDEs/ODEs, in order to decrease the computational cost of Markov Chain Monte Carlo simulations in Bayesian inference. The surrogate model is constructed via a regularized alternative least-square regression with Tikhonov regularization using a roughening matrix computing the gradient of the solution, in conjunction with a perturbation-based error indicator to detect optimal model complexities. The model approximates a vector of a continuous solution at discrete values of a physical variable. The required number of random realizations to achieve a successful approximation linearly depends on the function dimensionality. The computational cost of the model construction is quadratic in the number of random inputs, which potentially tackles the curse of dimensionality in high-dimensional stochastic functions. Furthermore, this vector-valued separated representation-based model, in comparison to the available scalar-valued case, leads to a significant reduction in the cost of approximation by an order of magnitude equal to the vector size. The performance of the method is studied through its application to three numerical examples including a 41-dimensional elliptic PDE and a 21-dimensional cavity flow.
Sojoudi, Alireza; Goodyear, Bradley G
2016-12-01
Spontaneous fluctuations of blood-oxygenation level-dependent functional magnetic resonance imaging (BOLD fMRI) signals are highly synchronous between brain regions that serve similar functions. This provides a means to investigate functional networks; however, most analysis techniques assume functional connections are constant over time. This may be problematic in the case of neurological disease, where functional connections may be highly variable. Recently, several methods have been proposed to determine moment-to-moment changes in the strength of functional connections over an imaging session (so called dynamic connectivity). Here a novel analysis framework based on a hierarchical observation modeling approach was proposed, to permit statistical inference of the presence of dynamic connectivity. A two-level linear model composed of overlapping sliding windows of fMRI signals, incorporating the fact that overlapping windows are not independent was described. To test this approach, datasets were synthesized whereby functional connectivity was either constant (significant or insignificant) or modulated by an external input. The method successfully determines the statistical significance of a functional connection in phase with the modulation, and it exhibits greater sensitivity and specificity in detecting regions with variable connectivity, when compared with sliding-window correlation analysis. For real data, this technique possesses greater reproducibility and provides a more discriminative estimate of dynamic connectivity than sliding-window correlation analysis. Hum Brain Mapp 37:4566-4580, 2016. © 2016 Wiley Periodicals, Inc.
Univariate description and bivariate statistical inference: the first step delving into data.
Zhang, Zhongheng
2016-03-01
In observational studies, the first step is usually to explore data distribution and the baseline differences between groups. Data description includes their central tendency (e.g., mean, median, and mode) and dispersion (e.g., standard deviation, range, interquartile range). There are varieties of bivariate statistical inference methods such as Student's t-test, Mann-Whitney U test and Chi-square test, for normal, skews and categorical data, respectively. The article shows how to perform these analyses with R codes. Furthermore, I believe that the automation of the whole workflow is of paramount importance in that (I) it allows for others to repeat your results; (II) you can easily find out how you performed analysis during revision; (III) it spares data input by hand and is less error-prone; and (IV) when you correct your original dataset, the final result can be automatically corrected by executing the codes. Therefore, the process of making a publication quality table incorporating all abovementioned statistics and P values is provided, allowing readers to customize these codes to their own needs.
Valid statistical inference methods for a case-control study with missing data.
Tian, Guo-Liang; Zhang, Chi; Jiang, Xuejun
2016-05-19
The main objective of this paper is to derive the valid sampling distribution of the observed counts in a case-control study with missing data under the assumption of missing at random by employing the conditional sampling method and the mechanism augmentation method. The proposed sampling distribution, called the case-control sampling distribution, can be used to calculate the standard errors of the maximum likelihood estimates of parameters via the Fisher information matrix and to generate independent samples for constructing small-sample bootstrap confidence intervals. Theoretical comparisons of the new case-control sampling distribution with two existing sampling distributions exhibit a large difference. Simulations are conducted to investigate the influence of the three different sampling distributions on statistical inferences. One finding is that the conclusion by the Wald test for testing independency under the two existing sampling distributions could be completely different (even contradictory) from the Wald test for testing the equality of the success probabilities in control/case groups under the proposed distribution. A real cervical cancer data set is used to illustrate the proposed statistical methods.
Sex, lies, and statistics: inferences from the child sexual abuse accommodation syndrome.
Weiss, Kenneth J; Curcio Alexander, Julia
2013-01-01
Victims of child sexual abuse often recant their complaints or do not report incidents, making prosecution of offenders difficult. The child with sexual abuse accommodation syndrome (CSAAS) has been used to explain this phenomenon by identifying common behavioral responses. Unlike PTSD but like rape trauma syndrome, CSAAS is not an official diagnostic term and should not be used as evidence of a defendant's guilt or to imply probative value in prosecutions. Courts have grappled with the ideal use of CSAAS in the evaluation of child witness testimony. Expert testimony should be helpful to the jurors without prejudicing them. The New Jersey Supreme Court ruled recently that statistical evidence about CSAAS implying the probability that a child is truthful runs the risk of confusing jury members and biasing them against the defendant. We review the parameters of expert testimony and its admissibility in this area, concluding that statistics about CSAAS should not be used to draw inferences about the victim's credibility or the defendant's guilt.
Emmert-Streib, Frank; Glazko, Galina V; Altay, Gökmen; de Matos Simoes, Ricardo
2012-01-01
In this paper, we present a systematic and conceptual overview of methods for inferring gene regulatory networks from observational gene expression data. Further, we discuss two classic approaches to infer causal structures and compare them with contemporary methods by providing a conceptual categorization thereof. We complement the above by surveying global and local evaluation measures for assessing the performance of inference algorithms.
Lu, Tsui-Shan; Longnecker, Matthew P; Zhou, Haibo
2017-03-15
Outcome-dependent sampling (ODS) scheme is a cost-effective sampling scheme where one observes the exposure with a probability that depends on the outcome. The well-known such design is the case-control design for binary response, the case-cohort design for the failure time data, and the general ODS design for a continuous response. While substantial work has been carried out for the univariate response case, statistical inference and design for the ODS with multivariate cases remain under-developed. Motivated by the need in biological studies for taking the advantage of the available responses for subjects in a cluster, we propose a multivariate outcome-dependent sampling (multivariate-ODS) design that is based on a general selection of the continuous responses within a cluster. The proposed inference procedure for the multivariate-ODS design is semiparametric where all the underlying distributions of covariates are modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimator is consistent and developed the asymptotically normality properties. Simulation studies show that the proposed estimator is more efficient than the estimator obtained using only the simple-random-sample portion of the multivariate-ODS or the estimator from a simple random sample with the same sample size. The multivariate-ODS design together with the proposed estimator provides an approach to further improve study efficiency for a given fixed study budget. We illustrate the proposed design and estimator with an analysis of association of polychlorinated biphenyl exposure to hearing loss in children born to the Collaborative Perinatal Study. Copyright © 2016 John Wiley & Sons, Ltd.
Duchesne, Thierry; Fortin, Daniel; Rivest, Louis-Paul
2015-01-01
Animal movement has a fundamental impact on population and community structure and dynamics. Biased correlated random walks (BCRW) and step selection functions (SSF) are commonly used to study movements. Because no studies have contrasted the parameters and the statistical properties of their estimators for models constructed under these two Lagrangian approaches, it remains unclear whether or not they allow for similar inference. First, we used the Weak Law of Large Numbers to demonstrate that the log-likelihood function for estimating the parameters of BCRW models can be approximated by the log-likelihood of SSFs. Second, we illustrated the link between the two approaches by fitting BCRW with maximum likelihood and with SSF to simulated movement data in virtual environments and to the trajectory of bison (Bison bison L.) trails in natural landscapes. Using simulated and empirical data, we found that the parameters of a BCRW estimated directly from maximum likelihood and by fitting an SSF were remarkably similar. Movement analysis is increasingly used as a tool for understanding the influence of landscape properties on animal distribution. In the rapidly developing field of movement ecology, management and conservation biologists must decide which method they should implement to accurately assess the determinants of animal movement. We showed that BCRW and SSF can provide similar insights into the environmental features influencing animal movements. Both techniques have advantages. BCRW has already been extended to allow for multi-state modeling. Unlike BCRW, however, SSF can be estimated using most statistical packages, it can simultaneously evaluate habitat selection and movement biases, and can easily integrate a large number of movement taxes at multiple scales. SSF thus offers a simple, yet effective, statistical technique to identify movement taxis. PMID:25898019
Duchesne, Thierry; Fortin, Daniel; Rivest, Louis-Paul
2015-01-01
Animal movement has a fundamental impact on population and community structure and dynamics. Biased correlated random walks (BCRW) and step selection functions (SSF) are commonly used to study movements. Because no studies have contrasted the parameters and the statistical properties of their estimators for models constructed under these two Lagrangian approaches, it remains unclear whether or not they allow for similar inference. First, we used the Weak Law of Large Numbers to demonstrate that the log-likelihood function for estimating the parameters of BCRW models can be approximated by the log-likelihood of SSFs. Second, we illustrated the link between the two approaches by fitting BCRW with maximum likelihood and with SSF to simulated movement data in virtual environments and to the trajectory of bison (Bison bison L.) trails in natural landscapes. Using simulated and empirical data, we found that the parameters of a BCRW estimated directly from maximum likelihood and by fitting an SSF were remarkably similar. Movement analysis is increasingly used as a tool for understanding the influence of landscape properties on animal distribution. In the rapidly developing field of movement ecology, management and conservation biologists must decide which method they should implement to accurately assess the determinants of animal movement. We showed that BCRW and SSF can provide similar insights into the environmental features influencing animal movements. Both techniques have advantages. BCRW has already been extended to allow for multi-state modeling. Unlike BCRW, however, SSF can be estimated using most statistical packages, it can simultaneously evaluate habitat selection and movement biases, and can easily integrate a large number of movement taxes at multiple scales. SSF thus offers a simple, yet effective, statistical technique to identify movement taxis.
Inference and Decoding of Motor Cortex Low-Dimensional Dynamics via Latent State-Space Models
Aghagolzadeh, Mehdi; Truccolo, Wilson
2016-01-01
Motor cortex neuronal ensemble spiking activity exhibits strong low-dimensional collective dynamics (i.e., coordinated modes of activity) during behavior. Here, we demonstrate that these low-dimensional dynamics, revealed by unsupervised latent state-space models, can provide as accurate or better reconstruction of movement kinematics as direct decoding from the entire recorded ensemble. Ensembles of single neurons were recorded with triple microelectrode arrays (MEAs) implanted in ventral and dorsal premotor (PMv, PMd) and primary motor (M1) cortices while nonhuman primates performed 3-D reach-to-grasp actions. Low-dimensional dynamics were estimated via various types of latent state-space models including, for example, Poisson linear dynamic system (PLDS) models. Decoding from low-dimensional dynamics was implemented via point process and Kalman filters coupled in series. We also examined decoding based on a predictive subsampling of the recorded population. In this case, a supervised greedy procedure selected neuronal subsets that optimized decoding performance. When comparing decoding based on predictive subsampling and latent state-space models, the size of the neuronal subset was set to the same number of latent state dimensions. Overall, our findings suggest that information about naturalistic reach kinematics present in the recorded population is preserved in the inferred low-dimensional motor cortex dynamics. Furthermore, decoding based on unsupervised PLDS models may also outperform previous approaches based on direct decoding from the recorded population or on predictive subsampling. PMID:26336135
Velocity statistics in two-dimensional granular turbulence
NASA Astrophysics Data System (ADS)
Isobe, Masaharu
2003-10-01
We studied the macroscopic statistical properties on the freely evolving quasielastic hard disk (granular) system by performing a large-scale (up to a few million particles) event-driven molecular dynamics systematically and found it to be remarkably analogous to an enstrophy cascade process in the decaying two-dimensional fluid turbulence. There are four typical stages in the freely evolving inelastic hard disk system, which are homogeneous, shearing (vortex), clustering, and final state. In the shearing stage, the self-organized macroscopic coherent vortices become dominant. In the clustering stage, the energy spectra are close to the expectation of Kraichnan-Batchelor theory and the squared two-particle separation strictly obeys Richardson law.
Velocity statistics in two-dimensional granular turbulence.
Isobe, Masaharu
2003-10-01
We studied the macroscopic statistical properties on the freely evolving quasielastic hard disk (granular) system by performing a large-scale (up to a few million particles) event-driven molecular dynamics systematically and found it to be remarkably analogous to an enstrophy cascade process in the decaying two-dimensional fluid turbulence. There are four typical stages in the freely evolving inelastic hard disk system, which are homogeneous, shearing (vortex), clustering, and final state. In the shearing stage, the self-organized macroscopic coherent vortices become dominant. In the clustering stage, the energy spectra are close to the expectation of Kraichnan-Batchelor theory and the squared two-particle separation strictly obeys Richardson law.
NASA Astrophysics Data System (ADS)
Hu, Zixi; Yao, Zhewei; Li, Jinglai
2017-03-01
Many scientific and engineering problems require to perform Bayesian inference for unknowns of infinite dimension. In such problems, many standard Markov Chain Monte Carlo (MCMC) algorithms become arbitrary slow under the mesh refinement, which is referred to as being dimension dependent. To this end, a family of dimensional independent MCMC algorithms, known as the preconditioned Crank-Nicolson (pCN) methods, were proposed to sample the infinite dimensional parameters. In this work we develop an adaptive version of the pCN algorithm, where the covariance operator of the proposal distribution is adjusted based on sampling history to improve the simulation efficiency. We show that the proposed algorithm satisfies an important ergodicity condition under some mild assumptions. Finally we provide numerical examples to demonstrate the performance of the proposed method.
Brannigan, V.M.; Bier, V.M.; Berg, C.
1992-09-01
Toxic torts are product liability cases dealing with alleged injuries due to chemical or biological hazards such as radiation, thalidomide, or Agent Orange. Toxic tort cases typically rely more heavily that other product liability cases on indirect or statistical proof of injury in toxic cases. However, there have been only a handful of actual legal decisions regarding the use of such statistical evidence, and most of those decisions have been inconclusive. Recently, a major case from the Fifth Circuit, involving allegations that Benedectin (a morning sickness drug) caused birth defects, was decided entirely on the basis of statistical inference. This paper examines both the conceptual basis of that decision, and also the relationships among statistical inference, scientific evidence, and the rules of product liability in general. 23 refs.
Conn, Paul B.; Johnson, Devin S.; Ver Hoef, Jay M.; Hooten, Mevin B.; London, Joshua M.; Boveng, Peter L.
2015-01-01
Ecologists often fit models to survey data to estimate and explain variation in animal abundance. Such models typically require that animal density remains constant across the landscape where sampling is being conducted, a potentially problematic assumption for animals inhabiting dynamic landscapes or otherwise exhibiting considerable spatiotemporal variation in density. We review several concepts from the burgeoning literature on spatiotemporal statistical models, including the nature of the temporal structure (i.e., descriptive or dynamical) and strategies for dimension reduction to promote computational tractability. We also review several features as they specifically relate to abundance estimation, including boundary conditions, population closure, choice of link function, and extrapolation of predicted relationships to unsampled areas. We then compare a suite of novel and existing spatiotemporal hierarchical models for animal count data that permit animal density to vary over space and time, including formulations motivated by resource selection and allowing for closed populations. We gauge the relative performance (bias, precision, computational demands) of alternative spatiotemporal models when confronted with simulated and real data sets from dynamic animal populations. For the latter, we analyze spotted seal (Phoca largha) counts from an aerial survey of the Bering Sea where the quantity and quality of suitable habitat (sea ice) changed dramatically while surveys were being conducted. Simulation analyses suggested that multiple types of spatiotemporal models provide reasonable inference (low positive bias, high precision) about animal abundance, but have potential for overestimating precision. Analysis of spotted seal data indicated that several model formulations, including those based on a log-Gaussian Cox process, had a tendency to overestimate abundance. By contrast, a model that included a population closure assumption and a scale prior on total
Toward 'smart' DNA microarrays: algorithms for improving data quality and statistical inference
NASA Astrophysics Data System (ADS)
Bakewell, David J. G.; Wit, Ernst
2007-12-01
DNA microarrays are a laboratory tool for understanding biological processes at the molecular scale and future applications of this technology include healthcare, agriculture, and environment. Despite their usefulness, however, the information microarrays make available to the end-user is not used optimally, and the data is often noisy and of variable quality. This paper describes the use of hierarchical Maximum Likelihood Estimation (MLE) for generating algorithms that improve the quality of microarray data and enhance statistical inference about gene behavior. The paper describes examples of recent work that improves microarray performance, demonstrated using data from both Monte Carlo simulations and published experiments. One example looks at the variable quality of cDNA spots on a typical microarray surface. It is shown how algorithms, derived using MLE, are used to "weight" these spots according to their morphological quality, and subsequently lead to improved detection of gene activity. Another example, briefly discussed, addresses the "noisy data about too many genes" issue confronting many analysts who are also interested in the collective action of a group of genes, often organized as a pathway or complex. Preliminary work is described where MLE is used to "share" variance information across a pre-assigned group of genes of interest, leading to improved detection of gene activity.
NASA Astrophysics Data System (ADS)
Knobles, David; Stotts, Steven; Sagers, Jason
2012-03-01
Why can one obtain from similar measurements a greater amount of information about cosmological parameters than seabed parameters in ocean waveguides? The cosmological measurements are in the form of a power spectrum constructed from spatial correlations of temperature fluctuations within the microwave background radiation. The seabed acoustic measurements are in the form of spatial correlations along the length of a spatial aperture. This study explores the above question from the perspective of posterior probability distributions obtained from maximizing a relative entropy functional. An answer is in part that the seabed in shallow ocean environments generally has large temporal and spatial inhomogeneities, whereas the early universe was a nearly homogeneous cosmological soup with small but important fluctuations. Acoustic propagation models used in shallow water acoustics generally do not capture spatial and temporal variability sufficiently well, which leads to model error dominating the statistical inference problem. This is not the case in cosmology. Further, the physics of the acoustic modes in cosmology is that of a standing wave with simple initial conditions, whereas for underwater acoustics it is a traveling wave in a strongly inhomogeneous bounded medium.
Sassenhagen, Jona; Alday, Phillip M
2016-11-01
Experimental research on behavior and cognition frequently rests on stimulus or subject selection where not all characteristics can be fully controlled, even when attempting strict matching. For example, when contrasting patients to controls, variables such as intelligence or socioeconomic status are often correlated with patient status. Similarly, when presenting word stimuli, variables such as word frequency are often correlated with primary variables of interest. One procedure very commonly employed to control for such nuisance effects is conducting inferential tests on confounding stimulus or subject characteristics. For example, if word length is not significantly different for two stimulus sets, they are considered as matched for word length. Such a test has high error rates and is conceptually misguided. It reflects a common misunderstanding of statistical tests: interpreting significance not to refer to inference about a particular population parameter, but about 1. the sample in question, 2. the practical relevance of a sample difference (so that a nonsignificant test is taken to indicate evidence for the absence of relevant differences). We show inferential testing for assessing nuisance effects to be inappropriate both pragmatically and philosophically, present a survey showing its high prevalence, and briefly discuss an alternative in the form of regression including nuisance variables.
Schlichting, Margaret L; Guarino, Katharine F; Schapiro, Anna C; Turk-Browne, Nicholas B; Preston, Alison R
2017-01-01
Despite the importance of learning and remembering across the lifespan, little is known about how the episodic memory system develops to support the extraction of associative structure from the environment. Here, we relate individual differences in volumes along the hippocampal long axis to performance on statistical learning and associative inference tasks-both of which require encoding associations that span multiple episodes-in a developmental sample ranging from ages 6 to 30 years. Relating age to volume, we found dissociable patterns across the hippocampal long axis, with opposite nonlinear volume changes in the head and body. These structural differences were paralleled by performance gains across the age range on both tasks, suggesting improvements in the cross-episode binding ability from childhood to adulthood. Controlling for age, we also found that smaller hippocampal heads were associated with superior behavioral performance on both tasks, consistent with this region's hypothesized role in forming generalized codes spanning events. Collectively, these results highlight the importance of examining hippocampal development as a function of position along the hippocampal axis and suggest that the hippocampal head is particularly important in encoding associative structure across development.
Racing to learn: statistical inference and learning in a single spiking neuron with adaptive kernels
Afshar, Saeed; George, Libin; Tapson, Jonathan; van Schaik, André; Hamilton, Tara J.
2014-01-01
This paper describes the Synapto-dendritic Kernel Adapting Neuron (SKAN), a simple spiking neuron model that performs statistical inference and unsupervised learning of spatiotemporal spike patterns. SKAN is the first proposed neuron model to investigate the effects of dynamic synapto-dendritic kernels and demonstrate their computational power even at the single neuron scale. The rule-set defining the neuron is simple: there are no complex mathematical operations such as normalization, exponentiation or even multiplication. The functionalities of SKAN emerge from the real-time interaction of simple additive and binary processes. Like a biological neuron, SKAN is robust to signal and parameter noise, and can utilize both in its operations. At the network scale neurons are locked in a race with each other with the fastest neuron to spike effectively “hiding” its learnt pattern from its neighbors. The robustness to noise, high speed, and simple building blocks not only make SKAN an interesting neuron model in computational neuroscience, but also make it ideal for implementation in digital and analog neuromorphic systems which is demonstrated through an implementation in a Field Programmable Gate Array (FPGA). Matlab, Python, and Verilog implementations of SKAN are available at: http://www.uws.edu.au/bioelectronics_neuroscience/bens/reproducible_research. PMID:25505378
Statistical Inference of a RANS closure for a Jet-in-Crossflow simulation
NASA Astrophysics Data System (ADS)
Heyse, Jan; Edeling, Wouter; Iaccarino, Gianluca
2016-11-01
The jet-in-crossflow is found in several engineering applications, such as discrete film cooling for turbine blades, where a coolant injected through hols in the blade's surface protects the component from the hot gases leaving the combustion chamber. Experimental measurements using MRI techniques have been completed for a single hole injection into a turbulent crossflow, providing full 3D averaged velocity field. For such flows of engineering interest, Reynolds-Averaged Navier-Stokes (RANS) turbulence closure models are often the only viable computational option. However, RANS models are known to provide poor predictions in the region close to the injection point. Since these models are calibrated on simple canonical flow problems, the obtained closure coefficient estimates are unlikely to extrapolate well to more complex flows. We will therefore calibrate the parameters of a RANS model using statistical inference techniques informed by the experimental jet-in-crossflow data. The obtained probabilistic parameter estimates can in turn be used to compute flow fields with quantified uncertainty. Stanford Graduate Fellowship in Science and Engineering.
Statistical inference of selection and divergence of the rice blast resistance gene Pi-ta.
Amei, Amei; Lee, Seonghee; Mysore, Kirankumar S; Jia, Yulin
2014-10-21
The resistance gene Pi-ta has been effectively used to control rice blast disease, but some populations of cultivated and wild rice have evolved resistance. Insights into the evolutionary processes that led to this resistance during crop domestication may be inferred from the population history of domesticated and wild rice strains. In this study, we applied a recently developed statistical method, time-dependent Poisson random field model, to examine the evolution of the Pi-ta gene in cultivated and weedy rice. Our study suggests that the Pi-ta gene may have more recently introgressed into cultivated rice, indica and japonica, and U.S. weedy rice from the wild species, O. rufipogon. In addition, the Pi-ta gene is under positive selection in japonica, tropical japonica, U.S. cultivars and U.S. weedy rice. We also found that sequences of two domains of the Pi-ta gene, the nucleotide binding site and leucine-rich repeat domain, are highly conserved among all rice accessions examined. Our results provide a valuable analytical tool for understanding the evolution of disease resistance genes in crop plants.
Ogunnaike, Babatunde A; Gelmi, Claudio A; Edwards, Jeremy S
2010-05-21
Gene expression studies generate large quantities of data with the defining characteristic that the number of genes (whose expression profiles are to be determined) exceed the number of available replicates by several orders of magnitude. Standard spot-by-spot analysis still seeks to extract useful information for each gene on the basis of the number of available replicates, and thus plays to the weakness of microarrays. On the other hand, because of the data volume, treating the entire data set as an ensemble, and developing theoretical distributions for these ensembles provides a framework that plays instead to the strength of microarrays. We present theoretical results that under reasonable assumptions, the distribution of microarray intensities follows the Gamma model, with the biological interpretations of the model parameters emerging naturally. We subsequently establish that for each microarray data set, the fractional intensities can be represented as a mixture of Beta densities, and develop a procedure for using these results to draw statistical inference regarding differential gene expression. We illustrate the results with experimental data from gene expression studies on Deinococcus radiodurans following DNA damage using cDNA microarrays.
Maximum entropy approach to statistical inference for an ocean acoustic waveguide.
Knobles, D P; Sagers, J D; Koch, R A
2012-02-01
A conditional probability distribution suitable for estimating the statistical properties of ocean seabed parameter values inferred from acoustic measurements is derived from a maximum entropy principle. The specification of the expectation value for an error function constrains the maximization of an entropy functional. This constraint determines the sensitivity factor (β) to the error function of the resulting probability distribution, which is a canonical form that provides a conservative estimate of the uncertainty of the parameter values. From the conditional distribution, marginal distributions for individual parameters can be determined from integration over the other parameters. The approach is an alternative to obtaining the posterior probability distribution without an intermediary determination of the likelihood function followed by an application of Bayes' rule. In this paper the expectation value that specifies the constraint is determined from the values of the error function for the model solutions obtained from a sparse number of data samples. The method is applied to ocean acoustic measurements taken on the New Jersey continental shelf. The marginal probability distribution for the values of the sound speed ratio at the surface of the seabed and the source levels of a towed source are examined for different geoacoustic model representations.
Afshar, Saeed; George, Libin; Tapson, Jonathan; van Schaik, André; Hamilton, Tara J
2014-01-01
This paper describes the Synapto-dendritic Kernel Adapting Neuron (SKAN), a simple spiking neuron model that performs statistical inference and unsupervised learning of spatiotemporal spike patterns. SKAN is the first proposed neuron model to investigate the effects of dynamic synapto-dendritic kernels and demonstrate their computational power even at the single neuron scale. The rule-set defining the neuron is simple: there are no complex mathematical operations such as normalization, exponentiation or even multiplication. The functionalities of SKAN emerge from the real-time interaction of simple additive and binary processes. Like a biological neuron, SKAN is robust to signal and parameter noise, and can utilize both in its operations. At the network scale neurons are locked in a race with each other with the fastest neuron to spike effectively "hiding" its learnt pattern from its neighbors. The robustness to noise, high speed, and simple building blocks not only make SKAN an interesting neuron model in computational neuroscience, but also make it ideal for implementation in digital and analog neuromorphic systems which is demonstrated through an implementation in a Field Programmable Gate Array (FPGA). Matlab, Python, and Verilog implementations of SKAN are available at: http://www.uws.edu.au/bioelectronics_neuroscience/bens/reproducible_research.
NASA Astrophysics Data System (ADS)
Hasan, A.; Maloney, C. E.
2014-12-01
We compute the effective dispersion and vibrational density of states (DOS) of two-dimensional subregions of three-dimensional face-centered-cubic crystals using both a direct projection-inversion technique and a Monte Carlo simulation based on a common underlying Hamiltonian. We study both a (111) and (100) plane. We show that for any given direction of wave vector, both (111) and (100) show an anomalous ω2˜q regime at low q where ω2 is the energy associated with the given mode and q is its wave number. The ω2˜q scaling should be expected to give rise to an anomalous DOS, Dω, at low ω : Dω˜ω3 rather than the conventional Debye result: Dω˜ω2 . The DOS for (100) looks to be consistent with Dω˜ω3 , while (111) shows something closer to the conventional Debye result at the smallest frequencies. In addition to the direct projection-inversion calculation, we perform Monte Carlo simulations to study the effects of finite sampling statistics. We show that finite sampling artifacts act as an effective disorder and bias Dω, giving a behavior closer to Dω˜ω2 than Dω˜ω3 . These results should have an important impact on the interpretation of recent studies of colloidal solids where the two-point displacement correlations can be obtained directly in real-space via microscopy.
Hasan, A; Maloney, C E
2014-12-01
We compute the effective dispersion and vibrational density of states (DOS) of two-dimensional subregions of three-dimensional face-centered-cubic crystals using both a direct projection-inversion technique and a Monte Carlo simulation based on a common underlying Hamiltonian. We study both a (111) and (100) plane. We show that for any given direction of wave vector, both (111) and (100) show an anomalous ω(2)∼q regime at low q where ω(2) is the energy associated with the given mode and q is its wave number. The ω(2)∼q scaling should be expected to give rise to an anomalous DOS, D(ω), at low ω: D(ω)∼ω(3) rather than the conventional Debye result: D(ω)∼ω(2). The DOS for (100) looks to be consistent with D(ω)∼ω(3), while (111) shows something closer to the conventional Debye result at the smallest frequencies. In addition to the direct projection-inversion calculation, we perform Monte Carlo simulations to study the effects of finite sampling statistics. We show that finite sampling artifacts act as an effective disorder and bias D(ω), giving a behavior closer to D(ω)∼ω(2) than D(ω)∼ω(3). These results should have an important impact on the interpretation of recent studies of colloidal solids where the two-point displacement correlations can be obtained directly in real-space via microscopy.
NASA Astrophysics Data System (ADS)
Riccio, A.; Caporaso, L.; di Giuseppe, F.; Bonafè, G.; Gobbi, G. P.; Angelini, A.
2010-09-01
The nowadays availability of low-cost commercial LIDAR/ceilometer, provides the opportunity to widely employ these active instruments to furnish continuous observation of the planetary boundary layer (PBL) evolution which could serve the scope of both air-quality model initialization and numerical weather prediction system evaluation. Their range-corrected signal is in fact proportional to the aerosol backscatter cross section, and therefore, in clear conditions, it allows to track the PBL evolution using aerosols as markers. The LIDAR signal is then processed to retrieve an estimate of the PBL mixing height. A standard approach uses the so called wavelet covariance transform (WCT) method which consists in the convolution of the vertical signal with a step function, which is able to detect local discontinuities in the backscatter profile. There are, nevertheless, several drawbacks which have to be considered when the WCT method is employed. Since water droplets may have a very large extinction and backscattering cross section, the presence of rain, clouds or fog decreases the returning signal causing interference and uncertainties in the mixing height retrievals. Moreover, if vertical mixing is scarce, aerosols remain suspended in a persistent residual layer which is detected even if it is not significantly connected to the actual mixing height. Finally, multiple layers are also cause of uncertainties. In this work we present a novel methodology to infer the height of planetary boundary layers (PBLs) from LIDAR data which corrects the unrealistic fluctuations introduced by the WCT method. It implements the assimilation of WCT-PBL heights estimations into a Bayesian statistical inference procedure which includes a physical model for the boundary layer (bulk model) as the first guess hypothesis. A hierarchical Bayesian Markov chain Monte Carlo (MCMC) approach is then used to explore the posterior state space and calculate the data likelihood of previously assigned
Stang, Andreas; Deckert, Markus; Poole, Charles; Rothman, Kenneth J
2017-01-01
Since its introduction in the twentieth century, null hypothesis significance testing (NHST), a hybrid of significance testing (ST) advocated by Fisher and null hypothesis testing (NHT) developed by Neyman and Pearson, has become widely adopted but has also been a source of debate. The principal alternative to such testing is estimation with point estimates and confidence intervals (CI). Our aim was to estimate time trends in NHST, ST, NHT and CI reporting in abstracts of major medical and epidemiological journals. We reviewed 89,533 abstracts in five major medical journals and seven major epidemiological journals, 1975-2014, and estimated time trends in the proportions of abstracts containing statistical inference. In those abstracts, we estimated time trends in the proportions relying on NHST and its major variants, ST and NHT, and in the proportions reporting CIs without explicit use of NHST (CI-only approach). The CI-only approach rose monotonically during the study period in the abstracts of all journals. In Epidemiology abstracts, as a result of the journal's editorial policy, the CI-only approach has always been the most common approach. In the other 11 journals, the NHST approach started out more common, but by 2014, this disparity had narrowed, disappeared or reversed in 9 of them. The exceptions were JAMA, New England Journal of Medicine, and Lancet abstracts, where the predominance of the NHST approach prevailed over time. In 2014, the CI-only approach is as popular as the NHST approach in the abstracts of 4 of the epidemiology journals: the American Journal of Epidemiology (48%), the Annals of Epidemiology (55%), Epidemiology (79%) and the International Journal of Epidemiology (52%). The reporting of CIs without explicitly interpreting them as statistical tests is becoming more common in abstracts, particularly in epidemiology journals. Although NHST is becoming less popular in abstracts of most epidemiology journals studied and some widely read medical
Gross, Kevin; Rosenheim, Jay A
2011-10-01
Secondary pest outbreaks occur when the use of a pesticide to reduce densities of an unwanted target pest species triggers subsequent outbreaks of other pest species. Although secondary pest outbreaks are thought to be familiar in agriculture, their rigorous documentation is made difficult by the challenges of performing randomized experiments at suitable scales. Here, we quantify the frequency and monetary cost of secondary pest outbreaks elicited by early-season applications of broad-spectrum insecticides to control the plant bug Lygus spp. (primarily L. hesperus) in cotton grown in the San Joaquin Valley, California, USA. We do so by analyzing pest-control management practices for 969 cotton fields spanning nine years and 11 private ranches. Our analysis uses statistical methods to draw formal causal inferences from nonexperimental data that have become popular in public health and economics, but that are not yet widely known in ecology or agriculture. We find that, in fields that received an early-season broad-spectrum insecticide treatment for Lygus, 20.2% +/- 4.4% (mean +/- SE) of late-season pesticide costs were attributable to secondary pest outbreaks elicited by the early-season insecticide application for Lygus. In 2010 U.S. dollars, this equates to an additional $6.00 +/- $1.30 (mean +/- SE) per acre in management costs. To the extent that secondary pest outbreaks may be driven by eliminating pests' natural enemies, these figures place a lower bound on the monetary value of ecosystem services provided by native communities of arthropod predators and parasitoids in this agricultural system.
Vincent, Martin; Mundbjerg, Kamilla; Skou Pedersen, Jakob; Liang, Gangning; Jones, Peter A; Ørntoft, Torben Falck; Dalsgaard Sørensen, Karina; Wiuf, Carsten
2017-02-21
The study of epigenetic heterogeneity at the level of individual cells and in whole populations is the key to understanding cellular differentiation, organismal development, and the evolution of cancer. We develop a statistical method, epiG, to infer and differentiate between different epi-allelic haplotypes, annotated with CpG methylation status and DNA polymorphisms, from whole-genome bisulfite sequencing data, and nucleosome occupancy from NOMe-seq data. We demonstrate the capabilities of the method by inferring allele-specific methylation and nucleosome occupancy in cell lines, and colon and tumor samples, and by benchmarking the method against independent experimental data.
Statistical properties of two-dimensional magnetohydrodynamic turbulence
NASA Astrophysics Data System (ADS)
Biskamp, D.; Welter, H.; Walter, M.
1990-12-01
The statistical properties of two-dimensional (2-D) magnetohydrodynamic (MHD) turbulence are studied by means of high-resolution numerical simulations. As a theoretical point of reference, the β model of intermittent turbulence is adapted to the MHD case. Comparison of simulation results for energy spectra with the β-model predictions shows intermittency corrections to be small, δ<0.2, while fourth-order correlation functions exhibit a stronger effect δ≂0.35, consistent with the numerically observed Reynolds-number dependence of the flatness factor F∝R1/2λ. An argument is given that this scaling valid for Rλ˜102 is, however, not characteristic of the asymptotic regime Rλ→∞, where a constant value of F is to be expected. The probability distributions of the field difference δv(x,t), δB(x,t) are Gaussian for large separation x or t, approaching an approximately exponential distribution for x, t→0. This behavior can be understood by a simple probabilistic argument. The probability distribution of the local energy dissipation rate ɛ is roughly consistent with a log-normal distribution at larger ɛ but shows a different behavior at small ɛ.
Sweeney, Elizabeth M; Shinohara, Russell T; Shiee, Navid; Mateen, Farrah J; Chudgar, Avni A; Cuzzocreo, Jennifer L; Calabresi, Peter A; Pham, Dzung L; Reich, Daniel S; Crainiceanu, Ciprian M
2013-01-01
Magnetic resonance imaging (MRI) can be used to detect lesions in the brains of multiple sclerosis (MS) patients and is essential for diagnosing the disease and monitoring its progression. In practice, lesion load is often quantified by either manual or semi-automated segmentation of MRI, which is time-consuming, costly, and associated with large inter- and intra-observer variability. We propose OASIS is Automated Statistical Inference for Segmentation (OASIS), an automated statistical method for segmenting MS lesions in MRI studies. We use logistic regression models incorporating multiple MRI modalities to estimate voxel-level probabilities of lesion presence. Intensity-normalized T1-weighted, T2-weighted, fluid-attenuated inversion recovery and proton density volumes from 131 MRI studies (98 MS subjects, 33 healthy subjects) with manual lesion segmentations were used to train and validate our model. Within this set, OASIS detected lesions with a partial area under the receiver operating characteristic curve for clinically relevant false positive rates of 1% and below of 0.59% (95% CI; [0.50%, 0.67%]) at the voxel level. An experienced MS neuroradiologist compared these segmentations to those produced by LesionTOADS, an image segmentation software that provides segmentation of both lesions and normal brain structures. For lesions, OASIS out-performed LesionTOADS in 74% (95% CI: [65%, 82%]) of cases for the 98 MS subjects. To further validate the method, we applied OASIS to 169 MRI studies acquired at a separate center. The neuroradiologist again compared the OASIS segmentations to those from LesionTOADS. For lesions, OASIS ranked higher than LesionTOADS in 77% (95% CI: [71%, 83%]) of cases. For a randomly selected subset of 50 of these studies, one additional radiologist and one neurologist also scored the images. Within this set, the neuroradiologist ranked OASIS higher than LesionTOADS in 76% (95% CI: [64%, 88%]) of cases, the neurologist 66% (95% CI: [52%, 78
Wallace, D L; Perlman, M D
1980-06-01
This report describes the research activities of the Department of Statistics, University of Chicago, during the period June 15, 1975 to July 30, 1979. Nine research projects are briefly described on the following subjects: statistical computing and approximation techniques in statistics; numerical computation of first passage distributions; probabilities of large deviations; combining independent tests of significance; small-sample efficiencies of tests and estimates; improved procedures for simultaneous estimation and testing of many correlations; statistical computing and improved regression methods; comparison of several populations; and unbiasedness in multivariate statistics. A description of the statistical consultation activities of the Department that are of interest to DOE, in particular, the scientific interactions between the Department and the scientists at Argonne National Laboratories, is given. A list of publications issued during the term of the contract is included.
Garcia-Retamero, Rocio; Hoffrage, Ulrich
2013-04-01
Doctors and patients have difficulty inferring the predictive value of a medical test from information about the prevalence of a disease and the sensitivity and false-positive rate of the test. Previous research has established that communicating such information in a format the human mind is adapted to-namely natural frequencies-as compared to probabilities, boosts accuracy of diagnostic inferences. In a study, we investigated to what extent these inferences can be improved-beyond the effect of natural frequencies-by providing visual aids. Participants were 81 doctors and 81 patients who made diagnostic inferences about three medical tests on the basis of information about prevalence of a disease, and the sensitivity and false-positive rate of the tests. Half of the participants received the information in natural frequencies, while the other half received the information in probabilities. Half of the participants only received numerical information, while the other half additionally received a visual aid representing the numerical information. In addition, participants completed a numeracy scale. Our study showed three important findings: (1) doctors and patients made more accurate inferences when information was communicated in natural frequencies as compared to probabilities; (2) visual aids boosted accuracy even when the information was provided in natural frequencies; and (3) doctors were more accurate in their diagnostic inferences than patients, though differences in accuracy disappeared when differences in numerical skills were controlled for. Our findings have important implications for medical practice as they suggest suitable ways to communicate quantitative medical data.
Dimensionality reduction and polynomial chaos acceleration of Bayesian inference in inverse problems
Marzouk, Youssef M. Najm, Habib N.
2009-04-01
We consider a Bayesian approach to nonlinear inverse problems in which the unknown quantity is a spatial or temporal field, endowed with a hierarchical Gaussian process prior. Computational challenges in this construction arise from the need for repeated evaluations of the forward model (e.g., in the context of Markov chain Monte Carlo) and are compounded by high dimensionality of the posterior. We address these challenges by introducing truncated Karhunen-Loeve expansions, based on the prior distribution, to efficiently parameterize the unknown field and to specify a stochastic forward problem whose solution captures that of the deterministic forward model over the support of the prior. We seek a solution of this problem using Galerkin projection on a polynomial chaos basis, and use the solution to construct a reduced-dimensionality surrogate posterior density that is inexpensive to evaluate. We demonstrate the formulation on a transient diffusion equation with prescribed source terms, inferring the spatially-varying diffusivity of the medium from limited and noisy data.
Statistical Inference in the Wright-Fisher Model Using Allele Frequency Data.
Tataru, Paula; Simonsen, Maria; Bataillon, Thomas; Hobolth, Asger
2016-08-02
The Wright-Fisher model provides an elegant mathematical framework for understanding allele frequency data. In particular, the model can be used to infer the demographic history of species and identify loci under selection. A crucial quantity for inference under the Wright-Fisher model is the distribution of allele frequencies (DAF). Despite the apparent simplicity of the model, the calculation of the DAF is challenging. We review and discuss strategies for approximating the DAF, and how these are used in methods that perform inference from allele frequency data. Various evolutionary forces can be incorporated in the Wright-Fisher model, and we consider these in turn. We begin our review with the basic bi-allelic Wright-Fisher model where random genetic drift is the only evolutionary force. We then consider mutation, migration, and selection. In particular, we compare diffusion-based and moment-based methods in terms of accuracy, computational efficiency, and analytical tractability. We conclude with a brief overview of the multi-allelic process with a general mutation model. [Allele frequency, diffusion, inference, moments, selection, Wright-Fisher.].
Moura, Lidia Mvr; Westover, M Brandon; Kwasnik, David; Cole, Andrew J; Hsu, John
2017-01-01
The elderly population faces an increasing number of cases of chronic neurological conditions, such as epilepsy and Alzheimer's disease. Because the elderly with epilepsy are commonly excluded from randomized controlled clinical trials, there are few rigorous studies to guide clinical practice. When the elderly are eligible for trials, they either rarely participate or frequently have poor adherence to therapy, thus limiting both generalizability and validity. In contrast, large observational data sets are increasingly available, but are susceptible to bias when using common analytic approaches. Recent developments in causal inference-analytic approaches also introduce the possibility of emulating randomized controlled trials to yield valid estimates. We provide a practical example of the application of the principles of causal inference to a large observational data set of patients with epilepsy. This review also provides a framework for comparative-effectiveness research in chronic neurological conditions.
Moura, Lidia MVR; Westover, M Brandon; Kwasnik, David; Cole, Andrew J; Hsu, John
2017-01-01
The elderly population faces an increasing number of cases of chronic neurological conditions, such as epilepsy and Alzheimer’s disease. Because the elderly with epilepsy are commonly excluded from randomized controlled clinical trials, there are few rigorous studies to guide clinical practice. When the elderly are eligible for trials, they either rarely participate or frequently have poor adherence to therapy, thus limiting both generalizability and validity. In contrast, large observational data sets are increasingly available, but are susceptible to bias when using common analytic approaches. Recent developments in causal inference-analytic approaches also introduce the possibility of emulating randomized controlled trials to yield valid estimates. We provide a practical example of the application of the principles of causal inference to a large observational data set of patients with epilepsy. This review also provides a framework for comparative-effectiveness research in chronic neurological conditions. PMID:28115873
Bayesian Statistical Inference in Ion-Channel Models with Exact Missed Event Correction.
Epstein, Michael; Calderhead, Ben; Girolami, Mark A; Sivilotti, Lucia G
2016-07-26
The stochastic behavior of single ion channels is most often described as an aggregated continuous-time Markov process with discrete states. For ligand-gated channels each state can represent a different conformation of the channel protein or a different number of bound ligands. Single-channel recordings show only whether the channel is open or shut: states of equal conductance are aggregated, so transitions between them have to be inferred indirectly. The requirement to filter noise from the raw signal further complicates the modeling process, as it limits the time resolution of the data. The consequence of the reduced bandwidth is that openings or shuttings that are shorter than the resolution cannot be observed; these are known as missed events. Postulated models fitted using filtered data must therefore explicitly account for missed events to avoid bias in the estimation of rate parameters and therefore assess parameter identifiability accurately. In this article, we present the first, to our knowledge, Bayesian modeling of ion-channels with exact missed events correction. Bayesian analysis represents uncertain knowledge of the true value of model parameters by considering these parameters as random variables. This allows us to gain a full appreciation of parameter identifiability and uncertainty when estimating values for model parameters. However, Bayesian inference is particularly challenging in this context as the correction for missed events increases the computational complexity of the model likelihood. Nonetheless, we successfully implemented a two-step Markov chain Monte Carlo method that we called "BICME", which performs Bayesian inference in models of realistic complexity. The method is demonstrated on synthetic and real single-channel data from muscle nicotinic acetylcholine channels. We show that parameter uncertainty can be characterized more accurately than with maximum-likelihood methods. Our code for performing inference in these ion channel
2011-04-30
Commander, Naval Sea Systems Command • Army Contracting Command, U.S. Army Materiel Command • Program Manager, Airborne, Maritime and Fixed Station...are in the area of the Design and Acquisition of Military Assets. Specific domains of interests include the concept of value and its integration...inference may point to areas where the test may be modified or additional control measures may be introduced to increase the likelihood of obtaining
2012-10-24
time series similarity measures for classification and change detection of ecosystem dynamics . Remote...for estimating species-richness, and introduce a method based on statistical wavelet multiresolution texture analysis to quantitatively assess...entropy for estimating species-richness, and introduce a method based on statistical wavelet multiresolution texture analysis to quantitatively
From a Logical Point of View: An Illuminating Perspective in Teaching Statistical Inference
ERIC Educational Resources Information Center
Sowey, Eric R
2005-01-01
Offering perspectives in the teaching of statistics assists students, immersed in the study of detail, to see the leading principles of the subject more clearly. Especially helpful can be a perspective on the logic of statistical inductive reasoning. Such a perspective can bring to prominence a broad principle on which both interval estimation and…
Using Action Research to Develop a Course in Statistical Inference for Workplace-Based Adults
ERIC Educational Resources Information Center
Forbes, Sharleen
2014-01-01
Many adults who need an understanding of statistical concepts have limited mathematical skills. They need a teaching approach that includes as little mathematical context as possible. Iterative participatory qualitative research (action research) was used to develop a statistical literacy course for adult learners informed by teaching in…
Salehi, Sohrab; Steif, Adi; Roth, Andrew; Aparicio, Samuel; Bouchard-Côté, Alexandre; Shah, Sohrab P
2017-03-01
Next-generation sequencing (NGS) of bulk tumour tissue can identify constituent cell populations in cancers and measure their abundance. This requires computational deconvolution of allelic counts from somatic mutations, which may be incapable of fully resolving the underlying population structure. Single cell sequencing (SCS) is a more direct method, although its replacement of NGS is impeded by technical noise and sampling limitations. We propose ddClone, which analytically integrates NGS and SCS data, leveraging their complementary attributes through joint statistical inference. We show on real and simulated datasets that ddClone produces more accurate results than can be achieved by either method alone.
Nonequilibrium statistical mechanics in one-dimensional bose gases
NASA Astrophysics Data System (ADS)
Baldovin, F.; Cappellaro, A.; Orlandini, E.; Salasnich, L.
2016-06-01
We study cold dilute gases made of bosonic atoms, showing that in the mean-field one-dimensional regime they support stable out-of-equilibrium states. Starting from the 3D Boltzmann-Vlasov equation with contact interaction, we derive an effective 1D Landau-Vlasov equation under the condition of a strong transverse harmonic confinement. We investigate the existence of out-of-equilibrium states, obtaining stability criteria similar to those of classical plasmas.
Multiple processes in two-dimensional visual statistical learning
Hoshino, Eiichi; Mogi, Ken
2017-01-01
Knowledge about the arrangement of visual elements is an important aspect of perception. This study investigates whether humans learn rules of two-dimensional abstract patterns (exemplars) generated from Reber's artificial grammar. The key question is whether the subjects can implicitly learn them without explicit instructions, and, if so, how they use the acquired knowledge to judge new patterns (probes) in relation to their finite experience of the exemplars. The analysis was conducted using dissimilarities among patterns, which are defined with n-gram probabilities and the Levenshtein distance. The results show that subjects are able to learn rules of two-dimensional visual patterns (exemplars) and make categorical judgment of probes based on knowledge of exemplar-based representation. Our analysis revealed that subjects' judgments of probes were related to the degree of dissimilarities between the probes and exemplars. The result suggests the coexistence of configural and element-based processing in exemplar-based representations. Exemplar-based representation was preferred to prototypical representation through tasks requiring discrimination, recognition and working memory. Relations of the studied judgment processes to the neural basis are discussed. We conclude that knowledge of a finite experience of two-dimensional visual patterns would be crystalized in different levels of relations among visual elements. PMID:28212388
Reconstructing a B-cell clonal lineage. I. Statistical inference of unobserved ancestors.
Kepler, Thomas B
2013-01-01
One of the key phenomena in the adaptive immune response to infection and immunization is affinity maturation, during which antibody genes are mutated and selected, typically resulting in a substantial increase in binding affinity to the eliciting antigen. Advances in technology on several fronts have made it possible to clone large numbers of heavy-chain light-chain pairs from individual B cells and thereby identify whole sets of clonally related antibodies. These collections could provide the information necessary to reconstruct their own history - the sequence of changes introduced into the lineage during the development of the clone - and to study affinity maturation in detail. But the success of such a program depends entirely on accurately inferring the founding ancestor and the other unobserved intermediates. Given a set of clonally related immunoglobulin V-region genes, the method described here allows one to compute the posterior distribution over their possible ancestors, thereby giving a thorough accounting of the uncertainty inherent in the reconstruction. I demonstrate the application of this method on heavy-chain and light-chain clones, assess the reliability of the inference, and discuss the sources of uncertainty.
Sandoval-Castellanos, Edson; Palkopoulou, Eleftheria; Dalén, Love
2014-01-01
Inference of population demographic history has vastly improved in recent years due to a number of technological and theoretical advances including the use of ancient DNA. Approximate Bayesian computation (ABC) stands among the most promising methods due to its simple theoretical fundament and exceptional flexibility. However, limited availability of user-friendly programs that perform ABC analysis renders it difficult to implement, and hence programming skills are frequently required. In addition, there is limited availability of programs able to deal with heterochronous data. Here we present the software BaySICS: Bayesian Statistical Inference of Coalescent Simulations. BaySICS provides an integrated and user-friendly platform that performs ABC analyses by means of coalescent simulations from DNA sequence data. It estimates historical demographic population parameters and performs hypothesis testing by means of Bayes factors obtained from model comparisons. Although providing specific features that improve inference from datasets with heterochronous data, BaySICS also has several capabilities making it a suitable tool for analysing contemporary genetic datasets. Those capabilities include joint analysis of independent tables, a graphical interface and the implementation of Markov-chain Monte Carlo without likelihoods.
NASA Astrophysics Data System (ADS)
Al-Yousef, Ali Abdallah
Reservoir characterization is one of the most important factors in successful reservoir management. In water injection projects, a knowledge of reservoir heterogeneities and discontinuities is particularly important to maximize oil recovery. This research project presents a new technique to quantify communication between injection and production wells in a reservoir based on temporal fluctuations in rates. The technique combines a nonlinear signal processing model and multiple linear regression (MLR) to provide information about permeability trends and the presence of flow barriers. The method was tested in synthetic fields using rates generated by a numerical simulator and then applied to producing fields in Argentina, the North Sea, Texas, and Wyoming. Results indicate that the model coefficients (weights) between wells are consistent with the known geology and relative location between wells; they are independent of injection/production rates. The developed procedure provides parameters (time constants) that explicitly indicate the attenuation and time lag between injector and producer pairs. The new procedure allows for a better insight into the well-to-well connectivities for the fields than MLR. Complex geological conditions are often not easily identified using the weights and time constants values individually. However, combining both sets of parameters in certain representations enhances the inference about the geological features. The applications of the new representations to numerically simulated fields and then to real fields indicate that these representations are capable of identifying whether the connectivity of an injector-producer well pair is through fractures, a high-permeability layer, or through partially completed wells. The technique may produce negative weights for some well pairs. Because there is no physical explanation in waterfloods for negative weights, these are also investigated. The negative weights have at least three causes
A statistical formulation of one-dimensional electron fluid turbulence
NASA Technical Reports Server (NTRS)
Fyfe, D.; Montgomery, D.
1977-01-01
A one-dimensional electron fluid model is investigated using the mathematical methods of modern fluid turbulence theory. Non-dissipative equilibrium canonical distributions are determined in a phase space whose co-ordinates are the real and imaginary parts of the Fourier coefficients for the field variables. Spectral densities are calculated, yielding a wavenumber electric field energy spectrum proportional to k to the negative second power for large wavenumbers. The equations of motion are numerically integrated and the resulting spectra are found to compare well with the theoretical predictions.
Inferring the hosts of coronavirus using dual statistical models based on nucleotide composition.
Tang, Qin; Song, Yulong; Shi, Mijuan; Cheng, Yingyin; Zhang, Wanting; Xia, Xiao-Qin
2015-11-26
Many coronaviruses are capable of interspecies transmission. Some of them have caused worldwide panic as emerging human pathogens in recent years, e.g., severe acute respiratory syndrome coronavirus (SARS-CoV) and Middle East respiratory syndrome coronavirus (MERS-CoV). In order to assess their threat to humans, we explored to infer the potential hosts of coronaviruses using a dual-model approach based on nineteen parameters computed from spike genes of coronaviruses. Both the support vector machine (SVM) model and the Mahalanobis distance (MD) discriminant model achieved high accuracies in leave-one-out cross-validation of training data consisting of 730 representative coronaviruses (99.86% and 98.08% respectively). Predictions on 47 additional coronaviruses precisely conformed to conclusions or speculations by other researchers. Our approach is implemented as a web server that can be accessed at http://bioinfo.ihb.ac.cn/seq2hosts.
Statistical inference of selection and divergence of rice blast resistance gene Pi-ta
Technology Transfer Automated Retrieval System (TEKTRAN)
The resistance gene Pi-ta has been effectively used to control rice blast disease worldwide. A few recent studies have described the possible evolution of Pi-ta in cultivated and weedy rice. However, evolutionary statistics used for the studies are too limited to precisely understand selection and d...
Recent Developments in Statistical Inference: Quasi-Experiments and Perquimans County.
ERIC Educational Resources Information Center
Cox, Gary W.
1988-01-01
Critiques "The Statistical Analysis of Quasi-Experiments" by Achen and examines its relevance for historians. Discusses the problems that arise when quasi-experiments involving nonrandom assignment or nonrandom selection are analyzed as if they were true experiments. Concludes that Achen's book will help historians recognize these…
Jacobs, Kevin B; Yeager, Meredith; Wacholder, Sholom; Craig, David; Kraft, Peter; Hunter, David J; Paschal, Justin; Manolio, Teri A; Tucker, Margaret; Hoover, Robert N; Thomas, Gilles D; Chanock, Stephen J; Chatterjee, Nilanjan
2009-11-01
Aggregate results from genome-wide association studies (GWAS), such as genotype frequencies for cases and controls, were until recently often made available on public websites because they were thought to disclose negligible information concerning an individual's participation in a study. Homer et al. recently suggested that a method for forensic detection of an individual's contribution to an admixed DNA sample could be applied to aggregate GWAS data. Using a likelihood-based statistical framework, we developed an improved statistic that uses genotype frequencies and individual genotypes to infer whether a specific individual or any close relatives participated in the GWAS and, if so, what the participant's phenotype status is. Our statistic compares the logarithm of genotype frequencies, in contrast to that of Homer et al., which is based on differences in either SNP probe intensity or allele frequencies. We derive the theoretical power of our test statistics and explore the empirical performance in scenarios with varying numbers of randomly chosen or top-associated SNPs.
Statistical inference for the additive hazards model under outcome-dependent sampling.
Yu, Jichang; Liu, Yanyan; Sandler, Dale P; Zhou, Haibo
2015-09-01
Cost-effective study design and proper inference procedures for data from such designs are always of particular interests to study investigators. In this article, we propose a biased sampling scheme, an outcome-dependent sampling (ODS) design for survival data with right censoring under the additive hazards model. We develop a weighted pseudo-score estimator for the regression parameters for the proposed design and derive the asymptotic properties of the proposed estimator. We also provide some suggestions for using the proposed method by evaluating the relative efficiency of the proposed method against simple random sampling design and derive the optimal allocation of the subsamples for the proposed design. Simulation studies show that the proposed ODS design is more powerful than other existing designs and the proposed estimator is more efficient than other estimators. We apply our method to analyze a cancer study conducted at NIEHS, the Cancer Incidence and Mortality of Uranium Miners Study, to study the risk of radon exposure to cancer.
Fu, Ji-Meng; Winchester, J.W. )
1994-03-01
Nitrogen in fresh waters of three rivers in northern Florida-the Apalachicola-Chattahoochee-Flint (ACF) River system, Ochlockonee (Och), and Sopchoppy (Sop)- is inferred to be derived mostly from atmospheric deposition. Because the N:P mole ratios in the rivers are nearly three times higher than the Redfield ratio for aquatic photosynthesis, N is saturate in the ecosystems, not a limiting nutrient, although it may be chemically transformed. Absolute principal component analysis (APCA), a receptor model, was applied to many years of monitoring data for Apalachicola River water and rainfall over its basin in order to better understand aquatic chemistry of nitrogen in the watershed. The APCA model aged rain and groundwater. In the fresh rain component, the ratio of atmospheric nitrate to sulfate is close to that in rainwater, as if some samples had been collected following very recent rainfall. The aged rain component of the river water is distinguished by a low NO[sup [minus][sub 3
Statistical inference for the additive hazards model under outcome-dependent sampling
Yu, Jichang; Liu, Yanyan; Sandler, Dale P.; Zhou, Haibo
2015-01-01
Cost-effective study design and proper inference procedures for data from such designs are always of particular interests to study investigators. In this article, we propose a biased sampling scheme, an outcome-dependent sampling (ODS) design for survival data with right censoring under the additive hazards model. We develop a weighted pseudo-score estimator for the regression parameters for the proposed design and derive the asymptotic properties of the proposed estimator. We also provide some suggestions for using the proposed method by evaluating the relative efficiency of the proposed method against simple random sampling design and derive the optimal allocation of the subsamples for the proposed design. Simulation studies show that the proposed ODS design is more powerful than other existing designs and the proposed estimator is more efficient than other estimators. We apply our method to analyze a cancer study conducted at NIEHS, the Cancer Incidence and Mortality of Uranium Miners Study, to study the risk of radon exposure to cancer. PMID:26379363
NASA Technical Reports Server (NTRS)
Abbey, Craig K.; Eckstein, Miguel P.
2002-01-01
We consider estimation and statistical hypothesis testing on classification images obtained from the two-alternative forced-choice experimental paradigm. We begin with a probabilistic model of task performance for simple forced-choice detection and discrimination tasks. Particular attention is paid to general linear filter models because these models lead to a direct interpretation of the classification image as an estimate of the filter weights. We then describe an estimation procedure for obtaining classification images from observer data. A number of statistical tests are presented for testing various hypotheses from classification images based on some more compact set of features derived from them. As an example of how the methods we describe can be used, we present a case study investigating detection of a Gaussian bump profile.
Statistical inference for classification of RRIM clone series using near IR reflectance properties
NASA Astrophysics Data System (ADS)
Ismail, Faridatul Aima; Madzhi, Nina Korlina; Hashim, Hadzli; Abdullah, Noor Ezan; Khairuzzaman, Noor Aishah; Azmi, Azrie Faris Mohd; Sampian, Ahmad Faiz Mohd; Harun, Muhammad Hafiz
2015-08-01
RRIM clone is a rubber breeding series produced by RRIM (Rubber Research Institute of Malaysia) through "rubber breeding program" to improve latex yield and producing clones attractive to farmers. The objective of this work is to analyse measurement of optical sensing device on latex of selected clone series. The device using transmitting NIR properties and its reflectance is converted in terms of voltage. The obtained reflectance index value via voltage was analyzed using statistical technique in order to find out the discrimination among the clones. From the statistical results using error plots and one-way ANOVA test, there is an overwhelming evidence showing discrimination of RRIM 2002, RRIM 2007 and RRIM 3001 clone series with p value = 0.000. RRIM 2008 cannot be discriminated with RRIM 2014; however both of these groups are distinct from the other clones.
Exploring the Connection Between Sampling Problems in Bayesian Inference and Statistical Mechanics
NASA Technical Reports Server (NTRS)
Pohorille, Andrew
2006-01-01
The Bayesian and statistical mechanical communities often share the same objective in their work - estimating and integrating probability distribution functions (pdfs) describing stochastic systems, models or processes. Frequently, these pdfs are complex functions of random variables exhibiting multiple, well separated local minima. Conventional strategies for sampling such pdfs are inefficient, sometimes leading to an apparent non-ergodic behavior. Several recently developed techniques for handling this problem have been successfully applied in statistical mechanics. In the multicanonical and Wang-Landau Monte Carlo (MC) methods, the correct pdfs are recovered from uniform sampling of the parameter space by iteratively establishing proper weighting factors connecting these distributions. Trivial generalizations allow for sampling from any chosen pdf. The closely related transition matrix method relies on estimating transition probabilities between different states. All these methods proved to generate estimates of pdfs with high statistical accuracy. In another MC technique, parallel tempering, several random walks, each corresponding to a different value of a parameter (e.g. "temperature"), are generated and occasionally exchanged using the Metropolis criterion. This method can be considered as a statistically correct version of simulated annealing. An alternative approach is to represent the set of independent variables as a Hamiltonian system. Considerab!e progress has been made in understanding how to ensure that the system obeys the equipartition theorem or, equivalently, that coupling between the variables is correctly described. Then a host of techniques developed for dynamical systems can be used. Among them, probably the most powerful is the Adaptive Biasing Force method, in which thermodynamic integration and biased sampling are combined to yield very efficient estimates of pdfs. The third class of methods deals with transitions between states described
Statistical Inference on Memory Structure of Processes and Its Applications to Information Theory
2016-05-12
ES) U.S. Army Research Office P.O. Box 12211 Research Triangle Park, NC 27709-2211 mathematical statistics; time series; Markov chains; random...scholarships or fellowships for further studies in science, mathematics , engineering or technology fields: Student Metrics This section only applies to...science, mathematics , engineering, or technology fields: The number of undergraduates funded by your agreement who graduated during this period and
Inferring earthquake statistics from soft-glass dynamics below yield stress
NASA Astrophysics Data System (ADS)
Kumar, Pinaki; Toschi, Federico; Benzi, Roberto; Trampert, Jeannot
2016-11-01
The current practice to generate synthetic earthquake catalogs employs purely statistical models, mechanical methods based on ad-hoc constitutive friction laws or a combination of the above. We adopt a new numerical approach based on the multi-component Lattice Boltzmann method to simulate yield stress materials. Below yield stress, under shear forcing, we find that the highly intermittent in time, irreversible T1 topological changes in the soft-glass (termed plastic events) bear a statistical resemblance to seismic events, radiating elastic perturbations through the system. Statistical analysis reveals scaling laws for magnitude similar to the Gutenberg-Richter law for quakes, a recurrence time scale with similar slope, a well-defined clustering of events into causal-aftershock sequences and Poisson events leading to the Omori law. Additionally space intermittency reveals a complex multi-fractal structure, like real quakes, and a characterization of the stick-slip behavior in terms of avalanche size and time distribution agrees with the de-pinning transition. The model system once properly tuned using real earthquake data, may help highlighting the origin of scaling in phenomenological seismic power laws. This research was partly funded by the Shell-NWO/FOM programme "Computational sciences for energy research" under Project Number 14CSER022.
Li, Changyang; Wang, Xiuying; Eberl, Stefan; Fulham, Michael; Yin, Yong; Dagan Feng, David
2015-01-01
Automated and general medical image segmentation can be challenging because the foreground and the background may have complicated and overlapping density distributions in medical imaging. Conventional region-based level set algorithms often assume piecewise constant or piecewise smooth for segments, which are implausible for general medical image segmentation. Furthermore, low contrast and noise make identification of the boundaries between foreground and background difficult for edge-based level set algorithms. Thus, to address these problems, we suggest a supervised variational level set segmentation model to harness the statistical region energy functional with a weighted probability approximation. Our approach models the region density distributions by using the mixture-of-mixtures Gaussian model to better approximate real intensity distributions and distinguish statistical intensity differences between foreground and background. The region-based statistical model in our algorithm can intuitively provide better performance on noisy images. We constructed a weighted probability map on graphs to incorporate spatial indications from user input with a contextual constraint based on the minimization of contextual graphs energy functional. We measured the performance of our approach on ten noisy synthetic images and 58 medical datasets with heterogeneous intensities and ill-defined boundaries and compared our technique to the Chan-Vese region-based level set model, the geodesic active contour model with distance regularization, and the random walker model. Our method consistently achieved the highest Dice similarity coefficient when compared to the other methods.
Fragmentation and exfoliation of 2-dimensional materials: a statistical approach
NASA Astrophysics Data System (ADS)
Kouroupis-Agalou, Konstantinos; Liscio, Andrea; Treossi, Emanuele; Ortolani, Luca; Morandi, Vittorio; Pugno, Nicola Maria; Palermo, Vincenzo
2014-05-01
The main advantage for applications of graphene and related 2D materials is that they can be produced on large scales by liquid phase exfoliation. The exfoliation process shall be considered as a particular fragmentation process, where the 2D character of the exfoliated objects will influence significantly fragmentation dynamics as compared to standard materials. Here, we used automatized image processing of Atomic Force Microscopy (AFM) data to measure, one by one, the exact shape and size of thousands of nanosheets obtained by exfoliation of an important 2D-material, boron nitride, and used different statistical functions to model the asymmetric distribution of nanosheet sizes typically obtained. Being the resolution of AFM much larger than the average sheet size, analysis could be performed directly at the nanoscale and at the single sheet level. We find that the size distribution of the sheets at a given time follows a log-normal distribution, indicating that the exfoliation process has a ``typical'' scale length that changes with time and that exfoliation proceeds through the formation of a distribution of random cracks that follow Poisson statistics. The validity of this model implies that the size distribution does not depend on the different preparation methods used, but is a common feature in the exfoliation of this material and thus probably for other 2D materials.The main advantage for applications of graphene and related 2D materials is that they can be produced on large scales by liquid phase exfoliation. The exfoliation process shall be considered as a particular fragmentation process, where the 2D character of the exfoliated objects will influence significantly fragmentation dynamics as compared to standard materials. Here, we used automatized image processing of Atomic Force Microscopy (AFM) data to measure, one by one, the exact shape and size of thousands of nanosheets obtained by exfoliation of an important 2D-material, boron nitride, and used
Soap film flows: Statistics of two-dimensional turbulence
Vorobieff, P.; Rivera, M.; Ecke, R.E.
1999-08-01
Soap film flows provide a very convenient laboratory model for studies of two-dimensional (2-D) hydrodynamics including turbulence. For a gravity-driven soap film channel with a grid of equally spaced cylinders inserted in the flow, we have measured the simultaneous velocity and thickness fields in the irregular flow downstream from the cylinders. The velocity field is determined by a modified digital particle image velocimetry method and the thickness from the light scattered by the particles in the film. From these measurements, we compute the decay of mean energy, enstrophy, and thickness fluctuations with downstream distance, and the structure functions of velocity, vorticity, thickness fluctuation, and vorticity flux. From these quantities we determine the microscale Reynolds number of the flow R{sub {lambda}}{approx}100 and the integral and dissipation scales of 2D turbulence. We also obtain quantitative measures of the degree to which our flow can be considered incompressible and isotropic as a function of downstream distance. We find coarsening of characteristic spatial scales, qualitative correspondence of the decay of energy and enstrophy with the Batchelor model, scaling of energy in {ital k} space consistent with the k{sup {minus}3} spectrum of the Kraichnan{endash}Batchelor enstrophy-scaling picture, and power-law scalings of the structure functions of velocity, vorticity, vorticity flux, and thickness. These results are compared with models of 2-D turbulence and with numerical simulations. {copyright} {ital 1999 American Institute of Physics.}
Statistical inference on censored data for targeted clinical trials under enrichment design.
Chen, Chen-Fang; Lin, Jr-Rung; Liu, Jen-Pei
2013-01-01
For the traditional clinical trials, inclusion and exclusion criteria are usually based on some clinical endpoints; the genetic or genomic variability of the trial participants are not totally utilized in the criteria. After completion of the human genome project, the disease targets at the molecular level can be identified and can be utilized for the treatment of diseases. However, the accuracy of diagnostic devices for identification of such molecular targets is usually not perfect. Some of the patients enrolled in targeted clinical trials with a positive result for the molecular target might not have the specific molecular targets. As a result, the treatment effect may be underestimated in the patient population truly with the molecular target. To resolve this issue, under the exponential distribution, we develop inferential procedures for the treatment effects of the targeted drug based on the censored endpoints in the patients truly with the molecular targets. Under an enrichment design, we propose using the expectation-maximization algorithm in conjunction with the bootstrap technique to incorporate the inaccuracy of the diagnostic device for detection of the molecular targets on the inference of the treatment effects. A simulation study was conducted to empirically investigate the performance of the proposed methods. Simulation results demonstrate that under the exponential distribution, the proposed estimator is nearly unbiased with adequate precision, and the confidence interval can provide adequate coverage probability. In addition, the proposed testing procedure can adequately control the size with sufficient power. On the other hand, when the proportional hazard assumption is violated, additional simulation studies show that the type I error rate is not controlled at the nominal level and is an increasing function of the positive predictive value. A numerical example illustrates the proposed procedures.
NASA Astrophysics Data System (ADS)
Calderon, Christopher P.; Weiss, Lucien E.; Moerner, W. E.
2014-05-01
Experimental advances have improved the two- (2D) and three-dimensional (3D) spatial resolution that can be extracted from in vivo single-molecule measurements. This enables researchers to quantitatively infer the magnitude and directionality of forces experienced by biomolecules in their native environment. Situations where such force information is relevant range from mitosis to directed transport of protein cargo along cytoskeletal structures. Models commonly applied to quantify single-molecule dynamics assume that effective forces and velocity in the x ,y (or x ,y,z) directions are statistically independent, but this assumption is physically unrealistic in many situations. We present a hypothesis testing approach capable of determining if there is evidence of statistical dependence between positional coordinates in experimentally measured trajectories; if the hypothesis of independence between spatial coordinates is rejected, then a new model accounting for 2D (3D) interactions can and should be considered. Our hypothesis testing technique is robust, meaning it can detect interactions, even if the noise statistics are not well captured by the model. The approach is demonstrated on control simulations and on experimental data (directed transport of intraflagellar transport protein 88 homolog in the primary cilium).
Anderson, Eric C
2012-11-08
Advances in genotyping that allow tens of thousands of individuals to be genotyped at a moderate number of single nucleotide polymorphisms (SNPs) permit parentage inference to be pursued on a very large scale. The intergenerational tagging this capacity allows is revolutionizing the management of cultured organisms (cows, salmon, etc.) and is poised to do the same for scientific studies of natural populations. Currently, however, there are no likelihood-based methods of parentage inference which are implemented in a manner that allows them to quickly handle a very large number of potential parents or parent pairs. Here we introduce an efficient likelihood-based method applicable to the specialized case of cultured organisms in which both parents can be reliably sampled. We develop a Markov chain representation for the cumulative number of Mendelian incompatibilities between an offspring and its putative parents and we exploit it to develop a fast algorithm for simulation-based estimates of statistical confidence in SNP-based assignments of offspring to pairs of parents. The method is implemented in the freely available software SNPPIT. We describe the method in detail, then assess its performance in a large simulation study using known allele frequencies at 96 SNPs from ten hatchery salmon populations. The simulations verify that the method is fast and accurate and that 96 well-chosen SNPs can provide sufficient power to identify the correct pair of parents from amongst millions of candidate pairs.
NASA Astrophysics Data System (ADS)
Doss, F. W.; Drake, R. P.; Kuranz, C. C.
2011-11-01
A laser-driven experiment produces images of dense shocked material by x-ray transmission. The post-shock material is sufficiently dense that no significant signal passes through the dense layer, and therefore the shock compression cannot be directly measured by comparing transmitted intensities. One could try to determine the shock compression ratio by observing the ratio of the total distance travelled by the shock to the dense post-shock layer width, but small deviations of the angle of the shock with respect to the angle of imaging create large asymmetric errors in observation. A statistical approach to recovering shock compression by appropriately combining data from several experiments is developed, using fits to a simple model for the shock and shock tube geometry.
Statistical Inference in Hidden Markov Models Using k-Segment Constraints.
Titsias, Michalis K; Holmes, Christopher C; Yau, Christopher
2016-01-02
Hidden Markov models (HMMs) are one of the most widely used statistical methods for analyzing sequence data. However, the reporting of output from HMMs has largely been restricted to the presentation of the most-probable (MAP) hidden state sequence, found via the Viterbi algorithm, or the sequence of most probable marginals using the forward-backward algorithm. In this article, we expand the amount of information we could obtain from the posterior distribution of an HMM by introducing linear-time dynamic programming recursions that, conditional on a user-specified constraint in the number of segments, allow us to (i) find MAP sequences, (ii) compute posterior probabilities, and (iii) simulate sample paths. We collectively call these recursions k-segment algorithms and illustrate their utility using simulated and real examples. We also highlight the prospective and retrospective use of k-segment constraints for fitting HMMs or exploring existing model fits. Supplementary materials for this article are available online.
Statistical Inference in Hidden Markov Models Using k-Segment Constraints
Titsias, Michalis K.; Holmes, Christopher C.; Yau, Christopher
2016-01-01
Hidden Markov models (HMMs) are one of the most widely used statistical methods for analyzing sequence data. However, the reporting of output from HMMs has largely been restricted to the presentation of the most-probable (MAP) hidden state sequence, found via the Viterbi algorithm, or the sequence of most probable marginals using the forward–backward algorithm. In this article, we expand the amount of information we could obtain from the posterior distribution of an HMM by introducing linear-time dynamic programming recursions that, conditional on a user-specified constraint in the number of segments, allow us to (i) find MAP sequences, (ii) compute posterior probabilities, and (iii) simulate sample paths. We collectively call these recursions k-segment algorithms and illustrate their utility using simulated and real examples. We also highlight the prospective and retrospective use of k-segment constraints for fitting HMMs or exploring existing model fits. Supplementary materials for this article are available online. PMID:27226674
Statistical inference methods for two crossing survival curves: a comparison of methods.
Li, Huimin; Han, Dong; Hou, Yawen; Chen, Huilin; Chen, Zheng
2015-01-01
A common problem that is encountered in medical applications is the overall homogeneity of survival distributions when two survival curves cross each other. A survey demonstrated that under this condition, which was an obvious violation of the assumption of proportional hazard rates, the log-rank test was still used in 70% of studies. Several statistical methods have been proposed to solve this problem. However, in many applications, it is difficult to specify the types of survival differences and choose an appropriate method prior to analysis. Thus, we conducted an extensive series of Monte Carlo simulations to investigate the power and type I error rate of these procedures under various patterns of crossing survival curves with different censoring rates and distribution parameters. Our objective was to evaluate the strengths and weaknesses of tests in different situations and for various censoring rates and to recommend an appropriate test that will not fail for a wide range of applications. Simulation studies demonstrated that adaptive Neyman's smooth tests and the two-stage procedure offer higher power and greater stability than other methods when the survival distributions cross at early, middle or late times. Even for proportional hazards, both methods maintain acceptable power compared with the log-rank test. In terms of the type I error rate, Renyi and Cramér-von Mises tests are relatively conservative, whereas the statistics of the Lin-Xu test exhibit apparent inflation as the censoring rate increases. Other tests produce results close to the nominal 0.05 level. In conclusion, adaptive Neyman's smooth tests and the two-stage procedure are found to be the most stable and feasible approaches for a variety of situations and censoring rates. Therefore, they are applicable to a wider spectrum of alternatives compared with other tests.
Kimura, S; Araki, D; Matsumura, K; Okada-Hatakeyama, M
2012-02-01
Voit and Almeida have proposed the decoupling approach as a method for inferring the S-system models of genetic networks. The decoupling approach defines the inference of a genetic network as a problem requiring the solutions of sets of algebraic equations. The computation can be accomplished in a very short time, as the approach estimates S-system parameters without solving any of the differential equations. Yet the defined algebraic equations are non-linear, which sometimes prevents us from finding reasonable S-system parameters. In this study, we propose a new technique to overcome this drawback of the decoupling approach. This technique transforms the problem of solving each set of algebraic equations into a one-dimensional function optimization problem. The computation can still be accomplished in a relatively short time, as the problem is transformed by solving a linear programming problem. We confirm the effectiveness of the proposed approach through numerical experiments.
NASA Astrophysics Data System (ADS)
Kononova, Olga; Jones, Lee; Barsegov, V.
2013-09-01
Cooperativity is a hallmark of proteins, many of which show a modular architecture comprising discrete structural domains. Detecting and describing dynamic couplings between structural regions is difficult in view of the many-body nature of protein-protein interactions. By utilizing the GPU-based computational acceleration, we carried out simulations of the protein forced unfolding for the dimer WW - WW of the all-β-sheet WW domains used as a model multidomain protein. We found that while the physically non-interacting identical protein domains (WW) show nearly symmetric mechanical properties at low tension, reflected, e.g., in the similarity of their distributions of unfolding times, these properties become distinctly different when tension is increased. Moreover, the uncorrelated unfolding transitions at a low pulling force become increasingly more correlated (dependent) at higher forces. Hence, the applied force not only breaks "the mechanical symmetry" but also couples the physically non-interacting protein domains forming a multi-domain protein. We call this effect "the topological coupling." We developed a new theory, inspired by order statistics, to characterize protein-protein interactions in multi-domain proteins. The method utilizes the squared-Gaussian model, but it can also be used in conjunction with other parametric models for the distribution of unfolding times. The formalism can be taken to the single-molecule experimental lab to probe mechanical cooperativity and domain communication in multi-domain proteins.
Menon, Ravishankar; Gerstoft, Peter; Hodgkiss, William S
2012-11-01
Cross-correlations of diffuse noise fields can be used to extract environmental information. The influence of directional sources (usually ships) often results in a bias of the travel time estimates obtained from the cross-correlations. Using an array of sensors, insights from random matrix theory on the behavior of the eigenvalues of the sample covariance matrix (SCM) in an isotropic noise field are used to isolate the diffuse noise component from the directional sources. A sequential hypothesis testing of the eigenvalues of the SCM reveals eigenvalues dominated by loud sources that are statistical outliers for the assumed diffuse noise model. Travel times obtained from cross-correlations using only the diffuse noise component (i.e., by discarding or attenuating the outliers) converge to the predicted travel times based on the known array sensor spacing and measured sound speed at the site and are stable temporally (i.e., unbiased estimates). Data from the Shallow Water 2006 experiment demonstrates the effectiveness of this approach and that the signal-to-noise ratio builds up as the square root of time, as predicted by theory.
Statistical inference from multiple iTRAQ experiments without using common reference standards.
Herbrich, Shelley M; Cole, Robert N; West, Keith P; Schulze, Kerry; Yager, James D; Groopman, John D; Christian, Parul; Wu, Lee; O'Meally, Robert N; May, Damon H; McIntosh, Martin W; Ruczinski, Ingo
2013-02-01
Isobaric tags for relative and absolute quantitation (iTRAQ) is a prominent mass spectrometry technology for protein identification and quantification that is capable of analyzing multiple samples in a single experiment. Frequently, iTRAQ experiments are carried out using an aliquot from a pool of all samples, or "masterpool", in one of the channels as a reference sample standard to estimate protein relative abundances in the biological samples and to combine abundance estimates from multiple experiments. In this manuscript, we show that using a masterpool is counterproductive. We obtain more precise estimates of protein relative abundance by using the available biological data instead of the masterpool and do not need to occupy a channel that could otherwise be used for another biological sample. In addition, we introduce a simple statistical method to associate proteomic data from multiple iTRAQ experiments with a numeric response and show that this approach is more powerful than the conventionally employed masterpool-based approach. We illustrate our methods using data from four replicate iTRAQ experiments on aliquots of the same pool of plasma samples and from a 406-sample project designed to identify plasma proteins that covary with nutrient concentrations in chronically undernourished children from South Asia.
Statistical inference methods for recurrent event processes with shape and size parameters
WANG, MEI-CHENG; HUANG, CHIUNG-YU
2015-01-01
Summary This paper proposes a unified framework to characterize the rate function of a recurrent event process through shape and size parameters. In contrast to the intensity function, which is the event occurrence rate conditional on the event history, the rate function is the occurrence rate unconditional on the event history, and thus it can be interpreted as a population-averaged count of events in unit time. In this paper, shape and size parameters are introduced and used to characterize the association between the rate function λ(·) and a random variable X. Measures of association between X and λ(·) are defined via shape- and size-based coefficients. Rate-independence of X and λ(·) is studied through tests of shape-independence and size-independence, where the shape-and size-based test statistics can be used separately or in combination. These tests can be applied when X is a covariable possibly correlated with the recurrent event process through λ(·) or, in the one-sample setting, when X is the censoring time at which the observation of N(·) is terminated. The proposed tests are shape- and size-based, so when a null hypothesis is rejected, the test results can serve to distinguish the source of violation. PMID:26412863
Constrained statistical inference: sample-size tables for ANOVA and regression.
Vanbrabant, Leonard; Van De Schoot, Rens; Rosseel, Yves
2014-01-01
Researchers in the social and behavioral sciences often have clear expectations about the order/direction of the parameters in their statistical model. For example, a researcher might expect that regression coefficient β1 is larger than β2 and β3. The corresponding hypothesis is H: β1 > {β2, β3} and this is known as an (order) constrained hypothesis. A major advantage of testing such a hypothesis is that power can be gained and inherently a smaller sample size is needed. This article discusses this gain in sample size reduction, when an increasing number of constraints is included into the hypothesis. The main goal is to present sample-size tables for constrained hypotheses. A sample-size table contains the necessary sample-size at a pre-specified power (say, 0.80) for an increasing number of constraints. To obtain sample-size tables, two Monte Carlo simulations were performed, one for ANOVA and one for multiple regression. Three results are salient. First, in an ANOVA the needed sample-size decreases with 30-50% when complete ordering of the parameters is taken into account. Second, small deviations from the imposed order have only a minor impact on the power. Third, at the maximum number of constraints, the linear regression results are comparable with the ANOVA results. However, in the case of fewer constraints, ordering the parameters (e.g., β1 > β2) results in a higher power than assigning a positive or a negative sign to the parameters (e.g., β1 > 0).
Can we infer the effect of river works on streamflow statistics?
NASA Astrophysics Data System (ADS)
Ganora, Daniele
2016-04-01
Most of our river network system is affected by anthropic pressure of different types. While climate and land use change are widely recognized as important factors, the effects of "in-line" water infrastructures on the global behavior of the river system is often overlooked. This is due to the difficulty in including local "physical" knowledge (e.g., the hydraulic behavior of a river reach with levees during a flood) into large-scale models that provide a statistical description of the streamflow, and which are the basis for the implementation of resources/risk management plans (e.g., regional models for prediction of the flood frequency curve). This work presents some preliminary applications regarding two widely used hydrological signatures, the flow duration curve and the flood frequency curve. We adopt a pragmatic (i.e., reliable and implementable at large scales) and parsimonious (i.e., that requires a few data) framework of analysis, considering that we operate in a complex system (many river work are already existing, and many others could be built in the future). In the first case, a method is proposed to correct observations of streamflow affected by the presence of upstream run-of-the-river power plants in order to provide the "natural" flow duration curve, using only simple information about the plant (i.e., the maximum intake flow). The second case regards the effects of flood-protection works on the downstream sections, to support the application of along-stream cost-benefit analysis in the flood risk management context. Current applications and possible future developments are discussed.
Shu-Jiang, Liu; Zhan-Ying, Chen; Yin-Zhong, Chang; Shi-Lian, Wang; Qi, Li; Yuan-Qing, Fan
2013-10-11
Multidimensional gas chromatography is widely applied to atmospheric xenon monitoring for the Comprehensive Nuclear-Test-Ban Treaty (CTBT). To improve the capability for xenon sampling from the atmosphere, sampling techniques have been investigated in detail. The sampling techniques are designed by xenon outflow curves which are influenced by many factors, and the injecting condition is one of the key factors that could influence the xenon outflow curves. In this paper, the xenon outflow curves of single-pulse injection in two-dimensional gas chromatography has been tested and fitted as a function of exponential modified Gaussian distribution. An inference formula of the xenon outflow curve for six-pulse injection is derived, and the inference formula is also tested to compare with its fitting formula of the xenon outflow curve. As a result, the curves of both the one-pulse and six-pulse injections obey the exponential modified Gaussian distribution when the temperature of the activated carbon column's temperature is 26°C and the flow rate of the carrier gas is 35.6mLmin(-1). The retention time of the xenon peak for one-pulse injection is 215min, and the peak width is 138min. For the six-pulse injection, however, the retention time is delayed to 255min, and the peak width broadens to 222min. According to the inferred formula of the xenon outflow curve for the six-pulse injection, the inferred retention time is 243min, the relative deviation of the retention time is 4.7%, and the inferred peak width is 225min, with a relative deviation of 1.3%.
Two-Dimensional Hermite Filters Simplify the Description of High-Order Statistics of Natural Images.
Hu, Qin; Victor, Jonathan D
2016-09-01
Natural image statistics play a crucial role in shaping biological visual systems, understanding their function and design principles, and designing effective computer-vision algorithms. High-order statistics are critical for conveying local features, but they are challenging to study - largely because their number and variety is large. Here, via the use of two-dimensional Hermite (TDH) functions, we identify a covert symmetry in high-order statistics of natural images that simplifies this task. This emerges from the structure of TDH functions, which are an orthogonal set of functions that are organized into a hierarchy of ranks. Specifically, we find that the shape (skewness and kurtosis) of the distribution of filter coefficients depends only on the projection of the function onto a 1-dimensional subspace specific to each rank. The characterization of natural image statistics provided by TDH filter coefficients reflects both their phase and amplitude structure, and we suggest an intuitive interpretation for the special subspace within each rank.
Grodwohl, Jean-Baptiste
2016-08-01
This paper gives a detailed narrative of a controversial empirical research in postwar population genetics, the analysis of the cytological polymorphisms of an Australian grasshopper, Moraba scurra. This research intertwined key technical developments in three research areas during the 1950s and 1960s: it involved Dobzhansky's empirical research program on cytological polymorphisms, the mathematical theory of natural selection in two-locus systems, and the building of reliable estimates of natural selection in the wild. In the mid-1950s the cytologist Michael White discovered an interesting case of epistasis in populations of Moraba scurra. These observations received a wide diffusion when theoretical population geneticist Richard Lewontin represented White's data on adaptive topographies. These topographies connected the information on the genetic structure of these grasshopper populations with the formal framework of theoretical population genetics. As such, they appeared at the time as the most successful application of two-locus models of natural selection to an empirical study system. However, this connection generated paradoxical results: in the landscapes, all grasshopper populations were located on a ridge (an unstable equilibrium) while they were expected to reach a peak. This puzzling result fueled years of research and triggered a controversy attracting contributors from Australia, the United States and the United Kingdom. While the original problem seemed, at first, purely empirical, the subsequent controversy affected the main mathematical tools used in the study of two-gene systems under natural selection. Adaptive topographies and their underlying mathematical structure, Wright's mean fitness equations, were submitted to close scrutiny. Suspicion eventually shifted to the statistical machinery used in data analysis, reflecting the crucial role of statistical inference in applied population genetics. In the 1950s and 1960s, population geneticists were
A three-dimensional statistical mechanical model of folding double-stranded chain molecules
NASA Astrophysics Data System (ADS)
Zhang, Wenbing; Chen, Shi-Jie
2001-05-01
Based on a graphical representation of intrachain contacts, we have developed a new three-dimensional model for the statistical mechanics of double-stranded chain molecules. The theory has been tested and validated for the cubic lattice chain conformations. The statistical mechanical model can be applied to the equilibrium folding thermodynamics of a large class of chain molecules, including protein β-hairpin conformations and RNA secondary structures. The application of a previously developed two-dimensional model to RNA secondary structure folding thermodynamics generally overestimates the breadth of the melting curves [S-J. Chen and K. A. Dill, Proc. Natl. Acad. Sci. U.S.A. 97, 646 (2000)], suggesting an underestimation for the sharpness of the conformational transitions. In this work, we show that the new three-dimensional model gives much sharper melting curves than the two-dimensional model. We believe that the new three-dimensional model may give much improved predictions for the thermodynamic properties of RNA conformational changes than the previous two-dimensional model.
Schimek, Michael G; Budinská, Eva; Kugler, Karl G; Švendová, Vendula; Ding, Jie; Lin, Shili
2015-06-01
High-throughput sequencing techniques are increasingly affordable and produce massive amounts of data. Together with other high-throughput technologies, such as microarrays, there are an enormous amount of resources in databases. The collection of these valuable data has been routine for more than a decade. Despite different technologies, many experiments share the same goal. For instance, the aims of RNA-seq studies often coincide with those of differential gene expression experiments based on microarrays. As such, it would be logical to utilize all available data. However, there is a lack of biostatistical tools for the integration of results obtained from different technologies. Although diverse technological platforms produce different raw data, one commonality for experiments with the same goal is that all the outcomes can be transformed into a platform-independent data format - rankings - for the same set of items. Here we present the R package TopKLists, which allows for statistical inference on the lengths of informative (top-k) partial lists, for stochastic aggregation of full or partial lists, and for graphical exploration of the input and consolidated output. A graphical user interface has also been implemented for providing access to the underlying algorithms. To illustrate the applicability and usefulness of the package, we integrated microRNA data of non-small cell lung cancer across different measurement techniques and draw conclusions. The package can be obtained from CRAN under a LGPL-3 license.
Braiding statistics and classification of two-dimensional charge-2 m superconductors
NASA Astrophysics Data System (ADS)
Wang, Chenjie
2016-08-01
We study braiding statistics between quasiparticles and vortices in two-dimensional charge-2 m (in units of e ) superconductors that are coupled to a Z2 m dynamical gauge field, where m is any positive integer. We show that there exist 16 m types of braiding statistics when m is odd, but only 4 m types when m is even. Based on the braiding statistics, we obtain a classification of topological phases of charge-2 m superconductors—or formally speaking, a classification of symmetry-protected topological phases, as well as invertible topological phases, of two-dimensional gapped fermions with Z2m f symmetry. Interestingly, we find that there is no nontrivial fermionic symmetry-protected topological phase with Z4f symmetry.
NASA Astrophysics Data System (ADS)
von Nessi, G. T.; Hole, M. J.; The MAST Team
2014-11-01
We present recent results and technical breakthroughs for the Bayesian inference of tokamak equilibria using force-balance as a prior constraint. Issues surrounding model parameter representation and posterior analysis are discussed and addressed. These points motivate the recent advancements embodied in the Bayesian Equilibrium Analysis and Simulation Tool (BEAST) software being presently utilized to study equilibria on the Mega-Ampere Spherical Tokamak (MAST) experiment in the UK (von Nessi et al 2012 J. Phys. A 46 185501). State-of-the-art results of using BEAST to study MAST equilibria are reviewed, with recent code advancements being systematically presented though out the manuscript.
NASA Astrophysics Data System (ADS)
Pandarinath, Kailasa
2014-12-01
Several new multi-dimensional tectonomagmatic discrimination diagrams employing log-ratio variables of chemical elements and probability based procedure have been developed during the last 10 years for basic-ultrabasic, intermediate and acid igneous rocks. There are numerous studies on extensive evaluations of these newly developed diagrams which have indicated their successful application to know the original tectonic setting of younger and older as well as sea-water and hydrothermally altered volcanic rocks. In the present study, these diagrams were applied to Precambrian rocks of Mexico (southern and north-eastern) and Argentina. The study indicated the original tectonic setting of Precambrian rocks from the Oaxaca Complex of southern Mexico as follows: (1) dominant rift (within-plate) setting for rocks of 1117-988 Ma age; (2) dominant rift and less-dominant arc setting for rocks of 1157-1130 Ma age; and (3) a combined tectonic setting of collision and rift for Etla Granitoid Pluton (917 Ma age). The diagrams have indicated the original tectonic setting of the Precambrian rocks from the north-eastern Mexico as: (1) a dominant arc tectonic setting for the rocks of 988 Ma age; and (2) an arc and collision setting for the rocks of 1200-1157 Ma age. Similarly, the diagrams have indicated the dominant original tectonic setting for the Precambrian rocks from Argentina as: (1) with-in plate (continental rift-ocean island) and continental rift (CR) setting for the rocks of 800 Ma and 845 Ma age, respectively; and (2) an arc setting for the rocks of 1174-1169 Ma and of 1212-1188 Ma age. The inferred tectonic setting for these Precambrian rocks are, in general, in accordance to the tectonic setting reported in the literature, though there are some inconsistence inference of tectonic settings by some of the diagrams. The present study confirms the importance of these newly developed discriminant-function based diagrams in inferring the original tectonic setting of
Anomalous wave function statistics on a one-dimensional lattice with power-law disorder.
Titov, M; Schomerus, H
2003-10-24
Within a general framework, we discuss the wave function statistics in the Lloyd model of Anderson localization on a one-dimensional lattice with a Cauchy distribution for random on-site potential. We demonstrate that already in leading order in the disorder strength, there exists a hierarchy of anomalies in the probability distributions of the wave function, the conductance, and the local density of states, for every energy which corresponds to a rational ratio of wavelength to lattice constant. Power-law rather than log-normal tails dominate the short-distance wave-function statistics.
A statistical theory of sound radiation from a two-dimensional lined duct
NASA Technical Reports Server (NTRS)
Cho, Y. C.; Watson, W. R.
1979-01-01
A statistical theory coupled with a finite element theory is employed for investigation of sound radiation from a two-dimensional lined duct. The analysis does not utilize duct modes, and can be applied to a non-uniform duct with variable wall liner properties. Numerical results are presented for various shapes of the incident wave. The results are in good agreement with the Wiener-Hopf calculation for cases where the latter can be made.
Two-Dimensional Hermite Filters Simplify the Description of High-Order Statistics of Natural Images
Hu, Qin; Victor, Jonathan D.
2016-01-01
Natural image statistics play a crucial role in shaping biological visual systems, understanding their function and design principles, and designing effective computer-vision algorithms. High-order statistics are critical for conveying local features, but they are challenging to study – largely because their number and variety is large. Here, via the use of two-dimensional Hermite (TDH) functions, we identify a covert symmetry in high-order statistics of natural images that simplifies this task. This emerges from the structure of TDH functions, which are an orthogonal set of functions that are organized into a hierarchy of ranks. Specifically, we find that the shape (skewness and kurtosis) of the distribution of filter coefficients depends only on the projection of the function onto a 1-dimensional subspace specific to each rank. The characterization of natural image statistics provided by TDH filter coefficients reflects both their phase and amplitude structure, and we suggest an intuitive interpretation for the special subspace within each rank. PMID:27713838
Schwermann, Achim H; Dos Santos Rolo, Tomy; Caterino, Michael S; Bechly, Günter; Schmied, Heiko; Baumbach, Tilo; van de Kamp, Thomas
2016-02-05
External and internal morphological characters of extant and fossil organisms are crucial to establishing their systematic position, ecological role and evolutionary trends. The lack of internal characters and soft-tissue preservation in many arthropod fossils, however, impedes comprehensive phylogenetic analyses and species descriptions according to taxonomic standards for Recent organisms. We found well-preserved three-dimensional anatomy in mineralized arthropods from Paleogene fissure fillings and demonstrate the value of these fossils by utilizing digitally reconstructed anatomical structure of a hister beetle. The new anatomical data facilitate a refinement of the species diagnosis and allowed us to reject a previous hypothesis of close phylogenetic relationship to an extant congeneric species. Our findings suggest that mineralized fossils, even those of macroscopically poor preservation, constitute a rich but yet largely unexploited source of anatomical data for fossil arthropods.
Schwermann, Achim H; dos Santos Rolo, Tomy; Caterino, Michael S; Bechly, Günter; Schmied, Heiko; Baumbach, Tilo; van de Kamp, Thomas
2016-01-01
External and internal morphological characters of extant and fossil organisms are crucial to establishing their systematic position, ecological role and evolutionary trends. The lack of internal characters and soft-tissue preservation in many arthropod fossils, however, impedes comprehensive phylogenetic analyses and species descriptions according to taxonomic standards for Recent organisms. We found well-preserved three-dimensional anatomy in mineralized arthropods from Paleogene fissure fillings and demonstrate the value of these fossils by utilizing digitally reconstructed anatomical structure of a hister beetle. The new anatomical data facilitate a refinement of the species diagnosis and allowed us to reject a previous hypothesis of close phylogenetic relationship to an extant congeneric species. Our findings suggest that mineralized fossils, even those of macroscopically poor preservation, constitute a rich but yet largely unexploited source of anatomical data for fossil arthropods. DOI: http://dx.doi.org/10.7554/eLife.12129.001 PMID:26854367
NASA Astrophysics Data System (ADS)
Hata, Maki; Takakura, Shinichi; Matsushima, Nobuo; Hashimoto, Takeshi; Utsugi, Mitsuru
2016-10-01
At Naka-dake cone, Aso caldera, Japan, volcanic activity is raised cyclically, an example of which was a phreatomagmatic eruption in September 2015. Using a three-dimensional model of electrical resistivity, we identify a magma pathway from a series of northward dipping conductive anomalies in the upper crust beneath the caldera. Our resistivity model was created from magnetotelluric measurements conducted in November-December 2015; thus, it provides the latest information about magma reservoir geometry beneath the caldera. The center of the conductive anomalies shifts from the north of Naka-dake at depths >10 km toward Naka-dake, along with a decrease in anomaly depths. The melt fraction is estimated at 13-15% at 2 km depth. Moreover, these anomalies are spatially correlated with the locations of earthquake clusters, which are distributed within resistive blocks on the conductive anomalies in the northwest of Naka-dake but distributed at the resistive sides of resistivity boundaries in the northeast.
Crossett, Ben; Edwards, Alistair V G; White, Melanie Y; Cordwell, Stuart J
2008-01-01
Standardized methods for the solubilization of proteins prior to proteomics analyses incorporating two-dimensional gel electrophoresis (2-DE) are essential for providing reproducible data that can be subjected to rigorous statistical interrogation for comparative studies investigating disease-genesis. In this chapter, we discuss the imaging and image analysis of proteins separated by 2-DE, in the context of determining protein abundance alterations related to a change in biochemical or biophysical conditions. We then describe the principles behind 2-DE gel statistical analysis, including subtraction of background noise, spot detection, gel matching, spot quantitation for data comparison, and statistical requirements to create meaningful gel data sets. We also emphasize the need to develop reproducible and robust protocols for protein sample preparation and 2-DE itself.
Austin, Peter C
2011-05-20
Propensity-score matching allows one to reduce the effects of treatment-selection bias or confounding when estimating the effects of treatments when using observational data. Some authors have suggested that methods of inference appropriate for independent samples can be used for assessing the statistical significance of treatment effects when using propensity-score matching. Indeed, many authors in the applied medical literature use methods for independent samples when making inferences about treatment effects using propensity-score matched samples. Dichotomous outcomes are common in healthcare research. In this study, we used Monte Carlo simulations to examine the effect on inferences about risk differences (or absolute risk reductions) when statistical methods for independent samples are used compared with when statistical methods for paired samples are used in propensity-score matched samples. We found that compared with using methods for independent samples, the use of methods for paired samples resulted in: (i) empirical type I error rates that were closer to the advertised rate; (ii) empirical coverage rates of 95 per cent confidence intervals that were closer to the advertised rate; (iii) narrower 95 per cent confidence intervals; and (iv) estimated standard errors that more closely reflected the sampling variability of the estimated risk difference. Differences between the empirical and advertised performance of methods for independent samples were greater when the treatment-selection process was stronger compared with when treatment-selection process was weaker. We recommend using statistical methods for paired samples when using propensity-score matched samples for making inferences on the effect of treatment on the reduction in the probability of an event occurring.
de Matos Simoes, Ricardo; Emmert-Streib, Frank
2011-01-01
The inference of gene regulatory networks from gene expression data is a difficult problem because the performance of the inference algorithms depends on a multitude of different factors. In this paper we study two of these. First, we investigate the influence of discrete mutual information (MI) estimators on the global and local network inference performance of the C3NET algorithm. More precisely, we study 4 different MI estimators (Empirical, Miller-Madow, Shrink and Schürmann-Grassberger) in combination with 3 discretization methods (equal frequency, equal width and global equal width discretization). We observe the best global and local inference performance of C3NET for the Miller-Madow estimator with an equal width discretization. Second, our numerical analysis can be considered as a systems approach because we simulate gene expression data from an underlying gene regulatory network, instead of making a distributional assumption to sample thereof. We demonstrate that despite the popularity of the latter approach, which is the traditional way of studying MI estimators, this is in fact not supported by simulated and biological expression data because of their heterogeneity. Hence, our study provides guidance for an efficient design of a simulation study in the context of network inference, supporting a systems approach.
Stevens, Katherine; McCabe, Christopher; Brazier, John; Roberts, Jennifer
2007-09-01
A key issue in health state valuation modelling is the choice of functional form. The two most frequently used preference based instruments adopt different approaches; one based on multi-attribute utility theory (MAUT), the other on statistical analysis. There has been no comparison of these alternative approaches in the context of health economics. We report a comparison of these approaches for the health utilities index mark 2. The statistical inference model predicts more accurately than the one based on MAUT. We discuss possible explanations for the differences in performance, the importance of the findings, and implications for future research.
Interoccurrence time statistics in the two-dimensional Burridge-Knopoff earthquake model
Hasumi, Tomohiro
2007-08-15
We have numerically investigated statistical properties of the so-called interoccurrence time or the waiting time, i.e., the time interval between successive earthquakes, based on the two-dimensional (2D) spring-block (Burridge-Knopoff) model, selecting the velocity-weakening property as the constitutive friction law. The statistical properties of frequency distribution and the cumulative distribution of the interoccurrence time are discussed by tuning the dynamical parameters, namely, a stiffness and frictional property of a fault. We optimize these model parameters to reproduce the interoccurrence time statistics in nature; the frequency and cumulative distribution can be described by the power law and Zipf-Mandelbrot type power law, respectively. In an optimal case, the b value of the Gutenberg-Richter law and the ratio of wave propagation velocity are in agreement with those derived from real earthquakes. As the threshold of magnitude is increased, the interoccurrence time distribution tends to follow an exponential distribution. Hence it is suggested that a temporal sequence of earthquakes, aside from small-magnitude events, is a Poisson process, which is observed in nature. We found that the interoccurrence time statistics derived from the 2D BK (original) model can efficiently reproduce that of real earthquakes, so that the model can be recognized as a realistic one in view of interoccurrence time statistics.
Predicting adsorption isotherms using a two-dimensional statistical associating fluid theory
NASA Astrophysics Data System (ADS)
Martinez, Alejandro; Castro, Martin; McCabe, Clare; Gil-Villegas, Alejandro
2007-02-01
A molecular thermodynamics approach is developed in order to describe the adsorption of fluids on solid surfaces. The new theory is based on the statistical associating fluid theory for potentials of variable range [A. Gil-Villegas et al., J. Chem. Phys. 106, 4168 (1997)] and uses a quasi-two-dimensional approximation to describe the properties of adsorbed fluids. The theory is tested against Gibbs ensemble Monte Carlo simulations and excellent agreement with the theoretical predictions is achieved. Additionally the authors use the new approach to describe the adsorption isotherms for nitrogen and methane on dry activated carbon.
Okamoto, Takashi; Fujita, Shuhei
2008-12-01
The statistical properties of three-dimensional normal and fractal speckle fields produced by two or three scattered waves crossed orthogonally are studied theoretically. The probability density function and the autocorrelation function of intensity are derived for speckle fields superposed with and without interference. It is shown that the spatial anisotropy of intensity distributions exists even when three scattered waves interfere with one another. This spatial anisotropy affects the power-law distribution of intensity correlation for fractal speckles and leads to intensity patterns that are not self-similar in two or three dimensions. A potential application of the superposed speckle field is proposed.
Predicting adsorption isotherms using a two-dimensional statistical associating fluid theory.
Martinez, Alejandro; Castro, Martin; McCabe, Clare; Gil-Villegas, Alejandro
2007-02-21
A molecular thermodynamics approach is developed in order to describe the adsorption of fluids on solid surfaces. The new theory is based on the statistical associating fluid theory for potentials of variable range [A. Gil-Villegas et al., J. Chem. Phys. 106, 4168 (1997)] and uses a quasi-two-dimensional approximation to describe the properties of adsorbed fluids. The theory is tested against Gibbs ensemble Monte Carlo simulations and excellent agreement with the theoretical predictions is achieved. Additionally the authors use the new approach to describe the adsorption isotherms for nitrogen and methane on dry activated carbon.
Harari, Gil
2014-01-01
Statistic significance, also known as p-value, and CI (Confidence Interval) are common statistics measures and are essential for the statistical analysis of studies in medicine and life sciences. These measures provide complementary information about the statistical probability and conclusions regarding the clinical significance of study findings. This article is intended to describe the methodologies, compare between the methods, assert their suitability for the different needs of study results analysis and to explain situations in which each method should be used.
Halpin, Peter F; Stam, Henderikus J
2006-01-01
The application of statistical testing in psychological research over the period of 1940-1960 is examined in order to address psychologists' reconciliation of the extant controversy between the Fisher and Neyman-Pearson approaches. Textbooks of psychological statistics and the psychological journal literature are reviewed to examine the presence of what Gigerenzer (1993) called a hybrid model of statistical testing. Such a model is present in the textbooks, although the mathematically incomplete character of this model precludes the appearance of a similarly hybridized approach to statistical testing in the research literature. The implications of this hybrid model for psychological research and the statistical testing controversy are discussed.
Statistical Projections for Multi-resolution, Multi-dimensional Visual Data Exploration and Analysis
Hoa T. Nguyen; Stone, Daithi; E. Wes Bethel
2016-01-01
An ongoing challenge in visual exploration and analysis of large, multi-dimensional datasets is how to present useful, concise information to a user for some specific visualization tasks. Typical approaches to this problem have proposed either reduced-resolution versions of data, or projections of data, or both. These approaches still have some limitations such as consuming high computation or suffering from errors. In this work, we explore the use of a statistical metric as the basis for both projections and reduced-resolution versions of data, with a particular focus on preserving one key trait in data, namely variation. We use two different case studies to explore this idea, one that uses a synthetic dataset, and another that uses a large ensemble collection produced by an atmospheric modeling code to study long-term changes in global precipitation. The primary findings of our work are that in terms of preserving the variation signal inherent in data, that using a statistical measure more faithfully preserves this key characteristic across both multi-dimensional projections and multi-resolution representations than a methodology based upon averaging.
Yang, Yuqing; Chen, Ning; Chen, Ting
2017-01-25
The inference of associations between environmental factors and microbes and among microbes is critical to interpreting metagenomic data, but compositional bias, indirect associations resulting from common factors, and variance within metagenomic sequencing data limit the discovery of associations. To account for these problems, we propose metagenomic Lognormal-Dirichlet-Multinomial (mLDM), a hierarchical Bayesian model with sparsity constraints, to estimate absolute microbial abundance and simultaneously infer both conditionally dependent associations among microbes and direct associations between microbes and environmental factors. We empirically show the effectiveness of the mLDM model using synthetic data, data from the TARA Oceans project, and a colorectal cancer dataset. Finally, we apply mLDM to 16S sequencing data from the western English Channel and report several associations. Our model can be used on both natural environmental and human metagenomic datasets, promoting the understanding of associations in the microbial community.
Blanc, Guillermo A.; Kewley, Lisa; Vogt, Frédéric P. A.; Dopita, Michael A.
2015-01-10
We present a new method for inferring the metallicity (Z) and ionization parameter (q) of H II regions and star-forming galaxies using strong nebular emission lines (SELs). We use Bayesian inference to derive the joint and marginalized posterior probability density functions for Z and q given a set of observed line fluxes and an input photoionization model. Our approach allows the use of arbitrary sets of SELs and the inclusion of flux upper limits. The method provides a self-consistent way of determining the physical conditions of ionized nebulae that is not tied to the arbitrary choice of a particular SEL diagnostic and uses all the available information. Unlike theoretically calibrated SEL diagnostics, the method is flexible and not tied to a particular photoionization model. We describe our algorithm, validate it against other methods, and present a tool that implements it called IZI. Using a sample of nearby extragalactic H II regions, we assess the performance of commonly used SEL abundance diagnostics. We also use a sample of 22 local H II regions having both direct and recombination line (RL) oxygen abundance measurements in the literature to study discrepancies in the abundance scale between different methods. We find that oxygen abundances derived through Bayesian inference using currently available photoionization models in the literature can be in good (∼30%) agreement with RL abundances, although some models perform significantly better than others. We also confirm that abundances measured using the direct method are typically ∼0.2 dex lower than both RL and photoionization-model-based abundances.
Erguler, Kamil; Stumpf, Michael P H
2011-05-01
The size and complexity of cellular systems make building predictive models an extremely difficult task. In principle dynamical time-course data can be used to elucidate the structure of the underlying molecular mechanisms, but a central and recurring problem is that many and very different models can be fitted to experimental data, especially when the latter are limited and subject to noise. Even given a model, estimating its parameters remains challenging in real-world systems. Here we present a comprehensive analysis of 180 systems biology models, which allows us to classify the parameters with respect to their contribution to the overall dynamical behaviour of the different systems. Our results reveal candidate elements of control in biochemical pathways that differentially contribute to dynamics. We introduce sensitivity profiles that concisely characterize parameter sensitivity and demonstrate how this can be connected to variability in data. Systematically linking data and model sloppiness allows us to extract features of dynamical systems that determine how well parameters can be estimated from time-course measurements, and associates the extent of data required for parameter inference with the model structure, and also with the global dynamical state of the system. The comprehensive analysis of so many systems biology models reaffirms the inability to estimate precisely most model or kinetic parameters as a generic feature of dynamical systems, and provides safe guidelines for performing better inferences and model predictions in the context of reverse engineering of mathematical models for biological systems.
Kravtsov, V.E.; Yudson, V.I.
2011-07-15
Highlights: > Statistics of normalized eigenfunctions in one-dimensional Anderson localization at E = 0 is studied. > Moments of inverse participation ratio are calculated. > Equation for generating function is derived at E = 0. > An exact solution for generating function at E = 0 is obtained. > Relation of the generating function to the phase distribution function is established. - Abstract: The one-dimensional (1d) Anderson model (AM), i.e. a tight-binding chain with random uncorrelated on-site energies, has statistical anomalies at any rational point f=(2a)/({lambda}{sub E}) , where a is the lattice constant and {lambda}{sub E} is the de Broglie wavelength. We develop a regular approach to anomalous statistics of normalized eigenfunctions {psi}(r) at such commensurability points. The approach is based on an exact integral transfer-matrix equation for a generating function {Phi}{sub r}(u, {phi}) (u and {phi} have a meaning of the squared amplitude and phase of eigenfunctions, r is the position of the observation point). This generating function can be used to compute local statistics of eigenfunctions of 1d AM at any disorder and to address the problem of higher-order anomalies at f=p/q with q > 2. The descender of the generating function P{sub r}({phi}){identical_to}{Phi}{sub r}(u=0,{phi}) is shown to be the distribution function of phase which determines the Lyapunov exponent and the local density of states. In the leading order in the small disorder we derived a second-order partial differential equation for the r-independent ('zero-mode') component {Phi}(u, {phi}) at the E = 0 (f=1/2 ) anomaly. This equation is nonseparable in variables u and {phi}. Yet, we show that due to a hidden symmetry, it is integrable and we construct an exact solution for {Phi}(u, {phi}) explicitly in quadratures. Using this solution we computed moments I{sub m} = N< vertical bar {psi} vertical bar {sup 2m}> (m {>=} 1) for a chain of the length N {yields} {infinity} and found an
Freely Evolving Process and Statistics in the Two-Dimensional Granular Turbulence
NASA Astrophysics Data System (ADS)
Isobe, Masaharu
2002-08-01
We studied the macroscopic statistical properties on the freely evolving quasi-inelastic hard disk (granular) system by performing large-scale (more than a million particles) event-driven molecular dynamics systematically and found that remarkably analogous to an enstrophy cascade process in decaying two-dimensional fluid turbulence. There are four typcial stages in the freely evolving inelastic hard disk system, which are homogeneous, shearing (vortex), clustering and final state. In the shearing stage, the self-organized macroscopic coherent vortices become dominant and the enstrophy decays power-low behavior. In the clustering stage, the energy spectra are close to the expectation of Kraichnan-Batchelor theory and the squared two particle separation strictly obeys Richardson law. These results indicate that the cooperative behavior of quasi-inelastic hard disks system has a same universal class as the macroscopic Navier-Stokes fluid turbulence in the study of dissipative structure.
Mode-resolved travel-time statistics for elastic rays in three-dimensional billiards.
Ortega, A; Stringlo, K; Gorin, T
2012-03-01
We consider the ray limit of propagating ultrasound waves in three-dimensional bodies made from a homogeneous, isotropic, elastic material. Using a Monte Carlo approach, we simulate the propagation and proliferation of elastic rays using realistic angle-dependent reflection coefficients, taking into account mode conversion and ray splitting. For a few simple geometries, we analyze the long-time equilibrium distribution, focusing on the energy ratio between compressional and shear waves. Finally, we study the travel time statistics, i.e., the distribution of the amount of time a given trajectory spends as a compressional wave, as compared to the total travel time. These results are intimately related to recent elastodynamics experiments on Coda-wave interferometry by Lobkis and Weaver [Phys. Rev. E 78, 066212 (2008)].
Statistical properties of chaos demonstrated in a class of one-dimensional maps
NASA Astrophysics Data System (ADS)
Csordás, András; Györgyi, Géza; Szépfalusy, Péter; Tél, Tamás
1993-01-01
One-dimensional maps with complete grammar are investigated in both permanent and transient chaotic cases. The discussion focuses on statistical characteristics such as Lyapunov exponent, generalized entropies and dimensions, free energies, and their finite size corrections. Our approach is based on the eigenvalue problem of generalized Frobenius-Perron operators, which are treated numerically as well as by perturbative and other analytical methods. The examples include the universal chaos function relevant near the period doubling threshold. Special emphasis is put on the entropies and their decay rates because of their invariance under the most general class of coordinate changes. Phase-transition-like phenomena at the border state of chaos due to intermittency and super instability are presented.
Collisional statistics and dynamics of two-dimensional hard-disk systems: From fluid to solid.
Taloni, Alessandro; Meroz, Yasmine; Huerta, Adrián
2015-08-01
We perform extensive MD simulations of two-dimensional systems of hard disks, focusing on the collisional statistical properties. We analyze the distribution functions of velocity, free flight time, and free path length for packing fractions ranging from the fluid to the solid phase. The behaviors of the mean free flight time and path length between subsequent collisions are found to drastically change in the coexistence phase. We show that single-particle dynamical properties behave analogously in collisional and continuous-time representations, exhibiting apparent crossovers between the fluid and the solid phases. We find that, both in collisional and continuous-time representation, the mean-squared displacement, velocity autocorrelation functions, intermediate scattering functions, and self-part of the van Hove function (propagator) closely reproduce the same behavior exhibited by the corresponding quantities in granular media, colloids, and supercooled liquids close to the glass or jamming transition.
Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.
Experiments with a three-dimensional statistical objective analysis scheme using FGGE data
NASA Technical Reports Server (NTRS)
Baker, Wayman E.; Bloom, Stephen C.; Woollen, John S.; Nestler, Mark S.; Brin, Eugenia
1987-01-01
A three-dimensional (3D), multivariate, statistical objective analysis scheme (referred to as optimum interpolation or OI) has been developed for use in numerical weather prediction studies with the FGGE data. Some novel aspects of the present scheme include: (1) a multivariate surface analysis over the oceans, which employs an Ekman balance instead of the usual geostrophic relationship, to model the pressure-wind error cross correlations, and (2) the capability to use an error correlation function which is geographically dependent. A series of 4-day data assimilation experiments are conducted to examine the importance of some of the key features of the OI in terms of their effects on forecast skill, as well as to compare the forecast skill using the OI with that utilizing a successive correction method (SCM) of analysis developed earlier. For the three cases examined, the forecast skill is found to be rather insensitive to varying the error correlation function geographically. However, significant differences are noted between forecasts from a two-dimensional (2D) version of the OI and those from the 3D OI, with the 3D OI forecasts exhibiting better forecast skill. The 3D OI forecasts are also more accurate than those from the SCM initial conditions. The 3D OI with the multivariate oceanic surface analysis was found to produce forecasts which were slightly more accurate, on the average, than a univariate version.
Statistical conservation law in two- and three-dimensional turbulent flows
NASA Astrophysics Data System (ADS)
Frishman, Anna; Boffetta, Guido; De Lillo, Filippo; Liberzon, Alex
2015-03-01
Particles in turbulence live complicated lives. It is nonetheless sometimes possible to find order in this complexity. It was proposed in Falkovich et al. [Phys. Rev. Lett. 110, 214502 (2013), 10.1103/PhysRevLett.110.214502] that pairs of Lagrangian tracers at small scales, in an incompressible isotropic turbulent flow, have a statistical conservation law. More specifically, in a d -dimensional flow the distance R (t ) between two neutrally buoyant particles, raised to the power -d and averaged over velocity realizations, remains at all times equal to the initial, fixed, separation raised to the same power. In this work we present evidence from direct numerical simulations of two- and three-dimensional turbulence for this conservation. In both cases the conservation is lost when particles exit the linear flow regime. In two dimensions we show that, as an extension of the conservation law, an Evans-Cohen-Morriss or Gallavotti-Cohen type fluctuation relation exists. We also analyze data from a 3D laboratory experiment [Liberzon et al., Physica D 241, 208 (2012), 10.1016/j.physd.2011.07.008], finding that although it probes small scales they are not in the smooth regime. Thus instead of
Statistical conservation law in two- and three-dimensional turbulent flows.
Frishman, Anna; Boffetta, Guido; De Lillo, Filippo; Liberzon, Alex
2015-03-01
Particles in turbulence live complicated lives. It is nonetheless sometimes possible to find order in this complexity. It was proposed in Falkovich et al. [Phys. Rev. Lett. 110, 214502 (2013)] that pairs of Lagrangian tracers at small scales, in an incompressible isotropic turbulent flow, have a statistical conservation law. More specifically, in a d-dimensional flow the distance R(t) between two neutrally buoyant particles, raised to the power -d and averaged over velocity realizations, remains at all times equal to the initial, fixed, separation raised to the same power. In this work we present evidence from direct numerical simulations of two- and three-dimensional turbulence for this conservation. In both cases the conservation is lost when particles exit the linear flow regime. In two dimensions we show that, as an extension of the conservation law, an Evans-Cohen-Morriss or Gallavotti-Cohen type fluctuation relation exists. We also analyze data from a 3D laboratory experiment [Liberzon et al., Physica D 241, 208 (2012)], finding that although it probes small scales they are not in the smooth regime. Thus instead of 〈R-3〉, we look for a similar, power-law-in-separation conservation law. We show that the existence of an initially slowly varying function of this form can be predicted but that it does not turn into a conservation law. We suggest that the conservation of 〈R-d〉, demonstrated here, can be used as a check of isotropy, incompressibility, and flow dimensionality in numerical and laboratory experiments that focus on small scales.
Aggelopoulos, Nikolaos C
2015-08-01
Perceptual inference refers to the ability to infer sensory stimuli from predictions that result from internal neural representations built through prior experience. Methods of Bayesian statistical inference and decision theory model cognition adequately by using error sensing either in guiding action or in "generative" models that predict the sensory information. In this framework, perception can be seen as a process qualitatively distinct from sensation, a process of information evaluation using previously acquired and stored representations (memories) that is guided by sensory feedback. The stored representations can be utilised as internal models of sensory stimuli enabling long term associations, for example in operant conditioning. Evidence for perceptual inference is contributed by such phenomena as the cortical co-localisation of object perception with object memory, the response invariance in the responses of some neurons to variations in the stimulus, as well as from situations in which perception can be dissociated from sensation. In the context of perceptual inference, sensory areas of the cerebral cortex that have been facilitated by a priming signal may be regarded as comparators in a closed feedback loop, similar to the better known motor reflexes in the sensorimotor system. The adult cerebral cortex can be regarded as similar to a servomechanism, in using sensory feedback to correct internal models, producing predictions of the outside world on the basis of past experience.
Saeki, Hiroyuki; Tango, Toshiro; Wang, Jinfang
2017-01-01
In clinical investigations of diagnostic procedures to indicate noninferiority, efficacy is generally evaluated on the basis of results from independent multiple raters. For each subject, if two diagnostic procedures are performed and some units are evaluated, the difference in proportions for matched-pair data is correlated between the two diagnostic procedures and within the subject, i.e. the data are clustered. In this article, we propose a noninferiority test to infer the difference in the correlated proportions of clustered data between the two diagnostic procedures. The proposed noninferiority test was validated in a Monte Carlo simulation study. Empirical sizes of the noninferiority test were close to the nominal level. The proposed test is illustrated on data of aneurysm diagnostic procedures for patients with acute subarachnoid hemorrhage.
Current Sheet Statistics in Three-Dimensional Simulations of Coronal Heating
NASA Astrophysics Data System (ADS)
Lin, L.; Ng, C. S.; Bhattacharjee, A.
2013-04-01
In a recent numerical study [Ng et al., Astrophys. J. 747, 109, 2012], with a three-dimensional model of coronal heating using reduced magnetohydrodynamics (RMHD), we have obtained scaling results of heating rate versus Lundquist number based on a series of runs in which random photospheric motions are imposed for hundreds to thousands of Alfvén time in order to obtain converged statistical values. The heating rate found in these simulations saturate to a level that is independent of the Lundquist number. This scaling result was also supported by an analysis with the assumption of the Sweet-Parker scaling of the current sheets, as well as how the width, length and number of current sheets scale with Lundquist number. In order to test these assumptions, we have implemented an automated routine to analyze thousands of current sheets in these simulations and return statistical scalings for these quantities. It is found that the Sweet-Parker scaling is justified. However, some discrepancies are also found and require further study.
Chapman, Benjamin P.; Weiss, Alexander; Duberstein, Paul
2016-01-01
Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in “big data” problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how three common SLT algorithms–Supervised Principal Components, Regularization, and Boosting—can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach—or perhaps because of them–SLT methods may hold value as a statistically rigorous approach to exploratory regression. PMID:27454257
Chapman, Benjamin P; Weiss, Alexander; Duberstein, Paul R
2016-12-01
Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in "big data" problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how 3 common SLT algorithms-supervised principal components, regularization, and boosting-can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach-or perhaps because of them-SLT methods may hold value as a statistically rigorous approach to exploratory regression. (PsycINFO Database Record
Wu, Wei; Mast, Thomas G; Ziembko, Christopher; Breza, Joseph M; Contreras, Robert J
2013-01-01
We analyzed the spike discharge patterns of two types of neurons in the rodent peripheral gustatory system, Na specialists (NS) and acid generalists (AG) to lingual stimulation with NaCl, acetic acid, and mixtures of the two stimuli. Previous computational investigations found that both spike rate and spike timing contribute to taste quality coding. These studies used commonly accepted computational methods, but they do not provide a consistent statistical evaluation of spike trains. In this paper, we adopted a new computational framework that treated each spike train as an individual data point for computing summary statistics such as mean and variance in the spike train space. We found that these statistical summaries properly characterized the firing patterns (e. g. template and variability) and quantified the differences between NS and AG neurons. The same framework was also used to assess the discrimination performance of NS and AG neurons and to remove spontaneous background activity or "noise" from the spike train responses. The results indicated that the new metric system provided the desired decoding performance and noise-removal improved stimulus classification accuracy, especially of neurons with high spontaneous rates. In summary, this new method naturally conducts statistical analysis and neural decoding under one consistent framework, and the results demonstrated that individual peripheral-gustatory neurons generate a unique and reliable firing pattern during sensory stimulation and that this pattern can be reliably decoded.
ERIC Educational Resources Information Center
Davis, Philip M.; Solla, Leah R.
2003-01-01
Reports an analysis of American Chemical Society electronic journal downloads at Cornell University (Ithaca, New York) by individual IP (Internet Protocol) addresses. Highlights include usage statistics to evaluate library journal subscriptions; understanding scientists' reading behavior; individual use of articles and of journals; and the…
ERIC Educational Resources Information Center
Schochet, Peter Z.
2015-01-01
This report presents the statistical theory underlying the "RCT-YES" software that estimates and reports impacts for RCTs for a wide range of designs used in social policy research. The report discusses a unified, non-parametric design-based approach for impact estimation using the building blocks of the Neyman-Rubin-Holland causal…
NASA Astrophysics Data System (ADS)
Hasan, Asad; Maloney, Craig
2013-03-01
We compute the effective dispersion and density of states (DOS) of two-dimensional sub-regions of three dimensional face centered cubic (FCC) crystals with both a direct projection-inversion technique and a Monte Carlo simulation based on a common Hamiltonian. We study sub-regions of both (111) and (100) planes. For any direction of wavevector, we show an anomalous ω2 ~ q scaling regime at low q where ω2 is the energy associated with a mode of wavenumber q. This scaling should give rise to an anomalous DOS, Dω, at low ω: Dω ~ω3 rather than the conventional Debye result: Dω ~ω2 . The DOS for the (100) sub-region looks to be consistent with Dω ~ω3 , while the (111) shows something closer to the Debye result at the smallest frequencies. Our Monte Carlo simulation shows that finite sampling artifacts act as an effective disorder and bias the Dω in the same way as the finite size artifacts, giving a behavior closer to Dω ~ω2 than Dω ~ω3 . These results should have an important impact on interpretation of recent studies of colloidal solids where two-point displacement correlations can be obtained in real-space via microscopy.
NASA Astrophysics Data System (ADS)
Das Sarma, S.; Nag, Amit; Sau, Jay D.
2016-07-01
We consider a simple conceptual question with respect to Majorana zero modes in semiconductor nanowires: can the measured nonideal values of the zero-bias-conductance-peak in the tunneling experiments be used as a characteristic to predict the underlying topological nature of the proximity induced nanowire superconductivity? In particular, we define and calculate the topological visibility, which is a variation of the topological invariant associated with the scattering matrix of the system as well as the zero-bias-conductance-peak heights in the tunneling measurements, in the presence of dissipative broadening, using precisely the same realistic nanowire parameters to connect the topological invariants with the zero-bias tunneling conductance values. This dissipative broadening is present in both (the existing) tunneling measurements and also (any future) braiding experiments as an inevitable consequence of a finite braiding time. The connection between the topological visibility and the conductance allows us to obtain the visibility of realistic braiding experiments in nanowires, and to conclude that the current experimentally accessible systems with nonideal zero-bias conductance peaks may indeed manifest (with rather low visibility) non-Abelian statistics for the Majorana zero modes. In general, we find that a large (small) superconducting gap (Majorana peak splitting) is essential for the manifestation of the non-Abelian braiding statistics, and in particular, a zero-bias conductance value of around half the ideal quantized Majorana value should be sufficient for the manifestation of non-Abelian statistics in experimental nanowires. Our work also establishes that as a matter of principle the topological transition associated with the emergence of Majorana zero modes in finite nanowires is always a crossover (akin to a quantum phase transition at finite temperature) requiring the presence of dissipative broadening (which must be larger than the Majorana energy
NASA Astrophysics Data System (ADS)
Shen, Samuel S. P.; Wied, Olaf; Weithmann, Alexander; Regele, Tobias; Bailey, Barbara A.; Lawrimore, Jay H.
2016-07-01
This paper describes six different temporal climate regimes of the contiguous United States (CONUS) according to interdecadal variations of surface air temperature (SAT) and precipitation using the United States Historical Climatology Network (USHCN) monthly data (Tmax, Tmin, Tmean, and precipitation) from 1895 to 2010. Our analysis is based on the probability distribution, mean, standard deviation, skewness, kurtosis, Kolmogorov-Smirnov (KS) test, and Welch's t test. The relevant statistical parameters are computed from gridded monthly SAT and precipitation data. SAT variations lead to classification of four regimes: 1895-1930 (cool), 1931-1960 (warm), 1961-1985 (cool), and 1986-2010 (warm), while precipitation variations lead to a classification of two regimes: 1895-1975 (dry) and 1976-2010 (wet). The KS test shows that any two regimes of the above six are statistically significantly different from each other due to clear shifts of the probability density functions. Extremes of SAT and precipitation identify the ten hottest, coldest, driest, and wettest years. Welch's t test is used to discern significant differences among these extremes. The spatial patterns of the six climate regimes and some years of extreme climate are analyzed. Although the recent two decades are the warmest among the other decades since 1895 and many hottest years measured by CONUS Tmin and Tmean are in these two decades, the hottest year according to the CONUS Tmax anomalies is 1934 (1.37 °C), which is very close to the second Tmax hottest year 2006 (1.35 °C).
A statistical mechanical theory for a two-dimensional model of water
NASA Astrophysics Data System (ADS)
Urbic, Tomaz; Dill, Ken A.
2010-06-01
We develop a statistical mechanical model for the thermal and volumetric properties of waterlike fluids. Each water molecule is a two-dimensional disk with three hydrogen-bonding arms. Each water interacts with neighboring waters through a van der Waals interaction and an orientation-dependent hydrogen-bonding interaction. This model, which is largely analytical, is a variant of the Truskett and Dill (TD) treatment of the "Mercedes-Benz" (MB) model. The present model gives better predictions than TD for hydrogen-bond populations in liquid water by distinguishing strong cooperative hydrogen bonds from weaker ones. We explore properties versus temperature T and pressure p. We find that the volumetric and thermal properties follow the same trends with T as real water and are in good general agreement with Monte Carlo simulations of MB water, including the density anomaly, the minimum in the isothermal compressibility, and the decreased number of hydrogen bonds for increasing temperature. The model reproduces that pressure squeezes out water's heat capacity and leads to a negative thermal expansion coefficient at low temperatures. In terms of water structuring, the variance in hydrogen-bonding angles increases with both T and p, while the variance in water density increases with T but decreases with p. Hydrogen bonding is an energy storage mechanism that leads to water's large heat capacity (for its size) and to the fragility in its cagelike structures, which are easily melted by temperature and pressure to a more van der Waals-like liquid state.
Air entrainment and bubble statistics in three-dimensional breaking waves
NASA Astrophysics Data System (ADS)
Deike, Luc; Melville, W. K.; Popinet, Stephane
2015-11-01
Wave breaking in the ocean is of fundamental importance in order to quantify wave dissipation and air-sea interaction, including gas and momentum exchange, and to improve parametrizationsfor weather and climate models. Here, we investigate air entrainment and bubble statistics in three-dimensional breaking waves through direct numerical simulations of the two-phase air-water flow using the Open Source solver Gerris. As in previous 2D simulations, the dissipation due to breaking is found to be in good agreement with previous experimental observations and inertial-scaling arguments. For radii larger than the Hinze scale, the bubble size distribution, is found to follow a power law of the radius, r-3and to scale linearly with the time dependent turbulent dissipation rate during the active breaking stages. The time-averaged bubble size distribution is found to follow the same power law of the radius and to scale linearly with the wave dissipation rate per unit length of breaking crest. We propose a phenomenological turbulent bubble break-up model that describes the numerical results and existing experimental results.
A one-dimensional statistical mechanics model for nucleosome positioning on genomic DNA
NASA Astrophysics Data System (ADS)
Tesoro, S.; Ali, I.; Morozov, A. N.; Sulaiman, N.; Marenduzzo, D.
2016-02-01
The first level of folding of DNA in eukaryotes is provided by the so-called ‘10 nm chromatin fibre’, where DNA wraps around histone proteins (∼10 nm in size) to form nucleosomes, which go on to create a zig-zagging bead-on-a-string structure. In this work we present a one-dimensional statistical mechanics model to study nucleosome positioning within one such 10 nm fibre. We focus on the case of genomic sheep DNA, and we start from effective potentials valid at infinite dilution and determined from high-resolution in vitro salt dialysis experiments. We study positioning within a polynucleosome chain, and compare the results for genomic DNA to that obtained in the simplest case of homogeneous DNA, where the problem can be mapped to a Tonks gas [1]. First, we consider the simple, analytically solvable, case where nucleosomes are assumed to be point-like. Then, we perform numerical simulations to gauge the effect of their finite size on the nucleosomal distribution probabilities. Finally we compare nucleosome distributions and simulated nuclease digestion patterns for the two cases (homogeneous and sheep DNA), thereby providing testable predictions of the effect of sequence on experimentally observable quantities in experiments on polynucleosome chromatin fibres reconstituted in vitro.
NASA Astrophysics Data System (ADS)
Kumar, Ranjeet; Chandra, Navin; Tomar, Surekha
2016-02-01
This paper deals with the role of triple encounters with low initial velocities and equal masses in the framework of statistical escape theory in two-dimensional space. This system is described by allowing for both energy and angular momentum conservation in the phase space. The complete statistical solutions (i.e. the semi-major axis `a', the distributions of eccentricity `e', and energy Eb of the final binary, escape energy Es of escaper and its escape velocity vs) of the system are calculated. These are in good agreement with the numerical results of Chandra and Bhatnagar (1999) in the range of perturbing velocities vi (10^{-1} ≤ vi ≤ 10^{-10}) in two-dimensional space. The double limit process has been applied to the system. It is observed that when vi to 0^{ +}, a vs2 to 2 / 3 for all directions in two-dimensional space.
Garelli, F M; Espinosa, M O; Gürtler, R E
2012-05-01
Understanding the processes that affect Aedes aegypti (L.) (Diptera: Culicidae) may serve as a starting point to create and/or improve vector control strategies. For this purpose, we performed statistical modeling of three entomological surveys conducted in Clorinda City, northern Argentina. Previous 'basic' models of presence or absence of larvae and/or pupae (infestation) and the number of pupae in infested containers (productivity), mainly based on physical characteristics of containers, were expanded to include variables selected a priori reflecting water use practices, vector-related context factors, the history of chemical control, and climate. Model selection was performed using Akaike's Information Criterion. In total, 5,431 water-holding containers were inspected and 12,369 Ae. aegypti pupae collected from 963 positive containers. Large tanks were the most productive container type. Variables reflecting every putative process considered, except for history of chemical control, were selected in the best models obtained for infestation and productivity. The associations found were very strong, particularly in the case of infestation. Water use practices and vector-related context factors were the most important ones, as evidenced by their impact on Akaike's Information Criterion scores of the infestation model. Risk maps based on empirical data and model predictions showed a heterogeneous distribution of entomological risk. An integrated vector control strategy is recommended, aiming at community participation for healthier water use practices and targeting large tanks for key elements such as lid status, water addition frequency and water use.
NASA Astrophysics Data System (ADS)
Germa, Aurelie; Connor, Laura; Connor, Chuck; Malservisi, Rocco
2015-04-01
One challenge of volcanic hazard assessment in distributed volcanic fields (large number of small-volume basaltic volcanoes along with one or more silicic central volcanoes) is to constrain the location of future activity. Although the extent of the source of melts at depth can be known using geophysical methods or the location of past eruptive vents, the location of preferential pathways and zones of higher magma flux are still unobserved. How does the spatial distribution of eruptive vents at the surface reveal the location of magma sources or focusing? When this distribution is investigated, the location of central polygenetic edifices as well as clusters of monogenetic volcanoes denote zones of high magma flux and recurrence rate, whereas areas of dispersed monogenetic vents represent zones of lower flux. Additionally, central polygenetic edifices, acting as magma filters, prevent dense mafic magmas from reaching the surface close to their central silicic system. Subsequently, the spatial distribution of mafic monogenetic vents may provide clues to the subsurface structure of a volcanic field, such as the location of magma sources, preferential magma pathways, and flux distribution across the field. Gathering such data is of highly importance in improving the assessment of volcanic hazards. We are developing a modeling framework that compares output of statistical models of vent distribution with outputs form numerical models of subsurface magma transport. Geologic data observed at the Earth's surface are used to develop statistical models of spatial intensity (vents per unit area), volume intensity (erupted volume per unit area) and volume-flux intensity (erupted volume per unit time and area). Outputs are in the form of probability density functions assumed to represent volcanic flow output at the surface. These are then compared to outputs from conceptual models of the subsurface processes of magma storage and transport. These models are using Darcy's law
Hall effect, edge states, and Haldane exclusion statistics in two-dimensional space
NASA Astrophysics Data System (ADS)
Ye, F.; Marchetti, P. A.; Su, Z. B.; Yu, L.
2015-12-01
We clarify the relation between two kinds of statistics for particle excitations in planar systems: the braid statistics of anyons and the Haldane exclusion statistics (HES). It is shown nonperturbatively that the HES exists for incompressible anyon liquid in the presence of a Hall response. We also study the statistical properties of a specific quantum anomalous Hall model with Chern-Simons term by perturbation in both compressible and incompressible regimes, where the crucial role of edge states to the HES is shown.
On the statistical properties of Klein polyhedra in three-dimensional lattices
Illarionov, A A
2013-06-30
We obtain asymptotic formulae for the average values of the number of faces of a fixed type and of vertices of Klein polyhedra of three-dimensional integer lattices with a given determinant. Bibliography: 20 titles.
Statistical Signal Models and Algorithms for Image Analysis
1984-10-25
In this report, two-dimensional stochastic linear models are used in developing algorithms for image analysis such as classification, segmentation, and object detection in images characterized by textured backgrounds. These models generate two-dimensional random processes as outputs to which statistical inference procedures can naturally be applied. A common thread throughout our algorithms is the interpretation of the inference procedures in terms of linear prediction
NASA Astrophysics Data System (ADS)
Verma, Sanjeet K.; Oliveira, Elson P.
2013-08-01
In present work, we applied two sets of new multi-dimensional geochemical diagrams (Verma et al., 2013) obtained from linear discriminant analysis (LDA) of natural logarithm-transformed ratios of major elements and immobile major and trace elements in acid magmas to decipher plate tectonic settings and corresponding probability estimates for Paleoproterozoic rocks from Amazonian craton, São Francisco craton, São Luís craton, and Borborema province of Brazil. The robustness of LDA minimizes the effects of petrogenetic processes and maximizes the separation among the different tectonic groups. The probability based boundaries further provide a better objective statistical method in comparison to the commonly used subjective method of determining the boundaries by eye judgment. The use of readjusted major element data to 100% on an anhydrous basis from SINCLAS computer program, also helps to minimize the effects of post-emplacement compositional changes and analytical errors on these tectonic discrimination diagrams. Fifteen case studies of acid suites highlighted the application of these diagrams and probability calculations. The first case study on Jamon and Musa granites, Carajás area (Central Amazonian Province, Amazonian craton) shows a collision setting (previously thought anorogenic). A collision setting was clearly inferred for Bom Jardim granite, Xingú area (Central Amazonian Province, Amazonian craton) The third case study on Older São Jorge, Younger São Jorge and Maloquinha granites Tapajós area (Ventuari-Tapajós Province, Amazonian craton) indicated a within-plate setting (previously transitional between volcanic arc and within-plate). We also recognized a within-plate setting for the next three case studies on Aripuanã and Teles Pires granites (SW Amazonian craton), and Pitinga area granites (Mapuera Suite, NW Amazonian craton), which were all previously suggested to have been emplaced in post-collision to within-plate settings. The seventh case
Zhao, Xi; Dellandréa, Emmanuel; Chen, Liming; Kakadiaris, Ioannis A
2011-10-01
Three-dimensional face landmarking aims at automatically localizing facial landmarks and has a wide range of applications (e.g., face recognition, face tracking, and facial expression analysis). Existing methods assume neutral facial expressions and unoccluded faces. In this paper, we propose a general learning-based framework for reliable landmark localization on 3-D facial data under challenging conditions (i.e., facial expressions and occlusions). Our approach relies on a statistical model, called 3-D statistical facial feature model, which learns both the global variations in configurational relationships between landmarks and the local variations of texture and geometry around each landmark. Based on this model, we further propose an occlusion classifier and a fitting algorithm. Results from experiments on three publicly available 3-D face databases (FRGC, BU-3-DFE, and Bosphorus) demonstrate the effectiveness of our approach, in terms of landmarking accuracy and robustness, in the presence of expressions and occlusions.
Lange, Kenneth; Papp, Jeanette C.; Sinsheimer, Janet S.; Sobel, Eric M.
2014-01-01
Statistical genetics is undergoing the same transition to big data that all branches of applied statistics are experiencing. With the advent of inexpensive DNA sequencing, the transition is only accelerating. This brief review highlights some modern techniques with recent successes in statistical genetics. These include: (a) lasso penalized regression and association mapping, (b) ethnic admixture estimation, (c) matrix completion for genotype and sequence data, (d) the fused lasso and copy number variation, (e) haplotyping, (f) estimation of relatedness, (g) variance components models, and (h) rare variant testing. For more than a century, genetics has been both a driver and beneficiary of statistical theory and practice. This symbiotic relationship will persist for the foreseeable future. PMID:24955378
Computationally efficient Bayesian inference for inverse problems.
Marzouk, Youssef M.; Najm, Habib N.; Rahn, Larry A.
2007-10-01
Bayesian statistics provides a foundation for inference from noisy and incomplete data, a natural mechanism for regularization in the form of prior information, and a quantitative assessment of uncertainty in the inferred results. Inverse problems - representing indirect estimation of model parameters, inputs, or structural components - can be fruitfully cast in this framework. Complex and computationally intensive forward models arising in physical applications, however, can render a Bayesian approach prohibitive. This difficulty is compounded by high-dimensional model spaces, as when the unknown is a spatiotemporal field. We present new algorithmic developments for Bayesian inference in this context, showing strong connections with the forward propagation of uncertainty. In particular, we introduce a stochastic spectral formulation that dramatically accelerates the Bayesian solution of inverse problems via rapid evaluation of a surrogate posterior. We also explore dimensionality reduction for the inference of spatiotemporal fields, using truncated spectral representations of Gaussian process priors. These new approaches are demonstrated on scalar transport problems arising in contaminant source inversion and in the inference of inhomogeneous material or transport properties. We also present a Bayesian framework for parameter estimation in stochastic models, where intrinsic stochasticity may be intermingled with observational noise. Evaluation of a likelihood function may not be analytically tractable in these cases, and thus several alternative Markov chain Monte Carlo (MCMC) schemes, operating on the product space of the observations and the parameters, are introduced.
Thermodynamics of a one-dimensional ideal gas with fractional exclusion statistics
Murthy, M.V.N.; Shankar, R. )
1994-12-19
We show that the particles in the Calogero-Sutherland model obey fractional exclusion statistics as defined by Haldane. We construct anyon number densities and derive the energy distribution function. We show that the partition function factorizes in the form characteristic of an ideal gas. The virial expansion is exactly computable and interestingly it is only the second virial coefficient that encodes the statistics information.
NASA Technical Reports Server (NTRS)
Bonavito, N. L.; Gordon, C. L.; Inguva, R.; Serafino, G. N.; Barnes, R. A.
1994-01-01
NASA's Mission to Planet Earth (MTPE) will address important interdisciplinary and environmental issues such as global warming, ozone depletion, deforestation, acid rain, and the like with its long term satellite observations of the Earth and with its comprehensive Data and Information System. Extensive sets of satellite observations supporting MTPE will be provided by the Earth Observing System (EOS), while more specific process related observations will be provided by smaller Earth Probes. MTPE will use data from ground and airborne scientific investigations to supplement and validate the global observations obtained from satellite imagery, while the EOS satellites will support interdisciplinary research and model development. This is important for understanding the processes that control the global environment and for improving the prediction of events. In this paper we illustrate the potential for powerful artificial intelligence (AI) techniques when used in the analysis of the formidable problems that exist in the NASA Earth Science programs and of those to be encountered in the future MTPE and EOS programs. These techniques, based on the logical and probabilistic reasoning aspects of plausible inference, strongly emphasize the synergetic relation between data and information. As such, they are ideally suited for the analysis of the massive data streams to be provided by both MTPE and EOS. To demonstrate this, we address both the satellite imagery and model enhancement issues for the problem of ozone profile retrieval through a method based on plausible scientific inferencing. Since in the retrieval problem, the atmospheric ozone profile that is consistent with a given set of measured radiances may not be unique, an optimum statistical method is used to estimate a 'best' profile solution from the radiances and from additional a priori information.
Statistical Analysis of Current Sheets in Three-dimensional Magnetohydrodynamic Turbulence
NASA Astrophysics Data System (ADS)
Zhdankin, Vladimir; Uzdensky, Dmitri A.; Perez, Jean C.; Boldyrev, Stanislav
2013-07-01
We develop a framework for studying the statistical properties of current sheets in numerical simulations of magnetohydrodynamic (MHD) turbulence with a strong guide field, as modeled by reduced MHD. We describe an algorithm that identifies current sheets in a simulation snapshot and then determines their geometrical properties (including length, width, and thickness) and intensities (peak current density and total energy dissipation rate). We then apply this procedure to simulations of reduced MHD and perform a statistical analysis on the obtained population of current sheets. We evaluate the role of reconnection by separately studying the populations of current sheets which contain magnetic X-points and those which do not. We find that the statistical properties of the two populations are different in general. We compare the scaling of these properties to phenomenological predictions obtained for the inertial range of MHD turbulence. Finally, we test whether the reconnecting current sheets are consistent with the Sweet-Parker model.
Spatial statistics of magnetic field in two-dimensional chaotic flow in the resistive growth stage
NASA Astrophysics Data System (ADS)
Kolokolov, I. V.
2017-03-01
The correlation tensors of magnetic field in a two-dimensional chaotic flow of conducting fluid are studied. It is shown that there is a stage of resistive evolution where the field correlators grow exponentially with time. The two- and four-point field correlation tensors are computed explicitly in this stage in the framework of Batchelor-Kraichnan-Kazantsev model. They demonstrate strong temporal intermittency of the field fluctuations and high level of non-Gaussianity in spatial field distribution.
NASA Astrophysics Data System (ADS)
Hunziker, Jürg; Laloy, Eric; Linde, Niklas
2016-04-01
Deterministic inversion procedures can often explain field data, but they only deliver one final subsurface model that depends on the initial model and regularization constraints. This leads to poor insights about the uncertainties associated with the inferred model properties. In contrast, probabilistic inversions can provide an ensemble of model realizations that accurately span the range of possible models that honor the available calibration data and prior information allowing a quantitative description of model uncertainties. We reconsider the problem of inferring the dielectric permittivity (directly related to radar velocity) structure of the subsurface by inversion of first-arrival travel times from crosshole ground penetrating radar (GPR) measurements. We rely on the DREAM_(ZS) algorithm that is a state-of-the-art Markov chain Monte Carlo (MCMC) algorithm. Such algorithms need several orders of magnitude more forward simulations than deterministic algorithms and often become infeasible in high parameter dimensions. To enable high-resolution imaging with MCMC, we use a recently proposed dimensionality reduction approach that allows reproducing 2D multi-Gaussian fields with far fewer parameters than a classical grid discretization. We consider herein a dimensionality reduction from 5000 to 257 unknowns. The first 250 parameters correspond to a spectral representation of random and uncorrelated spatial fluctuations while the remaining seven geostatistical parameters are (1) the standard deviation of the data error, (2) the mean and (3) the variance of the relative electric permittivity, (4) the integral scale along the major axis of anisotropy, (5) the anisotropy angle, (6) the ratio of the integral scale along the minor axis of anisotropy to the integral scale along the major axis of anisotropy and (7) the shape parameter of the Matérn function. The latter essentially defines the type of covariance function (e.g., exponential, Whittle, Gaussian). We present
NASA Astrophysics Data System (ADS)
Cai, Juntao; Chen, Xiaobin; Xu, Xiwei; Tang, Ji; Wang, Lifeng; Guo, Chunling; Han, Bing; Dong, Zeyi
2017-02-01
A three-dimensional (3-D) resistivity model around the 2014 Ms6.5 Ludian earthquake was obtained. The model shows that the aftershocks were mainly distributed in a shallow inverse L-shaped conductive angular region surrounded by resistive structures. The presences of this shallow conductive zone may be the key factor leading to the severe damage and surface rupture of the Ludian earthquake. A northwest trending local resistive belt along the Baogunao-Xiaohe fault interrupts the northeast trending conductive zone at the Zhaotong-Lianfeng fault zone in the middle crust, which may be the seismogenic structure of the main shock. Based on the 3-D electrical model, combining with GPS, thermal structure, and seismic survey results, a geodynamic model is proposed to interpret the seismotectonics, deep seismogenic background, and deformation characterized by a sinistral strike slip with a tensile component of the Ludian earthquake.
NASA Astrophysics Data System (ADS)
Yamawaki, Teruo; Tanaka, Satoru; Ueki, Sadato; Hamaguchi, Hiroyuki; Nakamichi, Haruhisa; Nishimura, Takeshi; Oikawa, Jun; Tsutsui, Tomoki; Nishi, Kiyoshi; Shimizu, Hiroshi; Yamaguchi, Sosuke; Miyamachi, Hiroki; Yamasato, Hitoshi; Hayashi, Yutaka
2004-12-01
The three-dimensional P-wave velocity structure of the Bandai volcano has been revealed by tomographic inversion using approximately 2200 travel-time data collected during an active seismic survey comprising 298 temporary seismic stations and eight artificial shots. The key result of this study is the delineation of a high-velocity anomaly (Vp>4.6 km/s at sea-level) immediately below the summit peak. This feature extends to depths of 1-2 km below sea-level. The near-surface horizontal position of the high-velocity anomaly coincides well with that of a positive Bouguer gravity anomaly. Geological data demonstrate that sector collapses have occurred in all directions from the summit and that the summit crater has been repeatedly refilled with magmatic material. These observations suggest that the high-velocity region revealed in this study is a manifestation of an almost-solidified magmatic plumbing system. We have also noted that a near-surface low-velocity region (Vp<3.0 km/s at sea-level) on the southern foot of the volcano corresponds to the position of volcanic sediments including ash and debris avalanche material. In addition, we have made use of the tomographic results to recompute the hypocenters of earthquake occurring during seismic swarms beneath the summit in 1988 and 2000. Relocating the earthquakes using the three-dimensional velocity model clearly indicates that they predominantly occurred on two steeply dipping planes. Low-frequency earthquakes observed during the swarms in 2000 occurred in the seismic gap between the two clusters. The hypocentral regions of the seismic swarms and the low-frequency earthquakes are close to the higher-velocity zone beneath the volcano's summit. These observations suggest that the recent seismic activity beneath the summit is likely associated with thermal energy being released within the solidifying magmatic plumbing system.
Singleton, J.; Harrison, N.; Mielke, C. H.; Schlueter, J. A.; Materials Science Division; LANL; Univ. of Oxford
2001-11-05
Although quasi-two-dimensional organic superconductors such as {kappa}-(BEDT-TTF){sub 2}Cu(NCS){sub 2} (BEDT-TTF{triple_bond}bis(ethylene-dithio)tetrathiafulvalene) seem to be very clean systems, with apparent quasiparticle mean free paths of several thousand angstroms, the superconducting transition is intrinsically broad (e.g. {approx}1 K wide for {Tc}{approx}10 K). We propose that this is due to the extreme anisotropy of these materials, which greatly exacerbates the statistical effects of spatial variations in the potential experienced by the quasiparticles. Using a statistical model, we are able to account for the experimental observations. A parameter {bar x}, which characterizes the spatial potential variations, may be derived from Shubnikov-de Haas oscillation experiments. Using this value, we are able to predict a transition width which is in good agreement with that observed in megahertz penetration-depth measurements on the same sample.
NASA Astrophysics Data System (ADS)
Qi, Di; Majda, Andrew J.
2017-03-01
A low-dimensional reduced-order statistical closure model is developed for quantifying the uncertainty to changes in forcing in a barotropic turbulent system with topography involving interactions between small-scale motions and a large-scale mean flow. Imperfect model sensitivity is improved through a recent mathematical strategy for calibrating model errors in a training phase, where information theory and linear statistical response theory are combined in a systematic fashion to achieve the optimal model parameters. Statistical theories about a Gaussian invariant measure and the exact statistical energy equations are also developed for the truncated barotropic equations that can be used to improve the imperfect model prediction skill. A stringent paradigm model of 57 degrees of freedom is used to display the feasibility of the reduced-order methods. This simple model creates large-scale zonal mean flow shifting directions from westward to eastward jets with an abrupt change in amplitude when perturbations are applied, and prototype blocked and unblocked patterns can be generated in this simple model similar to the real natural system. Principal statistical responses in mean and variance can be captured by the reduced-order models with desirable accuracy and efficiency with only 3 resolved modes. An even more challenging regime with non-Gaussian equilibrium statistics using the fluctuation equations is also tested in the reduced-order models with accurate prediction using the first 5 resolved modes. These reduced-order models also show potential for uncertainty quantification and prediction in more complex realistic geophysical turbulent dynamical systems.
Heat balance statistics derived from four-dimensional assimilations with a global circulation model
NASA Technical Reports Server (NTRS)
Schubert, S. D.; Herman, G. F.
1981-01-01
The reported investigation was conducted to develop a reliable procedure for obtaining the diabatic and vertical terms required for atmospheric heat balance studies. The method developed employs a four-dimensional assimilation mode in connection with the general circulation model of NASA's Goddard Laboratory for Atmospheric Sciences. The initial analysis was conducted with data obtained in connection with the 1976 Data Systems Test. On the basis of the results of the investigation, it appears possible to use the model's observationally constrained diagnostics to provide estimates of the global distribution of virtually all of the quantities which are needed to compute the atmosphere's heat and energy balance.
2017-01-01
A major purpose of exploratory metabolic profiling is for the identification of molecular species that are statistically associated with specific biological or medical outcomes; unfortunately, the structure elucidation process of unknowns is often a major bottleneck in this process. We present here new holistic strategies that combine different statistical spectroscopic and analytical techniques to improve and simplify the process of metabolite identification. We exemplify these strategies using study data collected as part of a dietary intervention to improve health and which elicits a relatively subtle suite of changes from complex molecular profiles. We identify three new dietary biomarkers related to the consumption of peas (N-methyl nicotinic acid), apples (rhamnitol), and onions (N-acetyl-S-(1Z)-propenyl-cysteine-sulfoxide) that can be used to enhance dietary assessment and assess adherence to diet. As part of the strategy, we introduce a new probabilistic statistical spectroscopy tool, RED-STORM (Resolution EnhanceD SubseT Optimization by Reference Matching), that uses 2D J-resolved 1H NMR spectra for enhanced information recovery using the Bayesian paradigm to extract a subset of spectra with similar spectral signatures to a reference. RED-STORM provided new information for subsequent experiments (e.g., 2D-NMR spectroscopy, solid-phase extraction, liquid chromatography prefaced mass spectrometry) used to ultimately identify an unknown compound. In summary, we illustrate the benefit of acquiring J-resolved experiments alongside conventional 1D 1H NMR as part of routine metabolic profiling in large data sets and show that application of complementary statistical and analytical techniques for the identification of unknown metabolites can be used to save valuable time and resources. PMID:28240543
Tegnér, Jesper; Zenil, Hector; Kiani, Narsis A; Ball, Gordon; Gomez-Cabrero, David
2016-11-13
Systems in nature capable of collective behaviour are nonlinear, operating across several scales. Yet our ability to account for their collective dynamics differs in physics, chemistry and biology. Here, we briefly review the similarities and differences between mathematical modelling of adaptive living systems versus physico-chemical systems. We find that physics-based chemistry modelling and computational neuroscience have a shared interest in developing techniques for model reductions aiming at the identification of a reduced subsystem or slow manifold, capturing the effective dynamics. By contrast, as relations and kinetics between biological molecules are less characterized, current quantitative analysis under the umbrella of bioinformatics focuses on signal extraction, correlation, regression and machine-learning analysis. We argue that model reduction analysis and the ensuing identification of manifolds bridges physics and biology. Furthermore, modelling living systems presents deep challenges as how to reconcile rich molecular data with inherent modelling uncertainties (formalism, variables selection and model parameters). We anticipate a new generative data-driven modelling paradigm constrained by identified governing principles extracted from low-dimensional manifold analysis. The rise of a new generation of models will ultimately connect biology to quantitative mechanistic descriptions, thereby setting the stage for investigating the character of the model language and principles driving living systems.This article is part of the themed issue 'Multiscale modelling at the physics-chemistry-biology interface'.
Zenil, Hector; Kiani, Narsis A.; Ball, Gordon; Gomez-Cabrero, David
2016-01-01
Systems in nature capable of collective behaviour are nonlinear, operating across several scales. Yet our ability to account for their collective dynamics differs in physics, chemistry and biology. Here, we briefly review the similarities and differences between mathematical modelling of adaptive living systems versus physico-chemical systems. We find that physics-based chemistry modelling and computational neuroscience have a shared interest in developing techniques for model reductions aiming at the identification of a reduced subsystem or slow manifold, capturing the effective dynamics. By contrast, as relations and kinetics between biological molecules are less characterized, current quantitative analysis under the umbrella of bioinformatics focuses on signal extraction, correlation, regression and machine-learning analysis. We argue that model reduction analysis and the ensuing identification of manifolds bridges physics and biology. Furthermore, modelling living systems presents deep challenges as how to reconcile rich molecular data with inherent modelling uncertainties (formalism, variables selection and model parameters). We anticipate a new generative data-driven modelling paradigm constrained by identified governing principles extracted from low-dimensional manifold analysis. The rise of a new generation of models will ultimately connect biology to quantitative mechanistic descriptions, thereby setting the stage for investigating the character of the model language and principles driving living systems. This article is part of the themed issue ‘Multiscale modelling at the physics–chemistry–biology interface’. PMID:27698038
ERIC Educational Resources Information Center
Watson, Jane
2007-01-01
Inference, or decision making, is seen in curriculum documents as the final step in a statistical investigation. For a formal statistical enquiry this may be associated with sophisticated tests involving probability distributions. For young students without the mathematical background to perform such tests, it is still possible to draw informal…
Statistical Mechanics of the Geometric Control of Flow Topology in Two-Dimensional Turbulence
NASA Astrophysics Data System (ADS)
Nadiga, Balasubramanya; Loxley, Peter
2013-04-01
We apply the principle of maximum entropy to two dimensional turbulence in a new fashion to predict the effect of geometry on flow topology. We consider two prototypical regimes of turbulence that lead to frequently observed self-organized coherent structures. Our theory predicts bistable behavior that exhibits hysteresis and large abrupt changes in flow topology in one regime; the other regime is predicted to exhibit monstable behavior with a continuous change of flow topology. The predictions are confirmed in fully nonlinear numerical simulations of the two-dimensional Navier-Stokes equation. These results suggest an explanation of the low frequency regime transitions that have been observed in the non-equilibrium setting of this problem. Following further development in the non-equilibrium context, we expect that insights developed in this problem should be useful in developing a better understanding of the phenomenon of low frequency regime transitions that is a pervasive feature of the weather and climate systems. Familiar occurrences of this phenomenon---wherein extreme and abrupt qualitative changes occur, seemingly randomly, after very long periods of apparent stability---include blocking in the extra-tropical winter atmosphere, the bimodality of the Kuroshio extension system, the Dansgaard-Oeschger events, and the glacial-interglacial transitions.
Boundary dynamics and the statistical mechanics of the 2 + 1-dimensional black hole
NASA Astrophysics Data System (ADS)
Bañados, Máximo; Brotz, Thorsten; Ortiz, Miguel E.
1999-04-01
We calculate the density of states of the 2 + 1-dimensional BTZ black hole in the micro-and grand-canonical ensembles. Our starting point is the relation between 2 + 1-dimensional quantum gravity and quantised Chern-Simons theory. In the micro-canonical ensemble, we find the Bekenstein-Hawking entropy by relating a Kac-Moody algebra of global gauge charges to a Virasoro algebra with a classical central charge via a twisted Sugawara construction. This construction is valid at all values of the black hole radius. At infinity it gives the asymptotic isometries of the black hole, and at the horizon it gives an explicit form for a set of deformations of the horizon whose algebra is the same Virasoro algebra. In the grand-canonical ensemble we define the partition function by using a surface term at infinity that is compatible with fixing the temperature and angular velocity of the black hole. We then compute the partition function directly in a boundary Wess-Zumino-Witten theory, and find that we obtain the correct result only after we include a source term at the horizon that induces a non-trivial spin-structure on the WZW partition function.
NASA Astrophysics Data System (ADS)
Lee, Kean Loon; Grémaud, Benoît; Miniatura, Christian
2014-10-01
As recently discovered [T. Karpiuk et al., Phys. Rev. Lett. 109, 190601 (2012), 10.1103/PhysRevLett.109.190601], Anderson localization in a bulk disordered system triggers the emergence of a coherent forward scattering (CFS) peak in momentum space, which twins the well-known coherent backscattering (CBS) peak observed in weak localization experiments. Going beyond the perturbative regime, we address here the long-time dynamics of the CFS peak in a one-dimensional random system and we relate this novel interference effect to the statistical properties of the eigenfunctions and eigenspectrum of the corresponding random Hamiltonian. Our numerical results show that the dynamics of the CFS peak is governed by the logarithmic level repulsion between localized states, with a time scale that is, with good accuracy, twice the Heisenberg time. This is in perfect agreement with recent findings based on the nonlinear sigma model. In the stationary regime, the width of the CFS peak in momentum space is inversely proportional to the localization length, reflecting the exponential decay of the eigenfunctions in real space, while its height is exactly twice the background, reflecting the Poisson statistical properties of the eigenfunctions. It would be interesting to extend our results to higher dimensional systems and other symmetry classes.
NASA Astrophysics Data System (ADS)
Jameson, A. R.; Larsen, M. L.
2016-06-01
Microphysical understanding of the variability in rain requires a statistical characterization of different drop sizes both in time and in all dimensions of space. Temporally, there have been several statistical characterizations of raindrop counts. However, temporal and spatial structures are neither equivalent nor readily translatable. While there are recent reports of the one-dimensional spatial correlation functions in rain, they can only be assumed to represent the two-dimensional (2D) correlation function under the assumption of spatial isotropy. To date, however, there are no actual observations of the (2D) spatial correlation function in rain over areas. Two reasons for this deficiency are the fiscal and the physical impossibilities of assembling a dense network of instruments over even hundreds of meters much less over kilometers. Consequently, all measurements over areas will necessarily be sparsely sampled. A dense network of data must then be estimated using interpolations from the available observations. In this work, a network of 19 optical disdrometers over a 100 m by 71 m area yield observations of drop spectra every minute. These are then interpolated to a 1 m resolution grid. Fourier techniques then yield estimates of the 2D spatial correlation functions. Preliminary examples using this technique found that steadier, light rain decorrelates spatially faster than does the convective rain, but in both cases the 2D spatial correlation functions are anisotropic, reflecting an asymmetry in the physical processes influencing the rain reaching the ground not accounted for in numerical microphysical models.
Three-dimensional segmentation of the heart muscle using image statistics
NASA Astrophysics Data System (ADS)
Nillesen, Maartje M.; Lopata, Richard G. P.; Gerrits, Inge H.; Kapusta, Livia; Huisman, Henkjan H.; Thijssen, Johan M.; de Korte, Chris L.
2006-03-01
Segmentation of the heart muscle in 3D echocardiographic images provides a tool for visualization of cardiac anatomy and assessment of heart function, and serves as an important pre-processing step for cardiac strain imaging. By incorporating spatial and temporal information of 3D ultrasound image sequences (4D), a fully automated method using image statistics was developed to perform 3D segmentation of the heart muscle. 3D rf-data were acquired with a Philips SONOS 7500 live 3D ultrasound system, and an X4 matrix array transducer (2-4 MHz). Left ventricular images of five healthy children were taken in transthoracial short/long axis view. As a first step, image statistics of blood and heart muscle were investigated. Next, based on these statistics, an adaptive mean squares filter was selected and applied to the images. Window size was related to speckle size (5x2 speckles). The degree of adaptive filtering was automatically steered by the local homogeneity of tissue. As a result, discrimination of heart muscle and blood was optimized, while sharpness of edges was preserved. After this pre-processing stage, homomorphic filtering and automatic thresholding were performed to obtain the inner borders of the heart muscle. Finally, a deformable contour algorithm was used to yield a closed contour of the left ventricular cavity in each elevational plane. Each contour was optimized using contours of the surrounding planes (spatial and temporal) as limiting condition to ensure spatial and temporal continuity. Better segmentation of the ventricle was obtained using 4D information than using information of each plane separately.
Emergent exclusion statistics of quasiparticles in two-dimensional topological phases
NASA Astrophysics Data System (ADS)
Hu, Yuting; Stirling, Spencer D.; Wu, Yong-Shi
2014-03-01
We demonstrate how the generalized Pauli exclusion principle emerges for quasiparticle excitations in 2D topological phases. As an example, we examine the Levin-Wen model with the Fibonacci data (specified in the text), and construct the number operator for fluxons living on plaquettes. By numerically counting the many-body states with fluxon number fixed, the matrix of exclusion statistics parameters is identified and is shown to depend on the spatial topology (sphere or torus) of the system. Our work reveals the structure of the (many-body) Hilbert space and some general features of thermodynamics for quasiparticle excitations in topological matter.
Statistical properties of three-dimensional two-fluid plasma model
Qaisrani, M. Hasnain; Xia, ZhenWei; Zou, Dandan
2015-09-15
The nonlinear dynamics of incompressible non-dissipative two-fluid plasma model is investigated through classical Gibbs ensemble methods. Liouville's theorem of phase space for each wave number is proved, and the absolute equilibrium spectra for Galerkin truncated two-fluid model are calculated. In two-fluid theory, the equilibrium is built on the conservation of three quadratic invariants: the total energy and the self-helicities for ions and electrons fluid, respectively. The implications of statistic equilibrium spectra with arbitrary ratios of conserved invariants are discussed.
Derrida, Bernard; Meerson, Baruch; Sasorov, Pavel V
2016-04-01
Consider a one-dimensional branching Brownian motion and rescale the coordinate and time so that the rates of branching and diffusion are both equal to 1. If X_{1}(t) is the position of the rightmost particle of the branching Brownian motion at time t, the empirical velocity c of this rightmost particle is defined as c=X_{1}(t)/t. Using the Fisher-Kolmogorov-Petrovsky-Piscounov equation, we evaluate the probability distribution P(c,t) of this empirical velocity c in the long-time t limit for c>2. It is already known that, for a single seed particle, P(c,t)∼exp[-(c^{2}/4-1)t] up to a prefactor that can depend on c and t. Here we show how to determine this prefactor. The result can be easily generalized to the case of multiple seed particles and to branching random walks associated with other traveling-wave equations.
Vorticity statistics in the direct cascade of two-dimensional turbulence.
Falkovich, Gregory; Lebedev, Vladimir
2011-04-01
For the direct cascade of steady two-dimensional (2D) Navier-Stokes turbulence, we derive analytically the probability of strong vorticity fluctuations. When ϖ is the vorticity coarse-grained over a scale R, the probability density function (PDF), P(ϖ), has a universal asymptotic behavior lnP~-ϖ/ϖ(rms) at ϖ≫ϖ(rms)=[Hln(L/R)](1/3), where H is the enstrophy flux and L is the pumping length. Therefore, the PDF has exponential tails and is self-similar, that is, it can be presented as a function of a single argument, ϖ/ϖ(rms), in distinction from other known direct cascades.
NASA Astrophysics Data System (ADS)
Derrida, Bernard; Meerson, Baruch; Sasorov, Pavel V.
2016-04-01
Consider a one-dimensional branching Brownian motion and rescale the coordinate and time so that the rates of branching and diffusion are both equal to 1. If X1(t ) is the position of the rightmost particle of the branching Brownian motion at time t , the empirical velocity c of this rightmost particle is defined as c =X1(t ) /t . Using the Fisher-Kolmogorov-Petrovsky-Piscounov equation, we evaluate the probability distribution P (c ,t ) of this empirical velocity c in the long-time t limit for c >2 . It is already known that, for a single seed particle, P (c ,t ) ˜exp[-(c2/4 -1 ) t ] up to a prefactor that can depend on c and t . Here we show how to determine this prefactor. The result can be easily generalized to the case of multiple seed particles and to branching random walks associated with other traveling-wave equations.
Statistical theory of reversals in two-dimensional confined turbulent flows
NASA Astrophysics Data System (ADS)
Shukla, Vishwanath; Fauve, Stephan; Brachet, Marc
2016-12-01
It is shown that the truncated Euler equation (TEE), i.e., a finite set of ordinary differential equations for the amplitude of the large-scale modes, can correctly describe the complex transitional dynamics that occur within the turbulent regime of a confined two-dimensional flow obeying Navier-Stokes equation (NSE) with bottom friction and a spatially periodic forcing. The random reversals of the NSE large-scale circulation on the turbulent background involve bifurcations of the probability distribution function of the large-scale circulation. We demonstrate that these NSE bifurcations are described by the related TEE microcanonical distribution which displays transitions from Gaussian to bimodal and broken ergodicity. A minimal 13-mode model reproduces these results.
NASA Technical Reports Server (NTRS)
Balkanski, Yves J.; Jacob, Daniel J.; Gardner, Geraldine M.; Graustein, William C.; Turekian, Karl K.
1993-01-01
A global three-dimensional model is used to investigate the transport and tropospheric residence time of Pb-210, an aerosol tracer produced in the atmosphere by radioactive decay of Rn-222 emitted from soils. The model uses meteorological input with 4 deg x 5 deg horizontal resolution and 4-hour temporal resolution from the Goddard Institute for Space Studies general circulation model (GCM). It computes aerosol scavenging by convective precipitation as part of the wet convective mass transport operator in order to capture the coupling between vertical transport and rainout. Scavenging in convective precipitation accounts for 74% of the global Pb-210 sink in the model; scavenging in large-scale precipitation accounts for 12%, and scavenging in dry deposition accounts for 14%. The model captures 63% of the variance of yearly mean Pb-210 concentrations measured at 85 sites around the world with negligible mean bias, lending support to the computation of aerosol scavenging. There are, however, a number of regional and seasonal discrepancies that reflect in part anomalies in GCM precipitation. Computed residence times with respect to deposition for Pb-210 aerosol in the tropospheric column are about 5 days at southern midlatitudes and 10-15 days in the tropics; values at northern midlatitudes vary from about 5 days in winter to 10 days in summer. The residence time of Pb-210 produced in the lowest 0.5 km of atmosphere is on average four times shorter than that of Pb-210 produced in the upper atmosphere. Both model and observations indicate a weaker decrease of Pb-210 concentrations between the continental mixed layer and the free troposphere than is observed for total aerosol concentrations; an explanation is that Rn-222 is transported to high altitudes in wet convective updrafts, while aerosols and soluble precursors of aerosols are scavenged by precipitation in the updrafts. Thus Pb-210 is not simply a tracer of aerosols produced in the continental boundary layer, but
NASA Astrophysics Data System (ADS)
King, Gary; Rosen, Ori; Tanner, Martin A.
2004-09-01
This collection of essays brings together a diverse group of scholars to survey the latest strategies for solving ecological inference problems in various fields. The last half-decade has witnessed an explosion of research in ecological inference--the process of trying to infer individual behavior from aggregate data. Although uncertainties and information lost in aggregation make ecological inference one of the most problematic types of research to rely on, these inferences are required in many academic fields, as well as by legislatures and the Courts in redistricting, by business in marketing research, and by governments in policy analysis.
Statistical Mechanics and Dynamics of a Three-Dimensional Glass-Forming System
NASA Astrophysics Data System (ADS)
Lerner, Edan; Procaccia, Itamar; Zylberg, Jacques
2009-03-01
In the context of a classical example of glass formation in three dimensions, we exemplify how to construct a statistical-mechanical theory of the glass transition. At the heart of the approach is a simple criterion for verifying a proper choice of upscaled quasispecies that allow the construction of a theory with a finite number of “states.” Once constructed, the theory identifies a typical scale ξ that increases rapidly with lowering the temperature and which determines the α-relaxation time τα as τα˜exp(μξ/T), with μ a typical chemical potential. The theory can predict relaxation times at temperatures that are inaccessible to numerical simulations.
NASA Astrophysics Data System (ADS)
Dienes, Keith R.
2006-05-01
Recent developments in string theory have reinforced the notion that the space of stable supersymmetric and nonsupersymmetric string vacua fills out a landscape whose features are largely unknown. It is then hoped that progress in extracting phenomenological predictions from string theory—such as correlations between gauge groups, matter representations, potential values of the cosmological constant, and so forth—can be achieved through statistical studies of these vacua. To date, most of the efforts in these directions have focused on type I vacua. In this note, we present the first results of a statistical study of the heterotic landscape, focusing on more than 105 explicit nonsupersymmetric tachyon-free heterotic string vacua and their associated gauge groups and one-loop cosmological constants. Although this study has several important limitations, we find a number of intriguing features which may be relevant for the heterotic landscape as a whole. These features include different probabilities and correlations for different possible gauge groups as functions of the number of orbifold twists. We also find a vast degeneracy amongst nonsupersymmetric string models, leading to a severe reduction in the number of realizable values of the cosmological constant as compared with naïve expectations. Finally, we find strong correlations between cosmological constants and gauge groups which suggest that heterotic string models with extremely small cosmological constants are overwhelmingly more likely to exhibit the standard model gauge group at the string scale than any of its grand-unified extensions. In all cases, heterotic world sheet symmetries such as modular invariance provide important constraints that do not appear in corresponding studies of type I vacua.
Two-dimensional wetting with binary disorder: a numerical study of the loop statistics
NASA Astrophysics Data System (ADS)
Garel, T.; Monthus, C.
2005-07-01
We numerically study the wetting (adsorption) transition of a polymer chain on a disordered substrate in 1+1 dimension. Following the Poland-Scheraga model of DNA denaturation, we use a Fixman-Freire scheme for the entropy of loops. This allows us to consider chain lengths of order N ˜105 to 106, with 104 disorder realizations. Our study is based on the statistics of loops between two contacts with the substrate, from which we define Binder-like parameters: their crossings for various sizes N allow a precise determination of the critical temperature, and their finite size properties yields a crossover exponent φ=1/(2-α) ≃0.5. We then analyse at criticality the distribution of loop length l in both regimes l ˜O(N) and 1 ≪l ≪N, as well as the finite-size properties of the contact density and energy. Our conclusion is that the critical exponents for the thermodynamics are the same as those of the pure case, except for strong logarithmic corrections to scaling. The presence of these logarithmic corrections in the thermodynamics is related to a disorder-dependent logarithmic singularity that appears in the critical loop distribution in the rescaled variable λ=l/N as λ↦1.
Pascal, Jean-Claude; Thomas, Jean-Hugh; Li, Jing-Fang
2008-10-01
It was recently shown that the statistical errors of the measurement in the acoustic energy density by the two microphone method in waveguide have little variation when the losses of coherence between microphones increase. To explain these intervals of uncertainty, the variance of the measurement is expressed in this paper as a function of the various energy quantities of the acoustic fields--energy densities and sound intensities. The necessary conditions to reach the lower bound are clarified. The results obtained are illustrated by an example of a one-dimensional partially coherent field, which allows one to specify the relationship between the coherence functions of the pressure and particle velocity and those of the two microphone signals.
Allen, J; Velsko, S
2009-11-16
This report explores the question of whether meaningful conclusions can be drawn regarding the transmission relationship between two microbial samples on the basis of differences observed between the two sample's respective genomes. Unlike similar forensic applications using human DNA, the rapid rate of microbial genome evolution combined with the dynamics of infectious disease require a shift in thinking on what it means for two samples to 'match' in support of a forensic hypothesis. Previous outbreaks for SARS-CoV, FMDV and HIV were examined to investigate the question of how microbial sequence data can be used to draw inferences that link two infected individuals by direct transmission. The results are counter intuitive with respect to human DNA forensic applications in that some genetic change rather than exact matching improve confidence in inferring direct transmission links, however, too much genetic change poses challenges, which can weaken confidence in inferred links. High rates of infection coupled with relatively weak selective pressure observed in the SARS-CoV and FMDV data lead to fairly low confidence for direct transmission links. Confidence values for forensic hypotheses increased when testing for the possibility that samples are separated by at most a few intermediate hosts. Moreover, the observed outbreak conditions support the potential to provide high confidence values for hypothesis that exclude direct transmission links. Transmission inferences are based on the total number of observed or inferred genetic changes separating two sequences rather than uniquely weighing the importance of any one genetic mismatch. Thus, inferences are surprisingly robust in the presence of sequencing errors provided the error rates are randomly distributed across all samples in the reference outbreak database and the novel sequence samples in question. When the number of observed nucleotide mutations are limited due to characteristics of the outbreak or the
Viecelli, J.A. )
1993-10-01
The Hamiltonian flow of a set of point vortices of like sign and strength has a low-temperature phase consisting of a rotating triangular lattice of vortices, and a normal temperature turbulent phase consisting of random clusters of vorticity that orbit about a common center along random tracks. The mean-field flow in the normal temperature phase has similarities with turbulent quasi-two-dimensional rotating laboratory and geophysical flows, whereas the low-temperature phase displays effects associated with quantum fluids. In the normal temperature phase the vortices follow power-law clustering distributions, while in the time domain random interval modulation of the vortex orbit radii fluctuations produces singular fractional exponent power-law low-frequency spectra corresponding to time autocorrelation functions with fractional exponent power-law tails. Enhanced diffusion is present in the turbulent state, whereas in the solid-body rotation state vortices thermally diffuse across the lattice. Over the entire temperature range the interaction energy of a single vortex in the field of the rest of the vortices follows positive temperature Fermi--Dirac statistics, with the zero temperature limit corresponding to the rotating crystal phase, and the infinite temperature limit corresponding to a Maxwellian distribution. Analyses of weather records dependent on the large-scale quasi-two-dimensional atmospheric circulation suggest the presence of singular fractional exponent power-law spectra and fractional exponent power-law autocorrelation tails, consistent with the theory.
NASA Astrophysics Data System (ADS)
Smith, L. W.; Al-Taie, H.; Sfigakis, F.; See, P.; Lesage, A. A. J.; Xu, B.; Griffiths, J. P.; Beere, H. E.; Jones, G. A. C.; Ritchie, D. A.; Kelly, M. J.; Smith, C. G.
2014-07-01
The properties of conductance in one-dimensional (1D) quantum wires are statistically investigated using an array of 256 lithographically identical split gates, fabricated on a GaAs/AlGaAs heterostructure. All the split gates are measured during a single cooldown under the same conditions. Electron many-body effects give rise to an anomalous feature in the conductance of a one-dimensional quantum wire, known as the "0.7 structure" (or "0.7 anomaly"). To handle the large data set, a method of automatically estimating the conductance value of the 0.7 structure is developed. Large differences are observed in the strength and value of the 0.7 structure [from 0.63 to 0.84×(2e2/h)], despite the constant temperature and identical device design. Variations in the 1D potential profile are quantified by estimating the curvature of the barrier in the direction of electron transport, following a saddle-point model. The 0.7 structure appears to be highly sensitive to the specific confining potential within individual devices.
Malinowski, Kathleen T.; Pantarotto, Jason R.; Senan, Suresh
2010-08-01
Purpose: To investigate the feasibility of modeling Stage III lung cancer tumor and node positions from anatomical surrogates. Methods and Materials: To localize their centroids, the primary tumor and lymph nodes from 16 Stage III lung cancer patients were contoured in 10 equal-phase planning four-dimensional (4D) computed tomography (CT) image sets. The centroids of anatomical respiratory surrogates (carina, xyphoid, nipples, mid-sternum) in each image set were also localized. The correlations between target and surrogate positions were determined, and ordinary least-squares (OLS) and partial least-squares (PLS) regression models based on a subset of respiratory phases (three to eight randomly selected) were created to predict the target positions in the remaining images. The three-phase image sets that provided the best predictive information were used to create models based on either the carina alone or all surrogates. Results: The surrogate most correlated with target motion varied widely. Depending on the number of phases used to build the models, mean OLS and PLS errors were 1.0 to 1.4 mm and 0.8 to 1.0 mm, respectively. Models trained on the 0%, 40%, and 80% respiration phases had mean ({+-} standard deviation) PLS errors of 0.8 {+-} 0.5 mm and 1.1 {+-} 1.1 mm for models based on all surrogates and carina alone, respectively. For target coordinates with motion >5 mm, the mean three-phase PLS error based on all surrogates was 1.1 mm. Conclusions: Our results establish the feasibility of inferring primary tumor and nodal motion from anatomical surrogates in 4D CT scans of Stage III lung cancer. Using inferential modeling to decrease the processing time of 4D CT scans may facilitate incorporation of patient-specific treatment margins.
Coory, M
2008-04-01
The aim of statistical analyses in cluster investigations is to estimate the probability that the aggregation of cases could be due to chance. As a result of several statistical problems - including the post-hoc nature of the analysis and the subjective nature of implied multiple comparisons - this cannot be carried out with any certainty. In cluster investigations, expert opinion should carry much more weight than P-values, which are exceedingly difficult to interpret.
NASA Astrophysics Data System (ADS)
Fyodorov, Yan V.; Bouchaud, Jean-Philippe
2008-08-01
We construct an N-dimensional Gaussian landscape with multiscale, translation invariant, logarithmic correlations and investigate the statistical mechanics of a single particle in this environment. In the limit of high dimension N → ∞ the free energy of the system and overlap function are calculated exactly using the replica trick and Parisi's hierarchical ansatz. In the thermodynamic limit, we recover the most general version of the Derrida's generalized random energy model (GREM). The low-temperature behaviour depends essentially on the spectrum of length scales involved in the construction of the landscape. If the latter consists of K discrete values, the system is characterized by a K-step replica symmetry breaking solution. We argue that our construction is in fact valid in any finite spatial dimensions N >= 1. We discuss the implications of our results for the singularity spectrum describing multifractality of the associated Boltzmann-Gibbs measure. Finally we discuss several generalizations and open problems, such as the dynamics in such a landscape and the construction of a generalized multifractal random walk.
NASA Astrophysics Data System (ADS)
Durand, Marc; Kraynik, Andrew M.; van Swol, Frank; Käfer, Jos; Quilliet, Catherine; Cox, Simon; Ataei Talebi, Shirin; Graner, François
2014-06-01
Bubble monolayers are model systems for experiments and simulations of two-dimensional packing problems of deformable objects. We explore the relation between the distributions of the number of bubble sides (topology) and the bubble areas (geometry) in the low liquid fraction limit. We use a statistical model [M. Durand, Europhys. Lett. 90, 60002 (2010), 10.1209/0295-5075/90/60002] which takes into account Plateau laws. We predict the correlation between geometrical disorder (bubble size dispersity) and topological disorder (width of bubble side number distribution) over an extended range of bubble size dispersities. Extensive data sets arising from shuffled foam experiments, surface evolver simulations, and cellular Potts model simulations all collapse surprisingly well and coincide with the model predictions, even at extremely high size dispersity. At moderate size dispersity, we recover our earlier approximate predictions [M. Durand, J. Kafer, C. Quilliet, S. Cox, S. A. Talebi, and F. Graner, Phys. Rev. Lett. 107, 168304 (2011), 10.1103/PhysRevLett.107.168304]. At extremely low dispersity, when approaching the perfectly regular honeycomb pattern, we study how both geometrical and topological disorders vanish. We identify a crystallization mechanism and explore it quantitatively in the case of bidisperse foams. Due to the deformability of the bubbles, foams can crystallize over a larger range of size dispersities than hard disks. The model predicts that the crystallization transition occurs when the ratio of largest to smallest bubble radii is 1.4.
NASA Astrophysics Data System (ADS)
Zhang, Honghai; Walker, Nicholas; Mitchell, Steven C.; Thomas, Matthew; Wahle, Andreas; Scholz, Thomas; Sonka, Milan
2006-03-01
Conventional analysis of cardiac ventricular magnetic resonance images is performed using short axis images and does not guarantee completeness and consistency of the ventricle coverage. In this paper, a four-dimensional (4D, 3D+time) left and right ventricle statistical shape model was generated from the combination of the long axis and short axis images. Iterative mutual intensity registration and interpolation were used to merge the long axis and short axis images into isotropic 4D images and simultaneously correct existing breathing artifact. Distance-based shape interpolation and approximation were used to generate complete ventricle shapes from the long axis and short axis manual segmentations. Landmarks were automatically generated and propagated to 4D data samples using rigid alignment, distance-based merging, and B-spline transform. Principal component analysis (PCA) was used in model creation and analysis. The two strongest modes of the shape model captured the most important shape feature of Tetralogy of Fallot (TOF) patients, right ventricle enlargement. Classification of cardiac images into classes of normal and TOF subjects performed on 3D and 4D models showed 100% classification correctness rates for both normal and TOF subjects using k-Nearest Neighbor (k=1 or 3) classifier and the two strongest shape modes.
NASA Astrophysics Data System (ADS)
Romé, M.; Lepreti, F.; Maero, G.; Pozzoli, R.; Vecchio, A.; Carbone, V.
2013-03-01
Highly magnetized, pure electron plasmas confined in a Penning-Malmberg trap allow one to perform experiments on the two-dimensional (2D) fluid dynamics under conditions where non-ideal effects are almost negligible. Recent results on the freely decaying 2D turbulence obtained from experiments with electron plasmas performed in the Penning-Malmberg trap ELTRAP are presented. The analysis has been applied to experimental sequences with different types of initial density distributions. The dynamical properties of the system have been investigated by means of wavelet transforms and Proper Orthogonal Decomposition (POD). The wavelet analysis shows that most of the enstrophy is contained at spatial scales corresponding to the typical size of the persistent vortices in the 2D electron plasma flow. The POD analysis allows one to identify the coherent structures which give the dominant contribution to the plasma evolution. The statistical properties of the turbulence have been investigated by means of Probability Density Functions (PDFs) and structure functions of spatial vorticity increments. The analysis evidences how the shape and evolution of the dominant coherent structures and the intermittency properties of the turbulence strongly depend on the initial conditions for the electron density.
Inverse Ising inference with correlated samples
NASA Astrophysics Data System (ADS)
Obermayer, Benedikt; Levine, Erel
2014-12-01
Correlations between two variables of a high-dimensional system can be indicative of an underlying interaction, but can also result from indirect effects. Inverse Ising inference is a method to distinguish one from the other. Essentially, the parameters of the least constrained statistical model are learned from the observed correlations such that direct interactions can be separated from indirect correlations. Among many other applications, this approach has been helpful for protein structure prediction, because residues which interact in the 3D structure often show correlated substitutions in a multiple sequence alignment. In this context, samples used for inference are not independent but share an evolutionary history on a phylogenetic tree. Here, we discuss the effects of correlations between samples on global inference. Such correlations could arise due to phylogeny but also via other slow dynamical processes. We present a simple analytical model to address the resulting inference biases, and develop an exact method accounting for background correlations in alignment data by combining phylogenetic modeling with an adaptive cluster expansion algorithm. We find that popular reweighting schemes are only marginally effective at removing phylogenetic bias, suggest a rescaling strategy that yields better results, and provide evidence that our conclusions carry over to the frequently used mean-field approach to the inverse Ising problem.
NASA Astrophysics Data System (ADS)
Nair, Anish Kumar M.; Rajeev, Kunjukrishnapillai
2012-07-01
Long-term (2006-2011) monthly and seasonal mean vertical distributions of clouds and their spatial variations over the Indian subcontinent and surrounding oceanic regions have been derived using data obtained from the space-borne radar, CloudSat. Together with the data from space-borne imagers (Kalpana-1-VHRR and NOAA-AVHRR), this provide insight into the 3-dimensional distribution of clouds and its linkage with dominant tropical dynamical features, which are largely unexplored over the Indian region. Meridonal cross sections of ITCZ, inferred from the vertical distribution of clouds, clearly reveal the relatively narrow structure of ITCZ flanked by thick cirrus outflows in the upper troposphere on either side. The base of cirrus clouds in the outflow region significantly increases away from the ITCZ core, while the corresponding variations in cirrus top is negligible, resulting in considerable thinning of cirrus away from the ITCZ. This provides direct observational evidence for the infrared radiative heating at cloud base and its role in regulating the cirrus lifetime through sublimation. On average, the frequency of occurrence of clouds rapidly decreases with altitude in the altitude band of 12-14 km, which corresponds to the convective tropopause altitude. North-south inclination and east-west asymmetry of ITCZ during the winter season are distinctly clear in the vertical distribution of clouds, which provide information on the pathways for inter-hemispheric transport over the Indian Ocean during this season. During the Asian summer monsoon season (June-September), substantial amount of deep convective clouds are found to occur over the North Bay of Bengal, extending up to an altitude of >14 km, which is ~1-2 km higher than that over other deep convective regions. This has potential implications in the pumping of tropospheric airmass across the tropical tropopause over the region. This study characterizes a pool of inhibited cloudiness over the southwest Bay of
van IJsseldijk, E. A.; Valstar, E. R.; Stoel, B. C.; Nelissen, R. G. H. H.; Baka, N.; van’t Klooster, R.
2016-01-01
t Klooster, B. L. Kaptein. Three dimensional measurement of minimum joint space width in the knee from stereo radiographs using statistical shape models. Bone Joint Res 2016;320–327. DOI: 10.1302/2046-3758.58.2000626. PMID:27491660
NASA Technical Reports Server (NTRS)
Boardman, J. W.; Pieters, C. M.; Green, R. O.; Clark, R. N.; Sunshine, J.; Combe, J.-P.; Isaacson, P.; Lundeen, S. R.; Malaret, E.; McCord, T.; Nettles, J.; Petro, N. E.; Varanasi, P.; Taylor, L.
2010-01-01
The Moon Mineralogy Mapper (M3), a NASA Discovery Mission of Opportunity, was launched October 22, 2008 from Shriharikota in India on board the Indian ISRO Chandrayaan- 1 spacecraft for a nominal two-year mission in a 100-km polar lunar orbit. M3 is a high-fidelity imaging spectrometer with 260 spectral bands in Target Mode and 85 spectral bands in a reduced-resolution Global Mode. Target Mode pixel sizes are nominally 70 meters and Global pixels (binned 2 by 2) are 140 meters, from the planned 100-km orbit. The mission was cut short, just before halfway, in August, 2009 when the spacecraft ceased operations. Despite the abbreviated mission and numerous technical and scientific challenges during the flight, M3 was able to cover more than 95% of the Moon in Global Mode. These data, presented and analyzed here as a global whole, are revolutionizing our understanding of the Moon. Already, numerous discoveries relating to volatiles and unexpected mineralogy have been published [1], [2], [3]. The rich spectral and spatial information content of the M3 data indicates that many more discoveries and an improved understanding of the mineralogy, geology, photometry, thermal regime and volatile status of our nearest neighbor are forthcoming from these data. Sadly, only minimal high-resolution Target Mode images were acquired, as these were to be the focus of the second half of the mission. This abstract gives the reader a global overview of all the M3 data that were collected and an introduction to their rich spectral character and complexity. We employ a Principal Components statistical method to assess the underlying dimensionality of the Moon as a whole, as seen by M3, and to identify numerous areas that are low-probability targets and thus of potential interest to selenologists.
NASA Astrophysics Data System (ADS)
Eslamizadeh, H.
2017-02-01
Evaporation residue cross section, fission probability, anisotropy of fission fragment angular distribution, mass and energy distributions of fission fragments and the pre-scission neutron multiplicity for the excited compound nuclei {}168{{Y}}{{b}}, {}172{{Y}}{{b}}, {}178{{W}} and {}227{{P}}{{a}} produced in fusion reactions have been calculated in the framework of the modified statistical model and multidimensional dynamical model. In the dynamical calculations, the dynamics of fission of excited nuclei has been studied by solving three- and four-dimensional Langevin equations with dissipation generated through the chaos-weighted wall and window friction formula. Three collective shape coordinates plus the projection of total spin of the compound nucleus to the symmetry axis, K, were considered in the four-dimensional dynamical model. A non-constant dissipation coefficient of K, {γ }k, was applied in the four-dimensional dynamical calculations. A comparison of the results of the three- and four-dimensional dynamical models with the experimental data showed that the results of the four-dimensional dynamical model for the evaporation residue cross section, fission probability, anisotropy of fission fragment angular distribution, mass and energy distributions of fission fragments and the pre-scission neutron multiplicity are in better agreement with the experimental data. It was also shown that the modified statistical model can reproduce the above-mentioned experimental data by choosing appropriate values of the temperature coefficient of the effective potential, λ , and the scaling factor of the fission-barrier height, {r}s.
Bayesian Inference of Galaxy Morphology
NASA Astrophysics Data System (ADS)
Yoon, Ilsang; Weinberg, M.; Katz, N.
2011-01-01
Reliable inference on galaxy morphology from quantitative analysis of ensemble galaxy images is challenging but essential ingredient in studying galaxy formation and evolution, utilizing current and forthcoming large scale surveys. To put galaxy image decomposition problem in broader context of statistical inference problem and derive a rigorous statistical confidence levels of the inference, I developed a novel galaxy image decomposition tool, GALPHAT (GALaxy PHotometric ATtributes) that exploits recent developments in Bayesian computation to provide full posterior probability distributions and reliable confidence intervals for all parameters. I will highlight the significant improvements in galaxy image decomposition using GALPHAT, over the conventional model fitting algorithms and introduce the GALPHAT potential to infer the statistical distribution of galaxy morphological structures, using ensemble posteriors of galaxy morphological parameters from the entire galaxy population that one studies.
Chang Liyun; Ho, S.-Y.; Chui, C.-S.; Lee, J.-H.; Du Yichun; Chen Tainsong
2008-06-15
We propose a new method based on statistical analysis technique to determine the minimum setup distance of a well chamber used in the calibration of {sup 192}Ir high dose rate (HDR). The chamber should be placed at least this distance away from any wall or from the floor in order to mitigate the effect of scatter. Three different chambers were included in this study, namely, Sun Nuclear Corporation, Nucletron, and Standard Imaging. The results from this study indicated that the minimum setup distance varies depending on the particular chamber and the room architecture in which the chamber was used. Our result differs from that of a previous study by Podgorsak et al. [Med. Phys. 19, 1311-1314 (1992)], in which 25 cm was suggested, and also differs from that of the International Atomic Energy Agency (IAEA)-TECDOC-1079 report, which suggested 30 cm. The new method proposed in this study may be considered as an alternative approach to determine the minimum setup distance of a well-type chamber used in the calibration of {sup 192}Ir HDR.
Osnes, J.D. ); Winberg, A.; Andersson, J.E.; Larsson, N.A. )
1991-09-27
Statistical and probabilistic methods for estimating the probability that a fracture is nonconductive (or equivalently, the conductive-fracture frequency) and the distribution of the transmissivities of conductive fractures from transmissivity measurements made in single-hole injection (well) tests were developed. These methods were applied to a database consisting of over 1,000 measurements made in nearly 25 km of borehole at five sites in Sweden. The depths of the measurements ranged from near the surface to over 600-m deep, and packer spacings of 20- and 25-m were used. A probabilistic model that describes the distribution of a series of transmissivity measurements was derived. When the parameters of this model were estimated using maximum likelihood estimators, the resulting estimated distributions generally fit the cumulative histograms of the transmissivity measurements very well. Further, estimates of the mean transmissivity of conductive fractures based on the maximum likelihood estimates of the model's parameters were reasonable, both in magnitude and in trend, with respect to depth. The estimates of the conductive fracture probability were generated in the range of 0.5--5.0 percent, with the higher values at shallow depths and with increasingly smaller values as depth increased. An estimation procedure based on the probabilistic model and the maximum likelihood estimators of its parameters was recommended. Some guidelines regarding the design of injection test programs were drawn from the recommended estimation procedure and the parameter estimates based on the Swedish data. 24 refs., 12 figs., 14 tabs.
Thorlund, Kristian; Wetterslev, Jørn; Awad, Tahany; Thabane, Lehana; Gluud, Christian
2011-12-01
In random-effects model meta-analysis, the conventional DerSimonian-Laird (DL) estimator typically underestimates the between-trial variance. Alternative variance estimators have been proposed to address this bias. This study aims to empirically compare statistical inferences from random-effects model meta-analyses on the basis of the DL estimator and four alternative estimators, as well as distributional assumptions (normal distribution and t-distribution) about the pooled intervention effect. We evaluated the discrepancies of p-values, 95% confidence intervals (CIs) in statistically significant meta-analyses, and the degree (percentage) of statistical heterogeneity (e.g. I(2)) across 920 Cochrane primary outcome meta-analyses. In total, 414 of the 920 meta-analyses were statistically significant with the DL meta-analysis, and 506 were not. Compared with the DL estimator, the four alternative estimators yielded p-values and CIs that could be interpreted as discordant in up to 11.6% or 6% of the included meta-analyses pending whether a normal distribution or a t-distribution of the intervention effect estimates were assumed. Large discrepancies were observed for the measures of degree of heterogeneity when comparing DL with each of the four alternative estimators. Estimating the degree (percentage) of heterogeneity on the basis of less biased between-trial variance estimators seems preferable to current practice. Disclosing inferential sensitivity of p-values and CIs may also be necessary when borderline significant results have substantial impact on the conclusion. Copyright © 2012 John Wiley & Sons, Ltd.
NASA Technical Reports Server (NTRS)
da Silva, Arlindo M.; Norris, Peter M.
2013-01-01
Part I presented a Monte Carlo Bayesian method for constraining a complex statistical model of GCM sub-gridcolumn moisture variability using high-resolution MODIS cloud data, thereby permitting large-scale model parameter estimation and cloud data assimilation. This part performs some basic testing of this new approach, verifying that it does indeed significantly reduce mean and standard deviation biases with respect to the assimilated MODIS cloud optical depth, brightness temperature and cloud top pressure, and that it also improves the simulated rotational-Ramman scattering cloud optical centroid pressure (OCP) against independent (non-assimilated) retrievals from the OMI instrument. Of particular interest, the Monte Carlo method does show skill in the especially difficult case where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach allows finite jumps into regions of non-zero cloud probability. In the example provided, the method is able to restore marine stratocumulus near the Californian coast where the background state has a clear swath. This paper also examines a number of algorithmic and physical sensitivities of the new method and provides guidance for its cost-effective implementation. One obvious difficulty for the method, and other cloud data assimilation methods as well, is the lack of information content in the cloud observables on cloud vertical structure, beyond cloud top pressure and optical thickness, thus necessitating strong dependence on the background vertical moisture structure. It is found that a simple flow-dependent correlation modification due to Riishojgaard (1998) provides some help in this respect, by better honoring inversion structures in the background state.
Application of Transformations in Parametric Inference
ERIC Educational Resources Information Center
Brownstein, Naomi; Pensky, Marianna
2008-01-01
The objective of the present paper is to provide a simple approach to statistical inference using the method of transformations of variables. We demonstrate performance of this powerful tool on examples of constructions of various estimation procedures, hypothesis testing, Bayes analysis and statistical inference for the stress-strength systems.…
Bayesian Inference: with ecological applications
Link, William A.; Barker, Richard J.
2010-01-01
This text provides a mathematically rigorous yet accessible and engaging introduction to Bayesian inference with relevant examples that will be of interest to biologists working in the fields of ecology, wildlife management and environmental studies as well as students in advanced undergraduate statistics.. This text opens the door to Bayesian inference, taking advantage of modern computational efficiencies and easily accessible software to evaluate complex hierarchical models.
NASA Technical Reports Server (NTRS)
Varnai, Tamas; Marshak, Alexander
2000-01-01
This paper presents a simple approach to estimate the uncertainties that arise in satellite retrievals of cloud optical depth when the retrievals use one-dimensional radiative transfer theory for heterogeneous clouds that have variations in all three dimensions. For the first time, preliminary error bounds are set to estimate the uncertainty of cloud optical depth retrievals. These estimates can help us better understand the nature of uncertainties that three-dimensional effects can introduce into retrievals of this important product of the MODIS instrument. The probability distribution of resulting retrieval errors is examined through theoretical simulations of shortwave cloud reflection for a wide variety of cloud fields. The results are used to illustrate how retrieval uncertainties change with observable and known parameters, such as solar elevation or cloud brightness. Furthermore, the results indicate that a tendency observed in an earlier study, clouds appearing thicker for oblique sun, is indeed caused by three-dimensional radiative effects.
ERIC Educational Resources Information Center
Douglas, Jeff; Kim, Hae-Rim; Roussos, Louis; Stout, William; Zhang, Jinming
An extensive nonparametric dimensionality analysis of latent structure was conducted on three forms of the Law School Admission Test (LSAT) (December 1991, June 1992, and October 1992) using the DIMTEST model in confirmatory analyses and using DIMTEST, FAC, DETECT, HCA, PROX, and a genetic algorithm in exploratory analyses. Results indicate that…
Correlation techniques and measurements of wave-height statistics
NASA Technical Reports Server (NTRS)
Guthart, H.; Taylor, W. C.; Graf, K. A.; Douglas, D. G.
1972-01-01
Statistical measurements of wave height fluctuations have been made in a wind wave tank. The power spectral density function of temporal wave height fluctuations evidenced second-harmonic components and an f to the minus 5th power law decay beyond the second harmonic. The observations of second harmonic effects agreed very well with a theoretical prediction. From the wave statistics, surface drift currents were inferred and compared to experimental measurements with satisfactory agreement. Measurements were made of the two dimensional correlation coefficient at 15 deg increments in angle with respect to the wind vector. An estimate of the two-dimensional spatial power spectral density function was also made.
NASA Technical Reports Server (NTRS)
Stone, Peter H.; Yao, Mao-Sung
1987-01-01
The role of eddy momentum fluxes in the general circulation was investigated using a two-dimensional zonally averaged statistical-dynamical model described by Yao and Stone (1987), which is almost two orders of magnitude faster than the three-dimensional climate model of Hansen et al. (1983). Results show that the vertical structure of the meridional eddy flux has relatively little impact on the general circulation, presumably because the vertical structure is strongly constrained by the thermal wind relation and surface friction. On the other hand, it was found that, in order to simulate accurately the general circulation and its response to climate changes, parameterization of the vertically integrated meridional eddy flux of angular momentum is necessary. A new parameterization of this eddy momentum transport was carried out, which is intended to represent the transport due to large-scale transient eddies arising from baroclinic instability.
Using Alien Coins to Test Whether Simple Inference Is Bayesian
ERIC Educational Resources Information Center
Cassey, Peter; Hawkins, Guy E.; Donkin, Chris; Brown, Scott D.
2016-01-01
Reasoning and inference are well-studied aspects of basic cognition that have been explained as statistically optimal Bayesian inference. Using a simplified experimental design, we conducted quantitative comparisons between Bayesian inference and human inference at the level of individuals. In 3 experiments, with more than 13,000 participants, we…
Nonparametric inference of network structure and dynamics
NASA Astrophysics Data System (ADS)
Peixoto, Tiago P.
The network structure of complex systems determine their function and serve as evidence for the evolutionary mechanisms that lie behind them. Despite considerable effort in recent years, it remains an open challenge to formulate general descriptions of the large-scale structure of network systems, and how to reliably extract such information from data. Although many approaches have been proposed, few methods attempt to gauge the statistical significance of the uncovered structures, and hence the majority cannot reliably separate actual structure from stochastic fluctuations. Due to the sheer size and high-dimensionality of many networks, this represents a major limitation that prevents meaningful interpretations of the results obtained with such nonstatistical methods. In this talk, I will show how these issues can be tackled in a principled and efficient fashion by formulating appropriate generative models of network structure that can have their parameters inferred from data. By employing a Bayesian description of such models, the inference can be performed in a nonparametric fashion, that does not require any a priori knowledge or ad hoc assumptions about the data. I will show how this approach can be used to perform model comparison, and how hierarchical models yield the most appropriate trade-off between model complexity and quality of fit based on the statistical evidence present in the data. I will also show how this general approach can be elegantly extended to networks with edge attributes, that are embedded in latent spaces, and that change in time. The latter is obtained via a fully dynamic generative network model, based on arbitrary-order Markov chains, that can also be inferred in a nonparametric fashion. Throughout the talk I will illustrate the application of the methods with many empirical networks such as the internet at the autonomous systems level, the global airport network, the network of actors and films, social networks, citations among
Statistical Inference of the Be Star Periodicity
NASA Astrophysics Data System (ADS)
Hubert, A. M.
2007-03-01
A review of periodicities with different timescales found in photometric and line profile variability of Be stars from visual, UV, and X-ray observations is presented. A distinction is made between what are called "stable" periods, "transient" periods and cyclic variations. Firstly, the report focuses on the intrinsic variability of Be stars. I attempt to distinguish between variations due to non-radial pulsations and those due to corotating magnetic structures at or near the stellar surface. As the rotational period is a critical value for selection of processes giving rise to short-term periodicities, estimates provided by rotational modulation attributes are compared with those derived from fundamental parameters. Then, I give an overview of new spatial instrumentation, which will improve our understanding of the origin and nature of Be stars through analyses of their light curves. Secondly, periodicities with longer time scales are considered. These periods are associated with binary systems. Though binarity is irrelevant to the Be phenomenon in many cases, it cannot be excluded that a significant number of Be stars are formed through binary processes. Thus, periodicities linked to mass transfer or orbital effects are analyzed in an evolutionary scheme. Thirdly, long-term cyclic variations that reflect the dynamical state of Be star disks are briefly reviewed.
Statistical Inference for Cultural Consensus Theory
2014-02-24
Facts on Aging Quiz ”. Covariates were also incorporated into the model, and it was shown that a consensus answer key that corresponded to the...Structural balance: a generalization of Heider’s theory. Psychological Review, 63, 277-293. Palmore, E. (1998). Facts on aging quiz , second edition
Statistical inference of static analysis rules
NASA Technical Reports Server (NTRS)
Engler, Dawson Richards (Inventor)
2009-01-01
Various apparatus and methods are disclosed for identifying errors in program code. Respective numbers of observances of at least one correctness rule by different code instances that relate to the at least one correctness rule are counted in the program code. Each code instance has an associated counted number of observances of the correctness rule by the code instance. Also counted are respective numbers of violations of the correctness rule by different code instances that relate to the correctness rule. Each code instance has an associated counted number of violations of the correctness rule by the code instance. A respective likelihood of the validity is determined for each code instance as a function of the counted number of observances and counted number of violations. The likelihood of validity indicates a relative likelihood that a related code instance is required to observe the correctness rule. The violations may be output in order of the likelihood of validity of a violated correctness rule.
Evaluation and statistical inference for human connectomes.
Pestilli, Franco; Yeatman, Jason D; Rokem, Ariel; Kay, Kendrick N; Wandell, Brian A
2014-10-01
Diffusion-weighted imaging coupled with tractography is currently the only method for in vivo mapping of human white-matter fascicles. Tractography takes diffusion measurements as input and produces the connectome, a large collection of white-matter fascicles, as output. We introduce a method to evaluate the evidence supporting connectomes. Linear fascicle evaluation (LiFE) takes any connectome as input and predicts diffusion measurements as output, using the difference between the measured and predicted diffusion signals to quantify the prediction error. We use the prediction error to evaluate the evidence that supports the properties of the connectome, to compare tractography algorithms and to test hypotheses about tracts and connections.
Statistical Manual. Methods of Making Experimental Inferences
1951-06-01
13683 12540 1-1665 10970 2-0401 120 39201 3 0718 1-6801 1447V 1-1000 2 17&0 2-0867 10164 10688 00 18415 29957 16049 13719 22141 1-0980 loot« 19184...37874 »7*49 13 :i L’IIIT 3 -I532 3 0527 29477 2 8932 28373 27797 27*04 26890 *S966 14 :i < tiiii SMO) 29493 28437 2 7X88 27324 26742 2614* 28619 14871
Statistical inference for capture-recapture experiments
Pollock, Kenneth H.; Nichols, James D.; Brownie, Cavell; Hines, James E.
1990-01-01
This monograph presents a detailed, practical exposition on the design, analysis, and interpretation of capture-recapture studies. The Lincoln-Petersen model (Chapter 2) and the closed population models (Chapter 3) are presented only briefly because these models have been covered in detail elsewhere. The Jolly- Seber open population model, which is central to the monograph, is covered in detail in Chapter 4. In Chapter 5 we consider the "enumeration" or "calendar of captures" approach, which is widely used by mammalogists and other vertebrate ecologists. We strongly recommend that it be abandoned in favor of analyses based on the Jolly-Seber model. We consider 2 restricted versions of the Jolly-Seber model. We believe the first of these, which allows losses (mortality or emigration) but not additions (births or immigration), is likely to be useful in practice. Another series of restrictive models requires the assumptions of a constant survival rate or a constant survival rate and a constant capture rate for the duration of the study. Detailed examples are given that illustrate the usefulness of these restrictions. There often can be a substantial gain in precision over Jolly-Seber estimates. In Chapter 5 we also consider 2 generalizations of the Jolly-Seber model. The temporary trap response model allows newly marked animals to have different survival and capture rates for 1 period. The other generalization is the cohort Jolly-Seber model. Ideally all animals would be marked as young, and age effects considered by using the Jolly-Seber model on each cohort separately. In Chapter 6 we present a detailed description of an age-dependent Jolly-Seber model, which can be used when 2 or more identifiable age classes are marked. In Chapter 7 we present a detailed description of the "robust" design. Under this design each primary period contains several secondary sampling periods. We propose an estimation procedure based on closed and open population models that allows for heterogeneity and trap response of capture rates (hence the name robust design). We begin by considering just 1 age class and then extend to 2 age classes. When there are 2 age classes it is possible to distinguish immigrants and births. In Chapter 8 we give a detailed discussion of the design of capture-recapture studies. First, capture-recapture is compared to other possible sampling procedures. Next, the design of capture-recapture studies to minimize assumption violations is considered. Finally, we consider the precision of parameter estimates and present figures on proportional standard errors for a variety of initial parameter values to aid the biologist about to plan a study. A new program, JOLLY, has been written to accompany the material on the Jolly-Seber model (Chapter 4) and its extensions (Chapter 5). Another new program, JOLLYAGE, has been written for a special case of the age-dependent model (Chapter 6) where there are only 2 age classes. In Chapter 9 a brief description of the different versions of the 2 programs is given. Chapter 10 gives a brief description of some alternative approaches that were not considered in this monograph. We believe that an excellent overall view of capture- recapture models may be obtained by reading the monograph by White et al. (1982) emphasizing closed models and then reading this monograph where we concentrate on open models. The important recent monograph by Burnham et al. (1987) could then be read if there were interest in the comparison of different populations.
Alper, Kenneth; Raghavan, Manoj; Isenhart, Robert; Howard, Bryant; Doyle, Werner; John, Roy; Prichep, Leslie
2008-02-01
This preliminary study sought to localize epileptogenic regions in patients with partial epilepsy by analysis of interictal EEG activity utilizing variable resolution electromagnetic tomography (VARETA), a three-dimensional quantitative electroencephalographic (QEEG) frequency-domain distributed source modeling technique. The very narrow band (VNB) spectra spanned the frequency range 0.39 Hz to 19.1 Hz, in 0.39 Hz steps. These VNB spectra were compared to normative data and transformed to provide Z-scores for every scalp derivation, and the spatial distributions of the probable EEG generators of the most abnormal values were displayed on slices from a probabilistic MRI atlas. Each voxel was color-coded to represent the significance of the deviation relative to age appropriate normative values. We compared the resulting three-dimensional images to the localization of epileptogenic regions based on invasive intracranial EEG recordings of seizure onsets. The VARETA image indicated abnormal interictal spectral power values in regions of seizure onset identified by invasive monitoring, mainly in delta and theta range (1.5 to 8.0 Hz). The VARETA localization of the most abnormal voxel was congruent with the epileptogenic regions identified by intracranial recordings with regard to hemisphere in all 6 cases, and with regard to lobe in 5 cases. In contrast, abnormal findings with routine EEG agreed with invasive monitoring with regard to hemisphere in 3 cases and with regard to lobe in 2 cases. These results suggest that analysis of background interictal EEG utilizing distributed source models should be investigated further in clinical epilepsy.
Confidence set inference with a prior quadratic bound
NASA Technical Reports Server (NTRS)
Backus, George E.
1989-01-01
In the uniqueness part of a geophysical inverse problem, the observer wants to predict all likely values of P unknown numerical properties z=(z sub 1,...,z sub p) of the earth from measurement of D other numerical properties y (sup 0) = (y (sub 1) (sup 0), ..., y (sub D (sup 0)), using full or partial knowledge of the statistical distribution of the random errors in y (sup 0). The data space Y containing y(sup 0) is D-dimensional, so when the model space X is infinite-dimensional the linear uniqueness problem usually is insoluble without prior information about the correct earth model x. If that information is a quadratic bound on x, Bayesian inference (BI) and stochastic inversion (SI) inject spurious structure into x, implied by neither the data nor the quadratic bound. Confidence set inference (CSI) provides an alternative inversion technique free of this objection. Confidence set inference is illustrated in the problem of estimating the geomagnetic field B at the core-mantle boundary (CMB) from components of B measured on or above the earth's surface.
NASA Technical Reports Server (NTRS)
Iacovazzi, Robert A., Jr.; Prabhakara, C.; Lau, William K. M. (Technical Monitor)
2001-01-01
In this study, a model is developed to estimate mesoscale-resolution atmospheric latent heating (ALH) profiles. It utilizes rain statistics deduced from Tropical Rainfall Measuring Mission (TRMM) data, and cloud vertical velocity profiles and regional surface thermodynamic climatologies derived from other available data sources. From several rain events observed over tropical ocean and land, ALH profiles retrieved by this model in convective rain regions reveal strong warming throughout most of the troposphere, while in stratiform rain regions they usually show slight cooling below the freezing level and significant warming above. The mesoscale-average, or total, ALH profiles reveal a dominant stratiform character, because stratiform rain areas are usually much larger than convective rain areas. Sensitivity tests of the model show that total ALH at a given tropospheric level varies by less than +/- 10 % when convective and stratiform rain rates and mesoscale fractional rain areas are perturbed individually by 1 15 %. This is also found when the non-uniform convective vertical velocity profiles are replaced by one that is uniform. Larger variability of the total ALH profiles arises when climatological ocean- and land-surface temperatures (water vapor mixing ratios) are independently perturbed by +/- 1.0 K (+/- 5 %) and +/- 5.0 K (+/- 15 %), respectively. At a given tropospheric level, such perturbations can cause a +/- 25 % variation of total ALH over ocean, and a factor-of-two sensitivity over land. This sensitivity is reduced substantially if perturbations of surface thermodynamic variables do not change surface relative humidity, or are not extended throughout the entire model evaporation layer. The ALH profiles retrieved in this study agree qualitatively with tropical total diabatic heating profiles deduced in earlier studies. Also, from January and July 1999 ALH-profile climatologies generated separately with TRMM Microwave Imager and Precipitation Radar rain
Manos, Thanos; Robnik, Marko
2015-04-01
We study the quantum kicked rotator in the classically fully chaotic regime K=10 and for various values of the quantum parameter k using Izrailev's N-dimensional model for various N≤3000, which in the limit N→∞ tends to the exact quantized kicked rotator. By numerically calculating the eigenfunctions in the basis of the angular momentum we find that the localization length L for fixed parameter values has a certain distribution; in fact, its inverse is Gaussian distributed, in analogy and in connection with the distribution of finite time Lyapunov exponents of Hamilton systems. However, unlike the case of the finite time Lyapunov exponents, this distribution is found to be independent of N and thus survives the limit N=∞. This is different from the tight-binding model of Anderson localization. The reason is that the finite bandwidth approximation of the underlying Hamilton dynamical system in the Shepelyansky picture [Phys. Rev. Lett. 56, 677 (1986)] does not apply rigorously. This observation explains the strong fluctuations in the scaling laws of the kicked rotator, such as the entropy localization measure as a function of the scaling parameter Λ=L/N, where L is the theoretical value of the localization length in the semiclassical approximation. These results call for a more refined theory of the localization length in the quantum kicked rotator and in similar Floquet systems, where we must predict not only the mean value of the inverse of the localization length L but also its (Gaussian) distribution, in particular the variance. In order to complete our studies we numerically analyze the related behavior of finite time Lyapunov exponents in the standard map and of the 2×2 transfer matrix formalism. This paper extends our recent work [Phys. Rev. E 87, 062905 (2013)].
NASA Astrophysics Data System (ADS)
Yenn Chong, See; Lee, Jung-Ryul; Yik Park, Chan
2013-03-01
Conventional threshold crossing technique generally encounters the difficulty in setting a common threshold level in the extraction of the respective time-of-flights (ToFs) and amplitudes from the guided waves obtained at many different points by spatial scanning. Therefore, we propose a statistical threshold determination method through noise map generation to automatically process numerous guided waves having different propagation distances. First, a two-dimensional (2-D) noise map is generated using one-dimensional (1-D) WT magnitudes at time zero of the acquired waves. Then, the probability density functions (PDFs) of Gamma distribution, Weibull distribution and exponential distribution are used to model the measured 2-D noise map. Graphical goodness-of-fit measurements are used to find the best fit among the three theoretical distributions. Then, the threshold level is automatically determined by selecting the desired confidence level of the noise rejection in the cumulative distribution function of the best fit PDF. Based on this threshold level, the amplitudes and ToFs are extracted and mapped into a 2-D matrix array form. The threshold level determined by the noise statistics may cross the noise signal after time zero. These crossings are represented as salt-and-pepper noise in the ToF and amplitude maps but finally removed by the 1-D median filter. This proposed method was verified in a thick stainless steel hollow cylinder where guided waves were acquired in an area of 180 mm×126 mm of the cylinder by using a laser ultrasonic scanning system and an ultrasonic sensor. The Gamma distribution was estimated as the best fit to the verification experimental data by the proposed algorithm. The statistical parameters of the Gamma distribution were used to determine the threshold level appropriate for most of the guided waves. The ToFs and amplitudes of the first arrival mode were mapped into a 2-D matrix array form. Each map included 447 noisy points out of 90
Guyonvarch, Estelle; Ramin, Elham; Kulahci, Murat; Plósz, Benedek Gy
2015-10-15
The present study aims at using statistically designed computational fluid dynamics (CFD) simulations as numerical experiments for the identification of one-dimensional (1-D) advection-dispersion models - computationally light tools, used e.g., as sub-models in systems analysis. The objective is to develop a new 1-D framework, referred to as interpreted CFD (iCFD) models, in which statistical meta-models are used to calculate the pseudo-dispersion coefficient (D) as a function of design and flow boundary conditions. The method - presented in a straightforward and transparent way - is illustrated using the example of a circular secondary settling tank (SST). First, the significant design and flow factors are screened out by applying the statistical method of two-level fractional factorial design of experiments. Second, based on the number of significant factors identified through the factor screening study and system understanding, 50 different sets of design and flow conditions are selected using Latin Hypercube Sampling (LHS). The boundary condition sets are imposed on a 2-D axi-symmetrical CFD simulation model of the SST. In the framework, to degenerate the 2-D model structure, CFD model outputs are approximated by the 1-D model through the calibration of three different model structures for D. Correlation equations for the D parameter then are identified as a function of the selected design and flow boundary conditions (meta-models), and their accuracy is evaluated against D values estimated in each numerical experiment. The evaluation and validation of the iCFD model structure is carried out using scenario simulation results obtained with parameters sampled from the corners of the LHS experimental region. For the studied SST, additional iCFD model development was carried out in terms of (i) assessing different density current sub-models; (ii) implementation of a combined flocculation, hindered, transient and compression settling velocity function; and (iii
BIE: Bayesian Inference Engine
NASA Astrophysics Data System (ADS)
Weinberg, Martin D.
2013-12-01
The Bayesian Inference Engine (BIE) is an object-oriented library of tools written in C++ designed explicitly to enable Bayesian update and model comparison for astronomical problems. To facilitate "what if" exploration, BIE provides a command line interface (written with Bison and Flex) to run input scripts. The output of the code is a simulation of the Bayesian posterior distribution from which summary statistics e.g. by taking moments, or determine confidence intervals and so forth, can be determined. All of these quantities are fundamentally integrals and the Markov Chain approach produces variates heta distributed according to P( heta|D) so moments are trivially obtained by summing of the ensemble of variates.
Strelioff, Christopher C; Crutchfield, James P; Hübler, Alfred W
2007-07-01
Markov chains are a natural and well understood tool for describing one-dimensional patterns in time or space. We show how to infer kth order Markov chains, for arbitrary k , from finite data by applying Bayesian methods to both parameter estimation and model-order selection. Extending existing results for multinomial models of discrete data, we connect inference to statistical mechanics through information-theoretic (type theory) techniques. We establish a direct relationship between Bayesian evidence and the partition function which allows for straightforward calculation of the expectation and variance of the conditional relative entropy and the source entropy rate. Finally, we introduce a method that uses finite data-size scaling with model-order comparison to infer the structure of out-of-class processes.
NASA Astrophysics Data System (ADS)
Rajabi, Mohammad Mahdi; Ataie-Ashtiani, Behzad
2016-05-01
Bayesian inference has traditionally been conceived as the proper framework for the formal incorporation of expert knowledge in parameter estimation of groundwater models. However, conventional Bayesian inference is incapable of taking into account the imprecision essentially embedded in expert provided information. In order to solve this problem, a number of extensions to conventional Bayesian inference have been introduced in recent years. One of these extensions is 'fuzzy Bayesian inference' which is the result of integrating fuzzy techniques into Bayesian statistics. Fuzzy Bayesian inference has a number of desirable features which makes it an attractive approach for incorporating expert knowledge in the parameter estimation process of groundwater models: (1) it is well adapted to the nature of expert provided information, (2) it allows to distinguishably model both uncertainty and imprecision, and (3) it presents a framework for fusing expert provided information regarding the various inputs of the Bayesian inference algorithm. However an important obstacle in employing fuzzy Bayesian inference in groundwater numerical modeling applications is the computational burden, as the required number of numerical model simulations often becomes extremely exhaustive and often computationally infeasible. In this paper, a novel approach of accelerating the fuzzy Bayesian inference algorithm is proposed which is based on using approximate posterior distributions derived from surrogate modeling, as a screening tool in the computations. The proposed approach is first applied to a synthetic test case of seawater intrusion (SWI) in a coastal aquifer. It is shown that for this synthetic test case, the proposed approach decreases the number of required numerical simulations by an order of magnitude. Then the proposed approach is applied to a real-world test case involving three-dimensional numerical modeling of SWI in Kish Island, located in the Persian Gulf. An expert
Pataky, Todd C; Robinson, Mark A; Vanrenterghem, Jos
2016-01-01
One-dimensional (1D) kinematic, force, and EMG trajectories are often analyzed using zero-dimensional (0D) metrics like local extrema. Recently whole-trajectory 1D methods have emerged in the literature as alternatives. Since 0D and 1D methods can yield qualitatively different results, the two approaches may appear to be theoretically distinct. The purposes of this paper were (a) to clarify that 0D and 1D approaches are actually just special cases of a more general region-of-interest (ROI) analysis framework, and (b) to demonstrate how ROIs can augment statistical power. We first simulated millions of smooth, random 1D datasets to validate theoretical predictions of the 0D, 1D and ROI approaches and to emphasize how ROIs provide a continuous bridge between 0D and 1D results. We then analyzed a variety of public datasets to demonstrate potential effects of ROIs on biomechanical conclusions. Results showed, first, that a priori ROI particulars can qualitatively affect the biomechanical conclusions that emerge from analyses and, second, that ROIs derived from exploratory/pilot analyses can detect smaller biomechanical effects than are detectable using full 1D methods. We recommend regarding ROIs, like data filtering particulars and Type I error rate, as parameters which can affect hypothesis testing results, and thus as sensitivity analysis tools to ensure arbitrary decisions do not influence scientific interpretations. Last, we describe open-source Python and MATLAB implementations of 1D ROI analysis for arbitrary experimental designs ranging from one-sample t tests to MANOVA.
Operation of the Bayes Inference Engine
Hanson, K.M.; Cunningham, G.S.
1998-07-27
The authors have developed a computer application, called the Bayes Inference Engine, to enable one to make inferences about models of a physical object from radiographs taken of it. In the BIE calculational models are represented by a data-flow diagram that can be manipulated by the analyst in a graphical-programming environment. The authors demonstrate the operation of the BIE in terms of examples of two-dimensional tomographic reconstruction including uncertainty estimation.
Using scientifically and statistically sufficient statistics in comparing image segmentations.
Chi, Yueh-Yun; Muller, Keith E
2010-01-01
Automatic computer segmentation in three dimensions creates opportunity to reduce the cost of three-dimensional treatment planning of radiotherapy for cancer treatment. Comparisons between human and computer accuracy in segmenting kidneys in CT scans generate distance values far larger in number than the number of CT scans. Such high dimension, low sample size (HDLSS) data present a grand challenge to statisticians: how do we find good estimates and make credible inference? We recommend discovering and using scientifically and statistically sufficient statistics as an additional strategy for overcoming the curse of dimensionality. First, we reduced the three-dimensional array of distances for each image comparison to a histogram to be modeled individually. Second, we used non-parametric kernel density estimation to explore distributional patterns and assess multi-modality. Third, a systematic exploratory search for parametric distributions and truncated variations led to choosing a Gaussian form as approximating the distribution of a cube root transformation of distance. Fourth, representing each histogram by an individually estimated distribution eliminated the HDLSS problem by reducing on average 26,000 distances per histogram to just 2 parameter estimates. In the fifth and final step we used classical statistical methods to demonstrate that the two human observers disagreed significantly less with each other than with the computer segmentation. Nevertheless, the size of all disagreements was clinically unimportant relative to the size of a kidney. The hierarchal modeling approach to object-oriented data created response variables deemed sufficient by both the scientists and statisticians. We believe the same strategy provides a useful addition to the imaging toolkit and will succeed with many other high throughput technologies in genetics, metabolomics and chemical analysis.
Kauweloa, Kevin I; Gutierrez, Alonso N; Stathakis, Sotirios; Papanikolaou, Niko; Mavroidis, Panayiotis
2016-07-01
A toolkit has been developed for calculating the 3-dimensional biological effective dose (BED) distributions in multi-phase, external beam radiotherapy treatments such as those applied in liver stereotactic body radiation therapy (SBRT) and in multi-prescription treatments. This toolkit also provides a wide range of statistical results related to dose and BED distributions. MATLAB 2010a, version 7.10 was used to create this GUI toolkit. The input data consist of the dose distribution matrices, organ contour coordinates, and treatment planning parameters from the treatment planning system (TPS). The toolkit has the capability of calculating the multi-phase BED distributions using different formulas (denoted as true and approximate). Following the calculations of the BED distributions, the dose and BED distributions can be viewed in different projections (e.g. coronal, sagittal and transverse). The different elements of this toolkit are presented and the important steps for the execution of its calculations are illustrated. The toolkit is applied on brain, head & neck and prostate cancer patients, who received primary and boost phases in order to demonstrate its capability in calculating BED distributions, as well as measuring the inaccuracy and imprecision of the approximate BED distributions. Finally, the clinical situations in which the use of the present toolkit would have a significant clinical impact are indicated.
NASA Astrophysics Data System (ADS)
Bocaniov, Serghei A.; Scavia, Donald
2016-06-01
Hypoxia or low bottom water dissolved oxygen (DO) is a world-wide problem of management concern requiring an understanding and ability to monitor and predict its spatial and temporal dynamics. However, this is often made difficult in large lakes and coastal oceans because of limited spatial and temporal coverage of field observations. We used a calibrated and validated three-dimensional ecological model of Lake Erie to extend a statistical relationship between hypoxic extent and bottom water DO concentrations to explore implications of the broader temporal and spatial development and dissipation of hypoxia. We provide the first numerical demonstration that hypoxia initiates in the nearshore, not the deep portion of the basin, and that the threshold used to define hypoxia matters in both spatial and temporal dynamics and in its sensitivity to climate. We show that existing monitoring programs likely underestimate both maximum hypoxic extent and the importance of low oxygen in the nearshore, discuss implications for ecosystem and drinking water protection, and recommend how these results could be used to efficiently and economically extend monitoring programs.
Statistical Methods for Astronomy
NASA Astrophysics Data System (ADS)
Feigelson, Eric D.; Babu, G. Jogesh
Statistical methodology, with deep roots in probability theory, providesquantitative procedures for extracting scientific knowledge from astronomical dataand for testing astrophysical theory. In recent decades, statistics has enormouslyincreased in scope and sophistication. After a historical perspective, this reviewoutlines concepts of mathematical statistics, elements of probability theory,hypothesis tests, and point estimation. Least squares, maximum likelihood, andBayesian approaches to statistical inference are outlined. Resampling methods,particularly the bootstrap, provide valuable procedures when distributionsfunctions of statistics are not known. Several approaches to model selection andgoodness of fit are considered.
Louarn, Gaëtan; Lecoeur, Jérémie; Lebon, Eric
2008-01-01
Background and Aims In grapevine, canopy-structure-related variations in light interception and distribution affect productivity, yield and the quality of the harvested product. A simple statistical model for reconstructing three-dimensional (3D) canopy structures for various cultivar–training system (C × T) pairs has been implemented with special attention paid to balance the time required for model parameterization and accuracy of the representations from organ to stand scales. Such an approach particularly aims at overcoming the weak integration of interplant variability using the usual direct 3D measurement methods. Model This model is original in combining a turbid-medium-like envelope enclosing the volume occupied by vine shoots with the use of discrete geometric polygons representing leaves randomly located within this volume to represent plant structure. Reconstruction rules were adapted to capture the main determinants of grapevine shoot architecture and their variability. Using a simplified set of parameters, it was possible to describe (1) the 3D path of the main shoot, (2) the volume occupied by the foliage around this path and (3) the orientation of individual leaf surfaces. Model parameterization (estimation of the probability distribution for each parameter) was carried out for eight contrasting C × T pairs. Key Results and Conclusions The parameter values obtained in each situation were consistent with our knowledge of grapevine architecture. Quantitative assessments for the generated virtual scenes were carried out at the canopy and plant scales. Light interception efficiency and local variations of light transmittance within and between experimental plots were correctly simulated for all canopies studied. The approach predicted these key ecophysiological variables significantly more accurately than the classical complete digitization method with a limited number of plants. In addition, this model accurately reproduced the characteristics of a
NASA Astrophysics Data System (ADS)
Goodman, Joseph W.
2000-07-01
The Wiley Classics Library consists of selected books that have become recognized classics in their respective fields. With these new unabridged and inexpensive editions, Wiley hopes to extend the life of these important works by making them available to future generations of mathematicians and scientists. Currently available in the Series: T. W. Anderson The Statistical Analysis of Time Series T. S. Arthanari & Yadolah Dodge Mathematical Programming in Statistics Emil Artin Geometric Algebra Norman T. J. Bailey The Elements of Stochastic Processes with Applications to the Natural Sciences Robert G. Bartle The Elements of Integration and Lebesgue Measure George E. P. Box & Norman R. Draper Evolutionary Operation: A Statistical Method for Process Improvement George E. P. Box & George C. Tiao Bayesian Inference in Statistical Analysis R. W. Carter Finite Groups of Lie Type: Conjugacy Classes and Complex Characters R. W. Carter Simple Groups of Lie Type William G. Cochran & Gertrude M. Cox Experimental Designs, Second Edition Richard Courant Differential and Integral Calculus, Volume I RIchard Courant Differential and Integral Calculus, Volume II Richard Courant & D. Hilbert Methods of Mathematical Physics, Volume I Richard Courant & D. Hilbert Methods of Mathematical Physics, Volume II D. R. Cox Planning of Experiments Harold S. M. Coxeter Introduction to Geometry, Second Edition Charles W. Curtis & Irving Reiner Representation Theory of Finite Groups and Associative Algebras Charles W. Curtis & Irving Reiner Methods of Representation Theory with Applications to Finite Groups and Orders, Volume I Charles W. Curtis & Irving Reiner Methods of Representation Theory with Applications to Finite Groups and Orders, Volume II Cuthbert Daniel Fitting Equations to Data: Computer Analysis of Multifactor Data, Second Edition Bruno de Finetti Theory of Probability, Volume I Bruno de Finetti Theory of Probability, Volume 2 W. Edwards Deming Sample Design in Business Research
Forward and Backward Inference in Spatial Cognition
Penny, Will D.; Zeidman, Peter; Burgess, Neil
2013-01-01
This paper shows that the various computations underlying spatial cognition can be implemented using statistical inference in a single probabilistic model. Inference is implemented using a common set of ‘lower-level’ computations involving forward and backward inference over time. For example, to estimate where you are in a known environment, forward inference is used to optimally combine location estimates from path integration with those from sensory input. To decide which way to turn to reach a goal, forward inference is used to compute the likelihood of reaching that goal under each option. To work out which environment you are in, forward inference is used to compute the likelihood of sensory observations under the different hypotheses. For reaching sensory goals that require a chaining together of decisions, forward inference can be used to compute a state trajectory that will lead to that goal, and backward inference to refine the route and estimate control signals that produce the required trajectory. We propose that these computations are reflected in recent findings of pattern replay in the mammalian brain. Specifically, that theta sequences reflect decision making, theta flickering reflects model selection, and remote replay reflects route and motor planning. We also propose a mapping of the above computational processes onto lateral and medial entorhinal cortex and hippocampus. PMID:24348230
Forward and backward inference in spatial cognition.
Penny, Will D; Zeidman, Peter; Burgess, Neil
2013-01-01
This paper shows that the various computations underlying spatial cognition can be implemented using statistical inference in a single probabilistic model. Inference is implemented using a common set of 'lower-level' computations involving forward and backward inference over time. For example, to estimate where you are in a known environment, forward inference is used to optimally combine location estimates from path integration with those from sensory input. To decide which way to turn to reach a goal, forward inference is used to compute the likelihood of reaching that goal under each option. To work out which environment you are in, forward inference is used to compute the likelihood of sensory observations under the different hypotheses. For reaching sensory goals that require a chaining together of decisions, forward inference can be used to compute a state trajectory that will lead to that goal, and backward inference to refine the route and estimate control signals that produce the required trajectory. We propose that these computations are reflected in recent findings of pattern replay in the mammalian brain. Specifically, that theta sequences reflect decision making, theta flickering reflects model selection, and remote replay reflects route and motor planning. We also propose a mapping of the above computational processes onto lateral and medial entorhinal cortex and hippocampus.
Causal inference based on counterfactuals
Höfler, M
2005-01-01
Background The counterfactual or potential outcome model has become increasingly standard for causal inference in epidemiological and medical studies. Discussion This paper provides an overview on the counterfactual and related approaches. A variety of conceptual as well as practical issues when estimating causal effects are reviewed. These include causal interactions, imperfect experiments, adjustment for confounding, time-varying exposures, competing risks and the probability of causation. It is argued that the counterfactual model of causal effects captures the main aspects of causality in health sciences and relates to many statistical procedures. Summary Counterfactuals are the basis of causal inference in medicine and epidemiology. Nevertheless, the estimation of counterfactual differences pose several difficulties, primarily in observational studies. These problems, however, reflect fundamental barriers only when learning from observations, and this does not invalidate the counterfactual concept. PMID:16159397
Network Plasticity as Bayesian Inference
Legenstein, Robert; Maass, Wolfgang
2015-01-01
General results from statistical learning theory suggest to understand not only brain computations, but also brain plasticity as probabilistic inference. But a model for that has been missing. We propose that inherently stochastic features of synaptic plasticity and spine motility enable cortical networks of neurons to carry out probabilistic inference by sampling from a posterior distribution of network configurations. This model provides a viable alternative to existing models that propose convergence of parameters to maximum likelihood values. It explains how priors on weight distributions and connection probabilities can be merged optimally with learned experience, how cortical networks can generalize learned information so well to novel experiences, and how they can compensate continuously for unforeseen disturbances of the network. The resulting new theory of network plasticity explains from a functional perspective a number of experimental data on stochastic aspects of synaptic plasticity that previously appeared to be quite puzzling. PMID:26545099
Bayesian Inference on Proportional Elections
Brunello, Gabriel Hideki Vatanabe; Nakano, Eduardo Yoshio
2015-01-01
Polls for majoritarian voting systems usually show estimates of the percentage of votes for each candidate. However, proportional vote systems do not necessarily guarantee the candidate with the most percentage of votes will be elected. Thus, traditional methods used in majoritarian elections cannot be applied on proportional elections. In this context, the purpose of this paper was to perform a Bayesian inference on proportional elections considering the Brazilian system of seats distribution. More specifically, a methodology to answer the probability that a given party will have representation on the chamber of deputies was developed. Inferences were made on a Bayesian scenario using the Monte Carlo simulation technique, and the developed methodology was applied on data from the Brazilian elections for Members of the Legislative Assembly and Federal Chamber of Deputies in 2010. A performance rate was also presented to evaluate the efficiency of the methodology. Calculations and simulations were carried out using the free R statistical software. PMID:25786259
FUNSTAT and statistical image representations
NASA Technical Reports Server (NTRS)
Parzen, E.
1983-01-01
General ideas of functional statistical inference analysis of one sample and two samples, univariate and bivariate are outlined. ONESAM program is applied to analyze the univariate probability distributions of multi-spectral image data.
Multiple Instance Fuzzy Inference
2015-12-02
INFERENCE A novel fuzzy learning framework that employs fuzzy inference to solve the problem of multiple instance learning (MIL) is presented. The...fuzzy learning framework that employs fuzzy inference to solve the problem of multiple instance learning (MIL) is presented. The framework introduces a...or learned from data. In multiple instance problems, the training data is ambiguously labeled. Instances are grouped into bags, labels of bags are
Inference of Internal Stress in a Cell Monolayer
Nier, Vincent; Jain, Shreyansh; Lim, Chwee Teck; Ishihara, Shuji; Ladoux, Benoit; Marcq, Philippe
2016-01-01
We combine traction force data with Bayesian inversion to obtain an absolute estimate of the internal stress field of a cell monolayer. The method, Bayesian inversion stress microscopy, is validated using numerical simulations performed in a wide range of conditions. It is robust to changes in each ingredient of the underlying statistical model. Importantly, its accuracy does not depend on the rheology of the tissue. We apply Bayesian inversion stress microscopy to experimental traction force data measured in a narrow ring of cohesive epithelial cells, and check that the inferred stress field coincides with that obtained by direct spatial integration of the traction force data in this quasi one-dimensional geometry. PMID:27074687
Confidence set inference with a prior quadratic bound
NASA Technical Reports Server (NTRS)
Backus, George E.
1988-01-01
In the uniqueness part of a geophysical inverse problem, the observer wants to predict all likely values of P unknown numerical properties z = (z sub 1,...,z sub p) of the earth from measurement of D other numerical properties y(0)=(y sub 1(0),...,y sub D(0)) knowledge of the statistical distribution of the random errors in y(0). The data space Y containing y(0) is D-dimensional, so when the model space X is infinite-dimensional the linear uniqueness problem usually is insoluble without prior information about the correct earth model x. If that information is a quadratic bound on x (e.g., energy or dissipation rate), Bayesian inference (BI) and stochastic inversion (SI) inject spurious structure into x, implied by neither the data nor the quadratic bound. Confidence set inference (CSI) provides an alternative inversion technique free of this objection. CSI is illustrated in the problem of estimating the geomagnetic field B at the core-mantle boundary (CMB) from components of B measured on or above the earth's surface. Neither the heat flow nor the energy bound is strong enough to permit estimation of B(r) at single points on the CMB, but the heat flow bound permits estimation of uniform averages of B(r) over discs on the CMB, and both bounds permit weighted disc-averages with continous weighting kernels. Both bounds also permit estimation of low-degree Gauss coefficients at the CMB. The heat flow bound resolves them up to degree 8 if the crustal field at satellite altitudes must be treated as a systematic error, but can resolve to degree 11 under the most favorable statistical treatment of the crust. These two limits produce circles of confusion on the CMB with diameters of 25 deg and 19 deg respectively.
NASA Astrophysics Data System (ADS)
Balachandran, Prasanna V.; Xue, Dezhen; Lookman, Turab
2016-04-01
One of the key impediments to the development of BaTiO3-based materials as candidates to replace toxic-Pb-based solid solutions is their relatively low ferroelectric Curie temperature (TC). Among many potential routes that are available to modify TC, ionic substitutions at the Ba and Ti sites remain the most common approach. Here, we perform density functional theory (DFT) calculations on a series of A TiO3 and Ba B O3 perovskites, where A =Ba , Ca, Sr, Pb, Cd, Sn, and Mg and B =Ti , Zr, Hf, and Sn. Our objective is to study the relative role of A and B cations in impacting the TC of the tetragonal (P 4 m m ) and rhombohedral (R 3 m ) ferroelectric phases in BaTiO3-based solid solutions, respectively. Using symmetry-mode analysis, we obtain a quantitative description of the relative contributions of various divalent (A ) and tetravalent (B ) cations to the ferroelectric distortions. Our results show that Ca, Pb, Cd, Sn, and Mg have large mode amplitudes for ferroelectric distortion in the tetragonal phase relative to Ba, whereas Sr suppresses the distortions. On the other hand, Zr, Hf, and Sn tetravalent cations severely suppress the ferroelectric distortion in the rhombohedral phase relative to Ti. In addition to symmetry modes, our calculated unit-cell volume also agrees with the experimental trends. We subsequently utilize the symmetry modes and unit-cell volumes as features within a machine learning approach to learn TC via an inference model and uncover trends that provide insights into the design of new high-TCBaTiO3 -based ferroelectrics. The inference model predicts CdTiO3-BaTiO3 solid solutions to have a higher TC and, therefore, we experimentally synthesized these solid solutions and measured their TC. Although the calculated mode strength for CdTiO3 in the tetragonal phase is even larger than that for PbTiO3, the TC of CdTiO3-BaTiO3 solid solutions in the tetragonal phase does not show any appreciable enhancement. Thus, CdTiO3-BaTiO3 does not follow the
Bayesian inference of the initial conditions from large-scale structure surveys
NASA Astrophysics Data System (ADS)
Leclercq, Florent
2016-10-01
Analysis of three-dimensional cosmological surveys has the potential to answer outstanding questions on the initial conditions from which structure appeared, and therefore on the very high energy physics at play in the early Universe. We report on recently proposed statistical data analysis methods designed to study the primordial large-scale structure via physical inference of the initial conditions in a fully Bayesian framework, and applications to the Sloan Digital Sky Survey data release 7. We illustrate how this approach led to a detailed characterization of the dynamic cosmic web underlying the observed galaxy distribution, based on the tidal environment.
Petrov, S.
1996-10-01
Languages with a solvable implication problem but without complete and consistent systems of inference rules (`poor` languages) are considered. The problem of existence of finite complete and consistent inference rule system for a ``poor`` language is stated independently of the language or rules syntax. Several properties of the problem arc proved. An application of results to the language of join dependencies is given.
An introduction to causal inference.
Pearl, Judea
2010-02-26
This paper summarizes recent advances in causal inference and underscores the paradigmatic shifts that must be undertaken in moving from traditional statistical analysis to causal analysis of multivariate data. Special emphasis is placed on the assumptions that underlie all causal inferences, the languages used in formulating those assumptions, the conditional nature of all causal and counterfactual claims, and the methods that have been developed for the assessment of such claims. These advances are illustrated using a general theory of causation based on the Structural Causal Model (SCM) described in Pearl (2000a), which subsumes and unifies other approaches to causation, and provides a coherent mathematical foundation for the analysis of causes and counterfactuals. In particular, the paper surveys the development of mathematical tools for inferring (from a combination of data and assumptions) answers to three types of causal queries: those about (1) the effects of potential interventions, (2) probabilities of counterfactuals, and (3) direct and indirect effects (also known as "mediation"). Finally, the paper defines the formal and conceptual relationships between the structural and potential-outcome frameworks and presents tools for a symbiotic analysis that uses the strong features of both. The tools are demonstrated in the analyses of mediation, causes of effects, and probabilities of causation.
Lessons from Inferentialism for Statistics Education
ERIC Educational Resources Information Center
Bakker, Arthur; Derry, Jan
2011-01-01
This theoretical paper relates recent interest in informal statistical inference (ISI) to the semantic theory termed inferentialism, a significant development in contemporary philosophy, which places inference at the heart of human knowing. This theory assists epistemological reflection on challenges in statistics education encountered when…
Campbell's and Rubin's Perspectives on Causal Inference
ERIC Educational Resources Information Center
West, Stephen G.; Thoemmes, Felix
2010-01-01
Donald Campbell's approach to causal inference (D. T. Campbell, 1957; W. R. Shadish, T. D. Cook, & D. T. Campbell, 2002) is widely used in psychology and education, whereas Donald Rubin's causal model (P. W. Holland, 1986; D. B. Rubin, 1974, 2005) is widely used in economics, statistics, medicine, and public health. Campbell's approach focuses on…
Using alien coins to test whether simple inference is Bayesian.
Cassey, Peter; Hawkins, Guy E; Donkin, Chris; Brown, Scott D
2016-03-01
Reasoning and inference are well-studied aspects of basic cognition that have been explained as statistically optimal Bayesian inference. Using a simplified experimental design, we conducted quantitative comparisons between Bayesian inference and human inference at the level of individuals. In 3 experiments, with more than 13,000 participants, we asked people for prior and posterior inferences about the probability that 1 of 2 coins would generate certain outcomes. Most participants' inferences were inconsistent with Bayes' rule. Only in the simplest version of the task did the majority of participants adhere to Bayes' rule, but even in that case, there was a significant proportion that failed to do so. The current results highlight the importance of close quantitative comparisons between Bayesian inference and human data at the individual-subject level when evaluating models of cognition.
NASA Technical Reports Server (NTRS)
Stone, Peter H.; Yao, Mao-Sung
1990-01-01
A number of perpetual January simulations are carried out with a two-dimensional zonally averaged model employing various parameterizations of the eddy fluxes of heat (potential temperature) and moisture. The parameterizations are evaluated by comparing these results with the eddy fluxes calculated in a parallel simulation using a three-dimensional general circulation model with zonally symmetric forcing. The three-dimensional model's performance in turn is evaluated by comparing its results using realistic (nonsymmetric) boundary conditions with observations. Branscome's parameterization of the meridional eddy flux of heat and Leovy's parameterization of the meridional eddy flux of moisture simulate the seasonal and latitudinal variations of these fluxes reasonably well, while somewhat underestimating their magnitudes. New parameterizations of the vertical eddy fluxes are developed that take into account the enhancement of the eddy mixing slope in a growing baroclinic wave due to condensation, and also the effect of eddy fluctuations in relative humidity. The new parameterizations, when tested in the two-dimensional model, simulate the seasonal, latitudinal, and vertical variations of the vertical eddy fluxes quite well, when compared with the three-dimensional model, and only underestimate the magnitude of the fluxes by 10 to 20 percent.
Topics in inference and decision-making with partial knowledge
NASA Technical Reports Server (NTRS)
Safavian, S. Rasoul; Landgrebe, David
1990-01-01
Two essential elements needed in the process of inference and decision-making are prior probabilities and likelihood functions. When both of these components are known accurately and precisely, the Bayesian approach provides a consistent and coherent solution to the problems of inference and decision-making. In many situations, however, either one or both of the above components may not be known, or at least may not be known precisely. This problem of partial knowledge about prior probabilities and likelihood functions is addressed. There are at least two ways to cope with this lack of precise knowledge: robust methods, and interval-valued methods. First, ways of modeling imprecision and indeterminacies in prior probabilities and likelihood functions are examined; then how imprecision in the above components carries over to the posterior probabilities is examined. Finally, the problem of decision making with imprecise posterior probabilities and the consequences of such actions are addressed. Application areas where the above problems may occur are in statistical pattern recognition problems, for example, the problem of classification of high-dimensional multispectral remote sensing image data.
Modern Statistical Methods for Astronomy
NASA Astrophysics Data System (ADS)
Feigelson, Eric D.; Babu, G. Jogesh
2012-07-01
1. Introduction; 2. Probability; 3. Statistical inference; 4. Probability distribution functions; 5. Nonparametric statistics; 6. Density estimation or data smoothing; 7. Regression; 8. Multivariate analysis; 9. Clustering, classification and data mining; 10. Nondetections: censored and truncated data; 11. Time series analysis; 12. Spatial point processes; Appendices; Index.
Developing Young Students' Informal Inference Skills in Data Analysis
ERIC Educational Resources Information Center
Paparistodemou, Efi; Meletiou-Mavrotheris, Maria
2008-01-01
This paper focuses on developing students' informal inference skills, reporting on how a group of third grade students formulated and evaluated data-based inferences using the dynamic statistics data-visualization environment TinkerPlots[TM] (Konold & Miller, 2005), software specifically designed to meet the learning needs of students in the…
Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining.
Hero, Alfred O; Rajaratnam, Bala
2016-01-01
When can reliable inference be drawn in fue "Big Data" context? This paper presents a framework for answering this fundamental question in the context of correlation mining, wifu implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-informatics fue dataset is often variable-rich but sample-starved: a regime where the number n of acquired samples (statistical replicates) is far fewer than fue number p of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for "Big Data". Sample complexity however has received relatively less attention, especially in the setting when the sample size n is fixed, and the dimension p grows without bound. To address fuis gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where fue variable dimension is fixed and fue sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; 3) the purely high dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa cale data dimension. We illustrate this high dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables fua t are of interest. Correlation mining arises in numerous applications and subsumes the regression context as a special case. we demonstrate various regimes of correlation mining based on the unifying perspective of high dimensional learning rates and sample complexity for different structured covariance models and different inference tasks.
Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining
Hero, Alfred O.; Rajaratnam, Bala
2015-01-01
When can reliable inference be drawn in fue “Big Data” context? This paper presents a framework for answering this fundamental question in the context of correlation mining, wifu implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-informatics fue dataset is often variable-rich but sample-starved: a regime where the number n of acquired samples (statistical replicates) is far fewer than fue number p of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for “Big Data”. Sample complexity however has received relatively less attention, especially in the setting when the sample size n is fixed, and the dimension p grows without bound. To address fuis gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where fue variable dimension is fixed and fue sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; 3) the purely high dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa cale data dimension. We illustrate this high dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables fua t are of interest. Correlation mining arises in numerous applications and subsumes the regression context as a special case. we demonstrate various regimes of correlation mining based on the unifying perspective of high dimensional learning rates and sample complexity for different structured covariance models and different inference tasks. PMID:27087700
ERIC Educational Resources Information Center
Jacob, Bridgette L.
2013-01-01
The difficulties introductory statistics students have with formal statistical inference are well known in the field of statistics education. "Informal" statistical inference has been studied as a means to introduce inferential reasoning well before and without the formalities of formal statistical inference. This mixed methods study…
Albert, Carlo; Ulzega, Simone; Stoop, Ruedi
2016-04-01
Parameter inference is a fundamental problem in data-driven modeling. Given observed data that is believed to be a realization of some parameterized model, the aim is to find parameter values that are able to explain the observed data. In many situations, the dominant sources of uncertainty must be included into the model for making reliable predictions. This naturally leads to stochastic models. Stochastic models render parameter inference much harder, as the aim then is to find a distribution of likely parameter values. In Bayesian statistics, which is a consistent framework for data-driven learning, this so-called posterior distribution can be used to make probabilistic predictions. We propose a novel, exact, and very efficient approach for generating posterior parameter distributions for stochastic differential equation models calibrated to measured time series. The algorithm is inspired by reinterpreting the posterior distribution as a statistical mechanics partition function of an object akin to a polymer, where the measurements are mapped on heavier beads compared to those of the simulated data. To arrive at distribution samples, we employ a Hamiltonian Monte Carlo approach combined with a multiple time-scale integration. A separation of time scales naturally arises if either the number of measurement points or the number of simulation points becomes large. Furthermore, at least for one-dimensional problems, we can decouple the harmonic modes between measurement points and solve the fastest part of their dynamics analytically. Our approach is applicable to a wide range of inference problems and is highly parallelizable.
NASA Astrophysics Data System (ADS)
Albert, Carlo; Ulzega, Simone; Stoop, Ruedi
2016-04-01
Parameter inference is a fundamental problem in data-driven modeling. Given observed data that is believed to be a realization of some parameterized model, the aim is to find parameter values that are able to explain the observed data. In many situations, the dominant sources of uncertainty must be included into the model for making reliable predictions. This naturally leads to stochastic models. Stochastic models render parameter inference much harder, as the aim then is to find a distribution of likely parameter values. In Bayesian statistics, which is a consistent framework for data-driven learning, this so-called posterior distribution can be used to make probabilistic predictions. We propose a novel, exact, and very efficient approach for generating posterior parameter distributions for stochastic differential equation models calibrated to measured time series. The algorithm is inspired by reinterpreting the posterior distribution as a statistical mechanics partition function of an object akin to a polymer, where the measurements are mapped on heavier beads compared to those of the simulated data. To arrive at distribution samples, we employ a Hamiltonian Monte Carlo approach combined with a multiple time-scale integration. A separation of time scales naturally arises if either the number of measurement points or the number of simulation points becomes large. Furthermore, at least for one-dimensional problems, we can decouple the harmonic modes between measurement points and solve the fastest part of their dynamics analytically. Our approach is applicable to a wide range of inference problems and is highly parallelizable.
Gene-network inference by message passing
NASA Astrophysics Data System (ADS)
Braunstein, A.; Pagnani, A.; Weigt, M.; Zecchina, R.
2008-01-01
The inference of gene-regulatory processes from gene-expression data belongs to the major challenges of computational systems biology. Here we address the problem from a statistical-physics perspective and develop a message-passing algorithm which is able to infer sparse, directed and combinatorial regulatory mechanisms. Using the replica technique, the algorithmic performance can be characterized analytically for artificially generated data. The algorithm is applied to genome-wide expression data of baker's yeast under various environmental conditions. We find clear cases of combinatorial control, and enrichment in common functional annotations of regulated genes and their regulators.
Estimation and Inference of Diffusion Coefficients in Complex Biomolecular Environments.
Calderon, Christopher P
2011-02-08
The 1-D diffusion coefficient associated with a charged atom fluctuating in an ion-channel binding pocket is statistically analyzed. More specifically, unconstrained and constrained molecular dynamics simulations of potassium in gramicidin A are studied. Time domain transition density based inference methods are used to fit simple stochastic differential equations and also to carry out frequentist goodness of fit tests. Particular attention is paid to varying the time between adjacent time series observations due to the well-known "non-Markovian noise" that can appear in this system due to inertia and other unresolved coordinates influencing the dynamics. Different types of non-Markovian noise are shown by the goodness of fit tests to be statistically significant on vastly different time scales. On intermediate scales, a Markovian model is not rejected by the tests; models calibrated at these intermediate scales demonstrate a predictive capability for some physical quantities. However, in this intermediate regime, ergodic sampling does not occur over the length of a time series, but a local diffusion coefficient is deemed statistically acceptable for the observed raw data. It is demonstrated that a linear mixed effects model can be used to summarize the variation induced by slow unresolved degrees of freedom acting as a non-Markovian noise source. The utility of quantitative criteria for assessing low-dimensional stochastic models calibrated from time series generated by high-dimensional biomolecular systems is briefly discussed. Less coarse-grained data summaries of this type show promise for better understanding the kinetic signature of unresolved degrees of freedom in time series coming from simulations and single-molecule experiments.
Thacker, Michael A.; Moseley, G. Lorimer
2017-01-01
Perception is seen as a process that utilises partial and noisy information to construct a coherent understanding of the world. Here we argue that the experience of pain is no different; it is based on incomplete, multimodal information, which is used to estimate potential bodily threat. We outline a Bayesian inference model, incorporating the key components of cue combination, causal inference, and temporal integration, which highlights the statistical problems in everyday perception. It is from this platform that we are able to review the pain literature, providing evidence from experimental, acute, and persistent phenomena to demonstrate the advantages of adopting a statistical account in pain. Our probabilistic conceptualisation suggests a principles-based view of pain, explaining a broad range of experimental and clinical findings and making testable predictions. PMID:28081134
Yang, X.; Juhás, P.; Billinge, S. J. L.
2014-07-19
Optimal methods are explored for obtaining one-dimensional powder pattern intensities from two-dimensional planar detectors with good estimates of their standard deviations. Methods are described to estimate uncertainties when the same image is measured in multiple frames as well as from a single frame. The importance of considering the correlation of diffraction points during the integration and the resampling process of data analysis is shown. It is found that correlations between adjacent pixels in the image can lead to seriously overestimated uncertainties if such correlations are neglected in the integration process. Off-diagonal entries in the variance–covariance (VC) matrix are problematic as virtually all data processing and modeling programs cannot handle the full VC matrix. It is shown that the off-diagonal terms come mainly from the pixel-splitting algorithm used as the default integration algorithm in many popular two-dimensional integration programs, as well as from rebinning and resampling steps later in the processing. When the full VC matrix can be propagated during the data reduction, it is possible to get accurate refined parameters and their uncertainties at the cost of increasing computational complexity. However, as this is not normally possible, the best approximate methods for data processing in order to estimate uncertainties on refined parameters with the greatest accuracy from just the diagonal variance terms in the VC matrix is explored.
sick: The Spectroscopic Inference Crank
NASA Astrophysics Data System (ADS)
Casey, Andrew R.
2016-03-01
There exists an inordinate amount of spectral data in both public and private astronomical archives that remain severely under-utilized. The lack of reliable open-source tools for analyzing large volumes of spectra contributes to this situation, which is poised to worsen as large surveys successively release orders of magnitude more spectra. In this article I introduce sick, the spectroscopic inference crank, a flexible and fast Bayesian tool for inferring astrophysical parameters from spectra. sick is agnostic to the wavelength coverage, resolving power, or general data format, allowing any user to easily construct a generative model for their data, regardless of its source. sick can be used to provide a nearest-neighbor estimate of model parameters, a numerically optimized point estimate, or full Markov Chain Monte Carlo sampling of the posterior probability distributions. This generality empowers any astronomer to capitalize on the plethora of published synthetic and observed spectra, and make precise inferences for a host of astrophysical (and nuisance) quantities. Model intensities can be reliably approximated from existing grids of synthetic or observed spectra using linear multi-dimensional interpolation, or a Cannon-based model. Additional phenomena that transform the data (e.g., redshift, rotational broadening, continuum, spectral resolution) are incorporated as free parameters and can be marginalized away. Outlier pixels (e.g., cosmic rays or poorly modeled regimes) can be treated with a Gaussian mixture model, and a noise model is included to account for systematically underestimated variance. Combining these phenomena into a scalar-justified, quantitative model permits precise inferences with credible uncertainties on noisy data. I describe the common model features, the implementation details, and the default behavior, which is balanced to be suitable for most astronomical applications. Using a forward model on low-resolution, high signal
Hanson, K.M.; Cunningham, G.S.
1996-04-01
The authors are developing a computer application, called the Bayes Inference Engine, to provide the means to make inferences about models of physical reality within a Bayesian framework. The construction of complex nonlinear models is achieved by a fully object-oriented design. The models are represented by a data-flow diagram that may be manipulated by the analyst through a graphical programming environment. Maximum a posteriori solutions are achieved using a general, gradient-based optimization algorithm. The application incorporates a new technique of estimating and visualizing the uncertainties in specific aspects of the model.
Inference of Isoforms from Short Sequence Reads
NASA Astrophysics Data System (ADS)
Feng, Jianxing; Li, Wei; Jiang, Tao
Due to alternative splicing events in eukaryotic species, the identification of mRNA isoforms (or splicing variants) is a difficult problem. Traditional experimental methods for this purpose are time consuming and cost ineffective. The emerging RNA-Seq technology provides a possible effective method to address this problem. Although the advantages of RNA-Seq over traditional methods in transcriptome analysis have been confirmed by many studies, the inference of isoforms from millions of short sequence reads (e.g., Illumina/Solexa reads) has remained computationally challenging. In this work, we propose a method to calculate the expression levels of isoforms and infer isoforms from short RNA-Seq reads using exon-intron boundary, transcription start site (TSS) and poly-A site (PAS) information. We first formulate the relationship among exons, isoforms, and single-end reads as a convex quadratic program, and then use an efficient algorithm (called IsoInfer) to search for isoforms. IsoInfer can calculate the expression levels of isoforms accurately if all the isoforms are known and infer novel isoforms from scratch. Our experimental tests on known mouse isoforms with both simulated expression levels and reads demonstrate that IsoInfer is able to calculate the expression levels of isoforms with an accuracy comparable to the state-of-the-art statistical method and a 60 times faster speed. Moreover, our tests on both simulated and real reads show that it achieves a good precision and sensitivity in inferring isoforms when given accurate exon-intron boundary, TSS and PAS information, especially for isoforms whose expression levels are significantly high.
Deep Learning for Population Genetic Inference
Sheehan, Sara; Song, Yun S.
2016-01-01
Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme. PMID:27018908
Reliability of the Granger causality inference
NASA Astrophysics Data System (ADS)
Zhou, Douglas; Zhang, Yaoyu; Xiao, Yanyang; Cai, David
2014-04-01
How to characterize information flows in physical, biological, and social systems remains a major theoretical challenge. Granger causality (GC) analysis has been widely used to investigate information flow through causal interactions. We address one of the central questions in GC analysis, that is, the reliability of the GC evaluation and its implications for the causal structures extracted by this analysis. Our work reveals that the manner in which a continuous dynamical process is projected or coarse-grained to a discrete process has a profound impact on the reliability of the GC inference, and different sampling may potentially yield completely opposite inferences. This inference hazard is present for both linear and nonlinear processes. We emphasize that there is a hazard of reaching incorrect conclusions about network topologies, even including statistical (such as small-world or scale-free) properties of the networks, when GC analysis is blindly applied to infer the network topology. We demonstrate this using a small-world network for which a drastic loss of small-world attributes occurs in the reconstructed network using the standard GC approach. We further show how to resolve the paradox that the GC analysis seemingly becomes less reliable when more information is incorporated using finer and finer sampling. Finally, we present strategies to overcome these inference artifacts in order to obtain a reliable GC result.
Deep Learning for Population Genetic Inference.
Sheehan, Sara; Song, Yun S
2016-03-01
Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme.
Spectral likelihood expansions for Bayesian inference
NASA Astrophysics Data System (ADS)
Nagel, Joseph B.; Sudret, Bruno
2016-03-01
A spectral approach to Bayesian inference is presented. It pursues the emulation of the posterior probability density. The starting point is a series expansion of the likelihood function in terms of orthogonal polynomials. From this spectral likelihood expansion all statistical quantities of interest can be calculated semi-analytically. The posterior is formally represented as the product of a reference density and a linear combination of polynomial basis functions. Both the model evidence and the posterior moments are related to the expansion coefficients. This formulation avoids Markov chain Monte Carlo simulation and allows one to make use of linear least squares instead. The pros and cons of spectral Bayesian inference are discussed and demonstrated on the basis of simple applications from classical statistics and inverse modeling.
NASA Astrophysics Data System (ADS)
Stone, Peter H.; Yao, Mao-Sung
1990-07-01
A number of perpetual January simulations are carried out with a two-dimensional (2-D) zonally averaged model employing various parameterizations of the eddy fluxes of heat (potential temperature) and moisture. The parameterizations are evaluated by comparing these results with the eddy fluxes calculated in a parallel simulation using a three-dimensional (3-D) general circulation model with zonally symmetric forcing. The 3-D model's performance in turn is evaluated by comparing its results using realistic (nonsymmetric) boundary conditions with observations.Branscome's parameterization of the meridional eddy flux of heat and Leovy's parameterization of the meridional eddy flux of moisture simulate the seasonal and latitudinal variations of these fluxes reasonably well, while somewhat underestimating their magnitudes. In particular, Branscome's parameterization underestimates the vertically integrated flux of heat by about 30%, mainly because it misses out the secondary peak in this flux near the tropopause; and Leovy's parameterization of the meridional eddy flux of moisture underestimates the magnitude of this flux by about 20%. The analogous parameterizations of the vertical eddy fluxes of heat and moisture are found to perform much more poorly, i.e., they give fluxes only one quarter to one half as strong as those calculated in the 3-D model. New parameterizations of the vertical eddy fluxes are developed that take into account the enhancement of the eddy mixing slope in a growing baroclinic wave due to condensation, and also the effect of eddy fluctuations in relative humidity. The new parameterizations, when tested in the 2-D model, simulate the seasonal, latitudinal, and vertical variations of the vertical eddy fluxes quite well, when compared with the 3-D model, and only underestimate the magnitude of the fluxes by 10% to 20%.
Zhu, H.; Braun, W.
1999-01-01
A statistical analysis of a representative data set of 169 known protein structures was used to analyze the specificity of residue interactions between spatial neighboring strands in beta-sheets. Pairwise potentials were derived from the frequency of residue pairs in nearest contact, second nearest and third nearest contacts across neighboring beta-strands compared to the expected frequency of residue pairs in a random model. A pseudo-energy function based on these statistical pairwise potentials recognized native beta-sheets among possible alternative pairings. The native pairing was found within the three lowest energies in 73% of the cases in the training data set and in 63% of beta-sheets in a test data set of 67 proteins, which were not part of the training set. The energy function was also used to detect tripeptides, which occur frequently in beta-sheets of native proteins. The majority of native partners of tripeptides were distributed in a low energy range. Self-correcting distance geometry (SECODG) calculations using distance constraints sets derived from possible low energy pairing of beta-strands uniquely identified the native pairing of the beta-sheet in pancreatic trypsin inhibitor (BPTI). These results will be useful for predicting the structure of proteins from their amino acid sequence as well as for the design of proteins containing beta-sheets. PMID:10048326
NASA Astrophysics Data System (ADS)
Graham, D. B.; Cairns, Iver H.; Skjaeraasen, O.; Robinson, P. A.
2012-02-01
The temperature ratio Ti/Te of ions to electrons affects both the ion-damping rate and the ion-acoustic speed in plasmas. The effects of changing the ion-damping rate and ion-acoustic speed are investigated for electrostatic strong turbulence and electromagnetic strong turbulence in three dimensions. When ion damping is strong, density wells relax in place and act as nucleation sites for the formation of new wave packets. In this case, the density perturbations are primarily density wells supported by the ponderomotive force. For weak ion damping, corresponding to low Ti/Te, ion-acoustic waves are launched radially outwards when wave packets dissipate at burnout, thereby increasing the level of density perturbations in the system and thus raising the level of scattering of Langmuir waves off density perturbations. Density wells no longer relax in place so renucleation at recent collapse sites no longer occurs, instead wave packets form in background low density regions, such as superpositions of troughs of propagating ion-acoustic waves. This transition is found to occur at Ti/Te ≈ 0.1. The change in behavior with Ti/Te is shown to change the bulk statistical properties, scaling behavior, spectra, and field statistics of strong turbulence. For Ti/Te>rsim0.1, the electrostatic results approach the predictions of the two-component model of Robinson and Newman, and good agreement is found for Ti/Te>rsim0.15.
Data free inference with processed data products
Chowdhary, K.; Najm, H. N.
2014-07-12
Here, we consider the context of probabilistic inference of model parameters given error bars or confidence intervals on model output values, when the data is unavailable. We introduce a class of algorithms in a Bayesian framework, relying on maximum entropy arguments and approximate Bayesian computation methods, to generate consistent data with the given summary statistics. Once we obtain consistent data sets, we pool the respective posteriors, to arrive at a single, averaged density on the parameters. This approach allows us to perform accurate forward uncertainty propagation consistent with the reported statistics.
Generalizability Theory and Experimental Design: Incongruity between Analysis and Inference.
ERIC Educational Resources Information Center
Hopkins, Kenneth D.
1984-01-01
In behavior research using cognitive and affective measures, there is often incongruity between the statistical analysis employed and the intended inference. This paper argues that incorporating items as levels of a random facet via generalizability theory allows the statistical examination of the inferential question in the desired universe of…
1988-06-27
de olf nessse end Id e ;-tl Sb ieeI smleo) ,Optical Artificial Intellegence ; Optical inference engines; Optical logic; Optical informationprocessing...common. They arise in areas such as expert systems and other artificial intelligence systems. In recent years, the computer science language PROLOG has...cal processors should in principle be well suited for : I artificial intelligence applications. In recent years, symbolic logic processing. , the
Zhang, Wanfeng; Zhu, Shukui; He, Sheng; Wang, Yanxin
2015-02-06
Using comprehensive two-dimensional gas chromatography coupled to time-of-flight mass spectrometry (GC×GC/TOFMS), volatile and semi-volatile organic compounds in crude oil samples from different reservoirs or regions were analyzed for the development of a molecular fingerprint database. Based on the GC×GC/TOFMS fingerprints of crude oils, principal component analysis (PCA) and cluster analysis were used to distinguish the oil sources and find biomarkers. As a supervised technique, the geological characteristics of crude oils, including thermal maturity, sedimentary environment etc., are assigned to the principal components. The results show that tri-aromatic steroid (TAS) series are the suitable marker compounds in crude oils for the oil screening, and the relative abundances of individual TAS compounds have excellent correlation with oil sources. In order to correct the effects of some other external factors except oil sources, the variables were defined as the content ratio of some target compounds and 13 parameters were proposed for the screening of oil sources. With the developed model, the crude oils were easily discriminated, and the result is in good agreement with the practical geological setting.
Estimating uncertainty of inference for validation
Booker, Jane M; Langenbrunner, James R; Hemez, Francois M; Ross, Timothy J
2010-09-30
first in a series of inference uncertainty estimations. While the methods demonstrated are primarily statistical, these do not preclude the use of nonprobabilistic methods for uncertainty characterization. The methods presented permit accurate determinations for validation and eventual prediction. It is a goal that these methods establish a standard against which best practice may evolve for determining degree of validation.
Active inference and learning.
Friston, Karl; FitzGerald, Thomas; Rigoli, Francesco; Schwartenbeck, Philipp; O'Doherty, John; Pezzulo, Giovanni
2016-09-01
This paper offers an active inference account of choice behaviour and learning. It focuses on the distinction between goal-directed and habitual behaviour and how they contextualise each other. We show that habits emerge naturally (and autodidactically) from sequential policy optimisation when agents are equipped with state-action policies. In active inference, behaviour has explorative (epistemic) and exploitative (pragmatic) aspects that are sensitive to ambiguity and risk respectively, where epistemic (ambiguity-resolving) behaviour enables pragmatic (reward-seeking) behaviour and the subsequent emergence of habits. Although goal-directed and habitual policies are usually associated with model-based and model-free schemes, we find the more important distinction is between belief-free and belief-based schemes. The underlying (variational) belief updating provides a comprehensive (if metaphorical) process theory for several phenomena, including the transfer of dopamine responses, reversal learning, habit formation and devaluation. Finally, we show that active inference reduces to a classical (Bellman) scheme, in the absence of ambiguity.
Multimodel inference and adaptive management
Rehme, S.E.; Powell, L.A.; Allen, C.R.
2011-01-01
Ecology is an inherently complex science coping with correlated variables, nonlinear interactions and multiple scales of pattern and process, making it difficult for experiments to result in clear, strong inference. Natural resource managers, policy makers, and stakeholders rely on science to provide timely and accurate management recommendations. However, the time necessary to untangle the complexities of interactions within ecosystems is often far greater than the time available to make management decisions. One method of coping with this problem is multimodel inference. Multimodel inference assesses uncertainty by calculating likelihoods among multiple competing hypotheses, but multimodel inference results are often equivocal. Despite this, there may be pressure for ecologists to provide management recommendations regardless of the strength of their study’s inference. We reviewed papers in the Journal of Wildlife Management (JWM) and the journal Conservation Biology (CB) to quantify the prevalence of multimodel inference approaches, the resulting inference (weak versus strong), and how authors dealt with the uncertainty. Thirty-eight percent and 14%, respectively, of articles in the JWM and CB used multimodel inference approaches. Strong inference was rarely observed, with only 7% of JWM and 20% of CB articles resulting in strong inference. We found the majority of weak inference papers in both journals (59%) gave specific management recommendations. Model selection uncertainty was ignored in most recommendations for management. We suggest that adaptive management is an ideal method to resolve uncertainty when research results in weak inference.
Student Performance in Curricula Centered on Simulation-Based Inference: A Preliminary Report
ERIC Educational Resources Information Center
Chance, Beth; Wong, Jimmy; Tintle, Nathan
2016-01-01
"Simulation-based inference" (e.g., bootstrapping and randomization tests) has been advocated recently with the goal of improving student understanding of statistical inference, as well as the statistical investigative process as a whole. Preliminary assessment data have been largely positive. This article describes the analysis of the…
The NIFTY way of Bayesian signal inference
Selig, Marco
2014-12-05
We introduce NIFTY, 'Numerical Information Field Theory', a software package for the development of Bayesian signal inference algorithms that operate independently from any underlying spatial grid and its resolution. A large number of Bayesian and Maximum Entropy methods for 1D signal reconstruction, 2D imaging, as well as 3D tomography, appear formally similar, but one often finds individualized implementations that are neither flexible nor easily transferable. Signal inference in the framework of NIFTY can be done in an abstract way, such that algorithms, prototyped in 1D, can be applied to real world problems in higher-dimensional settings. NIFTY as a versatile library is applicable and already has been applied in 1D, 2D, 3D and spherical settings. A recent application is the D{sup 3}PO algorithm targeting the non-trivial task of denoising, deconvolving, and decomposing photon observations in high energy astronomy.
Human Inferences about Sequences: A Minimal Transition Probability Model
2016-01-01
The brain constantly infers the causes of the inputs it receives and uses these inferences to generate statistical expectations about future observations. Experimental evidence for these expectations and their violations include explicit reports, sequential effects on reaction times, and mismatch or surprise signals recorded in electrophysiology and functional MRI. Here, we explore the hypothesis that the brain acts as a near-optimal inference device that constantly attempts to infer the time-varying matrix of transition probabilities between the stimuli it receives, even when those stimuli are in fact fully unpredictable. This parsimonious Bayesian model, with a single free parameter, accounts for a broad range of findings on surprise signals, sequential effects and the perception of randomness. Notably, it explains the pervasive asymmetry between repetitions and alternations encountered in those studies. Our analysis suggests that a neural machinery for inferring transition probabilities lies at the core of human sequence knowledge. PMID:28030543
Quantum-Like Representation of Non-Bayesian Inference
NASA Astrophysics Data System (ADS)
Asano, M.; Basieva, I.; Khrennikov, A.; Ohya, M.; Tanaka, Y.
2013-01-01
This research is related to the problem of "irrational decision making or inference" that have been discussed in cognitive psychology. There are some experimental studies, and these statistical data cannot be described by classical probability theory. The process of decision making generating these data cannot be reduced to the classical Bayesian inference. For this problem, a number of quantum-like coginitive models of decision making was proposed. Our previous work represented in a natural way the classical Bayesian inference in the frame work of quantum mechanics. By using this representation, in this paper, we try to discuss the non-Bayesian (irrational) inference that is biased by effects like the quantum interference. Further, we describe "psychological factor" disturbing "rationality" as an "environment" correlating with the "main system" of usual Bayesian inference.
Statistical Inference-Based Cache Management for Mobile Learning
ERIC Educational Resources Information Center
Li, Qing; Zhao, Jianmin; Zhu, Xinzhong
2009-01-01
Supporting efficient data access in the mobile learning environment is becoming a hot research problem in recent years, and the problem becomes tougher when the clients are using light-weight mobile devices such as cell phones whose limited storage space prevents the clients from holding a large cache. A practical solution is to store the cache…
Drawing statistical inferences from historical census data, 1850-1950.
Davern, Michael; Ruggles, Steven; Swenson, Tami; Alexander, J Trent; Oakes, J Michael
2009-08-01
Virtually all quantitative microdata used by social scientists derive from samples that incorporate clustering, stratification, and weighting adjustments (Kish 1965, 1992). Such data can yield standard error estimates that differ dramatically from those derived from a simple random sample of the same size. Researchers using historical U.S. census microdata, however, usually apply methods designed for simple random samples. The resulting p values and confidence intervals could be inaccurate and could lead to erroneous research conclusions. Because U.S. census microdata samples are among the most widely used sources for social science and policy research, the need for reliable standard error estimation is critical. We evaluate the historical microdata samples of the Integrated Public Use Microdata Series (IPUMS) project from 1850 to 1950 in order to determine (1) the impact of sample design on standard error estimates, and (2) how to apply modern standard error estimation software to historical census samples. We exploit a unique new data source from the 1880 census to validate our methods for standard error estimation, and then we apply this approach to the 1850-1870 and 1900-1950 decennial censuses. We conclude that Taylor series estimation can be used effectively with the historical decennial census microdata samples and should be applied in research analyses that have the potential for substantial clustering effects.
Statistical Inference for a Ratio of Dispersions Using Paired Samples.
ERIC Educational Resources Information Center
Bonett, Douglas G.; Seier, Edith
2003-01-01
Derived a confidence interval for a ratio of correlated mean absolute deviations. Simulation results show that it performs well in small sample sizes across realistically nonnormal distributions and that it is almost as powerful as the most powerful test examined by R. Wilcox (1990). (SLD)
Actively Learning Specific Function Properties with Applications to Statistical Inference
2007-12-01
which are distant from their nearest neigh- bors . However, when searching for level-sets, we are less interested in the function away from the level...34 excludes openly gay , lesbian and bisexual students from receiving ROTC scholarships or serving in the military. Nevertheless, all ROTC classes at
Testing Manifest Monotonicity Using Order-Constrained Statistical Inference
ERIC Educational Resources Information Center
Tijmstra, Jesper; Hessen, David J.; van der Heijden, Peter G. M.; Sijtsma, Klaas
2013-01-01
Most dichotomous item response models share the assumption of latent monotonicity, which states that the probability of a positive response to an item is a nondecreasing function of a latent variable intended to be measured. Latent monotonicity cannot be evaluated directly, but it implies manifest monotonicity across a variety of observed scores,…
Outcome- and auxiliary-dependent subsampling and its statistical inference.
Wang, Xiaofei; Wu, Yougui; Zhou, Haibo
2009-11-01
The performance of a biomarker predicting clinical outcome is often evaluated in a large prospective study. Due to high costs associated with bioassay, investigators need to select a subset from all available patients for biomarker assessment. We consider an outcome- and auxiliary-dependent subsampling (OADS) scheme, in which the probability of selecting a patient into the subset depends on the patient's clinical outcome and an auxiliary variable. We proposed a semiparametric empirical likelihood method to estimate the association between biomarker and clinical outcome. Asymptotic properties of the estimator are given. Simulation study shows that the proposed method outperforms alternative methods.
Statistical Inference for Detecting Structures and Anomalies in Networks
2015-08-27
community structure in dynamic networks, along with the discovery of a detectability phase transition as a function of the rate of change and the...local in- formation, about the known nodes and their neighbors. But when this fraction crosses a critical threshold, our knowledge becomes global
Statistical Inference and Spatial Patterns in Correlates of IQ
ERIC Educational Resources Information Center
Hassall, Christopher; Sherratt, Thomas N.
2011-01-01
Cross-national comparisons of IQ have become common since the release of a large dataset of international IQ scores. However, these studies have consistently failed to consider the potential lack of independence of these scores based on spatial proximity. To demonstrate the importance of this omission, we present a re-evaluation of several…
Beyond statistical inference: A decision theory for science
KILLEEN, PETER R.
2008-01-01
Traditional null hypothesis significance testing does not yield the probability of the null or its alternative and, therefore, cannot logically ground scientific decisions. The decision theory proposed here calculates the expected utility of an effect on the basis of (1) the probability of replicating it and (2) a utility function on its size. It takes significance tests—which place all value on the replicability of an effect and none on its magnitude—as a special case, one in which the cost of a false positive is revealed to be an order of magnitude greater than the value of a true positive. More realistic utility functions credit both replicability and effect size, integrating them for a single index of merit. The analysis incorporates opportunity cost and is consistent with alternate measures of effect size, such as r2 and information transmission, and with Bayesian model selection criteria. An alternate formulation is functionally equivalent to the formal theory, transparent, and easy to compute. PMID:17201351
Statistical Inferences from the Topology of Complex Networks
2016-10-04
of Mathematics at Cleveland State University. The remaining balance on the grant was sub-awarded to Dr. Bubenik at the Uni- versity of Florida, where...project was to help place Topological Data Analysis on a firmer mathematical foundation, strengthening its connections to mathematics and making it...Topology Research Network, a network with close to 300 members that is funded by the NSF through the Institute of Mathematics and its Applications
Permutation inference for the general linear model
Winkler, Anderson M.; Ridgway, Gerard R.; Webster, Matthew A.; Smith, Stephen M.; Nichols, Thomas E.
2014-01-01
Permutation methods can provide exact control of false positives and allow the use of non-standard statistics, making only weak assumptions about the data. With the availability of fast and inexpensive computing, their main limitation would be some lack of flexibility to work with arbitrary experimental designs. In this paper we report on results on approximate permutation methods that are more flexible with respect to the experimental design and nuisance variables, and conduct detailed simulations to identify the best method for settings that are typical for imaging research scenarios. We present a generic framework for permutation inference for complex general linear models (glms) when the errors are exchangeable and/or have a symmetric distribution, and show that, even in the presence of nuisance effects, these permutation inferences are powerful while providing excellent control of false positives in a wide range of common and relevant imaging research scenarios. We also demonstrate how the inference on glm parameters, originally intended for independent data, can be used in certain special but useful cases in which independence is violated. Detailed examples of common neuroimaging applications are provided, as well as a complete algorithm – the “randomise” algorithm – for permutation inference with the glm. PMID:24530839
Evolutionary inference via the Poisson Indel Process.
Bouchard-Côté, Alexandre; Jordan, Michael I
2013-01-22
We address the problem of the joint statistical inference of phylogenetic trees and multiple sequence alignments from unaligned molecular sequences. This problem is generally formulated in terms of string-valued evolutionary processes along the branches of a phylogenetic tree. The classic evolutionary process, the TKF91 model [Thorne JL, Kishino H, Felsenstein J (1991) J Mol Evol 33(2):114-124] is a continuous-time Markov chain model composed of insertion, deletion, and substitution events. Unfortunately, this model gives rise to an intractable computational problem: The computation of the marginal likelihood under the TKF91 model is exponential in the number of taxa. In this work, we present a stochastic process, the Poisson Indel Process (PIP), in which the complexity of this computation is reduced to linear. The Poisson Indel Process is closely related to the TKF91 model, differing only in its treatment of insertions, but it has a global characterization as a Poisson process on the phylogeny. Standard results for Poisson processes allow key computations to be decoupled, which yields the favorable computational profile of inference under the PIP model. We present illustrative experiments in which Bayesian inference under the PIP model is compared with separate inference of phylogenies and alignments.
HIERARCHICAL PROBABILISTIC INFERENCE OF COSMIC SHEAR
Schneider, Michael D.; Dawson, William A.; Hogg, David W.; Marshall, Philip J.; Bard, Deborah J.; Meyers, Joshua; Lang, Dustin
2015-07-01
Point estimators for the shearing of galaxy images induced by gravitational lensing involve a complex inverse problem in the presence of noise, pixelization, and model uncertainties. We present a probabilistic forward modeling approach to gravitational lensing inference that has the potential to mitigate the biased inferences in most common point estimators and is practical for upcoming lensing surveys. The first part of our statistical framework requires specification of a likelihood function for the pixel data in an imaging survey given parameterized models for the galaxies in the images. We derive the lensing shear posterior by marginalizing over all intrinsic galaxy properties that contribute to the pixel data (i.e., not limited to galaxy ellipticities) and learn the distributions for the intrinsic galaxy properties via hierarchical inference with a suitably flexible conditional probabilitiy distribution specification. We use importance sampling to separate the modeling of small imaging areas from the global shear inference, thereby rendering our algorithm computationally tractable for large surveys. With simple numerical examples we demonstrate the improvements in accuracy from our importance sampling approach, as well as the significance of the conditional distribution specification for the intrinsic galaxy properties when the data are generated from an unknown number of distinct galaxy populations with different morphological characteristics.
Inference in Adaptive Regression via the Kac-Rice Formula
2014-05-15
Inference in Adaptive Regression via the Kac- Rice Formula Jonathan Taylor∗, Joshua Loftus, Ryan J. Tibshirani Department of Statistics Stanford...general adaptive regression setting. Our approach uses the Kac- Rice formula (as described in Adler & Taylor 2007) applied to the problem of maximizing a...SUBTITLE Inference in Adaptive Regression via the Kac- Rice Formula 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d
NASA Technical Reports Server (NTRS)
Wheeler, Kevin; Timucin, Dogan; Rabbette, Maura; Curry, Charles; Allan, Mark; Lvov, Nikolay; Clanton, Sam; Pilewskie, Peter
2002-01-01
The goal of visual inference programming is to develop a software framework data analysis and to provide machine learning algorithms for inter-active data exploration and visualization. The topics include: 1) Intelligent Data Understanding (IDU) framework; 2) Challenge problems; 3) What's new here; 4) Framework features; 5) Wiring diagram; 6) Generated script; 7) Results of script; 8) Initial algorithms; 9) Independent Component Analysis for instrument diagnosis; 10) Output sensory mapping virtual joystick; 11) Output sensory mapping typing; 12) Closed-loop feedback mu-rhythm control; 13) Closed-loop training; 14) Data sources; and 15) Algorithms. This paper is in viewgraph form.
Fast inference of ill-posed problems within a convex space
NASA Astrophysics Data System (ADS)
Fernandez-de-Cossio-Diaz, J.; Mulet, R.
2016-07-01
In multiple scientific and technological applications we face the problem of having low dimensional data to be justified by a linear model defined in a high dimensional parameter space. The difference in dimensionality makes the problem ill-defined: the model is consistent with the data for many values of its parameters. The objective is to find the probability distribution of parameter values consistent with the data, a problem that can be cast as the exploration of a high dimensional convex polytope. In this work we introduce a novel algorithm to solve this problem efficiently. It provides results that are statistically indistinguishable from currently used numerical techniques while its running time scales linearly with the system size. We show that the algorithm performs robustly in many abstract and practical applications. As working examples we simulate the effects of restricting reaction fluxes on the space of feasible phenotypes of a genome scale Escherichia coli metabolic network and infer the traffic flow between origin and destination nodes in a real communication network.
Shi, Runhua; McLarty, Jerry W
2009-10-01
In this article, we introduced basic concepts of statistics, type of distributions, and descriptive statistics. A few examples were also provided. The basic concepts presented herein are only a fraction of the concepts related to descriptive statistics. Also, there are many commonly used distributions not presented herein, such as Poisson distributions for rare events and exponential distributions, F distributions, and logistic distributions. More information can be found in many statistics books and publications.
ERIC Educational Resources Information Center
Callamaras, Peter
1983-01-01
This buyer's guide to seven major types of statistics software packages for microcomputers reviews Edu-Ware Statistics 3.0; Financial Planning; Speed Stat; Statistics with DAISY; Human Systems Dynamics package of Stats Plus, ANOVA II, and REGRESS II; Maxistat; and Moore-Barnes' MBC Test Construction and MBC Correlation. (MBR)
ERIC Educational Resources Information Center
Petocz, Peter; Sowey, Eric
2008-01-01
As a branch of knowledge, Statistics is ubiquitous and its applications can be found in (almost) every field of human endeavour. In this article, the authors track down the possible source of the link between the "Siren song" and applications of Statistics. Answers to their previous five questions and five new questions on Statistics are presented.
Circular inferences in schizophrenia.
Jardri, Renaud; Denève, Sophie
2013-11-01
A considerable number of recent experimental and computational studies suggest that subtle impairments of excitatory to inhibitory balance or regulation are involved in many neurological and psychiatric conditions. The current paper aims to relate, specifically and quantitatively, excitatory to inhibitory imbalance with psychotic symptoms in schizophrenia. Considering that the brain constructs hierarchical causal models of the external world, we show that the failure to maintain the excitatory to inhibitory balance results in hallucinations as well as in the formation and subsequent consolidation of delusional beliefs. Indeed, the consequence of excitatory to inhibitory imbalance in a hierarchical neural network is equated to a pathological form of causal inference called 'circular belief propagation'. In circular belief propagation, bottom-up sensory information and top-down predictions are reverberated, i.e. prior beliefs are misinterpreted as sensory observations and vice versa. As a result, these predictions are counted multiple times. Circular inference explains the emergence of erroneous percepts, the patient's overconfidence when facing probabilistic choices, the learning of 'unshakable' causal relationships between unrelated events and a paradoxical immunity to perceptual illusions, which are all known to be associated with schizophrenia.
Inferring Horizontal Gene Transfer
Lassalle, Florent; Dessimoz, Christophe
2015-01-01
Horizontal or Lateral Gene Transfer (HGT or LGT) is the transmission of portions of genomic DNA between organisms through a process decoupled from vertical inheritance. In the presence of HGT events, different fragments of the genome are the result of different evolutionary histories. This can therefore complicate the investigations of evolutionary relatedness of lineages and species. Also, as HGT can bring into genomes radically different genotypes from distant lineages, or even new genes bearing new functions, it is a major source of phenotypic innovation and a mechanism of niche adaptation. For example, of particular relevance to human health is the lateral transfer of antibiotic resistance and pathogenicity determinants, leading to the emergence of pathogenic lineages [1]. Computational identification of HGT events relies upon the investigation of sequence composition or evolutionary history of genes. Sequence composition-based ("parametric") methods search for deviations from the genomic average, whereas evolutionary history-based ("phylogenetic") approaches identify genes whose evolutionary history significantly differs from that of the host species. The evaluation and benchmarking of HGT inference methods typically rely upon simulated genomes, for which the true history is known. On real data, different methods tend to infer different HGT events, and as a result it can be difficult to ascertain all but simple and clear-cut HGT events. PMID:26020646
Moment inference from tomograms
Day-Lewis, F. D.; Chen, Y.; Singha, K.
2007-01-01
Time-lapse geophysical tomography can provide valuable qualitative insights into hydrologic transport phenomena associated with aquifer dynamics, tracer experiments, and engineered remediation. Increasingly, tomograms are used to infer the spatial and/or temporal moments of solute plumes; these moments provide quantitative information about transport processes (e.g., advection, dispersion, and rate-limited mass transfer) and controlling parameters (e.g., permeability, dispersivity, and rate coefficients). The reliability of moments calculated from tomograms is, however, poorly understood because classic approaches to image appraisal (e.g., the model resolution matrix) are not directly applicable to moment inference. Here, we present a semi-analytical approach to construct a moment resolution matrix based on (1) the classic model resolution matrix and (2) image reconstruction from orthogonal moments. Numerical results for radar and electrical-resistivity imaging of solute plumes demonstrate that moment values calculated from tomograms depend strongly on plume location within the tomogram, survey geometry, regularization criteria, and measurement error. Copyright 2007 by the American Geophysical Union.
Inverse Ising Inference Using All the Data
NASA Astrophysics Data System (ADS)
Aurell, Erik; Ekeberg, Magnus
2012-03-01
We show that a method based on logistic regression, using all the data, solves the inverse Ising problem far better than mean-field calculations relying only on sample pairwise correlation functions, while still computationally feasible for hundreds of nodes. The largest improvement in reconstruction occurs for strong interactions. Using two examples, a diluted Sherrington-Kirkpatrick model and a two-dimensional lattice, we also show that interaction topologies can be recovered from few samples with good accuracy and that the use of l1 regularization is beneficial in this process, pushing inference abilities further into low-temperature regimes.
Developing Young Children's Emergent Inferential Practices in Statistics
ERIC Educational Resources Information Center
Makar, Katie
2016-01-01
Informal statistical inference has now been researched at all levels of schooling and initial tertiary study. Work in informal statistical inference is least understood in the early years, where children have had little if any exposure to data handling. A qualitative study in Australia was carried out through a series of teaching experiments with…
Impact of Diverse Polarisations on Clutter Statistics
2005-09-29
Statistical inference ’ (Dover Publications Inc., Mineola, New York, 2003) 12 Dudewicz, E.J., and Mishra, S.N.: ‘Modern mathematical statistics ’ (John Wiley...Impact of diverse polarisations on clutter statistics M. Rangaswamy Abstract: The author addresses the impact of diverse polarisations on clutter... statistics in the context of waveform diversity for multi-functional operation from a specific platform as well as for multiple sensing from multiple
Statistical Physics of Pairwise Probability Models
Roudi, Yasser; Aurell, Erik; Hertz, John A.
2009-01-01
Statistical models for describing the probability distribution over the states of biological systems are commonly used for dimensional reduction. Among these models, pairwise models are very attractive in part because they can be fit using a reasonable amount of data: knowledge of the mean values and correlations between pairs of elements in the system is sufficient. Not surprisingly, then, using pairwise models for studying neural data has been the focus of many studies in recent years. In this paper, we describe how tools from statistical physics can be employed for studying and using pairwise models. We build on our previous work on the subject and study the relation between different methods for fitting these models and evaluating their quality. In particular, using data from simulated cortical networks we study how the quality of various approximate methods for inferring the parameters in a pairwise model depends on the time bin chosen for binning the data. We also study the effect of the size of the time bin on the model quality itself, again using simulated data. We show that using finer time bins increases the quality of the pairwise model. We offer new ways of deriving the expressions reported in our previous work for assessing the quality of pairwise models. PMID:19949460
Causal Inference in Public Health
Glass, Thomas A.; Goodman, Steven N.; Hernán, Miguel A.; Samet, Jonathan M.
2014-01-01
Causal inference has a central role in public health; the determination that an association is causal indicates the possibility for intervention. We review and comment on the long-used guidelines for interpreting evidence as supporting a causal association and contrast them with the potential outcomes framework that encourages thinking in terms of causes that are interventions. We argue that in public health this framework is more suitable, providing an estimate of an action’s consequences rather than the less precise notion of a risk factor’s causal effect. A variety of modern statistical methods adopt this approach. When an intervention cannot be specified, causal relations can still exist, but how to intervene to change the outcome will be unclear. In application, the often-complex structure of causal processes needs to be acknowledged and appropriate data collected to study them. These newer approaches need to be brought to bear on the increasingly complex public health challenges of our globalized world. PMID:23297653
Statistical modeling of software reliability
NASA Technical Reports Server (NTRS)
Miller, Douglas R.
1992-01-01
This working paper discusses the statistical simulation part of a controlled software development experiment being conducted under the direction of the System Validation Methods Branch, Information Systems Division, NASA Langley Research Center. The experiment uses guidance and control software (GCS) aboard a fictitious planetary landing spacecraft: real-time control software operating on a transient mission. Software execution is simulated to study the statistical aspects of reliability and other failure characteristics of the software during development, testing, and random usage. Quantification of software reliability is a major goal. Various reliability concepts are discussed. Experiments are described for performing simulations and collecting appropriate simulated software performance and failure data. This data is then used to make statistical inferences about the quality of the software development and verification processes as well as inferences about the reliability of software versions and reliability growth under random testing and debugging.
Variational Bayesian Inference Algorithms for Infinite Relational Model of Network Data.
Konishi, Takuya; Kubo, Takatomi; Watanabe, Kazuho; Ikeda, Kazushi
2015-09-01
Network data show the relationship among one kind of objects, such as social networks and hyperlinks on the Web. Many statistical models have been proposed for analyzing these data. For modeling cluster structures of networks, the infinite relational model (IRM) was proposed as a Bayesian nonparametric extension of the stochastic block model. In this brief, we derive the inference algorithms for the IRM of network data based on the variational Bayesian (VB) inference methods. After showing the standard VB inference, we derive the collapsed VB (CVB) inference and its variant called the zeroth-order CVB inference. We compared the performances of the inference algorithms using six real network datasets. The CVB inference outperformed the VB inference in most of the datasets, and the differences were especially larger in dense networks.
Bayesian inference in geomagnetism
NASA Technical Reports Server (NTRS)
Backus, George E.
1988-01-01
The inverse problem in empirical geomagnetic modeling is investigated, with critical examination of recently published studies. Particular attention is given to the use of Bayesian inference (BI) to select the damping parameter lambda in the uniqueness portion of the inverse problem. The mathematical bases of BI and stochastic inversion are explored, with consideration of bound-softening problems and resolution in linear Gaussian BI. The problem of estimating the radial magnetic field B(r) at the earth core-mantle boundary from surface and satellite measurements is then analyzed in detail, with specific attention to the selection of lambda in the studies of Gubbins (1983) and Gubbins and Bloxham (1985). It is argued that the selection method is inappropriate and leads to lambda values much larger than those that would result if a reasonable bound on the heat flow at the CMB were assumed.
Inversion of multiwavelength radiometer measurements by three-dimensional filtering
NASA Technical Reports Server (NTRS)
Rosenkranz, P. W.; Baumann, W. T.
1980-01-01
Remote sensing data from satellites typically have three dimensions: scan position, spacecraft position, and wavelength. Inversion of the radiometric data to infer geophysical parameters is a filtering problem in which the dimension of wavelength (or channel number) is transformed into a dimension of geophysical parameters, and the most general solution is a three-dimensional filter. Linear filters have the advantages of computational speed and easily described transfer functions; but often the measurements are nonlinear functions of the parameters to be inferred. To the extent that the nonlinear inversion problem is overdetermined, it can be modeled by a critically determined linear problem. As an example, inversion of Scanning Multichannel Microwave Radiometer (SMMR) data by means of a three-dimensional Wiener Filter is described. Atmospheric water vapor content, rain liquid water content, surface wind speed and surface temperature are the parameters inferred from the measurements. Nonprecipitating liquid water and water vapor scale height are also modeled but not retrieved. The a priori statistics on which the filter is trained have the effect of governing the selection of a trade-off point of noise as a function of resolution (in all three retrieval dimensions).
Inference of magnetic fields in inhomogeneous prominences
NASA Astrophysics Data System (ADS)
Milić, I.; Faurobert, M.; Atanacković, O.
2017-01-01
Context. Most of the quantitative information about the magnetic field vector in solar prominences comes from the analysis of the Hanle effect acting on lines formed by scattering. As these lines can be of non-negligible optical thickness, it is of interest to study the line formation process further. Aims: We investigate the multidimensional effects on the interpretation of spectropolarimetric observations, particularly on the inference of the magnetic field vector. We do this by analyzing the differences between multidimensional models, which involve fully self-consistent radiative transfer computations in the presence of spatial inhomogeneities and velocity fields, and those which rely on simple one-dimensional geometry. Methods: We study the formation of a prototype line in ad hoc inhomogeneous, isothermal 2D prominence models. We solve the NLTE polarized line formation problem in the presence of a large-scale oriented magnetic field. The resulting polarized line profiles are then interpreted (i.e. inverted) assuming a simple 1D slab model. Results: We find that differences between input and the inferred magnetic field vector are non-negligible. Namely, we almost universally find that the inferred field is weaker and more horizontal than the input field. Conclusions: Spatial inhomogeneities and radiative transfer have a strong effect on scattering line polarization in the optically thick lines. In real-life situations, ignoring these effects could lead to a serious misinterpretation of spectropolarimetric observations of chromospheric objects such as prominences.
Inferring sparse networks for noisy transient processes
NASA Astrophysics Data System (ADS)
Tran, Hoang M.; Bukkapatnam, Satish T. S.
2016-02-01
Inferring causal structures of real world complex networks from measured time series signals remains an open issue. The current approaches are inadequate to discern between direct versus indirect influences (i.e., the presence or absence of a directed arc connecting two nodes) in the presence of noise, sparse interactions, as well as nonlinear and transient dynamics of real world processes. We report a sparse regression (referred to as the -min) approach with theoretical bounds on the constraints on the allowable perturbation to recover the network structure that guarantees sparsity and robustness to noise. We also introduce averaging and perturbation procedures to further enhance prediction scores (i.e., reduce inference errors), and the numerical stability of -min approach. Extensive investigations have been conducted with multiple benchmark simulated genetic regulatory network and Michaelis-Menten dynamics, as well as real world data sets from DREAM5 challenge. These investigations suggest that our approach can significantly improve, oftentimes by 5 orders of magnitude over the methods reported previously for inferring the structure of dynamic networks, such as Bayesian network, network deconvolution, silencing and modular response analysis methods based on optimizing for sparsity, transients, noise and high dimensionality issues.
Bayes factors and multimodel inference
Link, W.A.; Barker, R.J.; Thomson, David L.; Cooch, Evan G.; Conroy, Michael J.
2009-01-01
Multimodel inference has two main themes: model selection, and model averaging. Model averaging is a means of making inference conditional on a model set, rather than on a selected model, allowing formal recognition of the uncertainty associated with model choice. The Bayesian paradigm provides a natural framework for model averaging, and provides a context for evaluation of the commonly used AIC weights. We review Bayesian multimodel inference, noting the importance of Bayes factors. Noting the sensitivity of Bayes factors to the choice of priors on parameters, we define and propose nonpreferential priors as offering a reasonable standard for objective multimodel inference.
Topology-based kernels with application to inference problems in Alzheimer's disease.
Pachauri, Deepti; Hinrichs, Chris; Chung, Moo K; Johnson, Sterling C; Singh, Vikas
2011-10-01
Alzheimer's disease (AD) research has recently witnessed a great deal of activity focused on developing new statistical learning tools for automated inference using imaging data. The workhorse for many of these techniques is the support vector machine (SVM) framework (or more generally kernel-based methods). Most of these require, as a first step, specification of a kernel matrix K between input examples (i.e., images). The inner product between images I(i) and I(j) in a feature space can generally be written in closed form and so it is convenient to treat K as "given." However, in certain neuroimaging applications such an assumption becomes problematic. As an example, it is rather challenging to provide a scalar measure of similarity between two instances of highly attributed data such as cortical thickness measures on cortical surfaces. Note that cortical thickness is known to be discriminative for neurological disorders, so leveraging such information in an inference framework, especially within a multi-modal method, is potentially advantageous. But despite being clinically meaningful, relatively few works have successfully exploited this measure for classification or regression. Motivated by these applications, our paper presents novel techniques to compute similarity matrices for such topologically-based attributed data. Our ideas leverage recent developments to characterize signals (e.g., cortical thickness) motivated by the persistence of their topological features, leading to a scheme for simple constructions of kernel matrices. As a proof of principle, on a dataset of 356 subjects from the Alzheimer's Disease Neuroimaging Initiative study, we report good performance on several statistical inference tasks without any feature selection, dimensionality reduction, or parameter tuning.
Direct evidence for a dual process model of deductive inference.
Markovits, Henry; Brunet, Marie-Laurence; Thompson, Valerie; Brisson, Janie
2013-07-01
In 2 experiments, we tested a strong version of a dual process theory of conditional inference (cf. Verschueren et al., 2005a, 2005b) that assumes that most reasoners have 2 strategies available, the choice of which is determined by situational variables, cognitive capacity, and metacognitive control. The statistical strategy evaluates inferences probabilistically, accepting those with high conditional probability. The counterexample strategy rejects inferences when a counterexample shows the inference to be invalid. To discriminate strategy use, we presented reasoners with conditional statements (if p, then q) and explicit statistical information about the relative frequency of the probability of p/q (50% vs. 90%). A statistical strategy would accept the more probable inferences more frequently, whereas the counterexample one would reject both. In Experiment 1, reasoners under time pressure used the statistical strategy more, but switched to the counterexample strategy when time constraints were removed; the former took less time than the latter. These data are consistent with the hypothesis that the statistical strategy is the default heuristic. Under a free-time condition, reasoners preferred the counterexample strategy and kept it when put under time pressure. Thus, it is not simply a lack of capacity that produces a statistical strategy; instead, it seems that time pressure disrupts the ability to make good metacognitive choices. In line with this conclusion, in a 2nd experiment, we measured reasoners' confidence in their performance; those under time pressure were less confident in the statistical than the counterexample strategy and more likely to switch strategies under free-time conditions.
ERIC Educational Resources Information Center
Petocz, Peter; Sowey, Eric
2008-01-01
In this article, the authors focus on hypothesis testing--that peculiarly statistical way of deciding things. Statistical methods for testing hypotheses were developed in the 1920s and 1930s by some of the most famous statisticians, in particular Ronald Fisher, Jerzy Neyman and Egon Pearson, who laid the foundations of almost all modern methods of…
Improving Inferences from Multiple Methods.
ERIC Educational Resources Information Center
Shotland, R. Lance; Mark, Melvin M.
1987-01-01
Multiple evaluation methods (MEMs) can cause an inferential challenge, although there are strategies to strengthen inferences. Practical and theoretical issues involved in the use by social scientists of MEMs, three potential problems in drawing inferences from MEMs, and short- and long-term strategies for alleviating these problems are outlined.…
Causal Inference and Developmental Psychology
ERIC Educational Resources Information Center
Foster, E. Michael
2010-01-01
Causal inference is of central importance to developmental psychology. Many key questions in the field revolve around improving the lives of children and their families. These include identifying risk factors that if manipulated in some way would foster child development. Such a task inherently involves causal inference: One wants to know whether…
Causal Inference in Retrospective Studies.
ERIC Educational Resources Information Center
Holland, Paul W.; Rubin, Donald B.
1988-01-01
The problem of drawing causal inferences from retrospective case-controlled studies is considered. A model for causal inference in prospective studies is applied to retrospective studies. Limitations of case-controlled studies are formulated concerning relevant parameters that can be estimated in such studies. A coffee-drinking/myocardial…
Causal inference in biology networks with integrated belief propagation.
Chang, Rui; Karr, Jonathan R; Schadt, Eric E
2015-01-01
Inferring causal relationships among molecular and higher order phenotypes is a critical step in elucidating the complexity of living systems. Here we propose a novel method for inferring causality that is no longer constrained by the conditional dependency arguments that limit the ability of statistical causal inference methods to resolve causal relationships within sets of graphical models that are Markov equivalent. Our method utilizes Bayesian belief propagation to infer the responses of perturbation events on molecular traits given a hypothesized graph structure. A distance measure between the inferred response distribution and the observed data is defined to assess the 'fitness' of the hypothesized causal relationships. To test our algorithm, we infer causal relationships within equivalence classes of gene networks in which the form of the functional interactions that are possible are assumed to be nonlinear, given synthetic microarray and RNA sequencing data. We also apply our method to infer causality in real metabolic network with v-structure and feedback loop. We show that our method can recapitulate the causal structure and recover the feedback loop only from steady-state data which conventional method cannot.
Human brain lesion-deficit inference remapped.
Mah, Yee-Haur; Husain, Masud; Rees, Geraint; Nachev, Parashkev
2014-09-01
Our knowledge of the anatomical organization of the human brain in health and disease draws heavily on the study of patients with focal brain lesions. Historically the first method of mapping brain function, it is still potentially the most powerful, establishing the necessity of any putative neural substrate for a given function or deficit. Great inferential power, however, carries a crucial vulnerability: without stronger alternatives any consistent error cannot be easily detected. A hitherto unexamined source of such error is the structure of the high-dimensional distribution of patterns of focal damage, especially in ischaemic injury-the commonest aetiology in lesion-deficit studies-where the anatomy is naturally shaped by the architecture of the vascular tree. This distribution is so complex that analysis of lesion data sets of conventional size cannot illuminate its structure, leaving us in the dark about the presence or absence of such error. To examine this crucial question we assembled the largest known set of focal brain lesions (n = 581), derived from unselected patients with acute ischaemic injury (mean age = 62.3 years, standard deviation = 17.8, male:female ratio = 0.547), visualized with diffusion-weighted magnetic resonance imaging, and processed with validated automated lesion segmentation routines. High-dimensional analysis of this data revealed a hidden bias within the multivariate patterns of damage that will consistently distort lesion-deficit maps, displacing inferred critical regions from their true locations, in a manner opaque to replication. Quantifying the size of this mislocalization demonstrates that past lesion-deficit relationships estimated with conventional inferential methodology are likely to be significantly displaced, by a magnitude dependent on the unknown underlying lesion-deficit relationship itself. Past studies therefore cannot be retrospectively corrected, except by new knowledge that would render them redundant
Social Inference Through Technology
NASA Astrophysics Data System (ADS)
Oulasvirta, Antti
Awareness cues are computer-mediated, real-time indicators of people’s undertakings, whereabouts, and intentions. Already in the mid-1970 s, UNIX users could use commands such as “finger” and “talk” to find out who was online and to chat. The small icons in instant messaging (IM) applications that indicate coconversants’ presence in the discussion space are the successors of “finger” output. Similar indicators can be found in online communities, media-sharing services, Internet relay chat (IRC), and location-based messaging applications. But presence and availability indicators are only the tip of the iceberg. Technological progress has enabled richer, more accurate, and more intimate indicators. For example, there are mobile services that allow friends to query and follow each other’s locations. Remote monitoring systems developed for health care allow relatives and doctors to assess the wellbeing of homebound patients (see, e.g., Tang and Venables 2000). But users also utilize cues that have not been deliberately designed for this purpose. For example, online gamers pay attention to other characters’ behavior to infer what the other players are like “in real life.” There is a common denominator underlying these examples: shared activities rely on the technology’s representation of the remote person. The other human being is not physically present but present only through a narrow technological channel.
NASA Technical Reports Server (NTRS)
Feiveson, Alan H.; Foy, Millennia; Ploutz-Snyder, Robert; Fiedler, James
2014-01-01
Do you have elevated p-values? Is the data analysis process getting you down? Do you experience anxiety when you need to respond to criticism of statistical methods in your manuscript? You may be suffering from Insufficient Statistical Support Syndrome (ISSS). For symptomatic relief of ISSS, come for a free consultation with JSC biostatisticians at our help desk during the poster sessions at the HRP Investigators Workshop. Get answers to common questions about sample size, missing data, multiple testing, when to trust the results of your analyses and more. Side effects may include sudden loss of statistics anxiety, improved interpretation of your data, and increased confidence in your results.
Exploiting Low-Dimensional Structure in Astronomical Spectra
NASA Astrophysics Data System (ADS)
Richards, Joseph W.; Freeman, Peter E.; Lee, Ann B.; Schafer, Chad M.
2009-01-01
Dimension-reduction techniques can greatly improve statistical inference in astronomy. A standard approach is to use Principal Components Analysis (PCA). In this work, we apply a recently developed technique, diffusion maps, to astronomical spectra for data parameterization and dimensionality reduction, and develop a robust, eigenmode-based framework for regression. We show how our framework provides a computationally efficient means by which to predict redshifts of galaxies, and thus could inform more expensive redshift estimators such as template cross-correlation. It also provides a natural means by which to identify outliers (e.g., misclassified spectra, spectra with anomalous features). We analyze 3835 Sloan Digital Sky Survey spectra and show how our framework yields a more than 95% reduction in dimensionality. Finally, we show that the prediction error of the diffusion-map-based regression approach is markedly smaller than that of a similar approach based on PCA, clearly demonstrating the superiority of diffusion maps over PCA for this regression task.
Feature Selection for Varying Coefficient Models With Ultrahigh Dimensional Covariates.
Liu, Jingyuan; Li, Runze; Wu, Rongling
2014-01-01
This paper is concerned with feature screening and variable selection for varying coefficient models with ultrahigh dimensional covariates. We propose a new feature screening procedure for these models based on conditional correlation coefficient. We systematically study the theoretical properties of the proposed procedure, and establish their sure screening property and the ranking consistency. To enhance the finite sample performance of the proposed procedure, we further develop an iterative feature screening procedure. Monte Carlo simulation studies were conducted to examine the performance of the proposed procedures. In practice, we advocate a two-stage approach for varying coefficient models. The two stage approach consists of (a) reducing the ultrahigh dimensionality by using the proposed procedure and (b) applying regularization methods for dimension-reduced varying coefficient models to make statistical inferences on the coefficient functions. We illustrate the proposed two-stage approach by a real data example.
... population, or about 25 million Americans, has experienced tinnitus lasting at least five minutes in the past ... by NIDCD Epidemiology and Statistics Program staff: (1) tinnitus prevalence was obtained from the 2008 National Health ...
Direct Evidence for a Dual Process Model of Deductive Inference
ERIC Educational Resources Information Center
Markovits, Henry; Brunet, Marie-Laurence; Thompson, Valerie; Brisson, Janie
2013-01-01
In 2 experiments, we tested a strong version of a dual process theory of conditional inference (cf. Verschueren et al., 2005a, 2005b) that assumes that most reasoners have 2 strategies available, the choice of which is determined by situational variables, cognitive capacity, and metacognitive control. The statistical strategy evaluates inferences…
Stan: A Probabilistic Programming Language for Bayesian Inference and Optimization
ERIC Educational Resources Information Center
Gelman, Andrew; Lee, Daniel; Guo, Jiqiang
2015-01-01
Stan is a free and open-source C++ program that performs Bayesian inference or optimization for arbitrary user-specified models and can be called from the command line, R, Python, Matlab, or Julia and has great promise for fitting large and complex statistical models in many areas of application. We discuss Stan from users' and developers'…
From Blickets to Synapses: Inferring Temporal Causal Networks by Observation
ERIC Educational Resources Information Center
Fernando, Chrisantha
2013-01-01
How do human infants learn the causal dependencies between events? Evidence suggests that this remarkable feat can be achieved by observation of only a handful of examples. Many computational models have been produced to explain how infants perform causal inference without explicit teaching about statistics or the scientific method. Here, we…
Models for inference in dynamic metacommunity systems
Dorazio, R.M.; Kery, M.; Royle, J. Andrew; Plattner, M.
2010-01-01
A variety of processes are thought to be involved in the formation and dynamics of species assemblages. For example, various metacommunity theories are based on differences in the relative contributions of dispersal of species among local communities and interactions of species within local communities. Interestingly, metacommunity theories continue to be advanced without much empirical validation. Part of the problem is that statistical models used to analyze typical survey data either fail to specify ecological processes with sufficient complexity or they fail to account for errors in detection of species during sampling. In this paper, we describe a statistical modeling framework for the analysis of metacommunity dynamics that is based on the idea of adopting a unified approach, multispecies occupancy modeling, for computing inferences about individual species, local communities of species, or the entire metacommunity of species. This approach accounts for errors in detection of species during sampling and also allows different metacommunity paradigms to be specified in terms of species-and location-specific probabilities of occurrence, extinction, and colonization: all of which are estimable. In addition, this approach can be used to address inference problems that arise in conservation ecology, such as predicting temporal and spatial changes in biodiversity for use in making conservation decisions. To illustrate, we estimate changes in species composition associated with the species-specific phenologies of flight patterns of butterflies in Switzerland for the purpose of estimating regional differences in biodiversity. ?? 2010 by the Ecological Society of America.
Models for inference in dynamic metacommunity systems
Dorazio, Robert M.; Kery, Marc; Royle, J. Andrew; Plattner, Matthias
2010-01-01
A variety of processes are thought to be involved in the formation and dynamics of species assemblages. For example, various metacommunity theories are based on differences in the relative contributions of dispersal of species among local communities and interactions of species within local communities. Interestingly, metacommunity theories continue to be advanced without much empirical validation. Part of the problem is that statistical models used to analyze typical survey data either fail to specify ecological processes with sufficient complexity or they fail to account for errors in detection of species during sampling. In this paper, we describe a statistical modeling framework for the analysis of metacommunity dynamics that is based on the idea of adopting a unified approach, multispecies occupancy modeling, for computing inferences about individual species, local communities of species, or the entire metacommunity of species. This approach accounts for errors in detection of species during sampling and also allows different metacommunity paradigms to be specified in terms of species- and location-specific probabilities of occurrence, extinction, and colonization: all of which are estimable. In addition, this approach can be used to address inference problems that arise in conservation ecology, such as predicting temporal and spatial changes in biodiversity for use in making conservation decisions. To illustrate, we estimate changes in species composition associated with the species-specific phenologies of flight patterns of butterflies in Switzerland for the purpose of estimating regional differences in biodiversity.
Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models
NASA Astrophysics Data System (ADS)
Ekeberg, Magnus; Lövkvist, Cecilia; Lan, Yueheng; Weigt, Martin; Aurell, Erik
2013-01-01
Spatially proximate amino acids in a protein tend to coevolve. A protein's three-dimensional (3D) structure hence leaves an echo of correlations in the evolutionary record. Reverse engineering 3D structures from such correlations is an open problem in structural biology, pursued with increasing vigor as more and more protein sequences continue to fill the data banks. Within this task lies a statistical inference problem, rooted in the following: correlation between two sites in a protein sequence can arise from firsthand interaction but can also be network-propagated via intermediate sites; observed correlation is not enough to guarantee proximity. To separate direct from indirect interactions is an instance of the general problem of inverse statistical mechanics, where the task is to learn model parameters (fields, couplings) from observables (magnetizations, correlations, samples) in large systems. In the context of protein sequences, the approach has been referred to as direct-coupling analysis. Here we show that the pseudolikelihood method, applied to 21-state Potts models describing the statistical properties of families of evolutionarily related proteins, significantly outperforms existing approaches to the direct-coupling analysis, the latter being based on standard mean-field techniques. This improved performance also relies on a modified score for the coupling strength. The results are verified using known crystal structures of specific sequence instances of various protein families. Code implementing the new method can be found at http://plmdca.csc.kth.se/.
The Role of Probability-Based Inference in an Intelligent Tutoring System.
ERIC Educational Resources Information Center
Mislevy, Robert J.; Gitomer, Drew H.
Probability-based inference in complex networks of interdependent variables is an active topic in statistical research, spurred by such diverse applications as forecasting, pedigree analysis, troubleshooting, and medical diagnosis. This paper concerns the role of Bayesian inference networks for updating student models in intelligent tutoring…
2014-01-01
images. To our knowledge, this challenging problem has not yet been extensively explored in computer vision. We present a novel learning based...automatically infers why people are performing actions in images by learning from visual data and written language. ∗denotes equal contribution 1 Report...explored in computer vision. We present a novel learning based framework that uses high-level visual recognition to infer why people are performing
Linking numbers, spin, and statistics of solitons
NASA Technical Reports Server (NTRS)
Wilczek, F.; Zee, A.
1983-01-01
The spin and statistics of solitons in the (2 + 1)- and (3 + 1)-dimensional nonlinear sigma models is considered. For the (2 + 1)-dimensional case, there is the possibility of fractional spin and exotic statistics; for 3 + 1 dimensions, the usual spin-statistics relation is demonstrated. The linking-number interpretation of the Hopf invariant and the use of suspension considerably simplify the analysis.
Equivalent statistics and data interpretation.
Francis, Gregory
2016-10-14
Recent reform efforts in psychological science have led to a plethora of choices for scientists to analyze their data. A scientist making an inference about their data must now decide whether to report a p value, summarize the data with a standardized effect size and its confidence interval, report a Bayes Factor, or use other model comparison methods. To make good choices among these options, it is necessary for researchers to understand the characteristics of the various statistics used by the different analysis frameworks. Toward that end, this paper makes two contributions. First, it shows that for the case of a two-sample t test with known sample sizes, many different summary statistics are mathematically equivalent in the sense that they are based on the very same information in the data set. When the sample sizes are known, the p value provides as much information about a data set as the confidence interval of Cohen's d or a JZS Bayes factor. Second, this equivalence means that different analysis methods differ only in their interpretation of the empirical data. At first glance, it might seem that mathematical equivalence of the statistics suggests that it does not matter much which statistic is reported, but the opposite is true because the appropriateness of a reported statistic is relative to the inference it promotes. Accordingly, scientists should choose an analysis method appropriate for their scientific investigation. A direct comparison of the different inferential frameworks provides some guidance for scientists to make good choices and improve scientific practice.