Statistical Physics of High Dimensional Inference
NASA Astrophysics Data System (ADS)
Advani, Madhu; Ganguli, Surya
To model modern large-scale datasets, we need efficient algorithms to infer a set of P unknown model parameters from N noisy measurements. What are fundamental limits on the accuracy of parameter inference, given limited measurements, signal-to-noise ratios, prior information, and computational tractability requirements? How can we combine prior information with measurements to achieve these limits? Classical statistics gives incisive answers to these questions as the measurement density α =N/P --> ∞ . However, modern high-dimensional inference problems, in fields ranging from bio-informatics to economics, occur at finite α. We formulate and analyze high-dimensional inference analytically by applying the replica and cavity methods of statistical physics where data serves as quenched disorder and inferred parameters play the role of thermal degrees of freedom. Our analysis reveals that widely cherished Bayesian inference algorithms such as maximum likelihood and maximum a posteriori are suboptimal in the modern setting, and yields new tractable, optimal algorithms to replace them as well as novel bounds on the achievable accuracy of a large class of high-dimensional inference algorithms. Thanks to Stanford Graduate Fellowship and Mind Brain Computation IGERT grant for support.
High-dimensional statistical inference: From vector to matrix
NASA Astrophysics Data System (ADS)
Zhang, Anru
Statistical inference for sparse signals or low-rank matrices in high-dimensional settings is of significant interest in a range of contemporary applications. It has attracted significant recent attention in many fields including statistics, applied mathematics and electrical engineering. In this thesis, we consider several problems in including sparse signal recovery (compressed sensing under restricted isometry) and low-rank matrix recovery (matrix recovery via rank-one projections and structured matrix completion). The first part of the thesis discusses compressed sensing and affine rank minimization in both noiseless and noisy cases and establishes sharp restricted isometry conditions for sparse signal and low-rank matrix recovery. The analysis relies on a key technical tool which represents points in a polytope by convex combinations of sparse vectors. The technique is elementary while leads to sharp results. It is shown that, in compressed sensing, delta kA < 1/3, deltak A+ thetak,kA < 1, or deltatkA < √( t - 1)/t for any given constant t ≥ 4/3 guarantee the exact recovery of all k sparse signals in the noiseless case through the constrained ℓ1 minimization, and similarly in affine rank minimization delta rM < 1/3, deltar M + thetar, rM < 1, or deltatrM< √( t - 1)/t ensure the exact reconstruction of all matrices with rank at most r in the noiseless case via the constrained nuclear norm minimization. Moreover, for any epsilon > 0, delta kA < 1/3 + epsilon, deltak A + thetak,kA < 1 + epsilon, or deltatkA< √(t - 1) / t + epsilon are not sufficient to guarantee the exact recovery of all k-sparse signals for large k. Similar result also holds for matrix recovery. In addition, the conditions delta kA<1/3, deltak A+ thetak,kA<1, delta tkA < √(t - 1)/t and deltarM<1/3, delta rM+ thetar,rM<1, delta trM< √(t - 1)/ t are also shown to be sufficient respectively for stable recovery of approximately sparse signals and low-rank matrices in the noisy case
NASA Astrophysics Data System (ADS)
Khan, Shahjahan
Often scientific information on various data generating processes are presented in the from of numerical and categorical data. Except for some very rare occasions, generally such data represent a small part of the population, or selected outcomes of any data generating process. Although, valuable and useful information is lurking in the array of scientific data, generally, they are unavailable to the users. Appropriate statistical methods are essential to reveal the hidden "jewels" in the mess of the row data. Exploratory data analysis methods are used to uncover such valuable characteristics of the observed data. Statistical inference provides techniques to make valid conclusions about the unknown characteristics or parameters of the population from which scientifically drawn sample data are selected. Usually, statistical inference includes estimation of population parameters as well as performing test of hypotheses on the parameters. However, prediction of future responses and determining the prediction distributions are also part of statistical inference. Both Classical or Frequentists and Bayesian approaches are used in statistical inference. The commonly used Classical approach is based on the sample data alone. In contrast, increasingly popular Beyesian approach uses prior distribution on the parameters along with the sample data to make inferences. The non-parametric and robust methods are also being used in situations where commonly used model assumptions are unsupported. In this chapter,we cover the philosophical andmethodological aspects of both the Classical and Bayesian approaches.Moreover, some aspects of predictive inference are also included. In the absence of any evidence to support assumptions regarding the distribution of the underlying population, or if the variable is measured only in ordinal scale, non-parametric methods are used. Robust methods are employed to avoid any significant changes in the results due to deviations from the model
NASA Astrophysics Data System (ADS)
Khan, Shahjahan
Often scientific information on various data generating processes are presented in the from of numerical and categorical data. Except for some very rare occasions, generally such data represent a small part of the population, or selected outcomes of any data generating process. Although, valuable and useful information is lurking in the array of scientific data, generally, they are unavailable to the users. Appropriate statistical methods are essential to reveal the hidden “jewels” in the mess of the row data. Exploratory data analysis methods are used to uncover such valuable characteristics of the observed data. Statistical inference provides techniques to make valid conclusions about the unknown characteristics or parameters of the population from which scientifically drawn sample data are selected. Usually, statistical inference includes estimation of population parameters as well as performing test of hypotheses on the parameters. However, prediction of future responses and determining the prediction distributions are also part of statistical inference. Both Classical or Frequentists and Bayesian approaches are used in statistical inference. The commonly used Classical approach is based on the sample data alone. In contrast, increasingly popular Beyesian approach uses prior distribution on the parameters along with the sample data to make inferences. The non-parametric and robust methods are also being used in situations where commonly used model assumptions are unsupported. In this chapter,we cover the philosophical andmethodological aspects of both the Classical and Bayesian approaches.Moreover, some aspects of predictive inference are also included. In the absence of any evidence to support assumptions regarding the distribution of the underlying population, or if the variable is measured only in ordinal scale, non-parametric methods are used. Robust methods are employed to avoid any significant changes in the results due to deviations from the model
Statistical inference and string theory
NASA Astrophysics Data System (ADS)
Heckman, Jonathan J.
2015-09-01
In this paper, we expose some surprising connections between string theory and statistical inference. We consider a large collective of agents sweeping out a family of nearby statistical models for an M-dimensional manifold of statistical fitting parameters. When the agents making nearby inferences align along a d-dimensional grid, we find that the pooled probability that the collective reaches a correct inference is the partition function of a nonlinear sigma model in d dimensions. Stability under perturbations to the original inference scheme requires the agents of the collective to distribute along two dimensions. Conformal invariance of the sigma model corresponds to the condition of a stable inference scheme, directly leading to the Einstein field equations for classical gravity. By summing over all possible arrangements of the agents in the collective, we reach a string theory. We also use this perspective to quantify how much an observer can hope to learn about the internal geometry of a superstring compactification. Finally, we present some brief speculative remarks on applications to the AdS/CFT correspondence and Lorentzian signature space-times.
Nanotechnology and statistical inference
NASA Astrophysics Data System (ADS)
Vesely, Sara; Vesely, Leonardo; Vesely, Alessandro
2017-08-01
We discuss some problems that arise when applying statistical inference to data with the aim of disclosing new func-tionalities. A predictive model analyzes the data taken from experiments on a specific material to assess the likelihood that another product, with similar structure and properties, will exhibit the same functionality. It doesn't have much predictive power if vari-ability occurs as a consequence of a specific, non-linear behavior. We exemplify our discussion on some experiments with biased dice.
Statistical Inference: The Big Picture.
Kass, Robert E
2011-02-01
Statistics has moved beyond the frequentist-Bayesian controversies of the past. Where does this leave our ability to interpret results? I suggest that a philosophy compatible with statistical practice, labelled here statistical pragmatism, serves as a foundation for inference. Statistical pragmatism is inclusive and emphasizes the assumptions that connect statistical models with observed data. I argue that introductory courses often mis-characterize the process of statistical inference and I propose an alternative "big picture" depiction.
NASA Astrophysics Data System (ADS)
Sjöstrand, Karl; Cardenas, Valerie A.; Larsen, Rasmus; Studholme, Colin
2008-03-01
Whole-brain morphometry denotes a group of methods with the aim of relating clinical and cognitive measurements to regions of the brain. Typically, such methods require the statistical analysis of a data set with many variables (voxels and exogenous variables) paired with few observations (subjects). A common approach to this ill-posed problem is to analyze each spatial variable separately, dividing the analysis into manageable subproblems. A disadvantage of this method is that the correlation structure of the spatial variables is not taken into account. This paper investigates the use of ridge regression to address this issue, allowing for a gradual introduction of correlation information into the model. We make the connections between ridge regression and voxel-wise procedures explicit and discuss relations to other statistical methods. Results are given on an in-vivo data set of deformation based morphometry from a study of cognitive decline in an elderly population.
Statistical inference and Aristotle's Rhetoric.
Macdonald, Ranald R
2004-11-01
Formal logic operates in a closed system where all the information relevant to any conclusion is present, whereas this is not the case when one reasons about events and states of the world. Pollard and Richardson drew attention to the fact that the reasoning behind statistical tests does not lead to logically justifiable conclusions. In this paper statistical inferences are defended not by logic but by the standards of everyday reasoning. Aristotle invented formal logic, but argued that people mostly get at the truth with the aid of enthymemes--incomplete syllogisms which include arguing from examples, analogies and signs. It is proposed that statistical tests work in the same way--in that they are based on examples, invoke the analogy of a model and use the size of the effect under test as a sign that the chance hypothesis is unlikely. Of existing theories of statistical inference only a weak version of Fisher's takes this into account. Aristotle anticipated Fisher by producing an argument of the form that there were too many cases in which an outcome went in a particular direction for that direction to be plausibly attributed to chance. We can therefore conclude that Aristotle would have approved of statistical inference and there is a good reason for calling this form of statistical inference classical.
Statistical learning and selective inference
Taylor, Jonathan; Tibshirani, Robert J.
2015-01-01
We describe the problem of “selective inference.” This addresses the following challenge: Having mined a set of data to find potential associations, how do we properly assess the strength of these associations? The fact that we have “cherry-picked”—searched for the strongest associations—means that we must set a higher bar for declaring significant the associations that we see. This challenge becomes more important in the era of big data and complex statistical modeling. The cherry tree (dataset) can be very large and the tools for cherry picking (statistical learning methods) are now very sophisticated. We describe some recent new developments in selective inference and illustrate their use in forward stepwise regression, the lasso, and principal components analysis. PMID:26100887
Statistical learning and selective inference.
Taylor, Jonathan; Tibshirani, Robert J
2015-06-23
We describe the problem of "selective inference." This addresses the following challenge: Having mined a set of data to find potential associations, how do we properly assess the strength of these associations? The fact that we have "cherry-picked"--searched for the strongest associations--means that we must set a higher bar for declaring significant the associations that we see. This challenge becomes more important in the era of big data and complex statistical modeling. The cherry tree (dataset) can be very large and the tools for cherry picking (statistical learning methods) are now very sophisticated. We describe some recent new developments in selective inference and illustrate their use in forward stepwise regression, the lasso, and principal components analysis.
Statistical inference for inverse problems
NASA Astrophysics Data System (ADS)
Bissantz, Nicolai; Holzmann, Hajo
2008-06-01
In this paper we study statistical inference for certain inverse problems. We go beyond mere estimation purposes and review and develop the construction of confidence intervals and confidence bands in some inverse problems, including deconvolution and the backward heat equation. Further, we discuss the construction of certain hypothesis tests, in particular concerning the number of local maxima of the unknown function. The methods are illustrated in a case study, where we analyze the distribution of heliocentric escape velocities of galaxies in the Centaurus galaxy cluster, and provide statistical evidence for its bimodality.
Statistical Inference at Work: Statistical Process Control as an Example
ERIC Educational Resources Information Center
Bakker, Arthur; Kent, Phillip; Derry, Jan; Noss, Richard; Hoyles, Celia
2008-01-01
To characterise statistical inference in the workplace this paper compares a prototypical type of statistical inference at work, statistical process control (SPC), with a type of statistical inference that is better known in educational settings, hypothesis testing. Although there are some similarities between the reasoning structure involved in…
Statistical Inference at Work: Statistical Process Control as an Example
ERIC Educational Resources Information Center
Bakker, Arthur; Kent, Phillip; Derry, Jan; Noss, Richard; Hoyles, Celia
2008-01-01
To characterise statistical inference in the workplace this paper compares a prototypical type of statistical inference at work, statistical process control (SPC), with a type of statistical inference that is better known in educational settings, hypothesis testing. Although there are some similarities between the reasoning structure involved in…
Redshift data and statistical inference
NASA Technical Reports Server (NTRS)
Newman, William I.; Haynes, Martha P.; Terzian, Yervant
1994-01-01
Frequency histograms and the 'power spectrum analysis' (PSA) method, the latter developed by Yu & Peebles (1969), have been widely employed as techniques for establishing the existence of periodicities. We provide a formal analysis of these two classes of methods, including controlled numerical experiments, to better understand their proper use and application. In particular, we note that typical published applications of frequency histograms commonly employ far greater numbers of class intervals or bins than is advisable by statistical theory sometimes giving rise to the appearance of spurious patterns. The PSA method generates a sequence of random numbers from observational data which, it is claimed, is exponentially distributed with unit mean and variance, essentially independent of the distribution of the original data. We show that the derived random processes is nonstationary and produces a small but systematic bias in the usual estimate of the mean and variance. Although the derived variable may be reasonably described by an exponential distribution, the tail of the distribution is far removed from that of an exponential, thereby rendering statistical inference and confidence testing based on the tail of the distribution completely unreliable. Finally, we examine a number of astronomical examples wherein these methods have been used giving rise to widespread acceptance of statistically unconfirmed conclusions.
The Reasoning behind Informal Statistical Inference
ERIC Educational Resources Information Center
Makar, Katie; Bakker, Arthur; Ben-Zvi, Dani
2011-01-01
Informal statistical inference (ISI) has been a frequent focus of recent research in statistics education. Considering the role that context plays in developing ISI calls into question the need to be more explicit about the reasoning that underpins ISI. This paper uses educational literature on informal statistical inference and philosophical…
The Reasoning behind Informal Statistical Inference
ERIC Educational Resources Information Center
Makar, Katie; Bakker, Arthur; Ben-Zvi, Dani
2011-01-01
Informal statistical inference (ISI) has been a frequent focus of recent research in statistics education. Considering the role that context plays in developing ISI calls into question the need to be more explicit about the reasoning that underpins ISI. This paper uses educational literature on informal statistical inference and philosophical…
Predict! Teaching Statistics Using Informational Statistical Inference
ERIC Educational Resources Information Center
Makar, Katie
2013-01-01
Statistics is one of the most widely used topics for everyday life in the school mathematics curriculum. Unfortunately, the statistics taught in schools focuses on calculations and procedures before students have a chance to see it as a useful and powerful tool. Researchers have found that a dominant view of statistics is as an assortment of tools…
Statistical Inference in Graphical Models
2008-06-17
beliefNetwork> </ hercules> Figure 2 1. BNET XML encoding of a Bayesian Network. 28 The most complete package is Kevin Murphy’s Bayes Net Toolbox ( BNT ), an...networks, and dynamic Bayesian networks. Since 2002, researchers at Intel have been converting BNT to an open-source C++ library called the...of C++, and also offers interfaces for calling the library from MATLAB and R 1361. Notably, both BNT and PNL provide learning and inference algorithms
Local and Global Thinking in Statistical Inference
ERIC Educational Resources Information Center
Pratt, Dave; Johnston-Wilder, Peter; Ainley, Janet; Mason, John
2008-01-01
In this reflective paper, we explore students' local and global thinking about informal statistical inference through our observations of 10- to 11-year-olds, challenged to infer the unknown configuration of a virtual die, but able to use the die to generate as much data as they felt necessary. We report how they tended to focus on local changes…
Local and Global Thinking in Statistical Inference
ERIC Educational Resources Information Center
Pratt, Dave; Johnston-Wilder, Peter; Ainley, Janet; Mason, John
2008-01-01
In this reflective paper, we explore students' local and global thinking about informal statistical inference through our observations of 10- to 11-year-olds, challenged to infer the unknown configuration of a virtual die, but able to use the die to generate as much data as they felt necessary. We report how they tended to focus on local changes…
Ranald Macdonald and statistical inference.
Smith, Philip T
2009-05-01
Ranald Roderick Macdonald (1945-2007) was an important contributor to mathematical psychology in the UK, as a referee and action editor for British Journal of Mathematical and Statistical Psychology and as a participant and organizer at the British Psychological Society's Mathematics, statistics and computing section meetings. This appreciation argues that his most important contribution was to the foundations of significance testing, where his concern about what information was relevant in interpreting the results of significance tests led him to be a persuasive advocate for the 'Weak Fisherian' form of hypothesis testing.
Making statistical inferences about software reliability
NASA Technical Reports Server (NTRS)
Miller, Douglas R.
1986-01-01
Failure times of software undergoing random debugging can be modeled as order statistics of independent but nonidentically distributed exponential random variables. Using this model inferences can be made about current reliability and, if debugging continues, future reliability. This model also shows the difficulty inherent in statistical verification of very highly reliable software such as that used by digital avionics in commercial aircraft.
Making statistical inferences about software reliability
NASA Technical Reports Server (NTRS)
Miller, Douglas R.
1988-01-01
Failure times of software undergoing random debugging can be modelled as order statistics of independent but nonidentically distributed exponential random variables. Using this model inferences can be made about current reliability and, if debugging continues, future reliability. This model also shows the difficulty inherent in statistical verification of very highly reliable software such as that used by digital avionics in commercial aircraft.
Investigating Mathematics Teachers' Thoughts of Statistical Inference
ERIC Educational Resources Information Center
Yang, Kai-Lin
2012-01-01
Research on statistical cognition and application suggests that statistical inference concepts are commonly misunderstood by students and even misinterpreted by researchers. Although some research has been done on students' misunderstanding or misconceptions of confidence intervals (CIs), few studies explore either students' or mathematics…
Investigating Mathematics Teachers' Thoughts of Statistical Inference
ERIC Educational Resources Information Center
Yang, Kai-Lin
2012-01-01
Research on statistical cognition and application suggests that statistical inference concepts are commonly misunderstood by students and even misinterpreted by researchers. Although some research has been done on students' misunderstanding or misconceptions of confidence intervals (CIs), few studies explore either students' or mathematics…
Inference and the introductory statistics course
NASA Astrophysics Data System (ADS)
Pfannkuch, Maxine; Regan, Matt; Wild, Chris; Budgett, Stephanie; Forbes, Sharleen; Harraway, John; Parsonage, Ross
2011-10-01
This article sets out some of the rationale and arguments for making major changes to the teaching and learning of statistical inference in introductory courses at our universities by changing from a norm-based, mathematical approach to more conceptually accessible computer-based approaches. The core problem of the inferential argument with its hypothetical probabilistic reasoning process is examined in some depth. We argue that the revolution in the teaching of inference must begin. We also discuss some perplexing issues, problematic areas and some new insights into language conundrums associated with introducing the logic of inference through randomization methods.
Bayesian Cosmological inference beyond statistical isotropy
NASA Astrophysics Data System (ADS)
Souradeep, Tarun; Das, Santanu; Wandelt, Benjamin
2016-10-01
With advent of rich data sets, computationally challenge of inference in cosmology has relied on stochastic sampling method. First, I review the widely used MCMC approach used to infer cosmological parameters and present a adaptive improved implementation SCoPE developed by our group. Next, I present a general method for Bayesian inference of the underlying covariance structure of random fields on a sphere. We employ the Bipolar Spherical Harmonic (BipoSH) representation of general covariance structure on the sphere. We illustrate the efficacy of the method with a principled approach to assess violation of statistical isotropy (SI) in the sky maps of Cosmic Microwave Background (CMB) fluctuations. The general, principled, approach to a Bayesian inference of the covariance structure in a random field on a sphere presented here has huge potential for application to other many aspects of cosmology and astronomy, as well as, more distant areas of research like geosciences and climate modelling.
Thermodynamics of statistical inference by cells.
Lang, Alex H; Fisher, Charles K; Mora, Thierry; Mehta, Pankaj
2014-10-03
The deep connection between thermodynamics, computation, and information is now well established both theoretically and experimentally. Here, we extend these ideas to show that thermodynamics also places fundamental constraints on statistical estimation and learning. To do so, we investigate the constraints placed by (nonequilibrium) thermodynamics on the ability of biochemical signaling networks to estimate the concentration of an external signal. We show that accuracy is limited by energy consumption, suggesting that there are fundamental thermodynamic constraints on statistical inference.
Inference and the Introductory Statistics Course
ERIC Educational Resources Information Center
Pfannkuch, Maxine; Regan, Matt; Wild, Chris; Budgett, Stephanie; Forbes, Sharleen; Harraway, John; Parsonage, Ross
2011-01-01
This article sets out some of the rationale and arguments for making major changes to the teaching and learning of statistical inference in introductory courses at our universities by changing from a norm-based, mathematical approach to more conceptually accessible computer-based approaches. The core problem of the inferential argument with its…
Statistical Mechanics of Optimal Convex Inference in High Dimensions
NASA Astrophysics Data System (ADS)
Advani, Madhu; Ganguli, Surya
2016-07-01
A fundamental problem in modern high-dimensional data analysis involves efficiently inferring a set of P unknown model parameters governing the relationship between the inputs and outputs of N noisy measurements. Various methods have been proposed to regress the outputs against the inputs to recover the P parameters. What are fundamental limits on the accuracy of regression, given finite signal-to-noise ratios, limited measurements, prior information, and computational tractability requirements? How can we optimally combine prior information with measurements to achieve these limits? Classical statistics gives incisive answers to these questions as the measurement density α =(N /P )→∞ . However, these classical results are not relevant to modern high-dimensional inference problems, which instead occur at finite α . We employ replica theory to answer these questions for a class of inference algorithms, known in the statistics literature as M-estimators. These algorithms attempt to recover the P model parameters by solving an optimization problem involving minimizing the sum of a loss function that penalizes deviations between the data and model predictions, and a regularizer that leverages prior information about model parameters. Widely cherished algorithms like maximum likelihood (ML) and maximum-a posteriori (MAP) inference arise as special cases of M-estimators. Our analysis uncovers fundamental limits on the inference accuracy of a subclass of M-estimators corresponding to computationally tractable convex optimization problems. These limits generalize classical statistical theorems like the Cramer-Rao bound to the high-dimensional setting with prior information. We further discover the optimal M-estimator for log-concave signal and noise distributions; we demonstrate that it can achieve our high-dimensional limits on inference accuracy, while ML and MAP cannot. Intriguingly, in high dimensions, these optimal algorithms become computationally simpler than
Pointwise probability reinforcements for robust statistical inference.
Frénay, Benoît; Verleysen, Michel
2014-02-01
Statistical inference using machine learning techniques may be difficult with small datasets because of abnormally frequent data (AFDs). AFDs are observations that are much more frequent in the training sample that they should be, with respect to their theoretical probability, and include e.g. outliers. Estimates of parameters tend to be biased towards models which support such data. This paper proposes to introduce pointwise probability reinforcements (PPRs): the probability of each observation is reinforced by a PPR and a regularisation allows controlling the amount of reinforcement which compensates for AFDs. The proposed solution is very generic, since it can be used to robustify any statistical inference method which can be formulated as a likelihood maximisation. Experiments show that PPRs can be easily used to tackle regression, classification and projection: models are freed from the influence of outliers. Moreover, outliers can be filtered manually since an abnormality degree is obtained for each observation.
Conditional statistical inference with multistage testing designs.
Zwitser, Robert J; Maris, Gunter
2015-03-01
In this paper it is demonstrated how statistical inference from multistage test designs can be made based on the conditional likelihood. Special attention is given to parameter estimation, as well as the evaluation of model fit. Two reasons are provided why the fit of simple measurement models is expected to be better in adaptive designs, compared to linear designs: more parameters are available for the same number of observations; and undesirable response behavior, like slipping and guessing, might be avoided owing to a better match between item difficulty and examinee proficiency. The results are illustrated with simulated data, as well as with real data.
Statistical Inference for Data Adaptive Target Parameters.
Hubbard, Alan E; Kherad-Pajouh, Sara; van der Laan, Mark J
2016-05-01
Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in an estimation sample (one of the V subsamples) and corresponding complementary parameter-generating sample. For each of the V parameter-generating samples, we apply an algorithm that maps the sample to a statistical target parameter. We define our sample-split data adaptive statistical target parameter as the average of these V-sample specific target parameters. We present an estimator (and corresponding central limit theorem) of this type of data adaptive target parameter. This general methodology for generating data adaptive target parameters is demonstrated with a number of practical examples that highlight new opportunities for statistical learning from data. This new framework provides a rigorous statistical methodology for both exploratory and confirmatory analysis within the same data. Given that more research is becoming "data-driven", the theory developed within this paper provides a new impetus for a greater involvement of statistical inference into problems that are being increasingly addressed by clever, yet ad hoc pattern finding methods. To suggest such potential, and to verify the predictions of the theory, extensive simulation studies, along with a data analysis based on adaptively determined intervention rules are shown and give insight into how to structure such an approach. The results show that the data adaptive target parameter approach provides a general framework and resulting methodology for data-driven science.
Reasoning about Informal Statistical Inference: One Statistician's View
ERIC Educational Resources Information Center
Rossman, Allan J.
2008-01-01
This paper identifies key concepts and issues associated with the reasoning of informal statistical inference. I focus on key ideas of inference that I think all students should learn, including at secondary level as well as tertiary. I argue that a fundamental component of inference is to go beyond the data at hand, and I propose that statistical…
The Importance of Statistical Modeling in Data Analysis and Inference
ERIC Educational Resources Information Center
Rollins, Derrick, Sr.
2017-01-01
Statistical inference simply means to draw a conclusion based on information that comes from data. Error bars are the most commonly used tool for data analysis and inference in chemical engineering data studies. This work demonstrates, using common types of data collection studies, the importance of specifying the statistical model for sound…
Bayesian Statistical Inference for Coefficient Alpha. ACT Research Report Series.
ERIC Educational Resources Information Center
Li, Jun Corser; Woodruff, David J.
Coefficient alpha is a simple and very useful index of test reliability that is widely used in educational and psychological measurement. Classical statistical inference for coefficient alpha is well developed. This paper presents two methods for Bayesian statistical inference for a single sample alpha coefficient. An approximate analytic method…
Innovative Statistical Inference for Anomaly Detection in Hyperspectral Imagery
2004-09-01
Innovative Statistical Inference for Anomaly Detection in Hyperspectral Imagery by Dalton Rosario ARL-TR-3339 September 2004...2004 Innovative Statistical Inference for Anomaly Detection in Hyperspectral Imagery Dalton Rosario Sensors and Electron Devices...the effectiveness of both algorithms. 15. SUBJECT TERMS Hyperspectral anomaly detection , large sample theory 16. SECURITY CLASSIFICATION OF: 19a
Statistical Inference and Stochastic Simulation for Microrheology
2013-12-18
inference and stochastic simulation to analyze time series data from passive microrheology experiments of biofluids, especially mucus . During the time...analyze time series data from passive microrheology experiments of biofluids, especially mucus . During the time of the grant, progress was made on both
Statistical inference for serial dilution assay data.
Lee, M L; Whitmore, G A
1999-12-01
Serial dilution assays are widely employed for estimating substance concentrations and minimum inhibitory concentrations. The Poisson-Bernoulli model for such assays is appropriate for count data but not for continuous measurements that are encountered in applications involving substance concentrations. This paper presents practical inference methods based on a log-normal model and illustrates these methods using a case application involving bacterial toxins.
Verbal framing of statistical evidence drives children's preference inferences.
Garvin, Laura E; Woodward, Amanda L
2015-05-01
Although research has shown that statistical information can support children's inferences about specific psychological causes of others' behavior, previous work leaves open the question of how children interpret statistical information in more ambiguous situations. The current studies investigated the effect of specific verbal framing information on children's ability to infer mental states from statistical regularities in behavior. We found that preschool children inferred others' preferences from their statistically non-random choices only when they were provided with verbal information placing the person's behavior in a specifically preference-related context, not when the behavior was presented in a non-mentalistic action context or an intentional choice context. Furthermore, verbal framing information showed some evidence of supporting children's mental state inferences even from more ambiguous statistical data. These results highlight the role that specific, relevant framing information can play in supporting children's ability to derive novel insights from statistical information.
An argument for mechanism-based statistical inference in cancer
Ochs, Michael; Price, Nathan D.; Tomasetti, Cristian; Younes, Laurent
2015-01-01
Cancer is perhaps the prototypical systems disease, and as such has been the focus of extensive study in quantitative systems biology. However, translating these programs into personalized clinical care remains elusive and incomplete. In this perspective, we argue that realizing this agenda—in particular, predicting disease phenotypes, progression and treatment response for individuals—requires going well beyond standard computational and bioinformatics tools and algorithms. It entails designing global mathematical models over network-scale configurations of genomic states and molecular concentrations, and learning the model parameters from limited available samples of high-dimensional and integrative omics data. As such, any plausible design should accommodate: biological mechanism, necessary for both feasible learning and interpretable decision making; stochasticity, to deal with uncertainty and observed variation at many scales; and a capacity for statistical inference at the patient level. This program, which requires a close, sustained collaboration between mathematicians and biologists, is illustrated in several contexts, including learning bio-markers, metabolism, cell signaling, network inference and tumorigenesis. PMID:25381197
An argument for mechanism-based statistical inference in cancer.
Geman, Donald; Ochs, Michael; Price, Nathan D; Tomasetti, Cristian; Younes, Laurent
2015-05-01
Cancer is perhaps the prototypical systems disease, and as such has been the focus of extensive study in quantitative systems biology. However, translating these programs into personalized clinical care remains elusive and incomplete. In this perspective, we argue that realizing this agenda—in particular, predicting disease phenotypes, progression and treatment response for individuals—requires going well beyond standard computational and bioinformatics tools and algorithms. It entails designing global mathematical models over network-scale configurations of genomic states and molecular concentrations, and learning the model parameters from limited available samples of high-dimensional and integrative omics data. As such, any plausible design should accommodate: biological mechanism, necessary for both feasible learning and interpretable decision making; stochasticity, to deal with uncertainty and observed variation at many scales; and a capacity for statistical inference at the patient level. This program, which requires a close, sustained collaboration between mathematicians and biologists, is illustrated in several contexts, including learning biomarkers, metabolism, cell signaling, network inference and tumorigenesis.
Combining statistical inference and decisions in ecology.
Williams, Perry J; Hooten, Mevin B
2016-09-01
Statistical decision theory (SDT) is a sub-field of decision theory that formally incorporates statistical investigation into a decision-theoretic framework to account for uncertainties in a decision problem. SDT provides a unifying analysis of three types of information: statistical results from a data set, knowledge of the consequences of potential choices (i.e., loss), and prior beliefs about a system. SDT links the theoretical development of a large body of statistical methods, including point estimation, hypothesis testing, and confidence interval estimation. The theory and application of SDT have mainly been developed and published in the fields of mathematics, statistics, operations research, and other decision sciences, but have had limited exposure in ecology. Thus, we provide an introduction to SDT for ecologists and describe its utility for linking the conventionally separate tasks of statistical investigation and decision making in a single framework. We describe the basic framework of both Bayesian and frequentist SDT, its traditional use in statistics, and discuss its application to decision problems that occur in ecology. We demonstrate SDT with two types of decisions: Bayesian point estimation and an applied management problem of selecting a prescribed fire rotation for managing a grassland bird species. Central to SDT, and decision theory in general, are loss functions. Thus, we also provide basic guidance and references for constructing loss functions for an SDT problem. © 2016 by the Ecological Society of America.
Combining statistical inference and decisions in ecology
Williams, Perry J.; Hooten, Mevin B.
2016-01-01
Statistical decision theory (SDT) is a sub-field of decision theory that formally incorporates statistical investigation into a decision-theoretic framework to account for uncertainties in a decision problem. SDT provides a unifying analysis of three types of information: statistical results from a data set, knowledge of the consequences of potential choices (i.e., loss), and prior beliefs about a system. SDT links the theoretical development of a large body of statistical methods including point estimation, hypothesis testing, and confidence interval estimation. The theory and application of SDT have mainly been developed and published in the fields of mathematics, statistics, operations research, and other decision sciences, but have had limited exposure in ecology. Thus, we provide an introduction to SDT for ecologists and describe its utility for linking the conventionally separate tasks of statistical investigation and decision making in a single framework. We describe the basic framework of both Bayesian and frequentist SDT, its traditional use in statistics, and discuss its application to decision problems that occur in ecology. We demonstrate SDT with two types of decisions: Bayesian point estimation, and an applied management problem of selecting a prescribed fire rotation for managing a grassland bird species. Central to SDT, and decision theory in general, are loss functions. Thus, we also provide basic guidance and references for constructing loss functions for an SDT problem.
Statistical Inference and Patterns of Inequality in the Global North
ERIC Educational Resources Information Center
Moran, Timothy Patrick
2006-01-01
Cross-national inequality trends have historically been a crucial field of inquiry across the social sciences, and new methodological techniques of statistical inference have recently improved the ability to analyze these trends over time. This paper applies Monte Carlo, bootstrap inference methods to the income surveys of the Luxembourg Income…
Nuclear Forensic Inferences Using Iterative Multidimensional Statistics
Robel, M; Kristo, M J; Heller, M A
2009-06-09
Nuclear forensics involves the analysis of interdicted nuclear material for specific material characteristics (referred to as 'signatures') that imply specific geographical locations, production processes, culprit intentions, etc. Predictive signatures rely on expert knowledge of physics, chemistry, and engineering to develop inferences from these material characteristics. Comparative signatures, on the other hand, rely on comparison of the material characteristics of the interdicted sample (the 'questioned sample' in FBI parlance) with those of a set of known samples. In the ideal case, the set of known samples would be a comprehensive nuclear forensics database, a database which does not currently exist. In fact, our ability to analyze interdicted samples and produce an extensive list of precise materials characteristics far exceeds our ability to interpret the results. Therefore, as we seek to develop the extensive databases necessary for nuclear forensics, we must also develop the methods necessary to produce the necessary inferences from comparison of our analytical results with these large, multidimensional sets of data. In the work reported here, we used a large, multidimensional dataset of results from quality control analyses of uranium ore concentrate (UOC, sometimes called 'yellowcake'). We have found that traditional multidimensional techniques, such as principal components analysis (PCA), are especially useful for understanding such datasets and drawing relevant conclusions. In particular, we have developed an iterative partial least squares-discriminant analysis (PLS-DA) procedure that has proven especially adept at identifying the production location of unknown UOC samples. By removing classes which fell far outside the initial decision boundary, and then rebuilding the PLS-DA model, we have consistently produced better and more definitive attributions than with a single pass classification approach. Performance of the iterative PLS-DA method
Statistical inference for tumor growth inhibition T/C ratio.
Wu, Jianrong
2010-09-01
The tumor growth inhibition T/C ratio is commonly used to quantify treatment effects in drug screening tumor xenograft experiments. The T/C ratio is converted to an antitumor activity rating using an arbitrary cutoff point and often without any formal statistical inference. Here, we applied a nonparametric bootstrap method and a small sample likelihood ratio statistic to make a statistical inference of the T/C ratio, including both hypothesis testing and a confidence interval estimate. Furthermore, sample size and power are also discussed for statistical design of tumor xenograft experiments. Tumor xenograft data from an actual experiment were analyzed to illustrate the application.
Statistical Manual. Methods of Making Experimental Inferences
1951-06-01
statistical procedures. The body of the manual presents each procedure in "cook-book" style, first in general outline, then in illustration . The... illustrations are drawn in the main from the engineering sciences, though it is recognized that many of the methods originated in other fields. This work...significant error. To illustrate in terms of the simple experiment of the inclined plane: a) if an error greater than 0.040 seconds is significant, but
Simultaneous statistical inference for epigenetic data.
Schildknecht, Konstantin; Olek, Sven; Dickhaus, Thorsten
2015-01-01
Epigenetic research leads to complex data structures. Since parametric model assumptions for the distribution of epigenetic data are hard to verify we introduce in the present work a nonparametric statistical framework for two-group comparisons. Furthermore, epigenetic analyses are often performed at various genetic loci simultaneously. Hence, in order to be able to draw valid conclusions for specific loci, an appropriate multiple testing correction is necessary. Finally, with technologies available for the simultaneous assessment of many interrelated biological parameters (such as gene arrays), statistical approaches also need to deal with a possibly unknown dependency structure in the data. Our statistical approach to the nonparametric comparison of two samples with independent multivariate observables is based on recently developed multivariate multiple permutation tests. We adapt their theory in order to cope with families of hypotheses regarding relative effects. Our results indicate that the multivariate multiple permutation test keeps the pre-assigned type I error level for the global null hypothesis. In combination with the closure principle, the family-wise error rate for the simultaneous test of the corresponding locus/parameter-specific null hypotheses can be controlled. In applications we demonstrate that group differences in epigenetic data can be detected reliably with our methodology.
Introducing Statistical Inference to Biology Students through Bootstrapping and Randomization
ERIC Educational Resources Information Center
Lock, Robin H.; Lock, Patti Frazer
2008-01-01
Bootstrap methods and randomization tests are increasingly being used as alternatives to standard statistical procedures in biology. They also serve as an effective introduction to the key ideas of statistical inference in introductory courses for biology students. We discuss the use of such simulation based procedures in an integrated curriculum…
Unequal Division of Type I Risk in Statistical Inferences
ERIC Educational Resources Information Center
Meek, Gary E.; Ozgur, Ceyhun O.
2004-01-01
Introductory statistics texts give extensive coverage to two-sided inferences in hypothesis testing, interval estimation, and one-sided hypothesis tests. Very few discuss the possibility of one-sided interval estimation at all. Even fewer do so in any detail. Two of the business statistics texts we reviewed mentioned the possibility of dividing…
Introducing Statistical Inference to Biology Students through Bootstrapping and Randomization
ERIC Educational Resources Information Center
Lock, Robin H.; Lock, Patti Frazer
2008-01-01
Bootstrap methods and randomization tests are increasingly being used as alternatives to standard statistical procedures in biology. They also serve as an effective introduction to the key ideas of statistical inference in introductory courses for biology students. We discuss the use of such simulation based procedures in an integrated curriculum…
Inferring Demographic History Using Two-Locus Statistics.
Ragsdale, Aaron P; Gutenkunst, Ryan N
2017-06-01
Population demographic history may be learned from contemporary genetic variation data. Methods based on aggregating the statistics of many single loci into an allele frequency spectrum (AFS) have proven powerful, but such methods ignore potentially informative patterns of linkage disequilibrium (LD) between neighboring loci. To leverage such patterns, we developed a composite-likelihood framework for inferring demographic history from aggregated statistics of pairs of loci. Using this framework, we show that two-locus statistics are more sensitive to demographic history than single-locus statistics such as the AFS. In particular, two-locus statistics escape the notorious confounding of depth and duration of a bottleneck, and they provide a means to estimate effective population size based on the recombination rather than mutation rate. We applied our approach to a Zambian population of Drosophila melanogaster Notably, using both single- and two-locus statistics, we inferred a substantially lower ancestral effective population size than previous works and did not infer a bottleneck history. Together, our results demonstrate the broad potential for two-locus statistics to enable powerful population genetic inference. Copyright © 2017 by the Genetics Society of America.
Computationally Efficient Composite Likelihood Statistics for Demographic Inference.
Coffman, Alec J; Hsieh, Ping Hsun; Gravel, Simon; Gutenkunst, Ryan N
2016-02-01
Many population genetics tools employ composite likelihoods, because fully modeling genomic linkage is challenging. But traditional approaches to estimating parameter uncertainties and performing model selection require full likelihoods, so these tools have relied on computationally expensive maximum-likelihood estimation (MLE) on bootstrapped data. Here, we demonstrate that statistical theory can be applied to adjust composite likelihoods and perform robust computationally efficient statistical inference in two demographic inference tools: ∂a∂i and TRACTS. On both simulated and real data, the adjustments perform comparably to MLE bootstrapping while using orders of magnitude less computational time.
LOWER LEVEL INFERENCE CONTROL IN STATISTICAL DATABASE SYSTEMS
Lipton, D.L.; Wong, H.K.T.
1984-02-01
An inference is the process of transforming unclassified data values into confidential data values. Most previous research in inference control has studied the use of statistical aggregates to deduce individual records. However, several other types of inference are also possible. Unknown functional dependencies may be apparent to users who have 'expert' knowledge about the characteristics of a population. Some correlations between attributes may be concluded from 'commonly-known' facts about the world. To counter these threats, security managers should use random sampling of databases of similar populations, as well as expert systems. 'Expert' users of the DATABASE SYSTEM may form inferences from the variable performance of the user interface. Users may observe on-line turn-around time, accounting statistics. the error message received, and the point at which an interactive protocol sequence fails. One may obtain information about the frequency distributions of attribute values, and the validity of data object names from this information. At the back-end of a database system, improved software engineering practices will reduce opportunities to bypass functional units of the database system. The term 'DATA OBJECT' should be expanded to incorporate these data object types which generate new classes of threats. The security of DATABASES and DATABASE SySTEMS must be recognized as separate but related problems. Thus, by increased awareness of lower level inferences, system security managers may effectively nullify the threat posed by lower level inferences.
The Philosophical Foundations of Prescriptive Statements and Statistical Inference
ERIC Educational Resources Information Center
Sun, Shuyan; Pan, Wei
2011-01-01
From the perspectives of the philosophy of science and statistical inference, we discuss the challenges of making prescriptive statements in quantitative research articles. We first consider the prescriptive nature of educational research and argue that prescriptive statements are a necessity in educational research. The logic of deduction,…
The Philosophical Foundations of Prescriptive Statements and Statistical Inference
ERIC Educational Resources Information Center
Sun, Shuyan; Pan, Wei
2011-01-01
From the perspectives of the philosophy of science and statistical inference, we discuss the challenges of making prescriptive statements in quantitative research articles. We first consider the prescriptive nature of educational research and argue that prescriptive statements are a necessity in educational research. The logic of deduction,…
A Framework for Thinking about Informal Statistical Inference
ERIC Educational Resources Information Center
Makar, Katie; Rubin, Andee
2009-01-01
Informal inferential reasoning has shown some promise in developing students' deeper understanding of statistical processes. This paper presents a framework to think about three key principles of informal inference--generalizations "beyond the data," probabilistic language, and data as evidence. The authors use primary school classroom…
Targeted estimation of nuisance parameters to obtain valid statistical inference.
van der Laan, Mark J
2014-01-01
In order to obtain concrete results, we focus on estimation of the treatment specific mean, controlling for all measured baseline covariates, based on observing independent and identically distributed copies of a random variable consisting of baseline covariates, a subsequently assigned binary treatment, and a final outcome. The statistical model only assumes possible restrictions on the conditional distribution of treatment, given the covariates, the so-called propensity score. Estimators of the treatment specific mean involve estimation of the propensity score and/or estimation of the conditional mean of the outcome, given the treatment and covariates. In order to make these estimators asymptotically unbiased at any data distribution in the statistical model, it is essential to use data-adaptive estimators of these nuisance parameters such as ensemble learning, and specifically super-learning. Because such estimators involve optimal trade-off of bias and variance w.r.t. the infinite dimensional nuisance parameter itself, they result in a sub-optimal bias/variance trade-off for the resulting real-valued estimator of the estimand. We demonstrate that additional targeting of the estimators of these nuisance parameters guarantees that this bias for the estimand is second order and thereby allows us to prove theorems that establish asymptotic linearity of the estimator of the treatment specific mean under regularity conditions. These insights result in novel targeted minimum loss-based estimators (TMLEs) that use ensemble learning with additional targeted bias reduction to construct estimators of the nuisance parameters. In particular, we construct collaborative TMLEs (C-TMLEs) with known influence curve allowing for statistical inference, even though these C-TMLEs involve variable selection for the propensity score based on a criterion that measures how effective the resulting fit of the propensity score is in removing bias for the estimand. As a particular special
Statistical inference in behavior analysis: Experimental control is better
Perone, Michael
1999-01-01
Statistical inference promises automatic, objective, reliable assessments of data, independent of the skills or biases of the investigator, whereas the single-subject methods favored by behavior analysts often are said to rely too much on the investigator's subjective impressions, particularly in the visual analysis of data. In fact, conventional statistical methods are difficult to apply correctly, even by experts, and the underlying logic of null-hypothesis testing has drawn criticism since its inception. By comparison, single-subject methods foster direct, continuous interaction between investigator and subject and development of strong forms of experimental control that obviate the need for statistical inference. Treatment effects are demonstrated in experimental designs that incorporate replication within and between subjects, and the visual analysis of data is adequate when integrated into such designs. Thus, single-subject methods are ideal for shaping—and maintaining—the kind of experimental practices that will ensure the continued success of behavior analysis. PMID:22478328
Statistical detection of EEG synchrony using empirical bayesian inference.
Singh, Archana K; Asoh, Hideki; Takeda, Yuji; Phillips, Steven
2015-01-01
There is growing interest in understanding how the brain utilizes synchronized oscillatory activity to integrate information across functionally connected regions. Computing phase-locking values (PLV) between EEG signals is a popular method for quantifying such synchronizations and elucidating their role in cognitive tasks. However, high-dimensionality in PLV data incurs a serious multiple testing problem. Standard multiple testing methods in neuroimaging research (e.g., false discovery rate, FDR) suffer severe loss of power, because they fail to exploit complex dependence structure between hypotheses that vary in spectral, temporal and spatial dimension. Previously, we showed that a hierarchical FDR and optimal discovery procedures could be effectively applied for PLV analysis to provide better power than FDR. In this article, we revisit the multiple comparison problem from a new Empirical Bayes perspective and propose the application of the local FDR method (locFDR; Efron, 2001) for PLV synchrony analysis to compute FDR as a posterior probability that an observed statistic belongs to a null hypothesis. We demonstrate the application of Efron's Empirical Bayes approach for PLV synchrony analysis for the first time. We use simulations to validate the specificity and sensitivity of locFDR and a real EEG dataset from a visual search study for experimental validation. We also compare locFDR with hierarchical FDR and optimal discovery procedures in both simulation and experimental analyses. Our simulation results showed that the locFDR can effectively control false positives without compromising on the power of PLV synchrony inference. Our results from the application locFDR on experiment data detected more significant discoveries than our previously proposed methods whereas the standard FDR method failed to detect any significant discoveries.
Quantitative evaluation of statistical inference in resting state functional MRI.
Yang, Xue; Kang, Hakmook; Newton, Allen; Landman, Bennett A
2012-01-01
Modern statistical inference techniques may be able to improve the sensitivity and specificity of resting state functional MRI (rs-fMRI) connectivity analysis through more realistic characterization of distributional assumptions. In simulation, the advantages of such modern methods are readily demonstrable. However quantitative empirical validation remains elusive in vivo as the true connectivity patterns are unknown and noise/artifact distributions are challenging to characterize with high fidelity. Recent innovations in capturing finite sample behavior of asymptotically consistent estimators (i.e., SIMulation and EXtrapolation - SIMEX) have enabled direct estimation of bias given single datasets. Herein, we leverage the theoretical core of SIMEX to study the properties of inference methods in the face of diminishing data (in contrast to increasing noise). The stability of inference methods with respect to synthetic loss of empirical data (defined as resilience) is used to quantify the empirical performance of one inference method relative to another. We illustrate this new approach in a comparison of ordinary and robust inference methods with rs-fMRI.
The NIRS Analysis Package: noise reduction and statistical inference.
Fekete, Tomer; Rubin, Denis; Carlson, Joshua M; Mujica-Parodi, Lilianne R
2011-01-01
Near infrared spectroscopy (NIRS) is a non-invasive optical imaging technique that can be used to measure cortical hemodynamic responses to specific stimuli or tasks. While analyses of NIRS data are normally adapted from established fMRI techniques, there are nevertheless substantial differences between the two modalities. Here, we investigate the impact of NIRS-specific noise; e.g., systemic (physiological), motion-related artifacts, and serial autocorrelations, upon the validity of statistical inference within the framework of the general linear model. We present a comprehensive framework for noise reduction and statistical inference, which is custom-tailored to the noise characteristics of NIRS. These methods have been implemented in a public domain Matlab toolbox, the NIRS Analysis Package (NAP). Finally, we validate NAP using both simulated and actual data, showing marked improvement in the detection power and reliability of NIRS.
Statistical inference for extinction rates based on last sightings.
Nakamura, Miguel; Del Monte-Luna, Pablo; Lluch-Belda, Daniel; Lluch-Cota, Salvador E
2013-09-21
Rates of extinction can be estimated from sighting records and are assumed to be implicitly constant by many data analysis methods. However, historical sightings are scarce. Frequently, the only information available for inferring extinction is the date of the last sighting. In this study, we developed a probabilistic model and a corresponding statistical inference procedure based on last sightings. We applied this procedure to data on recent marine extirpations and extinctions, seeking to test the null hypothesis of a constant extinction rate. We found that over the past 500 years extirpations in the ocean have been increasing but at an uncertain rate, whereas a constant rate of global marine extinctions is statistically plausible. The small sample sizes of marine extinction records generate such high uncertainty that different combinations of model inputs can yield different outputs that fit the observed data equally well. Thus, current marine extinction trends may be idiosyncratic. Copyright © 2013 Elsevier Ltd. All rights reserved.
Two dimensional unstable scar statistics.
Warne, Larry Kevin; Jorgenson, Roy Eberhardt; Kotulski, Joseph Daniel; Lee, Kelvin S. H. (ITT Industries/AES Los Angeles, CA)
2006-12-01
This report examines the localization of time harmonic high frequency modal fields in two dimensional cavities along periodic paths between opposing sides of the cavity. The cases where these orbits lead to unstable localized modes are known as scars. This paper examines the enhancements for these unstable orbits when the opposing mirrors are both convex and concave. In the latter case the construction includes the treatment of interior foci.
Statistical inference for noisy nonlinear ecological dynamic systems.
Wood, Simon N
2010-08-26
Chaotic ecological dynamic systems defy conventional statistical analysis. Systems with near-chaotic dynamics are little better. Such systems are almost invariably driven by endogenous dynamic processes plus demographic and environmental process noise, and are only observable with error. Their sensitivity to history means that minute changes in the driving noise realization, or the system parameters, will cause drastic changes in the system trajectory. This sensitivity is inherited and amplified by the joint probability density of the observable data and the process noise, rendering it useless as the basis for obtaining measures of statistical fit. Because the joint density is the basis for the fit measures used by all conventional statistical methods, this is a major theoretical shortcoming. The inability to make well-founded statistical inferences about biological dynamic models in the chaotic and near-chaotic regimes, other than on an ad hoc basis, leaves dynamic theory without the methods of quantitative validation that are essential tools in the rest of biological science. Here I show that this impasse can be resolved in a simple and general manner, using a method that requires only the ability to simulate the observed data on a system from the dynamic model about which inferences are required. The raw data series are reduced to phase-insensitive summary statistics, quantifying local dynamic structure and the distribution of observations. Simulation is used to obtain the mean and the covariance matrix of the statistics, given model parameters, allowing the construction of a 'synthetic likelihood' that assesses model fit. This likelihood can be explored using a straightforward Markov chain Monte Carlo sampler, but one further post-processing step returns pure likelihood-based inference. I apply the method to establish the dynamic nature of the fluctuations in Nicholson's classic blowfly experiments.
Breakdown of statistical inference from some random experiments
NASA Astrophysics Data System (ADS)
Kupczynski, Marian; De Raedt, Hans
2016-03-01
Many experiments can be interpreted in terms of random processes operating according to some internal protocols. When experiments are costly or cannot be repeated only one or a few finite samples are available. In this paper we study data generated by pseudo-random computer experiments operating according to particular internal protocols. We show that the standard statistical analysis performed on a sample, containing 105 data points or more, may sometimes be highly misleading and statistical errors largely underestimated. Our results confirm in a dramatic way the dangers of standard asymptotic statistical inference if a sample is not homogeneous. We demonstrate that analyzing various subdivisions of samples by multiple chi-square tests and chi-square frequency graphs is very effective in detecting sample inhomogeneity. Therefore to assure correctness of the statistical inference the above mentioned chi-square tests and other non-parametric sample homogeneity tests should be incorporated in any statistical analysis of experimental data. If such tests are not performed the reported conclusions and estimates of the errors cannot be trusted.
Gene regulatory network inference using out of equilibrium statistical mechanics
Benecke, Arndt
2008-01-01
Spatiotemporal control of gene expression is fundamental to multicellular life. Despite prodigious efforts, the encoding of gene expression regulation in eukaryotes is not understood. Gene expression analyses nourish the hope to reverse engineer effector-target gene networks using inference techniques. Inference from noisy and circumstantial data relies on using robust models with few parameters for the underlying mechanisms. However, a systematic path to gene regulatory network reverse engineering from functional genomics data is still impeded by fundamental problems. Recently, Johannes Berg from the Theoretical Physics Institute of Cologne University has made two remarkable contributions that significantly advance the gene regulatory network inference problem. Berg, who uses gene expression data from yeast, has demonstrated a nonequilibrium regime for mRNA concentration dynamics and was able to map the gene regulatory process upon simple stochastic systems driven out of equilibrium. The impact of his demonstration is twofold, affecting both the understanding of the operational constraints under which transcription occurs and the capacity to extract relevant information from highly time-resolved expression data. Berg has used his observation to predict target genes of selected transcription factors, and thereby, in principle, demonstrated applicability of his out of equilibrium statistical mechanics approach to the gene network inference problem. PMID:19404429
Gene regulatory network inference using out of equilibrium statistical mechanics.
Benecke, Arndt
2008-08-01
Spatiotemporal control of gene expression is fundamental to multicellular life. Despite prodigious efforts, the encoding of gene expression regulation in eukaryotes is not understood. Gene expression analyses nourish the hope to reverse engineer effector-target gene networks using inference techniques. Inference from noisy and circumstantial data relies on using robust models with few parameters for the underlying mechanisms. However, a systematic path to gene regulatory network reverse engineering from functional genomics data is still impeded by fundamental problems. Recently, Johannes Berg from the Theoretical Physics Institute of Cologne University has made two remarkable contributions that significantly advance the gene regulatory network inference problem. Berg, who uses gene expression data from yeast, has demonstrated a nonequilibrium regime for mRNA concentration dynamics and was able to map the gene regulatory process upon simple stochastic systems driven out of equilibrium. The impact of his demonstration is twofold, affecting both the understanding of the operational constraints under which transcription occurs and the capacity to extract relevant information from highly time-resolved expression data. Berg has used his observation to predict target genes of selected transcription factors, and thereby, in principle, demonstrated applicability of his out of equilibrium statistical mechanics approach to the gene network inference problem.
Indirect Fourier transform in the context of statistical inference.
Muthig, Michael; Prévost, Sylvain; Orglmeister, Reinhold; Gradzielski, Michael
2016-09-01
Inferring structural information from the intensity of a small-angle scattering (SAS) experiment is an ill-posed inverse problem. Thus, the determination of a solution is in general non-trivial. In this work, the indirect Fourier transform (IFT), which determines the pair distance distribution function from the intensity and hence yields structural information, is discussed within two different statistical inference approaches, namely a frequentist one and a Bayesian one, in order to determine a solution objectively From the frequentist approach the cross-validation method is obtained as a good practical objective function for selecting an IFT solution. Moreover, modern machine learning methods are employed to suppress oscillatory behaviour of the solution, hence extracting only meaningful features of the solution. By comparing the results yielded by the different methods presented here, the reliability of the outcome can be improved and thus the approach should enable more reliable information to be deduced from SAS experiments.
A model independent safeguard against background mismodeling for statistical inference
NASA Astrophysics Data System (ADS)
Priel, Nadav; Rauch, Ludwig; Landsman, Hagar; Manfredini, Alessandro; Budnik, Ranny
2017-05-01
We propose a safeguard procedure for statistical inference that provides universal protection against mismodeling of the background. The method quantifies and incorporates the signal-like residuals of the background model into the likelihood function, using information available in a calibration dataset. This prevents possible false discovery claims that may arise through unknown mismodeling, and corrects the bias in limit setting created by overestimated or underestimated background. We demonstrate how the method removes the bias created by an incomplete background model using three realistic case studies.
Statistical Inference for Big Data Problems in Molecular Biophysics
Ramanathan, Arvind; Savol, Andrej; Burger, Virginia; Quinn, Shannon; Agarwal, Pratul K; Chennubhotla, Chakra
2012-01-01
We highlight the role of statistical inference techniques in providing biological insights from analyzing long time-scale molecular simulation data. Technologi- cal and algorithmic improvements in computation have brought molecular simu- lations to the forefront of techniques applied to investigating the basis of living systems. While these longer simulations, increasingly complex reaching petabyte scales presently, promise a detailed view into microscopic behavior, teasing out the important information has now become a true challenge on its own. Mining this data for important patterns is critical to automating therapeutic intervention discovery, improving protein design, and fundamentally understanding the mech- anistic basis of cellular homeostasis.
Statistical inference involving binomial and negative binomial parameters.
García-Pérez, Miguel A; Núñez-Antón, Vicente
2009-05-01
Statistical inference about two binomial parameters implies that they are both estimated by binomial sampling. There are occasions in which one aims at testing the equality of two binomial parameters before and after the occurrence of the first success along a sequence of Bernoulli trials. In these cases, the binomial parameter before the first success is estimated by negative binomial sampling whereas that after the first success is estimated by binomial sampling, and both estimates are related. This paper derives statistical tools to test two hypotheses, namely, that both binomial parameters equal some specified value and that both parameters are equal though unknown. Simulation studies are used to show that in small samples both tests are accurate in keeping the nominal Type-I error rates, and also to determine sample size requirements to detect large, medium, and small effects with adequate power. Additional simulations also show that the tests are sufficiently robust to certain violations of their assumptions.
Statistical inference to advance network models in epidemiology.
Welch, David; Bansal, Shweta; Hunter, David R
2011-03-01
Contact networks are playing an increasingly important role in the study of epidemiology. Most of the existing work in this area has focused on considering the effect of underlying network structure on epidemic dynamics by using tools from probability theory and computer simulation. This work has provided much insight on the role that heterogeneity in host contact patterns plays on infectious disease dynamics. Despite the important understanding afforded by the probability and simulation paradigm, this approach does not directly address important questions about the structure of contact networks such as what is the best network model for a particular mode of disease transmission, how parameter values of a given model should be estimated, or how precisely the data allow us to estimate these parameter values. We argue that these questions are best answered within a statistical framework and discuss the role of statistical inference in estimating contact networks from epidemiological data.
Online Updating of Statistical Inference in the Big Data Setting.
Schifano, Elizabeth D; Wu, Jing; Wang, Chun; Yan, Jun; Chen, Ming-Hui
2016-01-01
We present statistical methods for big data arising from online analytical processing, where large amounts of data arrive in streams and require fast analysis without storage/access to the historical data. In particular, we develop iterative estimating algorithms and statistical inferences for linear models and estimating equations that update as new data arrive. These algorithms are computationally efficient, minimally storage-intensive, and allow for possible rank deficiencies in the subset design matrices due to rare-event covariates. Within the linear model setting, the proposed online-updating framework leads to predictive residual tests that can be used to assess the goodness-of-fit of the hypothesized model. We also propose a new online-updating estimator under the estimating equation setting. Theoretical properties of the goodness-of-fit tests and proposed estimators are examined in detail. In simulation studies and real data applications, our estimator compares favorably with competing approaches under the estimating equation setting.
Statistical Models for Inferring Vegetation Composition from Fossil Pollen
NASA Astrophysics Data System (ADS)
Paciorek, C.; McLachlan, J. S.; Shang, Z.
2011-12-01
Fossil pollen provide information about vegetation composition that can be used to help understand how vegetation has changed over the past. However, these data have not traditionally been analyzed in a way that allows for statistical inference about spatio-temporal patterns and trends. We build a Bayesian hierarchical model called STEPPS (Spatio-Temporal Empirical Prediction from Pollen in Sediments) that predicts forest composition in southern New England, USA, over the last two millenia based on fossil pollen. The critical relationships between abundances of tree taxa in the pollen record and abundances in actual vegetation are estimated using modern (Forest Inventory Analysis) data and (witness tree) data from colonial records. This gives us two time points at which both pollen and direct vegetation data are available. Based on these relationships, and incorporating our uncertainty about them, we predict forest composition using fossil pollen. We estimate the spatial distribution and relative abundances of tree species and draw inference about how these patterns have changed over time. Finally, we describe ongoing work to extend the modeling to the upper Midwest of the U.S., including an approach to infer tree density and thereby estimate the prairie-forest boundary in Minnesota and Wisconsin. This work is part of the PalEON project, which brings together a team of ecosystem modelers, paleoecologists, and statisticians with the goal of reconstructing vegetation responses to climate during the last two millenia in the northeastern and midwestern United States. The estimates from the statistical modeling will be used to assess and calibrate ecosystem models that are used to project ecological changes in response to global change.
The Heuristic Value of p in Inductive Statistical Inference
Krueger, Joachim I.; Heck, Patrick R.
2017-01-01
Many statistical methods yield the probability of the observed data – or data more extreme – under the assumption that a particular hypothesis is true. This probability is commonly known as ‘the’ p-value. (Null Hypothesis) Significance Testing ([NH]ST) is the most prominent of these methods. The p-value has been subjected to much speculation, analysis, and criticism. We explore how well the p-value predicts what researchers presumably seek: the probability of the hypothesis being true given the evidence, and the probability of reproducing significant results. We also explore the effect of sample size on inferential accuracy, bias, and error. In a series of simulation experiments, we find that the p-value performs quite well as a heuristic cue in inductive inference, although there are identifiable limits to its usefulness. We conclude that despite its general usefulness, the p-value cannot bear the full burden of inductive inference; it is but one of several heuristic cues available to the data analyst. Depending on the inferential challenge at hand, investigators may supplement their reports with effect size estimates, Bayes factors, or other suitable statistics, to communicate what they think the data say. PMID:28649206
Bayesian inference on the sphere beyond statistical isotropy
Das, Santanu; Souradeep, Tarun; Wandelt, Benjamin D. E-mail: wandelt@iap.fr
2015-10-01
We present a general method for Bayesian inference of the underlying covariance structure of random fields on a sphere. We employ the Bipolar Spherical Harmonic (BipoSH) representation of general covariance structure on the sphere. We illustrate the efficacy of the method as a principled approach to assess violation of statistical isotropy (SI) in the sky maps of Cosmic Microwave Background (CMB) fluctuations. SI violation in observed CMB maps arise due to known physical effects such as Doppler boost and weak lensing; yet unknown theoretical possibilities like cosmic topology and subtle violations of the cosmological principle, as well as, expected observational artefacts of scanning the sky with a non-circular beam, masking, foreground residuals, anisotropic noise, etc. We explicitly demonstrate the recovery of the input SI violation signals with their full statistics in simulated CMB maps. Our formalism easily adapts to exploring parametric physical models with non-SI covariance, as we illustrate for the inference of the parameters of a Doppler boosted sky map. Our approach promises to provide a robust quantitative evaluation of the evidence for SI violation related anomalies in the CMB sky by estimating the BipoSH spectra along with their complete posterior.
Statistics for nuclear engineers and scientists. Part 1. Basic statistical inference
Beggs, W.J.
1981-02-01
This report is intended for the use of engineers and scientists working in the nuclear industry, especially at the Bettis Atomic Power Laboratory. It serves as the basis for several Bettis in-house statistics courses. The objectives of the report are to introduce the reader to the language and concepts of statistics and to provide a basic set of techniques to apply to problems of the collection and analysis of data. Part 1 covers subjects of basic inference. The subjects include: descriptive statistics; probability; simple inference for normally distributed populations, and for non-normal populations as well; comparison of two populations; the analysis of variance; quality control procedures; and linear regression analysis.
Algebraic Statistical Model for Biochemical Network Dynamics Inference
Linder, Daniel F.; Rempala, Grzegorz A.
2014-01-01
With modern molecular quantification methods, like, for instance, high throughput sequencing, biologists may perform multiple complex experiments and collect longitudinal data on RNA and DNA concentrations. Such data may be then used to infer cellular level interactions between the molecular entities of interest. One method which formalizes such inference is the stoichiometric algebraic statistical model (SASM) of [2] which allows to analyze the so-called conic (or single source) networks. Despite its intuitive appeal, up until now the SASM has been only heuristically studied on few simple examples. The current paper provides a more formal mathematical treatment of the SASM, expanding the original model to a wider class of reaction systems decomposable into multiple conic subnetworks. In particular, it is proved here that on such networks the SASM enjoys the so-called sparsistency property, that is, it asymptotically (with the number of observed network trajectories) discards the false interactions by setting their reaction rates to zero. For illustration, we apply the extended SASM to in silico data from a generic decomposable network as well as to biological data from an experimental search for a possible transcription factor for the heat shock protein 70 (Hsp70) in the zebrafish retina. PMID:25525612
Algebraic Statistical Model for Biochemical Network Dynamics Inference.
Linder, Daniel F; Rempala, Grzegorz A
2013-12-01
With modern molecular quantification methods, like, for instance, high throughput sequencing, biologists may perform multiple complex experiments and collect longitudinal data on RNA and DNA concentrations. Such data may be then used to infer cellular level interactions between the molecular entities of interest. One method which formalizes such inference is the stoichiometric algebraic statistical model (SASM) of [2] which allows to analyze the so-called conic (or single source) networks. Despite its intuitive appeal, up until now the SASM has been only heuristically studied on few simple examples. The current paper provides a more formal mathematical treatment of the SASM, expanding the original model to a wider class of reaction systems decomposable into multiple conic subnetworks. In particular, it is proved here that on such networks the SASM enjoys the so-called sparsistency property, that is, it asymptotically (with the number of observed network trajectories) discards the false interactions by setting their reaction rates to zero. For illustration, we apply the extended SASM to in silico data from a generic decomposable network as well as to biological data from an experimental search for a possible transcription factor for the heat shock protein 70 (Hsp70) in the zebrafish retina.
Simple statistical inference algorithms for task-dependent wellness assessment.
Kailas, A; Chong, C-C; Watanabe, F
2012-07-01
Stress is a key indicator of wellness in human beings and a prime contributor to performance degradation and errors during various human tasks. The overriding purpose of this paper is to propose two algorithms (probabilistic and non-probabilistic) that iteratively track stress states to compute a wellness index in terms of the stress levels. This paper adopts the physiological view-point that high stress is accompanied with large deviations in biometrics such as body temperature, heart rate, etc., and the proposed algorithms iteratively track these fluctuations to compute a personalized wellness index that is correlated to the engagement levels of the tasks performed by the user. In essence, this paper presents a quantitative relationship between temperature, occupational stress, and wellness during different tasks. The simplicity of the statistical inference algorithms make them favorable candidates for implementation on mobile platforms such as smart phones in the future, thereby providing users an inexpensive application for self-wellness monitoring for a healthier lifestyle.
Multiple Illuminant Colour Estimation via Statistical Inference on Factor Graphs.
Mutimbu, Lawrence; Robles-Kelly, Antonio
2016-08-31
This paper presents a method to recover a spatially varying illuminant colour estimate from scenes lit by multiple light sources. Starting with the image formation process, we formulate the illuminant recovery problem in a statistically datadriven setting. To do this, we use a factor graph defined across the scale space of the input image. In the graph, we utilise a set of illuminant prototypes computed using a data driven approach. As a result, our method delivers a pixelwise illuminant colour estimate being devoid of libraries or user input. The use of a factor graph also allows for the illuminant estimates to be recovered making use of a maximum a posteriori (MAP) inference process. Moreover, we compute the probability marginals by performing a Delaunay triangulation on our factor graph. We illustrate the utility of our method for pixelwise illuminant colour recovery on widely available datasets and compare against a number of alternatives. We also show sample colour correction results on real-world images.
Testing manifest monotonicity using order-constrained statistical inference.
Tijmstra, Jesper; Hessen, David J; van der Heijden, Peter G M; Sijtsma, Klaas
2013-01-01
Most dichotomous item response models share the assumption of latent monotonicity, which states that the probability of a positive response to an item is a nondecreasing function of a latent variable intended to be measured. Latent monotonicity cannot be evaluated directly, but it implies manifest monotonicity across a variety of observed scores, such as the restscore, a single item score, and in some cases the total score. In this study, we show that manifest monotonicity can be tested by means of the order-constrained statistical inference framework. We propose a procedure that uses this framework to determine whether manifest monotonicity should be rejected for specific items. This approach provides a likelihood ratio test for which the p-value can be approximated through simulation. A simulation study is presented that evaluates the Type I error rate and power of the test, and the procedure is applied to empirical data.
CALUX measurements: statistical inferences for the dose-response curve.
Elskens, M; Baston, D S; Stumpf, C; Haedrich, J; Keupers, I; Croes, K; Denison, M S; Baeyens, W; Goeyens, L
2011-09-30
Chemical Activated LUciferase gene eXpression [CALUX] is a reporter gene mammalian cell bioassay used for detection and semi-quantitative analyses of dioxin-like compounds. CALUX dose-response curves for 2,3,7,8-tetrachlorodibenzo-p-dioxin [TCDD] are typically smooth and sigmoidal when the dose is portrayed on a logarithmic scale. Non-linear regression models are used to calibrate the CALUX response versus TCDD standards and to convert the sample response into Bioanalytical EQuivalents (BEQs). Several complications may arise in terms of statistical inference, specifically and most important is the uncertainty assessment of the predicted BEQ. This paper presents the use of linear calibration functions based on Box-Cox transformations to overcome the issue of uncertainty assessment. Main issues being addressed are (i) confidence and prediction intervals for the CALUX response, (ii) confidence and prediction intervals for the predicted BEQ-value, and (iii) detection/estimation capabilities for the sigmoid and linearized models. Statistical comparisons between different calculation methods involving inverse prediction, effective concentration ratios (ECR(20-50-80)) and slope ratio were achieved with example datasets in order to provide guidance for optimizing BEQ determinations and expand assay performance with the recombinant mouse hepatoma CALUX cell line H1L6.1c3.
NASA Astrophysics Data System (ADS)
Albert, Carlo; Ulzega, Simone; Stoop, Ruedi
2016-04-01
Measured time-series of both precipitation and runoff are known to exhibit highly non-trivial statistical properties. For making reliable probabilistic predictions in hydrology, it is therefore desirable to have stochastic models with output distributions that share these properties. When parameters of such models have to be inferred from data, we also need to quantify the associated parametric uncertainty. For non-trivial stochastic models, however, this latter step is typically very demanding, both conceptually and numerically, and always never done in hydrology. Here, we demonstrate that methods developed in statistical physics make a large class of stochastic differential equation (SDE) models amenable to a full-fledged Bayesian parameter inference. For concreteness we demonstrate these methods by means of a simple yet non-trivial toy SDE model. We consider a natural catchment that can be described by a linear reservoir, at the scale of observation. All the neglected processes are assumed to happen at much shorter time-scales and are therefore modeled with a Gaussian white noise term, the standard deviation of which is assumed to scale linearly with the system state (water volume in the catchment). Even for constant input, the outputs of this simple non-linear SDE model show a wealth of desirable statistical properties, such as fat-tailed distributions and long-range correlations. Standard algorithms for Bayesian inference fail, for models of this kind, because their likelihood functions are extremely high-dimensional intractable integrals over all possible model realizations. The use of Kalman filters is illegitimate due to the non-linearity of the model. Particle filters could be used but become increasingly inefficient with growing number of data points. Hamiltonian Monte Carlo algorithms allow us to translate this inference problem to the problem of simulating the dynamics of a statistical mechanics system and give us access to most sophisticated methods
NASA Astrophysics Data System (ADS)
Vali Ahmadi, Mohammad; Doostparast, Mahdi; Ahmadi, Jafar
2015-04-01
In manufacturing industries, the lifetime of an item is usually characterised by a random variable X and considered to be satisfactory if X exceeds a given lower lifetime limit L. The probability of a satisfactory item is then ηL := P(X ≥ L), called conforming rate. In industrial companies, however, the lifetime performance index, proposed by Montgomery and denoted by CL, is widely used as a process capability index instead of the conforming rate. Assuming a parametric model for the random variable X, we show that there is a connection between the conforming rate and the lifetime performance index. Consequently, the statistical inferences about ηL and CL are equivalent. Hence, we restrict ourselves to statistical inference for CL based on generalised order statistics, which contains several ordered data models such as usual order statistics, progressively Type-II censored data and records. Various point and interval estimators for the parameter CL are obtained and optimal critical regions for the hypothesis testing problems concerning CL are proposed. Finally, two real data-sets on the lifetimes of insulating fluid and ball bearings, due to Nelson (1982) and Caroni (2002), respectively, and a simulated sample are analysed.
Statistical inference of regulatory networks for circadian regulation.
Aderhold, Andrej; Husmeier, Dirk; Grzegorczyk, Marco
2014-06-01
We assess the accuracy of various state-of-the-art statistics and machine learning methods for reconstructing gene and protein regulatory networks in the context of circadian regulation. Our study draws on the increasing availability of gene expression and protein concentration time series for key circadian clock components in Arabidopsis thaliana. In addition, gene expression and protein concentration time series are simulated from a recently published regulatory network of the circadian clock in A. thaliana, in which protein and gene interactions are described by a Markov jump process based on Michaelis-Menten kinetics. We closely follow recent experimental protocols, including the entrainment of seedlings to different light-dark cycles and the knock-out of various key regulatory genes. Our study provides relative network reconstruction accuracy scores for a critical comparative performance evaluation, and sheds light on a series of highly relevant questions: it quantifies the influence of systematically missing values related to unknown protein concentrations and mRNA transcription rates, it investigates the dependence of the performance on the network topology and the degree of recurrency, it provides deeper insight into when and why non-linear methods fail to outperform linear ones, it offers improved guidelines on parameter settings in different inference procedures, and it suggests new hypotheses about the structure of the central circadian gene regulatory network in A. thaliana.
Multivariate Statistical Inference of Lightning Occurrence, and Using Lightning Observations
NASA Technical Reports Server (NTRS)
Boccippio, Dennis
2004-01-01
Two classes of multivariate statistical inference using TRMM Lightning Imaging Sensor, Precipitation Radar, and Microwave Imager observation are studied, using nonlinear classification neural networks as inferential tools. The very large and globally representative data sample provided by TRMM allows both training and validation (without overfitting) of neural networks with many degrees of freedom. In the first study, the flashing / or flashing condition of storm complexes is diagnosed using radar, passive microwave and/or environmental observations as neural network inputs. The diagnostic skill of these simple lightning/no-lightning classifiers can be quite high, over land (above 80% Probability of Detection; below 20% False Alarm Rate). In the second, passive microwave and lightning observations are used to diagnose radar reflectivity vertical structure. A priori diagnosis of hydrometeor vertical structure is highly important for improved rainfall retrieval from either orbital radars (e.g., the future Global Precipitation Mission "mothership") or radiometers (e.g., operational SSM/I and future Global Precipitation Mission passive microwave constellation platforms), we explore the incremental benefit to such diagnosis provided by lightning observations.
Multivariate Statistical Inference of Lightning Occurrence, and Using Lightning Observations
NASA Technical Reports Server (NTRS)
Boccippio, Dennis
2004-01-01
Two classes of multivariate statistical inference using TRMM Lightning Imaging Sensor, Precipitation Radar, and Microwave Imager observation are studied, using nonlinear classification neural networks as inferential tools. The very large and globally representative data sample provided by TRMM allows both training and validation (without overfitting) of neural networks with many degrees of freedom. In the first study, the flashing / or flashing condition of storm complexes is diagnosed using radar, passive microwave and/or environmental observations as neural network inputs. The diagnostic skill of these simple lightning/no-lightning classifiers can be quite high, over land (above 80% Probability of Detection; below 20% False Alarm Rate). In the second, passive microwave and lightning observations are used to diagnose radar reflectivity vertical structure. A priori diagnosis of hydrometeor vertical structure is highly important for improved rainfall retrieval from either orbital radars (e.g., the future Global Precipitation Mission "mothership") or radiometers (e.g., operational SSM/I and future Global Precipitation Mission passive microwave constellation platforms), we explore the incremental benefit to such diagnosis provided by lightning observations.
Statistical inference for community detection in signed networks
NASA Astrophysics Data System (ADS)
Zhao, Xuehua; Yang, Bo; Liu, Xueyan; Chen, Huiling
2017-04-01
The problem of community detection in networks has received wide attention and proves to be computationally challenging. In recent years, with the surge of signed networks with positive links and negative links, to find community structure in such signed networks has become a research focus in the area of network science. Although many methods have been proposed to address the problem, their performance seriously depends on the predefined optimization objectives or heuristics which are usually difficult to accurately describe the intrinsic structure of community. In this study, we present a statistical inference method for community detection in signed networks, in which a probabilistic model is proposed to model signed networks and the expectation-maximization-based parameter estimation method is deduced to find communities in signed networks. In addition, to efficiently analyze signed networks without any a priori information, a model selection criterion is also proposed to automatically determine the number of communities. In our experiments, the proposed method is tested in the synthetic and real-word signed networks and compared with current methods. The experimental results show the proposed method can more efficiently and accurately find the communities in signed networks than current methods. Notably, the proposed method is a mathematically principled method.
Physics of epigenetic landscapes and statistical inference by cells
NASA Astrophysics Data System (ADS)
Lang, Alex H.
Biology is currently in the midst of a revolution. Great technological advances have led to unprecedented quantitative data at the whole genome level. However, new techniques are needed to deal with this deluge of high-dimensional data. Therefore, statistical physics has the potential to help develop systems biology level models that can incorporate complex data. Additionally, physicists have made great strides in understanding non-equilibrium thermodynamics. However, the consequences of these advances have yet to be fully incorporated into biology. There are three specific problems that I address in my dissertation. First, a common metaphor for describing development is a rugged "epigenetic landscape'' where cell fates are represented as attracting valleys resulting from a complex regulatory network. I introduce a framework for explicitly constructing epigenetic landscapes that combines genomic data with techniques from spin-glass physics. The model reproduces known reprogramming protocols and identifies candidate transcription factors for reprogramming to novel cell fates, suggesting epigenetic landscapes are a powerful paradigm for understanding cellular identity. Second, I examine the dynamics of cellular reprogramming. By reanalyzing all available time-series data, I show that gene expression dynamics during reprogramming follow a simple one-dimensional reaction coordinate that is independent of both the time and details of experimental protocol used. I show that such a reaction coordinate emerges naturally from epigenetic landscape models of cell identity where cellular reprogramming is viewed as a "barrier-crossing'' between the starting and ending cell fates. Overall, the analysis and model suggest that gene expression dynamics during reprogramming follow a canonical trajectory consistent with the idea of an ``optimal path'' in gene expression space for reprogramming. Third, an important task of cells is to perform complex computations in response to
Development of Statistical Methods Using Predictive Inference and Entropy.
1986-03-01
Inference and Entopy APPENDIX B: Achieab Accuracy in Parametric Estimation of B-I Multivariate spectra ii LWl OF MIUMU AND TABLES FIGURES PAGE Figre1...1986e). "Achievable Accuracy in Parametric Estimation of Multivariate Spec- tra’. Draft. Larimore, WE. (1983a). ’Predictive inference, sufficiency... PARAMETRIC ESTIMATION OF MULTIVARIATE SPECTRA By Wallace E. Larimore Scientific Systems Inc., Cambridge, Massachusetts, U.SA. Research Sponsored by the
Mechanical stress inference for two dimensional cell arrays.
Chiou, Kevin K; Hufnagel, Lars; Shraiman, Boris I
2012-01-01
Many morphogenetic processes involve mechanical rearrangements of epithelial tissues that are driven by precisely regulated cytoskeletal forces and cell adhesion. The mechanical state of the cell and intercellular adhesion are not only the targets of regulation, but are themselves the likely signals that coordinate developmental process. Yet, because it is difficult to directly measure mechanical stress in vivo on sub-cellular scale, little is understood about the role of mechanics in development. Here we present an alternative approach which takes advantage of the recent progress in live imaging of morphogenetic processes and uses computational analysis of high resolution images of epithelial tissues to infer relative magnitude of forces acting within and between cells. We model intracellular stress in terms of bulk pressure and interfacial tension, allowing these parameters to vary from cell to cell and from interface to interface. Assuming that epithelial cell layers are close to mechanical equilibrium, we use the observed geometry of the two dimensional cell array to infer interfacial tensions and intracellular pressures. Here we present the mathematical formulation of the proposed Mechanical Inverse method and apply it to the analysis of epithelial cell layers observed at the onset of ventral furrow formation in the Drosophila embryo and in the process of hair-cell determination in the avian cochlea. The analysis reveals mechanical anisotropy in the former process and mechanical heterogeneity, correlated with cell differentiation, in the latter process. The proposed method opens a way for quantitative and detailed experimental tests of models of cell and tissue mechanics.
Wilkinson, Michael
2014-03-01
Decisions about support for predictions of theories in light of data are made using statistical inference. The dominant approach in sport and exercise science is the Neyman-Pearson (N-P) significance-testing approach. When applied correctly it provides a reliable procedure for making dichotomous decisions for accepting or rejecting zero-effect null hypotheses with known and controlled long-run error rates. Type I and type II error rates must be specified in advance and the latter controlled by conducting an a priori sample size calculation. The N-P approach does not provide the probability of hypotheses or indicate the strength of support for hypotheses in light of data, yet many scientists believe it does. Outcomes of analyses allow conclusions only about the existence of non-zero effects, and provide no information about the likely size of true effects or their practical/clinical value. Bayesian inference can show how much support data provide for different hypotheses, and how personal convictions should be altered in light of data, but the approach is complicated by formulating probability distributions about prior subjective estimates of population effects. A pragmatic solution is magnitude-based inference, which allows scientists to estimate the true magnitude of population effects and how likely they are to exceed an effect magnitude of practical/clinical importance, thereby integrating elements of subjective Bayesian-style thinking. While this approach is gaining acceptance, progress might be hastened if scientists appreciate the shortcomings of traditional N-P null hypothesis significance testing.
Building Intuitions about Statistical Inference Based on Resampling
ERIC Educational Resources Information Center
Watson, Jane; Chance, Beth
2012-01-01
Formal inference, which makes theoretical assumptions about distributions and applies hypothesis testing procedures with null and alternative hypotheses, is notoriously difficult for tertiary students to master. The debate about whether this content should appear in Years 11 and 12 of the "Australian Curriculum: Mathematics" has gone on…
Bayesian Statistical Inference in Psychology: Comment on Trafimow (2003)
ERIC Educational Resources Information Center
Lee, Michael D.; Wagenmakers, Eric-Jan
2005-01-01
D. Trafimow presented an analysis of null hypothesis significance testing (NHST) using Bayes's theorem. Among other points, he concluded that NHST is logically invalid, but that logically valid Bayesian analyses are often not possible. The latter conclusion reflects a fundamental misunderstanding of the nature of Bayesian inference. This view…
Statistical Inference in the Learning of Novel Phonetic Categories
ERIC Educational Resources Information Center
Zhao, Yuan
2010-01-01
Learning a phonetic category (or any linguistic category) requires integrating different sources of information. A crucial unsolved problem for phonetic learning is how this integration occurs: how can we update our previous knowledge about a phonetic category as we hear new exemplars of the category? One model of learning is Bayesian Inference,…
Bayesian Statistical Inference in Psychology: Comment on Trafimow (2003)
ERIC Educational Resources Information Center
Lee, Michael D.; Wagenmakers, Eric-Jan
2005-01-01
D. Trafimow presented an analysis of null hypothesis significance testing (NHST) using Bayes's theorem. Among other points, he concluded that NHST is logically invalid, but that logically valid Bayesian analyses are often not possible. The latter conclusion reflects a fundamental misunderstanding of the nature of Bayesian inference. This view…
Some challenges with statistical inference in adaptive designs.
Hung, H M James; Wang, Sue-Jane; Yang, Peiling
2014-01-01
Adaptive designs have generated a great deal of attention to clinical trial communities. The literature contains many statistical methods to deal with added statistical uncertainties concerning the adaptations. Increasingly encountered in regulatory applications are adaptive statistical information designs that allow modification of sample size or related statistical information and adaptive selection designs that allow selection of doses or patient populations during the course of a clinical trial. For adaptive statistical information designs, a few statistical testing methods are mathematically equivalent, as a number of articles have stipulated, but arguably there are large differences in their practical ramifications. We pinpoint some undesirable features of these methods in this work. For adaptive selection designs, the selection based on biomarker data for testing the correlated clinical endpoints may increase statistical uncertainty in terms of type I error probability, and most importantly the increased statistical uncertainty may be impossible to assess.
Statistical mechanics of complex neural systems and high dimensional data
NASA Astrophysics Data System (ADS)
Advani, Madhu; Lahiri, Subhaneil; Ganguli, Surya
2013-03-01
Recent experimental advances in neuroscience have opened new vistas into the immense complexity of neuronal networks. This proliferation of data challenges us on two parallel fronts. First, how can we form adequate theoretical frameworks for understanding how dynamical network processes cooperate across widely disparate spatiotemporal scales to solve important computational problems? Second, how can we extract meaningful models of neuronal systems from high dimensional datasets? To aid in these challenges, we give a pedagogical review of a collection of ideas and theoretical methods arising at the intersection of statistical physics, computer science and neurobiology. We introduce the interrelated replica and cavity methods, which originated in statistical physics as powerful ways to quantitatively analyze large highly heterogeneous systems of many interacting degrees of freedom. We also introduce the closely related notion of message passing in graphical models, which originated in computer science as a distributed algorithm capable of solving large inference and optimization problems involving many coupled variables. We then show how both the statistical physics and computer science perspectives can be applied in a wide diversity of contexts to problems arising in theoretical neuroscience and data analysis. Along the way we discuss spin glasses, learning theory, illusions of structure in noise, random matrices, dimensionality reduction and compressed sensing, all within the unified formalism of the replica method. Moreover, we review recent conceptual connections between message passing in graphical models, and neural computation and learning. Overall, these ideas illustrate how statistical physics and computer science might provide a lens through which we can uncover emergent computational functions buried deep within the dynamical complexities of neuronal networks.
Statistical inference for stochastic simulation models--theory and application.
Hartig, Florian; Calabrese, Justin M; Reineking, Björn; Wiegand, Thorsten; Huth, Andreas
2011-08-01
Statistical models are the traditional choice to test scientific theories when observations, processes or boundary conditions are subject to stochasticity. Many important systems in ecology and biology, however, are difficult to capture with statistical models. Stochastic simulation models offer an alternative, but they were hitherto associated with a major disadvantage: their likelihood functions can usually not be calculated explicitly, and thus it is difficult to couple them to well-established statistical theory such as maximum likelihood and Bayesian statistics. A number of new methods, among them Approximate Bayesian Computing and Pattern-Oriented Modelling, bypass this limitation. These methods share three main principles: aggregation of simulated and observed data via summary statistics, likelihood approximation based on the summary statistics, and efficient sampling. We discuss principles as well as advantages and caveats of these methods, and demonstrate their potential for integrating stochastic simulation models into a unified framework for statistical modelling.
Statistical Machine Learning for Structured and High Dimensional Data
2014-09-17
AFRL-OSR-VA-TR-2014-0234 STATISTICAL MACHINE LEARNING FOR STRUCTURED AND HIGH DIMENSIONAL DATA Larry Wasserman CARNEGIE MELLON UNIVERSITY Final...Re . 8-98) v Prescribed by ANSI Std. Z39.18 14-06-2014 Final Dec 2009 - Aug 2014 Statistical Machine Learning for Structured and High Dimensional...area of resource-constrained statistical estimation. machine learning , high-dimensional statistics U U U UU John Lafferty 773-702-3813 > Research under
PROBLEMS OF STATISTICAL INFERENCE FOR BIRTH AND DEATH QUEUEING MODELS
A large sample theory is presented for birth and death queueing processes which are ergodic and metrically transitive. The theory is applied to make...inferences about how arrival and service rates vary with the number in the system. Likelihood ratio tests and maximum likelihood estimators are...derived for simple models which describe this variation. Composite hypotheses such as that the arrival rate does not vary with the number in the system are
Statistical Inferences from the Topology of Complex Networks
2016-10-04
for visualization, they are unsuitable for further statistical analysis or machine learning. The main goal of this project was to develop a new summary...compatible with statistics and machine learning. This goal was met with the development of a new summary, the “persistence landscape”. This summary is...main results were published in the Jour- nal of Machine Learning Research in a paper titled “Statistical topological data analysis using persistence
Statistical inference and sensitivity to sampling in 11-month-old infants.
Xu, Fei; Denison, Stephanie
2009-07-01
Research on initial conceptual knowledge and research on early statistical learning mechanisms have been, for the most part, two separate enterprises. We report a study with 11-month-old infants investigating whether they are sensitive to sampling conditions and whether they can integrate intentional information in a statistical inference task. Previous studies found that infants were able to make inferences from samples to populations, and vice versa [Xu, F., & Garcia, V. (2008). Intuitive statistics by 8-month-old infants. Proceedings of the National Academy of Sciences of the United States of America, 105, 5012-5015]. We found that when employing this statistical inference mechanism, infants are sensitive to whether a sample was randomly drawn from a population or not, and they take into account intentional information (e.g., explicitly expressed preference, visual access) when computing the relationship between samples and populations. Our results suggest that domain-specific knowledge is integrated with statistical inference mechanisms early in development.
For and Against Methodologies: Some Perspectives on Recent Causal and Statistical Inference Debates.
Greenland, Sander
2017-01-01
I present an overview of two methods controversies that are central to analysis and inference: That surrounding causal modeling as reflected in the "causal inference" movement, and that surrounding null bias in statistical methods as applied to causal questions. Human factors have expanded what might otherwise have been narrow technical discussions into broad philosophical debates. There seem to be misconceptions about the requirements and capabilities of formal methods, especially in notions that certain assumptions or models (such as potential-outcome models) are necessary or sufficient for valid inference. I argue that, once these misconceptions are removed, most elements of the opposing views can be reconciled. The chief problem of causal inference then becomes one of how to teach sound use of formal methods (such as causal modeling, statistical inference, and sensitivity analysis), and how to apply them without generating the overconfidence and misinterpretations that have ruined so many statistical practices.
The Role of the Sampling Distribution in Understanding Statistical Inference
ERIC Educational Resources Information Center
Lipson, Kay
2003-01-01
Many statistics educators believe that few students develop the level of conceptual understanding essential for them to apply correctly the statistical techniques at their disposal and to interpret their outcomes appropriately. It is also commonly believed that the sampling distribution plays an important role in developing this understanding.…
Inference Based on Simple Step Statistics for the Location Model.
1981-07-01
function. Let TN,k(9) - Zak(’)Vi(e). Then TNk is called the k-step statistic. Noether (1973) studied the 1-step statistic with particular emphasis on...opposed to the sign statistic. These latter two comparisons were first discussed by Noether (1973) in a somewhat different setting. Notice that the...obtained by Noether (1973). If k - 3, we seek the (C + 1)’st and (2N - bI - b2 - C)’th ordered Walsh averages in D The algorithm of Section 3 modified to
ERIC Educational Resources Information Center
Larwin, Karen H.; Larwin, David A.
2011-01-01
Bootstrapping methods and random distribution methods are increasingly recommended as better approaches for teaching students about statistical inference in introductory-level statistics courses. The authors examined the effect of teaching undergraduate business statistics students using random distribution and bootstrapping simulations. It is the…
ERIC Educational Resources Information Center
Larwin, Karen H.; Larwin, David A.
2011-01-01
Bootstrapping methods and random distribution methods are increasingly recommended as better approaches for teaching students about statistical inference in introductory-level statistics courses. The authors examined the effect of teaching undergraduate business statistics students using random distribution and bootstrapping simulations. It is the…
Statistical Inferences from Formaldehyde Dna-Protein Cross-Link Data
Physiologically-based pharmacokinetic (PBPK) modeling has reached considerable sophistication in its application in the pharmacological and environmental health areas. Yet, mature methodologies for making statistical inferences have not been routinely incorporated in these applic...
Statistical Inferences from Formaldehyde Dna-Protein Cross-Link Data
Physiologically-based pharmacokinetic (PBPK) modeling has reached considerable sophistication in its application in the pharmacological and environmental health areas. Yet, mature methodologies for making statistical inferences have not been routinely incorporated in these applic...
Wang, Xiaoxiao; Wang, Huan; Huang, Jinfeng; Zhou, Yifeng; Tzvetanov, Tzvetomir
2017-01-01
The contrast sensitivity function that spans the two dimensions of contrast and spatial frequency is crucial in predicting functional vision both in research and clinical applications. In this study, the use of Bayesian inference was proposed to determine the parameters of the two-dimensional contrast sensitivity function. Two-dimensional Bayesian inference was extensively simulated in comparison to classical one-dimensional measures. Its performance on two-dimensional data gathered with different sampling algorithms was also investigated. The results showed that the two-dimensional Bayesian inference method significantly improved the accuracy and precision of the contrast sensitivity function, as compared to the more common one-dimensional estimates. In addition, applying two-dimensional Bayesian estimation to the final data set showed similar levels of reliability and efficiency across widely disparate and established sampling methods (from classical one-dimensional sampling, such as Ψ or staircase, to more novel multi-dimensional sampling methods, such as quick contrast sensitivity function and Fisher information gain). Furthermore, the improvements observed following the application of Bayesian inference were maintained even when the prior poorly matched the subject's contrast sensitivity function. Simulation results were confirmed in a psychophysical experiment. The results indicated that two-dimensional Bayesian inference of contrast sensitivity function data provides similar estimates across a wide range of sampling methods. The present study likely has implications for the measurement of contrast sensitivity function in various settings (including research and clinical settings) and would facilitate the comparison of existing data from previous studies. PMID:28119563
Wang, Xiaoxiao; Wang, Huan; Huang, Jinfeng; Zhou, Yifeng; Tzvetanov, Tzvetomir
2016-01-01
The contrast sensitivity function that spans the two dimensions of contrast and spatial frequency is crucial in predicting functional vision both in research and clinical applications. In this study, the use of Bayesian inference was proposed to determine the parameters of the two-dimensional contrast sensitivity function. Two-dimensional Bayesian inference was extensively simulated in comparison to classical one-dimensional measures. Its performance on two-dimensional data gathered with different sampling algorithms was also investigated. The results showed that the two-dimensional Bayesian inference method significantly improved the accuracy and precision of the contrast sensitivity function, as compared to the more common one-dimensional estimates. In addition, applying two-dimensional Bayesian estimation to the final data set showed similar levels of reliability and efficiency across widely disparate and established sampling methods (from classical one-dimensional sampling, such as Ψ or staircase, to more novel multi-dimensional sampling methods, such as quick contrast sensitivity function and Fisher information gain). Furthermore, the improvements observed following the application of Bayesian inference were maintained even when the prior poorly matched the subject's contrast sensitivity function. Simulation results were confirmed in a psychophysical experiment. The results indicated that two-dimensional Bayesian inference of contrast sensitivity function data provides similar estimates across a wide range of sampling methods. The present study likely has implications for the measurement of contrast sensitivity function in various settings (including research and clinical settings) and would facilitate the comparison of existing data from previous studies.
Statistical Inference Models for Image Datasets with Systematic Variations
Kim, Won Hwa; Bendlin, Barbara B.; Chung, Moo K.; Johnson, Sterling C.; Singh, Vikas
2016-01-01
Statistical analysis of longitudinal or cross sectional brain imaging data to identify effects of neurodegenerative diseases is a fundamental task in various studies in neuroscience. However, when there are systematic variations in the images due to parameter changes such as changes in the scanner protocol, hardware changes, or when combining data from multi-site studies, the statistical analysis becomes problematic. Motivated by this scenario, the goal of this paper is to develop a unified statistical solution to the problem of systematic variations in statistical image analysis. Based in part on recent literature in harmonic analysis on diffusion maps, we propose an algorithm which compares operators that are resilient to the systematic variations. These operators are derived from the empirical measurements of the image data and provide an efficient surrogate to capturing the actual changes across images. We also establish a connection between our method to the design of wavelets in non-Euclidean space. To evaluate the proposed ideas, we present various experimental results on detecting changes in simulations as well as show how the method offers improved statistical power in the analysis of real longitudinal PIB-PET imaging data acquired from participants at risk for Alzheimer’s disease (AD). PMID:26989336
Statistical Inference Models for Image Datasets with Systematic Variations.
Kim, Won Hwa; Bendlin, Barbara B; Chung, Moo K; Johnson, Sterling C; Singh, Vikas
2015-06-01
Statistical analysis of longitudinal or cross sectional brain imaging data to identify effects of neurodegenerative diseases is a fundamental task in various studies in neuroscience. However, when there are systematic variations in the images due to parameter changes such as changes in the scanner protocol, hardware changes, or when combining data from multi-site studies, the statistical analysis becomes problematic. Motivated by this scenario, the goal of this paper is to develop a unified statistical solution to the problem of systematic variations in statistical image analysis. Based in part on recent literature in harmonic analysis on diffusion maps, we propose an algorithm which compares operators that are resilient to the systematic variations. These operators are derived from the empirical measurements of the image data and provide an efficient surrogate to capturing the actual changes across images. We also establish a connection between our method to the design of wavelets in non-Euclidean space. To evaluate the proposed ideas, we present various experimental results on detecting changes in simulations as well as show how the method offers improved statistical power in the analysis of real longitudinal PIB-PET imaging data acquired from participants at risk for Alzheimer's disease (AD).
Statistical inference in behavior analysis: Friend or foe?
Baron, Alan
1999-01-01
Behavior analysts are undecided about the proper role to be played by inferential statistics in behavioral research. The traditional view, as expressed in Sidman's Tactics of Scientific Research (1960), was that inferential statistics has no place within a science that focuses on the steady-state behavior of individual organisms. Despite this admonition, there have been steady inroads of statistical techniques into behavior analysis since then, as evidenced by publications in the Journal of the Experimental Analysis of Behavior. The issues raised by these developments were considered at a panel held at the 24th annual convention of the Association for Behavior Analysis, Orlando, Florida (May, 1998). The proceedings are reported in this and the following articles. PMID:22478323
Contrasting Diversity Values: Statistical Inferences Based on Overlapping Confidence Intervals
MacGregor-Fors, Ian; Payton, Mark E.
2013-01-01
Ecologists often contrast diversity (species richness and abundances) using tests for comparing means or indices. However, many popular software applications do not support performing standard inferential statistics for estimates of species richness and/or density. In this study we simulated the behavior of asymmetric log-normal confidence intervals and determined an interval level that mimics statistical tests with P(α) = 0.05 when confidence intervals from two distributions do not overlap. Our results show that 84% confidence intervals robustly mimic 0.05 statistical tests for asymmetric confidence intervals, as has been demonstrated for symmetric ones in the past. Finally, we provide detailed user-guides for calculating 84% confidence intervals in two of the most robust and highly-used freeware related to diversity measurements for wildlife (i.e., EstimateS, Distance). PMID:23437239
Contrasting diversity values: statistical inferences based on overlapping confidence intervals.
MacGregor-Fors, Ian; Payton, Mark E
2013-01-01
Ecologists often contrast diversity (species richness and abundances) using tests for comparing means or indices. However, many popular software applications do not support performing standard inferential statistics for estimates of species richness and/or density. In this study we simulated the behavior of asymmetric log-normal confidence intervals and determined an interval level that mimics statistical tests with P(α) = 0.05 when confidence intervals from two distributions do not overlap. Our results show that 84% confidence intervals robustly mimic 0.05 statistical tests for asymmetric confidence intervals, as has been demonstrated for symmetric ones in the past. Finally, we provide detailed user-guides for calculating 84% confidence intervals in two of the most robust and highly-used freeware related to diversity measurements for wildlife (i.e., EstimateS, Distance).
A Test by Any Other Name: P Values, Bayes Factors, and Statistical Inference.
Stern, Hal S
2016-01-01
Procedures used for statistical inference are receiving increased scrutiny as the scientific community studies the factors associated with insuring reproducible research. This note addresses recent negative attention directed at p values, the relationship of confidence intervals and tests, and the role of Bayesian inference and Bayes factors, with an eye toward better understanding these different strategies for statistical inference. We argue that researchers and data analysts too often resort to binary decisions (e.g., whether to reject or accept the null hypothesis) in settings where this may not be required.
Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction
ERIC Educational Resources Information Center
Imbens, Guido W.; Rubin, Donald B.
Most questions in social and biomedical sciences are causal in nature: what would happen to individuals, or to groups, if part of their environment were changed? In this groundbreaking text, two world-renowned experts present statistical methods for studying such questions. This book starts with the notion of potential outcomes, each corresponding…
Statistical Inference and Simulation with StatKey
ERIC Educational Resources Information Center
Quinn, Anne
2016-01-01
While looking for an inexpensive technology package to help students in statistics classes, the author found StatKey, a free Web-based app. Not only is StatKey useful for students' year-end projects, but it is also valuable for helping students learn fundamental content such as the central limit theorem. Using StatKey, students can engage in…
Statistical inference and forensic evidence: evaluating a bullet lead match.
Kaasa, Suzanne O; Peterson, Tiamoyo; Morris, Erin K; Thompson, William C
2007-10-01
This experiment tested the ability of undergraduate mock jurors (N=295) to draw appropriate conclusions from statistical data on the diagnostic value of forensic evidence. Jurors read a summary of a homicide trial in which the key evidence was a bullet lead "match" that was either highly diagnostic, non-diagnostic, or of unknown diagnostic value. There was also a control condition in which the forensic "match" was not presented. The results indicate that jurors as a group used the statistics appropriately to distinguish diagnostic from non-diagnostic forensic evidence, giving considerable weight to the former and little or no weight to the latter. However, this effect was attributable to responses of a subset of jurors who expressed confidence in their ability to use statistical data. Jurors who lacked confidence in their statistical ability failed to distinguish highly diagnostic from non-diagnostic forensic evidence; they gave no weight to the forensic evidence regardless of its diagnostic value. Confident jurors also gave more weight to evidence of unknown diagnostic value. Theoretical and legal implications are discussed.
ERIC Educational Resources Information Center
Knapp, Thomas R.; Noblitt, Gerald L.; Viragoontavan, Sunanta
2000-01-01
There is a trend toward abandoning traditional parametric approaches to data analysis, with all their restrictive assumptions, in favor of computer-intensive nonparametric inferential statistical procedures, such as the jackknife and the bootstrap that are based on resampling of the sample data. These techniques are compared with the parametric…
Technology Focus: Using Technology to Explore Statistical Inference
ERIC Educational Resources Information Center
Garofalo, Joe; Juersivich, Nicole
2007-01-01
There is much research that documents what many teachers know, that students struggle with many concepts in probability and statistics. This article presents two sample activities the authors use to help preservice teachers develop ideas about how they can use technology to promote their students' ability to understand mathematics and connect…
Statistical Inference and Simulation with StatKey
ERIC Educational Resources Information Center
Quinn, Anne
2016-01-01
While looking for an inexpensive technology package to help students in statistics classes, the author found StatKey, a free Web-based app. Not only is StatKey useful for students' year-end projects, but it is also valuable for helping students learn fundamental content such as the central limit theorem. Using StatKey, students can engage in…
Trans-dimensional Bayesian inference for large sequential data sets
NASA Astrophysics Data System (ADS)
Mandolesi, E.; Dettmer, J.; Dosso, S. E.; Holland, C. W.
2015-12-01
This work develops a sequential Monte Carlo method to infer seismic parameters of layered seabeds from large sequential reflection-coefficient data sets. The approach provides parameter estimates and uncertainties along survey tracks with the goal to aid in the detection of unexploded ordnance in shallow water. The sequential data are acquired by a moving platform with source and receiver array towed close to the seabed. This geometry requires consideration of spherical reflection coefficients, computed efficiently by massively parallel implementation of the Sommerfeld integral via Levin integration on a graphics processing unit. The seabed is parametrized with a trans-dimensional model to account for changes in the environment (i.e. changes in layering) along the track. The method combines advanced Markov chain Monte Carlo methods (annealing) with particle filtering (resampling). Since data from closely-spaced source transmissions (pings) often sample similar environments, the solution from one ping can be utilized to efficiently estimate the posterior for data from subsequent pings. Since reflection-coefficient data are highly informative, the likelihood function can be extremely peaked, resulting in little overlap between posteriors of adjacent pings. This is addressed by adding bridging distributions (via annealed importance sampling) between pings for more efficient transitions. The approach assumes the environment to be changing slowly enough to justify the local 1D parametrization. However, bridging allows rapid changes between pings to be addressed and we demonstrate the method to be stable in such situations. Results are in terms of trans-D parameter estimates and uncertainties along the track. The algorithm is examined for realistic simulated data along a track and applied to a dataset collected by an autonomous underwater vehicle on the Malta Plateau, Mediterranean Sea. [Work supported by the SERDP, DoD.
Inferring Master Painters' Esthetic Biases from the Statistics of Portraits.
Aleem, Hassan; Correa-Herran, Ivan; Grzywacz, Norberto M
2017-01-01
The Processing Fluency Theory posits that the ease of sensory information processing in the brain facilitates esthetic pleasure. Accordingly, the theory would predict that master painters should display biases toward visual properties such as symmetry, balance, and moderate complexity. Have these biases been occurring and if so, have painters been optimizing these properties (fluency variables)? Here, we address these questions with statistics of portrait paintings from the Early Renaissance period. To do this, we first developed different computational measures for each of the aforementioned fluency variables. Then, we measured their statistics in 153 portraits from 26 master painters, in 27 photographs of people in three controlled poses, and in 38 quickly snapped photographs of individual persons. A statistical comparison between Early Renaissance portraits and quickly snapped photographs revealed that painters showed a bias toward balance, symmetry, and moderate complexity. However, a comparison between portraits and controlled-pose photographs showed that painters did not optimize each of these properties. Instead, different painters presented biases toward different, narrow ranges of fluency variables. Further analysis suggested that the painters' individuality stemmed in part from having to resolve the tension between complexity vs. symmetry and balance. We additionally found that constraints on the use of different painting materials by distinct painters modulated these fluency variables systematically. In conclusion, the Processing Fluency Theory of Esthetic Pleasure would need expansion if we were to apply it to the history of visual art since it cannot explain the lack of optimization of each fluency variables. To expand the theory, we propose the existence of a Neuroesthetic Space, which encompasses the possible values that each of the fluency variables can reach in any given art period. We discuss the neural mechanisms of this Space and propose that it
Statistical Inference of Biometrical Genetic Model With Cultural Transmission.
Guo, Xiaobo; Ji, Tian; Wang, Xueqin; Zhang, Heping; Zhong, Shouqiang
2013-01-01
Twin and family studies establish the foundation for studying the genetic, environmental and cultural transmission effects for phenotypes. In this work, we make use of the well established statistical methods and theory for mixed models to assess cultural transmission in twin and family studies. Specifically, we address two critical yet poorly understood issues: the model identifiability in assessing cultural transmission for twin and family data and the biases in the estimates when sub-models are used. We apply our models and theory to two real data sets. A simulation is conducted to verify the bias in the estimates of genetic effects when the working model is a sub-model.
Inferring Master Painters' Esthetic Biases from the Statistics of Portraits
Aleem, Hassan; Correa-Herran, Ivan; Grzywacz, Norberto M.
2017-01-01
The Processing Fluency Theory posits that the ease of sensory information processing in the brain facilitates esthetic pleasure. Accordingly, the theory would predict that master painters should display biases toward visual properties such as symmetry, balance, and moderate complexity. Have these biases been occurring and if so, have painters been optimizing these properties (fluency variables)? Here, we address these questions with statistics of portrait paintings from the Early Renaissance period. To do this, we first developed different computational measures for each of the aforementioned fluency variables. Then, we measured their statistics in 153 portraits from 26 master painters, in 27 photographs of people in three controlled poses, and in 38 quickly snapped photographs of individual persons. A statistical comparison between Early Renaissance portraits and quickly snapped photographs revealed that painters showed a bias toward balance, symmetry, and moderate complexity. However, a comparison between portraits and controlled-pose photographs showed that painters did not optimize each of these properties. Instead, different painters presented biases toward different, narrow ranges of fluency variables. Further analysis suggested that the painters' individuality stemmed in part from having to resolve the tension between complexity vs. symmetry and balance. We additionally found that constraints on the use of different painting materials by distinct painters modulated these fluency variables systematically. In conclusion, the Processing Fluency Theory of Esthetic Pleasure would need expansion if we were to apply it to the history of visual art since it cannot explain the lack of optimization of each fluency variables. To expand the theory, we propose the existence of a Neuroesthetic Space, which encompasses the possible values that each of the fluency variables can reach in any given art period. We discuss the neural mechanisms of this Space and propose that it
Image analysis and statistical inference in neuroimaging with R.
Tabelow, K; Clayden, J D; de Micheaux, P Lafaye; Polzehl, J; Schmid, V J; Whitcher, B
2011-04-15
R is a language and environment for statistical computing and graphics. It can be considered an alternative implementation of the S language developed in the 1970s and 1980s for data analysis and graphics (Becker and Chambers, 1984; Becker et al., 1988). The R language is part of the GNU project and offers versions that compile and run on almost every major operating system currently available. We highlight several R packages built specifically for the analysis of neuroimaging data in the context of functional MRI, diffusion tensor imaging, and dynamic contrast-enhanced MRI. We review their methodology and give an overview of their capabilities for neuroimaging. In addition we summarize some of the current activities in the area of neuroimaging software development in R.
Inferences about time course of Weber's Law violate statistical principles.
Foster, Rachel M; Franz, Volker H
2013-01-15
Recently, Holmes et al. (2011b) suggested that grasping is only subject to Weber's Law at early but not late points of a grasping movement. They therefore conclude that distinct visual computations and information may guide early and late portions of grasping. Here, we argue that their results can be explained by an interesting statistical artifact, and cannot be considered indicative of the presence or absence of Weber's Law during early portions of grasping. Our argument has implications for other studies using similar methodology (e.g., Heath et al., 2011, Holmes et al., 2011a, 2012), and also for the analysis of temporal data (often called time series) in general. Copyright © 2012 Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Denbleyker, John Nickolas
2012-01-01
The shortcomings of the proportion above cut (PAC) statistic used so prominently in the educational landscape renders it a very problematic measure for making correct inferences with student test data. The limitations of PAC-based statistics are more pronounced with cross-test comparisons due to their dependency on cut-score locations. A better…
Statistical Inference and Sensitivity to Sampling in 11-Month-Old Infants
ERIC Educational Resources Information Center
Xu, Fei; Denison, Stephanie
2009-01-01
Research on initial conceptual knowledge and research on early statistical learning mechanisms have been, for the most part, two separate enterprises. We report a study with 11-month-old infants investigating whether they are sensitive to sampling conditions and whether they can integrate intentional information in a statistical inference task.…
Statistical Inference and Sensitivity to Sampling in 11-Month-Old Infants
ERIC Educational Resources Information Center
Xu, Fei; Denison, Stephanie
2009-01-01
Research on initial conceptual knowledge and research on early statistical learning mechanisms have been, for the most part, two separate enterprises. We report a study with 11-month-old infants investigating whether they are sensitive to sampling conditions and whether they can integrate intentional information in a statistical inference task.…
ERIC Educational Resources Information Center
Denbleyker, John Nickolas
2012-01-01
The shortcomings of the proportion above cut (PAC) statistic used so prominently in the educational landscape renders it a very problematic measure for making correct inferences with student test data. The limitations of PAC-based statistics are more pronounced with cross-test comparisons due to their dependency on cut-score locations. A better…
Two-dimensional disordered Ising model within nonextensive statistics
NASA Astrophysics Data System (ADS)
Borodikhin, V. N.
2017-06-01
In this work, the two-dimensional disordered Ising model with nonextensive Tsallis statistics has been studied for the first time. The critical temperatures and critical indices have been determined for both disordered and uniform models. A new type of critical behavior has been revealed for the disordered model with nonextensive statistics. It has been shown that, within the nonextensive statistics of the two-dimensional Ising model, the Harris criterion is also valid.
A Test By Any Other Name: P-values, Bayes Factors and Statistical Inference
Stern, Hal S.
2016-01-01
The exchange between Hoitjink, van Kooten and Hulsker (in press) (HKH) and Morey, Wagenmakers, and Rouder (in press) (MWR) in this issue is focused on the use of Bayes factors for statistical inference but raises a number of more general questions about Bayesian and frequentist approaches to inference. This note addresses recent negative attention directed at p-values, the relationship of confidence intervals and tests, and the role of Bayesian inference and Bayes factors, with an eye towards better understanding these different strategies for statistical inference. We argue that researchers and data analysts too often resort to binary decisions (e.g., whether to reject or accept the null hypothesis) in settings where this may not be required. PMID:26881954
Statistical inference from capture data on closed animal populations
Otis, David L.; Burnham, Kenneth P.; White, Gary C.; Anderson, David R.
1978-01-01
The estimation of animal abundance is an important problem in both the theoretical and applied biological sciences. Serious work to develop estimation methods began during the 1950s, with a few attempts before that time. The literature on estimation methods has increased tremendously during the past 25 years (Cormack 1968, Seber 1973). However, in large part, the problem remains unsolved. Past efforts toward comprehensive and systematic estimation of density (D) or population size (N) have been inadequate, in general. While more than 200 papers have been published on the subject, one is generally left without a unified approach to the estimation of abundance of an animal population This situation is unfortunate because a number of pressing research problems require such information. In addition, a wide array of environmental assessment studies and biological inventory programs require the estimation of animal abundance. These needs have been further emphasized by the requirement for the preparation of Environmental Impact Statements imposed by the National Environmental Protection Act in 1970. This publication treats inference procedures for certain types of capture data on closed animal populations. This includes multiple capture-recapture studies (variously called capture-mark-recapture, mark-recapture, or tag-recapture studies) involving livetrapping techniques and removal studies involving kill traps or at least temporary removal of captured individuals during the study. Animals do not necessarily need to be physically trapped; visual sightings of marked animals and electrofishing studies also produce data suitable for the methods described in this monograph. To provide a frame of reference for what follows, we give an exampled of a capture-recapture experiment to estimate population size of small animals using live traps. The general field experiment is similar for all capture-recapture studies (a removal study is, of course, slightly different). A typical
Difference to Inference: teaching logical and statistical reasoning through on-line interactivity.
Malloy, T E
2001-05-01
Difference to Inference is an on-line JAVA program that simulates theory testing and falsification through research design and data collection in a game format. The program, based on cognitive and epistemological principles, is designed to support learning of the thinking skills underlying deductive and inductive logic and statistical reasoning. Difference to Inference has database connectivity so that game scores can be counted as part of course grades.
Statistical inference for nanopore sequencing with a biased random walk model.
Emmett, Kevin J; Rosenstein, Jacob K; van de Meent, Jan-Willem; Shepard, Ken L; Wiggins, Chris H
2015-04-21
Nanopore sequencing promises long read-lengths and single-molecule resolution, but the stochastic motion of the DNA molecule inside the pore is, as of this writing, a barrier to high accuracy reads. We develop a method of statistical inference that explicitly accounts for this error, and demonstrate that high accuracy (>99%) sequence inference is feasible even under highly diffusive motion by using a hidden Markov model to jointly analyze multiple stochastic reads. Using this model, we place bounds on achievable inference accuracy under a range of experimental parameters.
Social Inferences from Faces: Ambient Images Generate a Three-Dimensional Model
ERIC Educational Resources Information Center
Sutherland, Clare A. M.; Oldmeadow, Julian A.; Santos, Isabel M.; Towler, John; Burt, D. Michael; Young, Andrew W.
2013-01-01
Three experiments are presented that investigate the two-dimensional valence/trustworthiness by dominance model of social inferences from faces (Oosterhof & Todorov, 2008). Experiment 1 used image averaging and morphing techniques to demonstrate that consistent facial cues subserve a range of social inferences, even in a highly variable sample of…
Social Inferences from Faces: Ambient Images Generate a Three-Dimensional Model
ERIC Educational Resources Information Center
Sutherland, Clare A. M.; Oldmeadow, Julian A.; Santos, Isabel M.; Towler, John; Burt, D. Michael; Young, Andrew W.
2013-01-01
Three experiments are presented that investigate the two-dimensional valence/trustworthiness by dominance model of social inferences from faces (Oosterhof & Todorov, 2008). Experiment 1 used image averaging and morphing techniques to demonstrate that consistent facial cues subserve a range of social inferences, even in a highly variable sample of…
Schumacher, Johannes; Wunderle, Thomas; Fries, Pascal; Jäkel, Frank; Pipa, Gordon
2015-08-01
In neuroscience, data are typically generated from neural network activity. The resulting time series represent measurements from spatially distributed subsystems with complex interactions, weakly coupled to a high-dimensional global system. We present a statistical framework to estimate the direction of information flow and its delay in measurements from systems of this type. Informed by differential topology, gaussian process regression is employed to reconstruct measurements of putative driving systems from measurements of the driven systems. These reconstructions serve to estimate the delay of the interaction by means of an analytical criterion developed for this purpose. The model accounts for a range of possible sources of uncertainty, including temporally evolving intrinsic noise, while assuming complex nonlinear dependencies. Furthermore, we show that if information flow is delayed, this approach also allows for inference in strong coupling scenarios of systems exhibiting synchronization phenomena. The validity of the method is demonstrated with a variety of delay-coupled chaotic oscillators. In addition, we show that these results seamlessly transfer to local field potentials in cat visual cortex.
Ganju, Jitendra; Yu, Xinxin; Ma, Guoguang Julie
2013-01-01
Formal inference in randomized clinical trials is based on controlling the type I error rate associated with a single pre-specified statistic. The deficiency of using just one method of analysis is that it depends on assumptions that may not be met. For robust inference, we propose pre-specifying multiple test statistics and relying on the minimum p-value for testing the null hypothesis of no treatment effect. The null hypothesis associated with the various test statistics is that the treatment groups are indistinguishable. The critical value for hypothesis testing comes from permutation distributions. Rejection of the null hypothesis when the smallest p-value is less than the critical value controls the type I error rate at its designated value. Even if one of the candidate test statistics has low power, the adverse effect on the power of the minimum p-value statistic is not much. Its use is illustrated with examples. We conclude that it is better to rely on the minimum p-value rather than a single statistic particularly when that single statistic is the logrank test, because of the cost and complexity of many survival trials.
Bayesian Inference of High-Dimensional Dynamical Ocean Models
NASA Astrophysics Data System (ADS)
Lin, J.; Lermusiaux, P. F. J.; Lolla, S. V. T.; Gupta, A.; Haley, P. J., Jr.
2015-12-01
This presentation addresses a holistic set of challenges in high-dimension ocean Bayesian nonlinear estimation: i) predict the probability distribution functions (pdfs) of large nonlinear dynamical systems using stochastic partial differential equations (PDEs); ii) assimilate data using Bayes' law with these pdfs; iii) predict the future data that optimally reduce uncertainties; and (iv) rank the known and learn the new model formulations themselves. Overall, we allow the joint inference of the state, equations, geometry, boundary conditions and initial conditions of dynamical models. Examples are provided for time-dependent fluid and ocean flows, including cavity, double-gyre and Strait flows with jets and eddies. The Bayesian model inference, based on limited observations, is illustrated first by the estimation of obstacle shapes and positions in fluid flows. Next, the Bayesian inference of biogeochemical reaction equations and of their states and parameters is presented, illustrating how PDE-based machine learning can rigorously guide the selection and discovery of complex ecosystem models. Finally, the inference of multiscale bottom gravity current dynamics is illustrated, motivated in part by classic overflows and dense water formation sites and their relevance to climate monitoring and dynamics. This is joint work with our MSEAS group at MIT.
Protein and gene model inference based on statistical modeling in k-partite graphs.
Gerster, Sarah; Qeli, Ermir; Ahrens, Christian H; Bühlmann, Peter
2010-07-06
One of the major goals of proteomics is the comprehensive and accurate description of a proteome. Shotgun proteomics, the method of choice for the analysis of complex protein mixtures, requires that experimentally observed peptides are mapped back to the proteins they were derived from. This process is also known as protein inference. We present Markovian Inference of Proteins and Gene Models (MIPGEM), a statistical model based on clearly stated assumptions to address the problem of protein and gene model inference for shotgun proteomics data. In particular, we are dealing with dependencies among peptides and proteins using a Markovian assumption on k-partite graphs. We are also addressing the problems of shared peptides and ambiguous proteins by scoring the encoding gene models. Empirical results on two control datasets with synthetic mixtures of proteins and on complex protein samples of Saccharomyces cerevisiae, Drosophila melanogaster, and Arabidopsis thaliana suggest that the results with MIPGEM are competitive with existing tools for protein inference.
Assessing colour-dependent occupation statistics inferred from galaxy group catalogues
NASA Astrophysics Data System (ADS)
Campbell, Duncan; van den Bosch, Frank C.; Hearin, Andrew; Padmanabhan, Nikhil; Berlind, Andreas; Mo, H. J.; Tinker, Jeremy; Yang, Xiaohu
2015-09-01
We investigate the ability of current implementations of galaxy group finders to recover colour-dependent halo occupation statistics. To test the fidelity of group catalogue inferred statistics, we run three different group finders used in the literature over a mock that includes galaxy colours in a realistic manner. Overall, the resulting mock group catalogues are remarkably similar, and most colour-dependent statistics are recovered with reasonable accuracy. However, it is also clear that certain systematic errors arise as a consequence of correlated errors in group membership determination, central/satellite designation, and halo mass assignment. We introduce a new statistic, the halo transition probability (HTP), which captures the combined impact of all these errors. As a rule of thumb, errors tend to equalize the properties of distinct galaxy populations (i.e. red versus blue galaxies or centrals versus satellites), and to result in inferred occupation statistics that are more accurate for red galaxies than for blue galaxies. A statistic that is particularly poorly recovered from the group catalogues is the red fraction of central galaxies as a function of halo mass. Group finders do a good job in recovering galactic conformity, but also have a tendency to introduce weak conformity when none is present. We conclude that proper inference of colour-dependent statistics from group catalogues is best achieved using forward modelling (i.e. running group finders over mock data) or by implementing a correction scheme based on the HTP, as long as the latter is not too strongly model dependent.
Statistical inference for remote sensing-based estimates of net deforestation
Ronald E. McRoberts; Brian F. Walters
2012-01-01
Statistical inference requires expression of an estimate in probabilistic terms, usually in the form of a confidence interval. An approach to constructing confidence intervals for remote sensing-based estimates of net deforestation is illustrated. The approach is based on post-classification methods using two independent forest/non-forest classifications because...
Young Children's Use of Statistical Sampling Evidence to Infer the Subjectivity of Preferences
ERIC Educational Resources Information Center
Ma, Lili; Xu, Fei
2011-01-01
A crucial task in social interaction involves understanding subjective mental states. Here we report two experiments with toddlers exploring whether they can use statistical evidence to infer the subjective nature of preferences. We found that 2-year-olds were likely to interpret another person's nonrandom sampling behavior as a cue for a…
Young Children's Use of Statistical Sampling Evidence to Infer the Subjectivity of Preferences
ERIC Educational Resources Information Center
Ma, Lili; Xu, Fei
2011-01-01
A crucial task in social interaction involves understanding subjective mental states. Here we report two experiments with toddlers exploring whether they can use statistical evidence to infer the subjective nature of preferences. We found that 2-year-olds were likely to interpret another person's nonrandom sampling behavior as a cue for a…
NASA Astrophysics Data System (ADS)
Bakker, Arthur; Ben-Zvi, Dani; Makar, Katie
2017-01-01
To understand how statistical and other types of reasoning are coordinated with actions to reduce uncertainty, we conducted a case study in vocational education that involved statistical hypothesis testing. We analyzed an intern's research project in a hospital laboratory in which reducing uncertainties was crucial to make a valid statistical inference. In his project, the intern, Sam, investigated whether patients' blood could be sent through pneumatic post without influencing the measurement of particular blood components. We asked, in the process of making a statistical inference, how are reasons and actions coordinated to reduce uncertainty? For the analysis, we used the semantic theory of inferentialism, specifically, the concept of webs of reasons and actions—complexes of interconnected reasons for facts and actions; these reasons include premises and conclusions, inferential relations, implications, motives for action, and utility of tools for specific purposes in a particular context. Analysis of interviews with Sam, his supervisor and teacher as well as video data of Sam in the classroom showed that many of Sam's actions aimed to reduce variability, rule out errors, and thus reduce uncertainties so as to arrive at a valid inference. Interestingly, the decisive factor was not the outcome of a t test but of the reference change value, a clinical chemical measure of analytic and biological variability. With insights from this case study, we expect that students can be better supported in connecting statistics with context and in dealing with uncertainty.
Inferring the connectivity of coupled oscillators from time-series statistical similarity analysis
Tirabassi, Giulio; Sevilla-Escoboza, Ricardo; Buldú, Javier M.; Masoller, Cristina
2015-01-01
A system composed by interacting dynamical elements can be represented by a network, where the nodes represent the elements that constitute the system, and the links account for their interactions, which arise due to a variety of mechanisms, and which are often unknown. A popular method for inferring the system connectivity (i.e., the set of links among pairs of nodes) is by performing a statistical similarity analysis of the time-series collected from the dynamics of the nodes. Here, by considering two systems of coupled oscillators (Kuramoto phase oscillators and Rössler chaotic electronic oscillators) with known and controllable coupling conditions, we aim at testing the performance of this inference method, by using linear and non linear statistical similarity measures. We find that, under adequate conditions, the network links can be perfectly inferred, i.e., no mistakes are made regarding the presence or absence of links. These conditions for perfect inference require: i) an appropriated choice of the observed variable to be analysed, ii) an appropriated interaction strength, and iii) an adequate thresholding of the similarity matrix. For the dynamical units considered here we find that the linear statistical similarity measure performs, in general, better than the non-linear ones. PMID:26042395
Elucidating the Foundations of Statistical Inference with 2 x 2 Tables
Choi, Leena; Blume, Jeffrey D.; Dupont, William D.
2015-01-01
To many, the foundations of statistical inference are cryptic and irrelevant to routine statistical practice. The analysis of 2 x 2 contingency tables, omnipresent in the scientific literature, is a case in point. Fisher's exact test is routinely used even though it has been fraught with controversy for over 70 years. The problem, not widely acknowledged, is that several different p-values can be associated with a single table, making scientific inference inconsistent. The root cause of this controversy lies in the table's origins and the manner in which nuisance parameters are eliminated. However, fundamental statistical principles (e.g., sufficiency, ancillarity, conditionality, and likelihood) can shed light on the controversy and guide our approach in using this test. In this paper, we use these fundamental principles to show how much information is lost when the tables origins are ignored and when various approaches are used to eliminate unknown nuisance parameters. We present novel likelihood contours to aid in the visualization of information loss and show that the information loss is often virtually non-existent. We find that problems arising from the discreteness of the sample space are exacerbated by p-value-based inference. Accordingly, methods that are less sensitive to this discreteness - likelihood ratios, posterior probabilities and mid-p-values - lead to more consistent inferences. PMID:25849515
Elucidating the foundations of statistical inference with 2 x 2 tables.
Choi, Leena; Blume, Jeffrey D; Dupont, William D
2015-01-01
To many, the foundations of statistical inference are cryptic and irrelevant to routine statistical practice. The analysis of 2 x 2 contingency tables, omnipresent in the scientific literature, is a case in point. Fisher's exact test is routinely used even though it has been fraught with controversy for over 70 years. The problem, not widely acknowledged, is that several different p-values can be associated with a single table, making scientific inference inconsistent. The root cause of this controversy lies in the table's origins and the manner in which nuisance parameters are eliminated. However, fundamental statistical principles (e.g., sufficiency, ancillarity, conditionality, and likelihood) can shed light on the controversy and guide our approach in using this test. In this paper, we use these fundamental principles to show how much information is lost when the tables origins are ignored and when various approaches are used to eliminate unknown nuisance parameters. We present novel likelihood contours to aid in the visualization of information loss and show that the information loss is often virtually non-existent. We find that problems arising from the discreteness of the sample space are exacerbated by p-value-based inference. Accordingly, methods that are less sensitive to this discreteness - likelihood ratios, posterior probabilities and mid-p-values - lead to more consistent inferences.
Emmert-Streib, Frank; Dehmer, Matthias; Haibe-Kains, Benjamin
2014-01-01
In this paper, we shed light on approaches that are currently used to infer networks from gene expression data with respect to their biological meaning. As we will show, the biological interpretation of these networks depends on the chosen theoretical perspective. For this reason, we distinguish a statistical perspective from a mathematical modeling perspective and elaborate their differences and implications. Our results indicate the imperative need for a genomic network ontology in order to avoid increasing confusion about the biological interpretation of inferred networks, which can be even enhanced by approaches that integrate multiple data sets, respectively, data types.
High-Dimensional Statistical Learning: Roots, Justifications, and Potential Machineries
Zollanvari, Amin
2015-01-01
High-dimensional data generally refer to data in which the number of variables is larger than the sample size. Analyzing such datasets poses great challenges for classical statistical learning because the finite-sample performance of methods developed within classical statistical learning does not live up to classical asymptotic premises in which the sample size unboundedly grows for a fixed dimensionality of observations. Much work has been done in developing mathematical–statistical techniques for analyzing high-dimensional data. Despite remarkable progress in this field, many practitioners still utilize classical methods for analyzing such datasets. This state of affairs can be attributed, in part, to a lack of knowledge and, in part, to the ready-to-use computational and statistical software packages that are well developed for classical techniques. Moreover, many scientists working in a specific field of high-dimensional statistical learning are either not aware of other existing machineries in the field or are not willing to try them out. The primary goal in this work is to bring together various machineries of high-dimensional analysis, give an overview of the important results, and present the operating conditions upon which they are grounded. When appropriate, readers are referred to relevant review articles for more information on a specific subject. PMID:27081307
High-Dimensional Statistical Learning: Roots, Justifications, and Potential Machineries.
Zollanvari, Amin
2015-01-01
High-dimensional data generally refer to data in which the number of variables is larger than the sample size. Analyzing such datasets poses great challenges for classical statistical learning because the finite-sample performance of methods developed within classical statistical learning does not live up to classical asymptotic premises in which the sample size unboundedly grows for a fixed dimensionality of observations. Much work has been done in developing mathematical-statistical techniques for analyzing high-dimensional data. Despite remarkable progress in this field, many practitioners still utilize classical methods for analyzing such datasets. This state of affairs can be attributed, in part, to a lack of knowledge and, in part, to the ready-to-use computational and statistical software packages that are well developed for classical techniques. Moreover, many scientists working in a specific field of high-dimensional statistical learning are either not aware of other existing machineries in the field or are not willing to try them out. The primary goal in this work is to bring together various machineries of high-dimensional analysis, give an overview of the important results, and present the operating conditions upon which they are grounded. When appropriate, readers are referred to relevant review articles for more information on a specific subject.
Dimensionality reduction and network inference for sea surface temperature data
NASA Astrophysics Data System (ADS)
Falasca, Fabrizio; Bracco, Annalisa; Nenes, Athanasios; Dovrolis, Constantine; Fountalis, Ilias
2017-04-01
Earth's climate is a complex dynamical system. The underlying components of the system interact with each other (in a linear or non linear way) on several spatial and time scales. Network science provides a set of tools to study the structure and dynamics of such systems. Here we propose an application of a novel network inference method, δ-MAPS, to investigate sea surface temperature (SST) fields in reanalyses and models. δ-MAPS first identifies the underlying components (domains) of the system, modeling them as spatially contiguous, potentially overlapping regions of highly correlated temporal activity, and then infers the weighted and potentially lagged interactions between them. The SST network is represented as a weighted and directed graph. Edge direction captures the temporal ordering of events, while edge weights capture the magnitude of the interaction between the domains. We focus on two reanalysis datasets (HadISST and COBE ) and on a dozen of runs of the CESM model (extracted from the so-called large ensemble). The networks are built using 45 years of data every 3 years for the total dataset temporal coverage (from 1871 to 2015 for HadISST, from 1891 to 2015 for COBE and from 1920 to 2100 for CESM members). We then explore similarities and differences between reanalyses and models in terms of the domains identified, the networks inferred and their time evolution. The spatial extent and shape of the identified domains is consistent between observations and models. According to our analysis the largest SST domain always corresponds to the El Niño Southern Oscillation (ENSO) while most of the other domains correspond to known climate modes. However, the network structure shows significant differences. For example, the unique role played by the South Tropical Atlantic in the observed network is not captured by any model run. Regarding the time evolution of the system we focus on the strength of ENSO: while we observe a positive trend for observations and
Statistical Inference Following Sample Size Adjustment Based on the 50%-Conditional-Power Principle.
Joshua Chen, Y H; Yuan, Shuai S; Li, Xiaoming
2017-08-29
Sample size adjustment at an interim analysis can mitigate the risk of failing to meet the study objective due to lower than expected treatment effect. Without modification to the conventional statistical methods, the type I error rate will be inflated, primarily caused by increasing sample size when the interim observed treatment effect is close to null or no treatment effect. Modifications to the conventional statistical methods, such as changing critical values or using weighted test statistics, have been proposed to address primarily such a scenario at the cost of flexibility or interpretability. In reality, increasing sample size when interim results indicate no or very small treatment effect could unnecessarily waste limited resource on an ineffective drug candidate. Such considerations lead to the recently increased interest in sample size adjustment based on promising interim results. The 50% conditional power principle allows sample size increase only when the unblinded interim results are promising or the conditional power being greater than 50%. The conventional un-weighted test statistics and critical values can be used without inflation of type I error rate. In this paper, statistical inference following such a design is assessed. As shown in the numerical study, the bias of the conventional maximum likelihood estimate (MLE) and coverage error of its conventional confidence interval are generally small following sample size adjustment. We recommend use of conventional, MLE-based statistical inference when applying the 50% conditional power principle for sample size adjustment. In such a way, consistent statistics will be used in both hypothesis test and statistical inference.
ERIC Educational Resources Information Center
Sotos, Ana Elisa Castro; Vanhoof, Stijn; Van den Noortgate, Wim; Onghena, Patrick
2007-01-01
A solid understanding of "inferential statistics" is of major importance for designing and interpreting empirical results in any scientific discipline. However, students are prone to many misconceptions regarding this topic. This article structurally summarizes and describes these misconceptions by presenting a systematic review of publications…
Statistical inference of the generation probability of T-cell receptors from sequence repertoires.
Murugan, Anand; Mora, Thierry; Walczak, Aleksandra M; Callan, Curtis G
2012-10-02
Stochastic rearrangement of germline V-, D-, and J-genes to create variable coding sequence for certain cell surface receptors is at the origin of immune system diversity. This process, known as "VDJ recombination", is implemented via a series of stochastic molecular events involving gene choices and random nucleotide insertions between, and deletions from, genes. We use large sequence repertoires of the variable CDR3 region of human CD4+ T-cell receptor beta chains to infer the statistical properties of these basic biochemical events. Because any given CDR3 sequence can be produced in multiple ways, the probability distribution of hidden recombination events cannot be inferred directly from the observed sequences; we therefore develop a maximum likelihood inference method to achieve this end. To separate the properties of the molecular rearrangement mechanism from the effects of selection, we focus on nonproductive CDR3 sequences in T-cell DNA. We infer the joint distribution of the various generative events that occur when a new T-cell receptor gene is created. We find a rich picture of correlation (and absence thereof), providing insight into the molecular mechanisms involved. The generative event statistics are consistent between individuals, suggesting a universal biochemical process. Our probabilistic model predicts the generation probability of any specific CDR3 sequence by the primitive recombination process, allowing us to quantify the potential diversity of the T-cell repertoire and to understand why some sequences are shared between individuals. We argue that the use of formal statistical inference methods, of the kind presented in this paper, will be essential for quantitative understanding of the generation and evolution of diversity in the adaptive immune system.
Research participant compensation: A matter of statistical inference as well as ethics.
Swanson, David M; Betensky, Rebecca A
2015-11-01
The ethics of compensation of research subjects for participation in clinical trials has been debated for years. One ethical issue of concern is variation among subjects in the level of compensation for identical treatments. Surprisingly, the impact of variation on the statistical inferences made from trial results has not been examined. We seek to identify how variation in compensation may influence any existing dependent censoring in clinical trials, thereby also influencing inference about the survival curve, hazard ratio, or other measures of treatment efficacy. In simulation studies, we consider a model for how compensation structure may influence the censoring model. Under existing dependent censoring, we estimate survival curves under different compensation structures and observe how these structures induce variability in the estimates. We show through this model that if the compensation structure affects the censoring model and dependent censoring is present, then variation in that structure induces variation in the estimates and affects the accuracy of estimation and inference on treatment efficacy. From the perspectives of both ethics and statistical inference, standardization and transparency in the compensation of participants in clinical trials are warranted.
Evaluation of statistical inference on empirical resting state fMRI.
Yang, Xue; Kang, Hakmook; Newton, Allen T; Landman, Bennett A
2014-04-01
Modern statistical inference techniques may be able to improve the sensitivity and specificity of resting state functional magnetic resonance imaging (rs-fMRI) connectivity analysis through more realistic assumptions. In simulation, the advantages of such methods are readily demonstrable. However, quantitative empirical validation remains elusive in vivo as the true connectivity patterns are unknown and noise distributions are challenging to characterize, especially in ultra-high field (e.g., 7T fMRI). Though the physiological characteristics of the fMRI signal are difficult to replicate in controlled phantom studies, it is critical that the performance of statistical techniques be evaluated. The SIMulation EXtrapolation (SIMEX) method has enabled estimation of bias with asymptotically consistent estimators on empirical finite sample data by adding simulated noise . To avoid the requirement of accurate estimation of noise structure, the proposed quantitative evaluation approach leverages the theoretical core of SIMEX to study the properties of inference methods in the face of diminishing data (in contrast to increasing noise). The performance of ordinary and robust inference methods in simulation and empirical rs-fMRI are compared using the proposed quantitative evaluation approach. This study provides a simple, but powerful method for comparing a proxy for inference accuracy using empirical data.
Goyal, Ravi; De Gruttola, Victor
2017-07-25
Analysis of sexual history data intended to describe sexual networks presents many challenges arising from the fact that most surveys collect information on only a very small fraction of the population of interest. In addition, partners are rarely identified and responses are subject to reporting biases. Typically, each network statistic of interest, such as mean number of sexual partners for men or women, is estimated independently of other network statistics. There is, however, a complex relationship among networks statistics; and knowledge of these relationships can aid in addressing concerns mentioned earlier. We develop a novel method that constrains a posterior predictive distribution of a collection of network statistics in order to leverage the relationships among network statistics in making inference about network properties of interest. The method ensures that inference on network properties is compatible with an actual network. Through extensive simulation studies, we also demonstrate that use of this method can improve estimates in settings where there is uncertainty that arises both from sampling and from systematic reporting bias compared with currently available approaches to estimation. To illustrate the method, we apply it to estimate network statistics using data from the Chicago Health and Social Life Survey. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Emura, Takeshi; Konno, Yoshihiko; Michimae, Hirofumi
2015-07-01
Doubly truncated data consist of samples whose observed values fall between the right- and left- truncation limits. With such samples, the distribution function of interest is estimated using the nonparametric maximum likelihood estimator (NPMLE) that is obtained through a self-consistency algorithm. Owing to the complicated asymptotic distribution of the NPMLE, the bootstrap method has been suggested for statistical inference. This paper proposes a closed-form estimator for the asymptotic covariance function of the NPMLE, which is computationally attractive alternative to bootstrapping. Furthermore, we develop various statistical inference procedures, such as confidence interval, goodness-of-fit tests, and confidence bands to demonstrate the usefulness of the proposed covariance estimator. Simulations are performed to compare the proposed method with both the bootstrap and jackknife methods. The methods are illustrated using the childhood cancer dataset.
Zhu, Hongjian
2016-12-12
Seamless phase II/III clinical trials have attracted increasing attention recently. They mainly use Bayesian response adaptive randomization (RAR) designs. There has been little research into seamless clinical trials using frequentist RAR designs because of the difficulty in performing valid statistical inference following this procedure. The well-designed frequentist RAR designs can target theoretically optimal allocation proportions, and they have explicit asymptotic results. In this paper, we study the asymptotic properties of frequentist RAR designs with adjusted target allocation proportions, and investigate statistical inference for this procedure. The properties of the proposed design provide an important theoretical foundation for advanced seamless clinical trials. Our numerical studies demonstrate that the design is ethical and efficient.
Trans-dimensional Bayesian inference for gravitational lens substructures
NASA Astrophysics Data System (ADS)
Brewer, Brendon J.; Huijser, David; Lewis, Geraint F.
2016-01-01
We introduce a Bayesian solution to the problem of inferring the density profile of strong gravitational lenses when the lens galaxy may contain multiple dark or faint substructures. The source and lens models are based on a superposition of an unknown number of non-negative basis functions (or `blobs') whose form was chosen with speed as a primary criterion. The prior distribution for the blobs' properties is specified hierarchically, so the mass function of substructures is a natural output of the method. We use reversible jump Markov Chain Monte Carlo within Diffusive Nested Sampling to sample the posterior distribution and evaluate the marginal likelihood of the model, including the summation over the unknown number of blobs in the source and the lens. We demonstrate the method on two simulated data sets: one with a single substructure, and the other with 10. We also apply the method to the g-band image of the `Cosmic Horseshoe' system, and find evidence for more than zero substructures. However, these have large spatial extent and probably only point to misspecifications in the model (such as the shape of the smooth lens component or the point-spread function), which are difficult to guard against in full generality.
NASA Astrophysics Data System (ADS)
Cocco, Simona; Monasson, Rémi; Weigt, Martin
2013-12-01
We consider the Hopfield-Potts model for the covariation between residues in protein families recently introduced in Cocco, Monasson, Weigt (2013). The patterns of the model are inferred from the data within a new gauge, more symmetric in the residues. We compute the statistical error bars on the pattern components. Results are illustrated on real data for a response regulator receiver domain (Pfam ID PF00072) family.
PyClone: statistical inference of clonal population structure in cancer.
Roth, Andrew; Khattra, Jaswinder; Yap, Damian; Wan, Adrian; Laks, Emma; Biele, Justina; Ha, Gavin; Aparicio, Samuel; Bouchard-Côté, Alexandre; Shah, Sohrab P
2014-04-01
We introduce PyClone, a statistical model for inference of clonal population structures in cancers. PyClone is a Bayesian clustering method for grouping sets of deeply sequenced somatic mutations into putative clonal clusters while estimating their cellular prevalences and accounting for allelic imbalances introduced by segmental copy-number changes and normal-cell contamination. Single-cell sequencing validation demonstrates PyClone's accuracy.
A variance components model for statistical inference on functional connectivity networks.
Fiecas, Mark; Cribben, Ivor; Bahktiari, Reyhaneh; Cummine, Jacqueline
2017-01-24
We propose a variance components linear modeling framework to conduct statistical inference on functional connectivity networks that directly accounts for the temporal autocorrelation inherent in functional magnetic resonance imaging (fMRI) time series data and for the heterogeneity across subjects in the study. The novel method estimates the autocorrelation structure in a nonparametric and subject-specific manner, and estimates the variance due to the heterogeneity using iterative least squares. We apply the new model to a resting-state fMRI study to compare the functional connectivity networks in both typical and reading impaired young adults in order to characterize the resting state networks that are related to reading processes. We also compare the performance of our model to other methods of statistical inference on functional connectivity networks that do not account for the temporal autocorrelation or heterogeneity across the subjects using simulated data, and show that by accounting for these sources of variation and covariation results in more powerful tests for statistical inference.
Yang, Lan-Yan; Chi, Yunchan; Chow, Shein-Chung
2011-05-01
In clinical research, it is not uncommon to modify a trial procedure and/or statistical methods of ongoing clinical trials through protocol amendments. A major modification to the study protocol could result in a shift in target patient population. In addition, frequent and significant modifications could lead to a totally different study that is unable to address the medical questions that the original study intended to answer. In this article, we propose a logistic regression model for statistical inference based on a binary study endpoint for trials with protocol amendments. Under the proposed method, sample size adjustment is also derived.
Inference in infinite-dimensional inverse problems - Discretization and duality
NASA Technical Reports Server (NTRS)
Stark, Philip B.
1992-01-01
Many techniques for solving inverse problems involve approximating the unknown model, a function, by a finite-dimensional 'discretization' or parametric representation. The uncertainty in the computed solution is sometimes taken to be the uncertainty within the parametrization; this can result in unwarranted confidence. The theory of conjugate duality can overcome the limitations of discretization within the 'strict bounds' formalism, a technique for constructing confidence intervals for functionals of the unknown model incorporating certain types of prior information. The usual computational approach to strict bounds approximates the 'primal' problem in a way that the resulting confidence intervals are at most long enough to have the nominal coverage probability. There is another approach based on 'dual' optimization problems that gives confidence intervals with at least the nominal coverage probability. The pair of intervals derived by the two approaches bracket a correct confidence interval. The theory is illustrated with gravimetric, seismic, geomagnetic, and helioseismic problems and a numerical example in seismology.
Multi-Dimensional Inference and Confidential Data Protection with Decision Tree Methods
2002-01-01
critical information technologies today. The pressing demand for such a protection technique is partly due to the trend of information sharing between insti...Dimensional Inference and Confidential Data Protection with Decision Tree Methods 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6
Inference for High-dimensional Differential Correlation Matrices *
Cai, T. Tony; Zhang, Anru
2015-01-01
Motivated by differential co-expression analysis in genomics, we consider in this paper estimation and testing of high-dimensional differential correlation matrices. An adaptive thresholding procedure is introduced and theoretical guarantees are given. Minimax rate of convergence is established and the proposed estimator is shown to be adaptively rate-optimal over collections of paired correlation matrices with approximately sparse differences. Simulation results show that the procedure significantly outperforms two other natural methods that are based on separate estimation of the individual correlation matrices. The procedure is also illustrated through an analysis of a breast cancer dataset, which provides evidence at the gene co-expression level that several genes, of which a subset has been previously verified, are associated with the breast cancer. Hypothesis testing on the differential correlation matrices is also considered. A test, which is particularly well suited for testing against sparse alternatives, is introduced. In addition, other related problems, including estimation of a single sparse correlation matrix, estimation of the differential covariance matrices, and estimation of the differential cross-correlation matrices, are also discussed. PMID:26500380
Inference for High-dimensional Differential Correlation Matrices.
Cai, T Tony; Zhang, Anru
2016-01-01
Motivated by differential co-expression analysis in genomics, we consider in this paper estimation and testing of high-dimensional differential correlation matrices. An adaptive thresholding procedure is introduced and theoretical guarantees are given. Minimax rate of convergence is established and the proposed estimator is shown to be adaptively rate-optimal over collections of paired correlation matrices with approximately sparse differences. Simulation results show that the procedure significantly outperforms two other natural methods that are based on separate estimation of the individual correlation matrices. The procedure is also illustrated through an analysis of a breast cancer dataset, which provides evidence at the gene co-expression level that several genes, of which a subset has been previously verified, are associated with the breast cancer. Hypothesis testing on the differential correlation matrices is also considered. A test, which is particularly well suited for testing against sparse alternatives, is introduced. In addition, other related problems, including estimation of a single sparse correlation matrix, estimation of the differential covariance matrices, and estimation of the differential cross-correlation matrices, are also discussed.
Statistical entropy of charged two-dimensional black holes
NASA Astrophysics Data System (ADS)
Teo, Edward
1998-06-01
The statistical entropy of a five-dimensional black hole in Type II string theory was recently derived by showing that it is U-dual to the three-dimensional Bañados-Teitelboim-Zanelli black hole, and using Carlip's method to count the microstates of the latter. This is valid even for the non-extremal case, unlike the derivation which relies on D-brane techniques. In this letter, I shall exploit the U-duality that exists between the five-dimensional black hole and the two-dimensional charged black hole of McGuigan, Nappi and Yost, to microscopically compute the entropy of the latter. It is shown that this result agrees with previous calculations using thermodynamic arguments.
Graffelman, Jan; Sánchez, Milagros; Cook, Samantha; Moreno, Victor
2013-01-01
In genetic association studies, tests for Hardy-Weinberg proportions are often employed as a quality control checking procedure. Missing genotypes are typically discarded prior to testing. In this paper we show that inference for Hardy-Weinberg proportions can be biased when missing values are discarded. We propose to use multiple imputation of missing values in order to improve inference for Hardy-Weinberg proportions. For imputation we employ a multinomial logit model that uses information from allele intensities and/or neighbouring markers. Analysis of an empirical data set of single nucleotide polymorphisms possibly related to colon cancer reveals that missing genotypes are not missing completely at random. Deviation from Hardy-Weinberg proportions is mostly due to a lack of heterozygotes. Inbreeding coefficients estimated by multiple imputation of the missings are typically lowered with respect to inbreeding coefficients estimated by discarding the missings. Accounting for missings by multiple imputation qualitatively changed the results of 10 to 17% of the statistical tests performed. Estimates of inbreeding coefficients obtained by multiple imputation showed high correlation with estimates obtained by single imputation using an external reference panel. Our conclusion is that imputation of missing data leads to improved statistical inference for Hardy-Weinberg proportions.
NASA Astrophysics Data System (ADS)
Savoy, H.; Renard, P.; Straubhaar, J.; Rubin, Y.
2016-12-01
Contaminant transport in aquifers is highly dependent on the spatial heterogeneity of hydraulic conductivity. This heterogeneity can exhibit complex spatial patterns in certain geologies, such as from braided river deposits. In order to address this spatial complexity, multipoint statistics methods can be used to generate random fields based on training images. This poster explores inferring conceptual models of heterogeneity from a variety of possible training images using multi-scale and multi-type hydrogeological data via the Method of Anchored Distributions (MAD). MAD has previously been applied in the inference of variogram parameters and this study is the first application of MAD to multipoint statistics and training images as random parameters. The collection of training images used in the study includes images inspired by natural channel networks plus variogram- and object-based random fields with approximately the same low-order statistics. The goal of this study is to showcase the applicability of coupling MAD and multipoint statistics, two generic methods that can constrain the uncertainty attributed to spatial heterogeneity in hydrology and other environmental sciences.
Local dependence in random graph models: characterization, properties and statistical inference.
Schweinberger, Michael; Handcock, Mark S
2015-06-01
Dependent phenomena, such as relational, spatial and temporal phenomena, tend to be characterized by local dependence in the sense that units which are close in a well-defined sense are dependent. In contrast with spatial and temporal phenomena, though, relational phenomena tend to lack a natural neighbourhood structure in the sense that it is unknown which units are close and thus dependent. Owing to the challenge of characterizing local dependence and constructing random graph models with local dependence, many conventional exponential family random graph models induce strong dependence and are not amenable to statistical inference. We take first steps to characterize local dependence in random graph models, inspired by the notion of finite neighbourhoods in spatial statistics and M-dependence in time series, and we show that local dependence endows random graph models with desirable properties which make them amenable to statistical inference. We show that random graph models with local dependence satisfy a natural domain consistency condition which every model should satisfy, but conventional exponential family random graph models do not satisfy. In addition, we establish a central limit theorem for random graph models with local dependence, which suggests that random graph models with local dependence are amenable to statistical inference. We discuss how random graph models with local dependence can be constructed by exploiting either observed or unobserved neighbourhood structure. In the absence of observed neighbourhood structure, we take a Bayesian view and express the uncertainty about the neighbourhood structure by specifying a prior on a set of suitable neighbourhood structures. We present simulation results and applications to two real world networks with 'ground truth'.
Hupé, Jean-Michel
2015-01-01
Published studies using functional and structural MRI include many errors in the way data are analyzed and conclusions reported. This was observed when working on a comprehensive review of the neural bases of synesthesia, but these errors are probably endemic to neuroimaging studies. All studies reviewed had based their conclusions using Null Hypothesis Significance Tests (NHST). NHST have yet been criticized since their inception because they are more appropriate for taking decisions related to a Null hypothesis (like in manufacturing) than for making inferences about behavioral and neuronal processes. Here I focus on a few key problems of NHST related to brain imaging techniques, and explain why or when we should not rely on “significance” tests. I also observed that, often, the ill-posed logic of NHST was even not correctly applied, and describe what I identified as common mistakes or at least problematic practices in published papers, in light of what could be considered as the very basics of statistical inference. MRI statistics also involve much more complex issues than standard statistical inference. Analysis pipelines vary a lot between studies, even for those using the same software, and there is no consensus which pipeline is the best. I propose a synthetic view of the logic behind the possible methodological choices, and warn against the usage and interpretation of two statistical methods popular in brain imaging studies, the false discovery rate (FDR) procedure and permutation tests. I suggest that current models for the analysis of brain imaging data suffer from serious limitations and call for a revision taking into account the “new statistics” (confidence intervals) logic. PMID:25745383
Local dependence in random graph models: characterization, properties and statistical inference
Schweinberger, Michael; Handcock, Mark S.
2015-01-01
Summary Dependent phenomena, such as relational, spatial and temporal phenomena, tend to be characterized by local dependence in the sense that units which are close in a well-defined sense are dependent. In contrast with spatial and temporal phenomena, though, relational phenomena tend to lack a natural neighbourhood structure in the sense that it is unknown which units are close and thus dependent. Owing to the challenge of characterizing local dependence and constructing random graph models with local dependence, many conventional exponential family random graph models induce strong dependence and are not amenable to statistical inference. We take first steps to characterize local dependence in random graph models, inspired by the notion of finite neighbourhoods in spatial statistics and M-dependence in time series, and we show that local dependence endows random graph models with desirable properties which make them amenable to statistical inference. We show that random graph models with local dependence satisfy a natural domain consistency condition which every model should satisfy, but conventional exponential family random graph models do not satisfy. In addition, we establish a central limit theorem for random graph models with local dependence, which suggests that random graph models with local dependence are amenable to statistical inference. We discuss how random graph models with local dependence can be constructed by exploiting either observed or unobserved neighbourhood structure. In the absence of observed neighbourhood structure, we take a Bayesian view and express the uncertainty about the neighbourhood structure by specifying a prior on a set of suitable neighbourhood structures. We present simulation results and applications to two real world networks with ‘ground truth’. PMID:26560142
Moon, Inkyu; Javidi, Bahram
2006-01-01
We present a new statistical approach to real-time sensing and recognition of microorganisms using digital holographic microscopy. We numerically produce many section images at different depths along a longitudinal direction from the single digital hologram of three-dimensional (3D) microorganisms in the Fresnel domain. For volumetric 3D recognition, the test pixel points are randomly selected from the section image; this procedure can be repeated with different specimens of the same microorganism. The multivariate joint density functions are calculated from the pixel values of each section image at the same random pixel points. The parameters of the statistical distributions are compared using maximum likelihood estimation and statistical inference algorithms. The performance of the proposed system is illustrated with preliminary experimental results.
Schmidt, Paul; Schmid, Volker J; Gaser, Christian; Buck, Dorothea; Bührlen, Susanne; Förschler, Annette; Mühlau, Mark
2013-01-01
Aiming at iron-related T2-hypointensity, which is related to normal aging and neurodegenerative processes, we here present two practicable approaches, based on Bayesian inference, for preprocessing and statistical analysis of a complex set of structural MRI data. In particular, Markov Chain Monte Carlo methods were used to simulate posterior distributions. First, we rendered a segmentation algorithm that uses outlier detection based on model checking techniques within a Bayesian mixture model. Second, we rendered an analytical tool comprising a Bayesian regression model with smoothness priors (in the form of Gaussian Markov random fields) mitigating the necessity to smooth data prior to statistical analysis. For validation, we used simulated data and MRI data of 27 healthy controls (age: [Formula: see text]; range, [Formula: see text]). We first observed robust segmentation of both simulated T2-hypointensities and gray-matter regions known to be T2-hypointense. Second, simulated data and images of segmented T2-hypointensity were analyzed. We found not only robust identification of simulated effects but also a biologically plausible age-related increase of T2-hypointensity primarily within the dentate nucleus but also within the globus pallidus, substantia nigra, and red nucleus. Our results indicate that fully Bayesian inference can successfully be applied for preprocessing and statistical analysis of structural MRI data.
A Comprehensive Statistical Model for Cell Signaling and Protein Activity Inference
Yörük, Erdem; Ochs, Michael F.; Geman, Donald; Younes, Laurent
2010-01-01
Protein signaling networks play a central role in transcriptional regulation and the etiology of many diseases. Statistical methods, particularly Bayesian networks, have been widely used to model cell signaling, mostly for model organisms and with focus on uncovering connectivity rather than inferring aberrations. Extensions to mammalian systems have not yielded compelling results, due likely to greatly increased complexity and limited proteomic measurements in vivo. In this study, we propose a comprehensive statistical model that is anchored to a predefined core topology, has a limited complexity due to parameter sharing and uses micorarray data of mRNA transcripts as the only observable components of signaling. Specifically, we account for cell heterogeneity and a multi-level process, representing signaling as a Bayesian network at the cell level, modeling measurements as ensemble averages at the tissue level and incorporating patient-to-patient differences at the population level. Motivated by the goal of identifying individual protein abnormalities as potential therapeutical targets, we applied our method to the RAS-RAF network using a breast cancer study with 118 patients. We demonstrated rigorous statistical inference, established reproducibility through simulations and the ability to recover receptor status from available microarray data. PMID:20855924
A statistical model for brain networks inferred from large-scale electrophysiological signals.
Obando, Catalina; De Vico Fallani, Fabrizio
2017-03-01
Network science has been extensively developed to characterize the structural properties of complex systems, including brain networks inferred from neuroimaging data. As a result of the inference process, networks estimated from experimentally obtained biological data represent one instance of a larger number of realizations with similar intrinsic topology. A modelling approach is therefore needed to support statistical inference on the bottom-up local connectivity mechanisms influencing the formation of the estimated brain networks. Here, we adopted a statistical model based on exponential random graph models (ERGMs) to reproduce brain networks, or connectomes, estimated by spectral coherence between high-density electroencephalographic (EEG) signals. ERGMs are made up by different local graph metrics, whereas the parameters weight the respective contribution in explaining the observed network. We validated this approach in a dataset of N = 108 healthy subjects during eyes-open (EO) and eyes-closed (EC) resting-state conditions. Results showed that the tendency to form triangles and stars, reflecting clustering and node centrality, better explained the global properties of the EEG connectomes than other combinations of graph metrics. In particular, the synthetic networks generated by this model configuration replicated the characteristic differences found in real brain networks, with EO eliciting significantly higher segregation in the alpha frequency band (8-13 Hz) than EC. Furthermore, the fitted ERGM parameter values provided complementary information showing that clustering connections are significantly more represented from EC to EO in the alpha range, but also in the beta band (14-29 Hz), which is known to play a crucial role in cortical processing of visual input and externally oriented attention. Taken together, these findings support the current view of the functional segregation and integration of the brain in terms of modules and hubs, and provide a
Statistical mechanics of two-dimensional and geophysical flows
NASA Astrophysics Data System (ADS)
Bouchet, Freddy; Venaille, Antoine
2012-06-01
The theoretical study of the self-organization of two-dimensional and geophysical turbulent flows is addressed based on statistical mechanics methods. This review is a self-contained presentation of classical and recent works on this subject; from the statistical mechanics basis of the theory up to applications to Jupiter’s troposphere and ocean vortices and jets. Emphasize has been placed on examples with available analytical treatment in order to favor better understanding of the physics and dynamics. After a brief presentation of the 2D Euler and quasi-geostrophic equations, the specificity of two-dimensional and geophysical turbulence is emphasized. The equilibrium microcanonical measure is built from the Liouville theorem. Important statistical mechanics concepts (large deviations and mean field approach) and thermodynamic concepts (ensemble inequivalence and negative heat capacity) are briefly explained and described. On this theoretical basis, we predict the output of the long time evolution of complex turbulent flows as statistical equilibria. This is applied to make quantitative models of two-dimensional turbulence, the Great Red Spot and other Jovian vortices, ocean jets like the Gulf-Stream, and ocean vortices. A detailed comparison between these statistical equilibria and real flow observations is provided. We also present recent results for non-equilibrium situations, for the studies of either the relaxation towards equilibrium or non-equilibrium steady states. In this last case, forces and dissipation are in a statistical balance; fluxes of conserved quantity characterize the system and microcanonical or other equilibrium measures no longer describe the system.
Three enhancements to the inference of statistical protein-DNA potentials.
AlQuraishi, Mohammed; McAdams, Harley H
2013-03-01
The energetics of protein-DNA interactions are often modeled using so-called statistical potentials, that is, energy models derived from the atomic structures of protein-DNA complexes. Many statistical protein-DNA potentials based on differing theoretical assumptions have been investigated, but little attention has been paid to the types of data and the parameter estimation process used in deriving the statistical potentials. We describe three enhancements to statistical potential inference that significantly improve the accuracy of predicted protein-DNA interactions: (i) incorporation of binding energy data of protein-DNA complexes, in conjunction with their X-ray crystal structures, (ii) use of spatially-aware parameter fitting, and (iii) use of ensemble-based parameter fitting. We apply these enhancements to three widely-used statistical potentials and use the resulting enhanced potentials in a structure-based prediction of the DNA binding sites of proteins. These enhancements are directly applicable to all statistical potentials used in protein-DNA modeling, and we show that they can improve the accuracy of predicted DNA binding sites by up to 21%. Copyright © 2012 Wiley Periodicals, Inc.
Statistical thermodynamics of a two-dimensional relativistic gas.
Montakhab, Afshin; Ghodrat, Malihe; Barati, Mahmood
2009-03-01
In this paper we study a fully relativistic model of a two-dimensional hard-disk gas. This model avoids the general problems associated with relativistic particle collisions and is therefore an ideal system to study relativistic effects in statistical thermodynamics. We study this model using molecular-dynamics simulation, concentrating on the velocity distribution functions. We obtain results for x and y components of velocity in the rest frame (Gamma) as well as the moving frame (Gamma;{'}) . Our results confirm that Jüttner distribution is the correct generalization of Maxwell-Boltzmann distribution. We obtain the same "temperature" parameter beta for both frames consistent with a recent study of a limited one-dimensional model. We also address the controversial topic of temperature transformation. We show that while local thermal equilibrium holds in the moving frame, relying on statistical methods such as distribution functions or equipartition theorem are ultimately inconclusive in deciding on a correct temperature transformation law (if any).
n-dimensional Statistical Inverse Graphical Hydraulic Test Simulator
2012-09-12
nSIGHTS (n-dimensional Statistical Inverse Graphical Hydraulic Test Simulator) is a comprehensive well test analysis software package. It provides a user-interface, a well test analysis model and many tools to analyze both field and simulated data. The well test analysis model simulates a single-phase, one-dimensional, radial/non-radial flow regime, with a borehole at the center of the modeled flow system. nSIGHTS solves the radially symmetric n-dimensional forward flow problem using a solver based on a graph-theoretic approach. The results of the forward simulation are pressure, and flow rate, given all the input parameters. The parameter estimation portion of nSIGHTS uses a perturbation-based approach to interpret the best-fit well and reservoir parameters, given an observed dataset of pressure and flow rate.
Jones, Graham; Sagitov, Serik; Oxelman, Bengt
2013-05-01
Polyploidy is an important speciation mechanism, particularly in land plants. Allopolyploid species are formed after hybridization between otherwise intersterile parental species. Recent theoretical progress has led to successful implementation of species tree models that take population genetic parameters into account. However, these models have not included allopolyploid hybridization and the special problems imposed when species trees of allopolyploids are inferred. Here, 2 new models for the statistical inference of the evolutionary history of allopolyploids are evaluated using simulations and demonstrated on 2 empirical data sets. It is assumed that there has been a single hybridization event between 2 diploid species resulting in a genomic allotetraploid. The evolutionary history can be represented as a species network or as a multilabeled species tree, in which some pairs of tips are labeled with the same species. In one of the models (AlloppMUL), the multilabeled species tree is inferred directly. This is the simplest model and the most widely applicable, since fewer assumptions are made. The second model (AlloppNET) incorporates the hybridization event explicitly which means that fewer parameters need to be estimated. Both models are implemented in the BEAST framework. Simulations show that both models are useful and that AlloppNET is more accurate if the assumptions it is based on are valid. The models are demonstrated on previously analyzed data from the genera Pachycladon (Brassicaceae) and Silene (Caryophyllaceae).
Chen, Zhe; Putrino, David F; Ghosh, Soumya; Barbieri, Riccardo; Brown, Emery N
2011-04-01
The ability to accurately infer functional connectivity between ensemble neurons using experimentally acquired spike train data is currently an important research objective in computational neuroscience. Point process generalized linear models and maximum likelihood estimation have been proposed as effective methods for the identification of spiking dependency between neurons. However, unfavorable experimental conditions occasionally results in insufficient data collection due to factors such as low neuronal firing rates or brief recording periods, and in these cases, the standard maximum likelihood estimate becomes unreliable. The present studies compares the performance of different statistical inference procedures when applied to the estimation of functional connectivity in neuronal assemblies with sparse spiking data. Four inference methods were compared: maximum likelihood estimation, penalized maximum likelihood estimation, using either l(2) or l(1) regularization, and hierarchical Bayesian estimation based on a variational Bayes algorithm. Algorithmic performances were compared using well-established goodness-of-fit measures in benchmark simulation studies, and the hierarchical Bayesian approach performed favorably when compared with the other algorithms, and this approach was then successfully applied to real spiking data recorded from the cat motor cortex. The identification of spiking dependencies in physiologically acquired data was encouraging, since their sparse nature would have previously precluded them from successful analysis using traditional methods.
Ciftçi, Koray; Sankur, Bülent; Kahya, Yasemin P; Akin, Ata
2008-09-01
Functional near-infrared spectroscopy (fNIRS) is an emerging technique for monitoring the concentration changes of oxy- and deoxy-hemoglobin (oxy-Hb and deoxy-Hb) in the brain. An important consideration in fNIRS-based neuroimaging modality is to conduct group-level analysis from a set of time series measured from a group of subjects. We investigate the feasibility of multilevel statistical inference for fNIRS. As a case study, we search for hemodynamic activations in the prefrontal cortex during Stroop interference. Hierarchical general linear model (GLM) is used for making this multilevel analysis. Activation patterns both at the subject and group level are investigated on a comparative basis using various classical and Bayesian inference methods. All methods showed consistent left lateral prefrontal cortex activation for oxy-Hb during interference condition, while the effects were much less pronounced for deoxy-Hb. Our analysis showed that mixed effects or Bayesian models are more convenient for faithful analysis of fNIRS data. We arrived at two important conclusions. First, fNIRS has the capability to identify activations at the group level, and second, the mixed effects or Bayesian model is the appropriate mechanism to pass from subject to group-level inference.
Approximation of epidemic models by diffusion processes and their statistical inference.
Guy, Romain; Larédo, Catherine; Vergu, Elisabeta
2015-02-01
Multidimensional continuous-time Markov jump processes [Formula: see text] on [Formula: see text] form a usual set-up for modeling [Formula: see text]-like epidemics. However, when facing incomplete epidemic data, inference based on [Formula: see text] is not easy to be achieved. Here, we start building a new framework for the estimation of key parameters of epidemic models based on statistics of diffusion processes approximating [Formula: see text]. First, previous results on the approximation of density-dependent [Formula: see text]-like models by diffusion processes with small diffusion coefficient [Formula: see text], where [Formula: see text] is the population size, are generalized to non-autonomous systems. Second, our previous inference results on discretely observed diffusion processes with small diffusion coefficient are extended to time-dependent diffusions. Consistent and asymptotically Gaussian estimates are obtained for a fixed number [Formula: see text] of observations, which corresponds to the epidemic context, and for [Formula: see text]. A correction term, which yields better estimates non asymptotically, is also included. Finally, performances and robustness of our estimators with respect to various parameters such as [Formula: see text] (the basic reproduction number), [Formula: see text], [Formula: see text] are investigated on simulations. Two models, [Formula: see text] and [Formula: see text], corresponding to single and recurrent outbreaks, respectively, are used to simulate data. The findings indicate that our estimators have good asymptotic properties and behave noticeably well for realistic numbers of observations and population sizes. This study lays the foundations of a generic inference method currently under extension to incompletely observed epidemic data. Indeed, contrary to the majority of current inference techniques for partially observed processes, which necessitates computer intensive simulations, our method being mostly an
A statistical method for lung tumor segmentation uncertainty in PET images based on user inference.
Zheng, Chaojie; Wang, Xiuying; Feng, Dagan
2015-01-01
PET has been widely accepted as an effective imaging modality for lung tumor diagnosis and treatment. However, standard criteria for delineating tumor boundary from PET are yet to develop largely due to relatively low quality of PET images, uncertain tumor boundary definition, and variety of tumor characteristics. In this paper, we propose a statistical solution to segmentation uncertainty on the basis of user inference. We firstly define the uncertainty segmentation band on the basis of segmentation probability map constructed from Random Walks (RW) algorithm; and then based on the extracted features of the user inference, we use Principle Component Analysis (PCA) to formulate the statistical model for labeling the uncertainty band. We validated our method on 10 lung PET-CT phantom studies from the public RIDER collections [1] and 16 clinical PET studies where tumors were manually delineated by two experienced radiologists. The methods were validated using Dice similarity coefficient (DSC) to measure the spatial volume overlap. Our method achieved an average DSC of 0.878 ± 0.078 on phantom studies and 0.835 ± 0.039 on clinical studies.
Statistical inference of seabed sound-speed structure in the Gulf of Oman Basin.
Sagers, Jason D; Knobles, David P
2014-06-01
Addressed is the statistical inference of the sound-speed depth profile of a thick soft seabed from broadband sound propagation data recorded in the Gulf of Oman Basin in 1977. The acoustic data are in the form of time series signals recorded on a sparse vertical line array and generated by explosive sources deployed along a 280 km track. The acoustic data offer a unique opportunity to study a deep-water bottom-limited thickly sedimented environment because of the large number of time series measurements, very low seabed attenuation, and auxiliary measurements. A maximum entropy method is employed to obtain a conditional posterior probability distribution (PPD) for the sound-speed ratio and the near-surface sound-speed gradient. The multiple data samples allow for a determination of the average error constraint value required to uniquely specify the PPD for each data sample. Two complicating features of the statistical inference study are addressed: (1) the need to develop an error function that can both utilize the measured multipath arrival structure and mitigate the effects of data errors and (2) the effect of small bathymetric slopes on the structure of the bottom interacting arrivals.
Jagiella, Nick; Rickert, Dennis; Theis, Fabian J; Hasenauer, Jan
2017-02-22
Mechanistic understanding of multi-scale biological processes, such as cell proliferation in a changing biological tissue, is readily facilitated by computational models. While tools exist to construct and simulate multi-scale models, the statistical inference of the unknown model parameters remains an open problem. Here, we present and benchmark a parallel approximate Bayesian computation sequential Monte Carlo (pABC SMC) algorithm, tailored for high-performance computing clusters. pABC SMC is fully automated and returns reliable parameter estimates and confidence intervals. By running the pABC SMC algorithm for ∼10(6) hr, we parameterize multi-scale models that accurately describe quantitative growth curves and histological data obtained in vivo from individual tumor spheroid growth in media droplets. The models capture the hybrid deterministic-stochastic behaviors of 10(5)-10(6) of cells growing in a 3D dynamically changing nutrient environment. The pABC SMC algorithm reliably converges to a consistent set of parameters. Our study demonstrates a proof of principle for robust, data-driven modeling of multi-scale biological systems and the feasibility of multi-scale model parameterization through statistical inference.
Moon, Inkyu; Yi, Faliu; Javidi, Bahram
2010-01-01
We overview an approach to providing automated three-dimensional (3D) sensing and recognition of biological micro/nanoorganisms integrating Gabor digital holographic microscopy and statistical sampling methods. For 3D data acquisition of biological specimens, a coherent beam propagates through the specimen and its transversely and longitudinally magnified diffraction pattern observed by the microscope objective is optically recorded with an image sensor array interfaced with a computer. 3D visualization of the biological specimen from the magnified diffraction pattern is accomplished by using the computational Fresnel propagation algorithm. For 3D recognition of the biological specimen, a watershed image segmentation algorithm is applied to automatically remove the unnecessary background parts in the reconstructed holographic image. Statistical estimation and inference algorithms are developed to the automatically segmented holographic image. Overviews of preliminary experimental results illustrate how the holographic image reconstructed from the Gabor digital hologram of biological specimen contains important information for microbial recognition. PMID:22163664
Moon, Inkyu; Yi, Faliu; Javidi, Bahram
2010-01-01
We overview an approach to providing automated three-dimensional (3D) sensing and recognition of biological micro/nanoorganisms integrating Gabor digital holographic microscopy and statistical sampling methods. For 3D data acquisition of biological specimens, a coherent beam propagates through the specimen and its transversely and longitudinally magnified diffraction pattern observed by the microscope objective is optically recorded with an image sensor array interfaced with a computer. 3D visualization of the biological specimen from the magnified diffraction pattern is accomplished by using the computational Fresnel propagation algorithm. For 3D recognition of the biological specimen, a watershed image segmentation algorithm is applied to automatically remove the unnecessary background parts in the reconstructed holographic image. Statistical estimation and inference algorithms are developed to the automatically segmented holographic image. Overviews of preliminary experimental results illustrate how the holographic image reconstructed from the Gabor digital hologram of biological specimen contains important information for microbial recognition.
NASA Technical Reports Server (NTRS)
Lerner, Jeffrey A.; Jedlovec, Gary J.; Atkinson, Robert J.
1998-01-01
Ever since the first satellite image loops from the 6.3 micron water vapor channel on the METEOSAT-1 in 1978, there have been numerous efforts (many to a great degree of success) to relate the water vapor radiance patterns to familiar atmospheric dynamic quantities. The realization of these efforts is becoming evident with the merging of satellite derived winds into predictive models (Velden et al., 1997; Swadley and Goerss, 1989). Another parameter that has been quantified from satellite water vapor channel measurements is upper tropospheric relative humidity (UTH) (e.g., Soden and Bretherton, 1996; Schmetz and Turpeinen, 1988). These humidity measurements, in turn, can be used to quantify upper tropospheric water vapor and its transport to more accurately diagnose climate changes (Lerner et al., 1998; Schmetz et al. 1995a) and quantify radiative processes in the upper troposphere. Also apparent in water vapor imagery animations are regions of subsiding and ascending air flow. Indeed, a component of the translated motions we observe are due to vertical velocities. The few attempts at exploiting this information have been met with a fair degree of success. Picon and Desbois (1990) statistically related Meteosat monthly mean water vapor radiances to six standard pressure levels of the European Centre for Medium Range Weather Forecast (ECMWF) model vertical velocities and found correlation coefficients of about 0.50 or less. This paper presents some preliminary results of viewing climatological satellite water vapor data in a different fashion. Specifically, we attempt to infer the three dimensional flow characteristics of the mid- to upper troposphere as portrayed by GOES VAS during the warm ENSO event (1987) and a subsequent cold period in 1998.
Social inferences from faces: ambient images generate a three-dimensional model.
Sutherland, Clare A M; Oldmeadow, Julian A; Santos, Isabel M; Towler, John; Michael Burt, D; Young, Andrew W
2013-04-01
Three experiments are presented that investigate the two-dimensional valence/trustworthiness by dominance model of social inferences from faces (Oosterhof & Todorov, 2008). Experiment 1 used image averaging and morphing techniques to demonstrate that consistent facial cues subserve a range of social inferences, even in a highly variable sample of 1000 ambient images (images that are intended to be representative of those encountered in everyday life, see Jenkins, White, Van Montfort, & Burton, 2011). Experiment 2 then tested Oosterhof and Todorov's two-dimensional model on this extensive sample of face images. The original two dimensions were replicated and a novel 'youthful-attractiveness' factor also emerged. Experiment 3 successfully cross-validated the three-dimensional model using face averages directly constructed from the factor scores. These findings highlight the utility of the original trustworthiness and dominance dimensions, but also underscore the need to utilise varied face stimuli: with a more realistically diverse set of face images, social inferences from faces show a more elaborate underlying structure than hitherto suggested. Copyright © 2012 Elsevier B.V. All rights reserved.
Coulson, Melissa; Healey, Michelle; Fidler, Fiona; Cumming, Geoff
2010-01-01
A statistically significant result, and a non-significant result may differ little, although significance status may tempt an interpretation of difference. Two studies are reported that compared interpretation of such results presented using null hypothesis significance testing (NHST), or confidence intervals (CIs). Authors of articles published in psychology, behavioral neuroscience, and medical journals were asked, via email, to interpret two fictitious studies that found similar results, one statistically significant, and the other non-significant. Responses from 330 authors varied greatly, but interpretation was generally poor, whether results were presented as CIs or using NHST. However, when interpreting CIs respondents who mentioned NHST were 60% likely to conclude, unjustifiably, the two results conflicted, whereas those who interpreted CIs without reference to NHST were 95% likely to conclude, justifiably, the two results were consistent. Findings were generally similar for all three disciplines. An email survey of academic psychologists confirmed that CIs elicit better interpretations if NHST is not invoked. Improved statistical inference can result from encouragement of meta-analytic thinking and use of CIs but, for full benefit, such highly desirable statistical reform requires also that researchers interpret CIs without recourse to NHST. PMID:21607077
Coulson, Melissa; Healey, Michelle; Fidler, Fiona; Cumming, Geoff
2010-01-01
A statistically significant result, and a non-significant result may differ little, although significance status may tempt an interpretation of difference. Two studies are reported that compared interpretation of such results presented using null hypothesis significance testing (NHST), or confidence intervals (CIs). Authors of articles published in psychology, behavioral neuroscience, and medical journals were asked, via email, to interpret two fictitious studies that found similar results, one statistically significant, and the other non-significant. Responses from 330 authors varied greatly, but interpretation was generally poor, whether results were presented as CIs or using NHST. However, when interpreting CIs respondents who mentioned NHST were 60% likely to conclude, unjustifiably, the two results conflicted, whereas those who interpreted CIs without reference to NHST were 95% likely to conclude, justifiably, the two results were consistent. Findings were generally similar for all three disciplines. An email survey of academic psychologists confirmed that CIs elicit better interpretations if NHST is not invoked. Improved statistical inference can result from encouragement of meta-analytic thinking and use of CIs but, for full benefit, such highly desirable statistical reform requires also that researchers interpret CIs without recourse to NHST.
Inferences on weather extremes and weather-related disasters: a review of statistical methods
NASA Astrophysics Data System (ADS)
Visser, H.; Petersen, A. C.
2012-02-01
The study of weather extremes and their impacts, such as weather-related disasters, plays an important role in research of climate change. Due to the great societal consequences of extremes - historically, now and in the future - the peer-reviewed literature on this theme has been growing enormously since the 1980s. Data sources have a wide origin, from century-long climate reconstructions from tree rings to relatively short (30 to 60 yr) databases with disaster statistics and human impacts. When scanning peer-reviewed literature on weather extremes and its impacts, it is noticeable that many different methods are used to make inferences. However, discussions on these methods are rare. Such discussions are important since a particular methodological choice might substantially influence the inferences made. A calculation of a return period of once in 500 yr, based on a normal distribution will deviate from that based on a Gumbel distribution. And the particular choice between a linear or a flexible trend model might influence inferences as well. In this article, a concise overview of statistical methods applied in the field of weather extremes and weather-related disasters is given. Methods have been evaluated as to stationarity assumptions, the choice for specific probability density functions (PDFs) and the availability of uncertainty information. As for stationarity assumptions, the outcome was that good testing is essential. Inferences on extremes may be wrong if data are assumed stationary while they are not. The same holds for the block-stationarity assumption. As for PDF choices it was found that often more than one PDF shape fits to the same data. From a simulation study the conclusion can be drawn that both the generalized extreme value (GEV) distribution and the log-normal PDF fit very well to a variety of indicators. The application of the normal and Gumbel distributions is more limited. As for uncertainty, it is advisable to test conclusions on extremes
On statistical inference in time series analysis of the evolution of road safety.
Commandeur, Jacques J F; Bijleveld, Frits D; Bergel-Hayat, Ruth; Antoniou, Constantinos; Yannis, George; Papadimitriou, Eleonora
2013-11-01
Data collected for building a road safety observatory usually include observations made sequentially through time. Examples of such data, called time series data, include annual (or monthly) number of road traffic accidents, traffic fatalities or vehicle kilometers driven in a country, as well as the corresponding values of safety performance indicators (e.g., data on speeding, seat belt use, alcohol use, etc.). Some commonly used statistical techniques imply assumptions that are often violated by the special properties of time series data, namely serial dependency among disturbances associated with the observations. The first objective of this paper is to demonstrate the impact of such violations to the applicability of standard methods of statistical inference, which leads to an under or overestimation of the standard error and consequently may produce erroneous inferences. Moreover, having established the adverse consequences of ignoring serial dependency issues, the paper aims to describe rigorous statistical techniques used to overcome them. In particular, appropriate time series analysis techniques of varying complexity are employed to describe the development over time, relating the accident-occurrences to explanatory factors such as exposure measures or safety performance indicators, and forecasting the development into the near future. Traditional regression models (whether they are linear, generalized linear or nonlinear) are shown not to naturally capture the inherent dependencies in time series data. Dedicated time series analysis techniques, such as the ARMA-type and DRAG approaches are discussed next, followed by structural time series models, which are a subclass of state space methods. The paper concludes with general recommendations and practice guidelines for the use of time series models in road safety research. Copyright © 2012 Elsevier Ltd. All rights reserved.
Inference of reaction rate parameters based on summary statistics from experiments
Khalil, Mohammad; Chowdhary, Kamaljit Singh; Safta, Cosmin; ...
2016-10-15
Here, we present the results of an application of Bayesian inference and maximum entropy methods for the estimation of the joint probability density for the Arrhenius rate para meters of the rate coefficient of the H2/O2-mechanism chain branching reaction H + O2 → OH + O. Available published data is in the form of summary statistics in terms of nominal values and error bars of the rate coefficient of this reaction at a number of temperature values obtained from shock-tube experiments. Our approach relies on generating data, in this case OH concentration profiles, consistent with the given summary statistics, usingmore » Approximate Bayesian Computation methods and a Markov Chain Monte Carlo procedure. The approach permits the forward propagation of parametric uncertainty through the computational model in a manner that is consistent with the published statistics. A consensus joint posterior on the parameters is obtained by pooling the posterior parameter densities given each consistent data set. To expedite this process, we construct efficient surrogates for the OH concentration using a combination of Pad'e and polynomial approximants. These surrogate models adequately represent forward model observables and their dependence on input parameters and are computationally efficient to allow their use in the Bayesian inference procedure. We also utilize Gauss-Hermite quadrature with Gaussian proposal probability density functions for moment computation resulting in orders of magnitude speedup in data likelihood evaluation. Despite the strong non-linearity in the model, the consistent data sets all res ult in nearly Gaussian conditional parameter probability density functions. The technique also accounts for nuisance parameters in the form of Arrhenius parameters of other rate coefficients with prescribed uncertainty. The resulting pooled parameter probability density function is propagated through stoichiometric hydrogen-air auto-ignition computations to illustrate
Inference of reaction rate parameters based on summary statistics from experiments
Khalil, Mohammad; Chowdhary, Kamaljit Singh; Safta, Cosmin; Sargsyan, Khachik; Najm, Habib N.
2016-10-15
Here, we present the results of an application of Bayesian inference and maximum entropy methods for the estimation of the joint probability density for the Arrhenius rate para meters of the rate coefficient of the H_{2}/O_{2}-mechanism chain branching reaction H + O_{2} → OH + O. Available published data is in the form of summary statistics in terms of nominal values and error bars of the rate coefficient of this reaction at a number of temperature values obtained from shock-tube experiments. Our approach relies on generating data, in this case OH concentration profiles, consistent with the given summary statistics, using Approximate Bayesian Computation methods and a Markov Chain Monte Carlo procedure. The approach permits the forward propagation of parametric uncertainty through the computational model in a manner that is consistent with the published statistics. A consensus joint posterior on the parameters is obtained by pooling the posterior parameter densities given each consistent data set. To expedite this process, we construct efficient surrogates for the OH concentration using a combination of Pad'e and polynomial approximants. These surrogate models adequately represent forward model observables and their dependence on input parameters and are computationally efficient to allow their use in the Bayesian inference procedure. We also utilize Gauss-Hermite quadrature with Gaussian proposal probability density functions for moment computation resulting in orders of magnitude speedup in data likelihood evaluation. Despite the strong non-linearity in the model, the consistent data sets all res ult in nearly Gaussian conditional parameter probability density functions. The technique also accounts for nuisance parameters in the form of Arrhenius parameters of other rate coefficients with prescribed uncertainty. The resulting pooled parameter probability density function is propagated through stoichiometric hydrogen-air auto
Marzouk, Youssef
2016-08-31
Predictive simulation of complex physical systems increasingly rests on the interplay of experimental observations with computational models. Key inputs, parameters, or structural aspects of models may be incomplete or unknown, and must be developed from indirect and limited observations. At the same time, quantified uncertainties are needed to qualify computational predictions in the support of design and decision-making. In this context, Bayesian statistics provides a foundation for inference from noisy and limited data, but at prohibitive computional expense. This project intends to make rigorous predictive modeling *feasible* in complex physical systems, via accelerated and scalable tools for uncertainty quantification, Bayesian inference, and experimental design. Specific objectives are as follows: 1. Develop adaptive posterior approximations and dimensionality reduction approaches for Bayesian inference in high-dimensional nonlinear systems. 2. Extend accelerated Bayesian methodologies to large-scale {\\em sequential} data assimilation, fully treating nonlinear models and non-Gaussian state and parameter distributions. 3. Devise efficient surrogate-based methods for Bayesian model selection and the learning of model structure. 4. Develop scalable simulation/optimization approaches to nonlinear Bayesian experimental design, for both parameter inference and model selection. 5. Demonstrate these inferential tools on chemical kinetic models in reacting flow, constructing and refining thermochemical and electrochemical models from limited data. Demonstrate Bayesian filtering on canonical stochastic PDEs and in the dynamic estimation of inhomogeneous subsurface properties and flow fields.
Statistical Downscaling in Multi-dimensional Wave Climate Forecast
NASA Astrophysics Data System (ADS)
Camus, P.; Méndez, F. J.; Medina, R.; Losada, I. J.; Cofiño, A. S.; Gutiérrez, J. M.
2009-04-01
Wave climate at a particular site is defined by the statistical distribution of sea state parameters, such as significant wave height, mean wave period, mean wave direction, wind velocity, wind direction and storm surge. Nowadays, long-term time series of these parameters are available from reanalysis databases obtained by numerical models. The Self-Organizing Map (SOM) technique is applied to characterize multi-dimensional wave climate, obtaining the relevant "wave types" spanning the historical variability. This technique summarizes multi-dimension of wave climate in terms of a set of clusters projected in low-dimensional lattice with a spatial organization, providing Probability Density Functions (PDFs) on the lattice. On the other hand, wind and storm surge depend on instantaneous local large-scale sea level pressure (SLP) fields while waves depend on the recent history of these fields (say, 1 to 5 days). Thus, these variables are associated with large-scale atmospheric circulation patterns. In this work, a nearest-neighbors analog method is used to predict monthly multi-dimensional wave climate. This method establishes relationships between the large-scale atmospheric circulation patterns from numerical models (SLP fields as predictors) with local wave databases of observations (monthly wave climate SOM PDFs as predictand) to set up statistical models. A wave reanalysis database, developed by Puertos del Estado (Ministerio de Fomento), is considered as historical time series of local variables. The simultaneous SLP fields calculated by NCEP atmospheric reanalysis are used as predictors. Several applications with different size of sea level pressure grid and with different temporal domain resolution are compared to obtain the optimal statistical model that better represents the monthly wave climate at a particular site. In this work we examine the potential skill of this downscaling approach considering perfect-model conditions, but we will also analyze the
McDonald, L.L.; Erickson, W.P.; Strickland, M.D.
1995-12-31
The objective of the Coastal Habitat Injury Assessment study was to document and quantify injury to biota of the shallow subtidal, intertidal, and supratidal zones throughout the shoreline affected by oil or cleanup activity associated with the Exxon Valdez oil spill. The results of these studies were to be used to support the Trustee`s Type B Natural Resource Damage Assessment under the Comprehensive Environmental Response, Compensation, and Liability Act of 1980 (CERCLA). A probability based stratified random sample of shoreline segments was selected with probability proportional to size from each of 15 strata (5 habitat types crossed with 3 levels of potential oil impact) based on those data available in July, 1989. Three study regions were used: Prince William Sound, Cook Inlet/Kenai Peninsula, and Kodiak/Alaska Peninsula. A Geographic Information System was utilized to combine oiling and habitat data and to select the probability sample of study sites. Quasi-experiments were conducted where randomly selected oiled sites were compared to matched reference sites. Two levels of statistical inferences, philosophical bases, and limitations are discussed and illustrated with example data from the resulting studies. 25 refs., 4 figs., 1 tab.
Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations.
Neuwald, Andrew F; Altschul, Stephen F
2016-12-01
Over evolutionary time, members of a superfamily of homologous proteins sharing a common structural core diverge into subgroups filling various functional niches. At the sequence level, such divergence appears as correlations that arise from residue patterns distinct to each subgroup. Such a superfamily may be viewed as a population of sequences corresponding to a complex, high-dimensional probability distribution. Here we model this distribution as hierarchical interrelated hidden Markov models (hiHMMs), which describe these sequence correlations implicitly. By characterizing such correlations one may hope to obtain information regarding functionally-relevant properties that have thus far evaded detection. To do so, we infer a hiHMM distribution from sequence data using Bayes' theorem and Markov chain Monte Carlo (MCMC) sampling, which is widely recognized as the most effective approach for characterizing a complex, high dimensional distribution. Other routines then map correlated residue patterns to available structures with a view to hypothesis generation. When applied to N-acetyltransferases, this reveals sequence and structural features indicative of functionally important, yet generally unknown biochemical properties. Even for sets of proteins for which nothing is known beyond unannotated sequences and structures, this can lead to helpful insights. We describe, for example, a putative coenzyme-A-induced-fit substrate binding mechanism mediated by arginine residue switching between salt bridge and π-π stacking interactions. A suite of programs implementing this approach is available (psed.igs.umaryland.edu).
Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations
Neuwald, Andrew F.
2016-01-01
Over evolutionary time, members of a superfamily of homologous proteins sharing a common structural core diverge into subgroups filling various functional niches. At the sequence level, such divergence appears as correlations that arise from residue patterns distinct to each subgroup. Such a superfamily may be viewed as a population of sequences corresponding to a complex, high-dimensional probability distribution. Here we model this distribution as hierarchical interrelated hidden Markov models (hiHMMs), which describe these sequence correlations implicitly. By characterizing such correlations one may hope to obtain information regarding functionally-relevant properties that have thus far evaded detection. To do so, we infer a hiHMM distribution from sequence data using Bayes’ theorem and Markov chain Monte Carlo (MCMC) sampling, which is widely recognized as the most effective approach for characterizing a complex, high dimensional distribution. Other routines then map correlated residue patterns to available structures with a view to hypothesis generation. When applied to N-acetyltransferases, this reveals sequence and structural features indicative of functionally important, yet generally unknown biochemical properties. Even for sets of proteins for which nothing is known beyond unannotated sequences and structures, this can lead to helpful insights. We describe, for example, a putative coenzyme-A-induced-fit substrate binding mechanism mediated by arginine residue switching between salt bridge and π-π stacking interactions. A suite of programs implementing this approach is available (psed.igs.umaryland.edu). PMID:28002465
Palstra, Friso P; Heyer, Evelyne; Austerlitz, Frédéric
2015-06-01
The demographic history of modern humans constitutes a combination of expansions, colonizations, contractions, and remigrations. The advent of large scale genetic data combined with statistically refined methods facilitates inference of this complex history. Here we study the demographic history of two genetically admixed ethnic groups in Central Asia, an area characterized by high levels of genetic diversity and a history of recurrent immigration. Using Approximate Bayesian Computation, we infer that the timing of admixture markedly differs between the two groups. Admixture in the traditionally agricultural Tajiks could be dated back to the onset of the Neolithic transition in the region, whereas admixture in Kyrgyz is more recent, and may have involved the westward movement of Turkic peoples. These results are confirmed by a coalescent method that fits an isolation-with-migration model to the genetic data, with both Central Asian groups having received gene flow from the extremities of Eurasia. Interestingly, our analyses also uncover signatures of gene flow from Eastern to Western Eurasia during Paleolithic times. In conclusion, the high genetic diversity currently observed in these two Central Asian peoples most likely reflects the effects of recurrent immigration that likely started before historical times. Conversely, conquests during historical times may have had a relatively limited genetic impact. These results emphasize the need for a better understanding of the genetic consequences of transmission of culture and technological innovations, as well as those of invasions and conquests.
Johnson, Eric D; Tubau, Elisabet
2016-09-27
Presenting natural frequencies facilitates Bayesian inferences relative to using percentages. Nevertheless, many people, including highly educated and skilled reasoners, still fail to provide Bayesian responses to these computationally simple problems. We show that the complexity of relational reasoning (e.g., the structural mapping between the presented and requested relations) can help explain the remaining difficulties. With a non-Bayesian inference that required identical arithmetic but afforded a more direct structural mapping, performance was universally high. Furthermore, reducing the relational demands of the task through questions that directed reasoners to use the presented statistics, as compared with questions that prompted the representation of a second, similar sample, also significantly improved reasoning. Distinct error patterns were also observed between these presented- and similar-sample scenarios, which suggested differences in relational-reasoning strategies. On the other hand, while higher numeracy was associated with better Bayesian reasoning, higher-numerate reasoners were not immune to the relational complexity of the task. Together, these findings validate the relational-reasoning view of Bayesian problem solving and highlight the importance of considering not only the presented task structure, but also the complexity of the structural alignment between the presented and requested relations.
Staude, Benjamin; Grün, Sonja; Rotter, Stefan
2009-01-01
The extent to which groups of neurons exhibit higher-order correlations in their spiking activity is a controversial issue in current brain research. A major difficulty is that currently available tools for the analysis of massively parallel spike trains (N >10) for higher-order correlations typically require vast sample sizes. While multiple single-cell recordings become increasingly available, experimental approaches to investigate the role of higher-order correlations suffer from the limitations of available analysis techniques. We have recently presented a novel method for cumulant-based inference of higher-order correlations (CuBIC) that detects correlations of higher order even from relatively short data stretches of length T = 10–100 s. CuBIC employs the compound Poisson process (CPP) as a statistical model for the population spike counts, and assumes spike trains to be stationary in the analyzed data stretch. In the present study, we describe a non-stationary version of the CPP by decoupling the correlation structure from the spiking intensity of the population. This allows us to adapt CuBIC to time-varying firing rates. Numerical simulations reveal that the adaptation corrects for false positive inference of correlations in data with pure rate co-variation, while allowing for temporal variations of the firing rates has a surprisingly small effect on CuBICs sensitivity for correlations. PMID:20725510
One-dimensional statistical parametric mapping in Python.
Pataky, Todd C
2012-01-01
Statistical parametric mapping (SPM) is a topological methodology for detecting field changes in smooth n-dimensional continua. Many classes of biomechanical data are smooth and contained within discrete bounds and as such are well suited to SPM analyses. The current paper accompanies release of 'SPM1D', a free and open-source Python package for conducting SPM analyses on a set of registered 1D curves. Three example applications are presented: (i) kinematics, (ii) ground reaction forces and (iii) contact pressure distribution in probabilistic finite element modelling. In addition to offering a high-level interface to a variety of common statistical tests like t tests, regression and ANOVA, SPM1D also emphasises fundamental concepts of SPM theory through stand-alone example scripts. Source code and documentation are available at: www.tpataky.net/spm1d/.
Lagrangian statistics in forced two-dimensional turbulence
NASA Astrophysics Data System (ADS)
Kamps, Oliver; Friedrich, Rudolf
2007-11-01
In recent years the Lagrangian description of turbulent flows has attracted much interest from the experimental point of view and as well is in the focus of numerical and analytical investigations. We present detailed numerical investigations of Lagrangian tracer particles in the inverse energy cascade of two-dimensional turbulence. In the first part we focus on the shape and scaling properties of the probability distribution functions for the velocity increments and compare them to the Eulerian case and the increment statistics in three dimensions. Motivated by our observations we address the important question of translating increment statistics from one frame of reference to the other [1]. To reveal the underlying physical mechanism we determine numerically the involved transition probabilities. In this way we shed light on the source of Lagrangian intermittency.[1ex] [1] R. Friedrich, R. Grauer, H. Hohmann, O. Kamps, A Corrsin type approximation for Lagrangian fluid Turbulence , arXiv:0705.3132
Specificity and timescales of cortical adaptation as inferences about natural movie statistics.
Snow, Michoel; Coen-Cagli, Ruben; Schwartz, Odelia
2016-10-01
Adaptation is a phenomenological umbrella term under which a variety of temporal contextual effects are grouped. Previous models have shown that some aspects of visual adaptation reflect optimal processing of dynamic visual inputs, suggesting that adaptation should be tuned to the properties of natural visual inputs. However, the link between natural dynamic inputs and adaptation is poorly understood. Here, we extend a previously developed Bayesian modeling framework for spatial contextual effects to the temporal domain. The model learns temporal statistical regularities of natural movies and links these statistics to adaptation in primary visual cortex via divisive normalization, a ubiquitous neural computation. In particular, the model divisively normalizes the present visual input by the past visual inputs only to the degree that these are inferred to be statistically dependent. We show that this flexible form of normalization reproduces classical findings on how brief adaptation affects neuronal selectivity. Furthermore, prior knowledge acquired by the Bayesian model from natural movies can be modified by prolonged exposure to novel visual stimuli. We show that this updating can explain classical results on contrast adaptation. We also simulate the recent finding that adaptation maintains population homeostasis, namely, a balanced level of activity across a population of neurons with different orientation preferences. Consistent with previous disparate observations, our work further clarifies the influence of stimulus-specific and neuronal-specific normalization signals in adaptation.
Geomechanical changes inferred from variations of statistical properties of microseismic events
NASA Astrophysics Data System (ADS)
Grob, M.; van der Baan, M.; Chorney, D.; Jain, P.
2012-12-01
Microseismic events created during hydraulic fracturing of oil and gas reservoirs or geothermal fields are used to infer geomechanical properties of the medium like the stress field. Usually moment tensor analysis is the chosen method to get to these properties. However moment tensor inversion is a complex analysis and requires very high quality data, which is hard to obtain in these environments. We suggest to use statistical analysis instead, particularly looking at the spatial and temporal variations of the b and D values which quantifies respectively the magnitude distribution and the spatial distribution of the events. The advantage of b and D statistical analysis is that it requires only the location and magnitude of the events which are routinely computed. b and D coefficients could even be calculated for near-real time characterisation of differences between various stages, thus helping change the strategy of fracturing as it goes. As the statistical analysis is performed over a cluster of events, it determines properties for a wide area of the reservoir and not only at the exact location of an event. We show how b and D can be related to the stress field through case studies in different environments and geomechanical simulations based on a bonded particle modelling. Computing b and D fractal dimensions is a simple procedure given a sufficiently large microseismic dataset and can reveal pertinent information on the local in situ stress regime in the reservoir.
Specificity and timescales of cortical adaptation as inferences about natural movie statistics
Snow, Michoel; Coen-Cagli, Ruben; Schwartz, Odelia
2016-01-01
Adaptation is a phenomenological umbrella term under which a variety of temporal contextual effects are grouped. Previous models have shown that some aspects of visual adaptation reflect optimal processing of dynamic visual inputs, suggesting that adaptation should be tuned to the properties of natural visual inputs. However, the link between natural dynamic inputs and adaptation is poorly understood. Here, we extend a previously developed Bayesian modeling framework for spatial contextual effects to the temporal domain. The model learns temporal statistical regularities of natural movies and links these statistics to adaptation in primary visual cortex via divisive normalization, a ubiquitous neural computation. In particular, the model divisively normalizes the present visual input by the past visual inputs only to the degree that these are inferred to be statistically dependent. We show that this flexible form of normalization reproduces classical findings on how brief adaptation affects neuronal selectivity. Furthermore, prior knowledge acquired by the Bayesian model from natural movies can be modified by prolonged exposure to novel visual stimuli. We show that this updating can explain classical results on contrast adaptation. We also simulate the recent finding that adaptation maintains population homeostasis, namely, a balanced level of activity across a population of neurons with different orientation preferences. Consistent with previous disparate observations, our work further clarifies the influence of stimulus-specific and neuronal-specific normalization signals in adaptation. PMID:27699416
Demidenko, Eugene; Williams, Benjamin B; Flood, Ann Barry; Swartz, Harold M
2013-05-30
This paper develops a new metric, the standard error of inverse prediction (SEIP), for a dose-response relationship (calibration curve) when dose is estimated from response via inverse regression. SEIP can be viewed as a generalization of the coefficient of variation to regression problem when x is predicted using y-value. We employ nonstandard statistical methods to treat the inverse prediction, which has an infinite mean and variance due to the presence of a normally distributed variable in the denominator. We develop confidence intervals and hypothesis testing for SEIP on the basis of the normal approximation and using the exact statistical inference based on the noncentral t-distribution. We derive the power functions for both approaches and test them via statistical simulations. The theoretical SEIP, as the ratio of the regression standard error to the slope, is viewed as reciprocal of the signal-to-noise ratio, a popular measure of signal processing. The SEIP, as a figure of merit for inverse prediction, can be used for comparison of calibration curves with different dependent variables and slopes. We illustrate our theory with electron paramagnetic resonance tooth dosimetry for a rapid estimation of the radiation dose received in the event of nuclear terrorism.
ERIC Educational Resources Information Center
Henriques, Ana; Oliveira, Hélia
2016-01-01
This paper reports on the results of a study investigating the potential to embed Informal Statistical Inference in statistical investigations, using TinkerPlots, for assisting 8th grade students' informal inferential reasoning to emerge, particularly their articulations of uncertainty. Data collection included students' written work on a…
ERIC Educational Resources Information Center
Henriques, Ana; Oliveira, Hélia
2016-01-01
This paper reports on the results of a study investigating the potential to embed Informal Statistical Inference in statistical investigations, using TinkerPlots, for assisting 8th grade students' informal inferential reasoning to emerge, particularly their articulations of uncertainty. Data collection included students' written work on a…
Validi, AbdoulAhad
2014-03-01
This study introduces a non-intrusive approach in the context of low-rank separated representation to construct a surrogate of high-dimensional stochastic functions, e.g., PDEs/ODEs, in order to decrease the computational cost of Markov Chain Monte Carlo simulations in Bayesian inference. The surrogate model is constructed via a regularized alternative least-square regression with Tikhonov regularization using a roughening matrix computing the gradient of the solution, in conjunction with a perturbation-based error indicator to detect optimal model complexities. The model approximates a vector of a continuous solution at discrete values of a physical variable. The required number of random realizations to achieve a successful approximation linearly depends on the function dimensionality. The computational cost of the model construction is quadratic in the number of random inputs, which potentially tackles the curse of dimensionality in high-dimensional stochastic functions. Furthermore, this vector-valued separated representation-based model, in comparison to the available scalar-valued case, leads to a significant reduction in the cost of approximation by an order of magnitude equal to the vector size. The performance of the method is studied through its application to three numerical examples including a 41-dimensional elliptic PDE and a 21-dimensional cavity flow.
Zhang, Kai; Traskin, Mikhail; Small, Dylan S
2012-03-01
For group-randomized trials, randomization inference based on rank statistics provides robust, exact inference against nonnormal distributions. However, in a matched-pair design, the currently available rank-based statistics lose significant power compared to normal linear mixed model (LMM) test statistics when the LMM is true. In this article, we investigate and develop an optimal test statistic over all statistics in the form of the weighted sum of signed Mann-Whitney-Wilcoxon statistics under certain assumptions. This test is almost as powerful as the LMM even when the LMM is true, but it is much more powerful for heavy tailed distributions. A simulation study is conducted to examine the power.
Charte, Francisco; Rivera, Antonio J; del Jesus, María J; Herrera, Francisco
2014-10-01
Multilabel classification (MLC) has generated considerable research interest in recent years, as a technique that can be applied to many real-world scenarios. To process them with binary or multiclass classifiers, methods for transforming multilabel data sets (MLDs) have been proposed, as well as adapted algorithms able to work with this type of data sets. However, until now, few studies have addressed the problem of how to deal with MLDs having a large number of labels. This characteristic can be defined as high dimensionality in the label space (output attributes), in contrast to the traditional high dimensionality problem, which is usually focused on the feature space (by means of feature selection) or sample space (by means of instance selection). The purpose of this paper is to analyze dimensionality in the label space in MLDs, and to present a transformation methodology based on the use of association rules to discover label dependencies. These dependencies are used to reduce the label space, to ease the work of any MLC algorithm, and to infer the deleted labels in a final postprocessing stage. The proposed process is validated in an extensive experimentation with several MLDs and classification algorithms, resulting in a statistically significant improvement of performance in some cases, as will be shown.
Sex, lies, and statistics: inferences from the child sexual abuse accommodation syndrome.
Weiss, Kenneth J; Curcio Alexander, Julia
2013-01-01
Victims of child sexual abuse often recant their complaints or do not report incidents, making prosecution of offenders difficult. The child with sexual abuse accommodation syndrome (CSAAS) has been used to explain this phenomenon by identifying common behavioral responses. Unlike PTSD but like rape trauma syndrome, CSAAS is not an official diagnostic term and should not be used as evidence of a defendant's guilt or to imply probative value in prosecutions. Courts have grappled with the ideal use of CSAAS in the evaluation of child witness testimony. Expert testimony should be helpful to the jurors without prejudicing them. The New Jersey Supreme Court ruled recently that statistical evidence about CSAAS implying the probability that a child is truthful runs the risk of confusing jury members and biasing them against the defendant. We review the parameters of expert testimony and its admissibility in this area, concluding that statistics about CSAAS should not be used to draw inferences about the victim's credibility or the defendant's guilt.
GWIS: Genome-Wide Inferred Statistics for Functions of Multiple Phenotypes.
Nieuwboer, Harold A; Pool, René; Dolan, Conor V; Boomsma, Dorret I; Nivard, Michel G
2016-10-06
Here we present a method of genome-wide inferred study (GWIS) that provides an approximation of genome-wide association study (GWAS) summary statistics for a variable that is a function of phenotypes for which GWAS summary statistics, phenotypic means, and covariances are available. A GWIS can be performed regardless of sample overlap between the GWAS of the phenotypes on which the function depends. Because a GWIS provides association estimates and their standard errors for each SNP, a GWIS can form the basis for polygenic risk scoring, LD score regression, Mendelian randomization studies, biological annotation, and other analyses. GWISs can also be used to boost power of a GWAS meta-analysis where cohorts have not measured all constituent phenotypes in the function. We demonstrate the accuracy of a BMI GWIS by performing power simulations and type I error simulations under varying circumstances, and we apply a GWIS by reconstructing a body mass index (BMI) GWAS based on a weight GWAS and a height GWAS. Furthermore, we apply a GWIS to further our understanding of the underlying genetic structure of bipolar disorder and schizophrenia and their relation to educational attainment. Our analyses suggest that the previously reported genetic correlation between schizophrenia and educational attainment is probably induced by the observed genetic correlation between schizophrenia and bipolar disorder and the previously reported genetic correlation between bipolar disorder and educational attainment. Copyright © 2016 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Univariate description and bivariate statistical inference: the first step delving into data.
Zhang, Zhongheng
2016-03-01
In observational studies, the first step is usually to explore data distribution and the baseline differences between groups. Data description includes their central tendency (e.g., mean, median, and mode) and dispersion (e.g., standard deviation, range, interquartile range). There are varieties of bivariate statistical inference methods such as Student's t-test, Mann-Whitney U test and Chi-square test, for normal, skews and categorical data, respectively. The article shows how to perform these analyses with R codes. Furthermore, I believe that the automation of the whole workflow is of paramount importance in that (I) it allows for others to repeat your results; (II) you can easily find out how you performed analysis during revision; (III) it spares data input by hand and is less error-prone; and (IV) when you correct your original dataset, the final result can be automatically corrected by executing the codes. Therefore, the process of making a publication quality table incorporating all abovementioned statistics and P values is provided, allowing readers to customize these codes to their own needs.
Valid statistical inference methods for a case-control study with missing data.
Tian, Guo-Liang; Zhang, Chi; Jiang, Xuejun
2016-05-19
The main objective of this paper is to derive the valid sampling distribution of the observed counts in a case-control study with missing data under the assumption of missing at random by employing the conditional sampling method and the mechanism augmentation method. The proposed sampling distribution, called the case-control sampling distribution, can be used to calculate the standard errors of the maximum likelihood estimates of parameters via the Fisher information matrix and to generate independent samples for constructing small-sample bootstrap confidence intervals. Theoretical comparisons of the new case-control sampling distribution with two existing sampling distributions exhibit a large difference. Simulations are conducted to investigate the influence of the three different sampling distributions on statistical inferences. One finding is that the conclusion by the Wald test for testing independency under the two existing sampling distributions could be completely different (even contradictory) from the Wald test for testing the equality of the success probabilities in control/case groups under the proposed distribution. A real cervical cancer data set is used to illustrate the proposed statistical methods.
Sojoudi, Alireza; Goodyear, Bradley G
2016-12-01
Spontaneous fluctuations of blood-oxygenation level-dependent functional magnetic resonance imaging (BOLD fMRI) signals are highly synchronous between brain regions that serve similar functions. This provides a means to investigate functional networks; however, most analysis techniques assume functional connections are constant over time. This may be problematic in the case of neurological disease, where functional connections may be highly variable. Recently, several methods have been proposed to determine moment-to-moment changes in the strength of functional connections over an imaging session (so called dynamic connectivity). Here a novel analysis framework based on a hierarchical observation modeling approach was proposed, to permit statistical inference of the presence of dynamic connectivity. A two-level linear model composed of overlapping sliding windows of fMRI signals, incorporating the fact that overlapping windows are not independent was described. To test this approach, datasets were synthesized whereby functional connectivity was either constant (significant or insignificant) or modulated by an external input. The method successfully determines the statistical significance of a functional connection in phase with the modulation, and it exhibits greater sensitivity and specificity in detecting regions with variable connectivity, when compared with sliding-window correlation analysis. For real data, this technique possesses greater reproducibility and provides a more discriminative estimate of dynamic connectivity than sliding-window correlation analysis. Hum Brain Mapp 37:4566-4580, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Lagrangian statistics in weakly forced two-dimensional turbulence
Rivera, Michael K.; Ecke, Robert E.
2016-01-14
Measurements of Lagrangian single-point and multiple-point statistics in a quasi-two-dimensional stratified layer system are reported. The system consists of a layer of salt water over an immiscible layer of Fluorinert and is forced electromagnetically so that mean-squared vorticity is injected at a well-defined spatial scale r_{i}. Simultaneous cascades develop in which enstrophy flows predominately to small scales whereas energy cascades, on average, to larger scales. Lagrangian correlations and one- and two-point displacements are measured for random initial conditions and for initial positions within topological centers and saddles. Some of the behavior of these quantities can be understood in terms of the trapping characteristics of long-lived centers, the slow motion near strong saddles, and the rapid fluctuations outside of either centers or saddles. We also present statistics of Lagrangian velocity fluctuations using energy spectra in frequency space and structure functions in real space. We compare with complementary Eulerian velocity statistics. We find that simultaneous inverse energy and enstrophy ranges present in spectra are not directly echoed in real-space moments of velocity difference. Nevertheless, the spectral ranges line up well with features of moment ratios, indicating that although the moments are not exhibiting unambiguous scaling, the behavior of the probability distribution functions is changing over short ranges of length scales. Furthermore, implications for understanding weakly forced 2D turbulence with simultaneous inverse and direct cascades are discussed.
Lagrangian statistics in weakly forced two-dimensional turbulence
Rivera, Michael K.; Ecke, Robert E.
2016-01-14
Measurements of Lagrangian single-point and multiple-point statistics in a quasi-two-dimensional stratified layer system are reported. The system consists of a layer of salt water over an immiscible layer of Fluorinert and is forced electromagnetically so that mean-squared vorticity is injected at a well-defined spatial scale ri. Simultaneous cascades develop in which enstrophy flows predominately to small scales whereas energy cascades, on average, to larger scales. Lagrangian correlations and one- and two-point displacements are measured for random initial conditions and for initial positions within topological centers and saddles. Some of the behavior of these quantities can be understood in terms ofmore » the trapping characteristics of long-lived centers, the slow motion near strong saddles, and the rapid fluctuations outside of either centers or saddles. We also present statistics of Lagrangian velocity fluctuations using energy spectra in frequency space and structure functions in real space. We compare with complementary Eulerian velocity statistics. We find that simultaneous inverse energy and enstrophy ranges present in spectra are not directly echoed in real-space moments of velocity difference. Nevertheless, the spectral ranges line up well with features of moment ratios, indicating that although the moments are not exhibiting unambiguous scaling, the behavior of the probability distribution functions is changing over short ranges of length scales. Furthermore, implications for understanding weakly forced 2D turbulence with simultaneous inverse and direct cascades are discussed.« less
Emmert-Streib, Frank; Glazko, Galina V; Altay, Gökmen; de Matos Simoes, Ricardo
2012-01-01
In this paper, we present a systematic and conceptual overview of methods for inferring gene regulatory networks from observational gene expression data. Further, we discuss two classic approaches to infer causal structures and compare them with contemporary methods by providing a conceptual categorization thereof. We complement the above by surveying global and local evaluation measures for assessing the performance of inference algorithms.
Inference and Decoding of Motor Cortex Low-Dimensional Dynamics via Latent State-Space Models
Aghagolzadeh, Mehdi; Truccolo, Wilson
2016-01-01
Motor cortex neuronal ensemble spiking activity exhibits strong low-dimensional collective dynamics (i.e., coordinated modes of activity) during behavior. Here, we demonstrate that these low-dimensional dynamics, revealed by unsupervised latent state-space models, can provide as accurate or better reconstruction of movement kinematics as direct decoding from the entire recorded ensemble. Ensembles of single neurons were recorded with triple microelectrode arrays (MEAs) implanted in ventral and dorsal premotor (PMv, PMd) and primary motor (M1) cortices while nonhuman primates performed 3-D reach-to-grasp actions. Low-dimensional dynamics were estimated via various types of latent state-space models including, for example, Poisson linear dynamic system (PLDS) models. Decoding from low-dimensional dynamics was implemented via point process and Kalman filters coupled in series. We also examined decoding based on a predictive subsampling of the recorded population. In this case, a supervised greedy procedure selected neuronal subsets that optimized decoding performance. When comparing decoding based on predictive subsampling and latent state-space models, the size of the neuronal subset was set to the same number of latent state dimensions. Overall, our findings suggest that information about naturalistic reach kinematics present in the recorded population is preserved in the inferred low-dimensional motor cortex dynamics. Furthermore, decoding based on unsupervised PLDS models may also outperform previous approaches based on direct decoding from the recorded population or on predictive subsampling. PMID:26336135
Inference and Decoding of Motor Cortex Low-Dimensional Dynamics via Latent State-Space Models.
Aghagolzadeh, Mehdi; Truccolo, Wilson
2016-02-01
Motor cortex neuronal ensemble spiking activity exhibits strong low-dimensional collective dynamics (i.e., coordinated modes of activity) during behavior. Here, we demonstrate that these low-dimensional dynamics, revealed by unsupervised latent state-space models, can provide as accurate or better reconstruction of movement kinematics as direct decoding from the entire recorded ensemble. Ensembles of single neurons were recorded with triple microelectrode arrays (MEAs) implanted in ventral and dorsal premotor (PMv, PMd) and primary motor (M1) cortices while nonhuman primates performed 3-D reach-to-grasp actions. Low-dimensional dynamics were estimated via various types of latent state-space models including, for example, Poisson linear dynamic system (PLDS) models. Decoding from low-dimensional dynamics was implemented via point process and Kalman filters coupled in series. We also examined decoding based on a predictive subsampling of the recorded population. In this case, a supervised greedy procedure selected neuronal subsets that optimized decoding performance. When comparing decoding based on predictive subsampling and latent state-space models, the size of the neuronal subset was set to the same number of latent state dimensions. Overall, our findings suggest that information about naturalistic reach kinematics present in the recorded population is preserved in the inferred low-dimensional motor cortex dynamics. Furthermore, decoding based on unsupervised PLDS models may also outperform previous approaches based on direct decoding from the recorded population or on predictive subsampling.
Lu, Tsui-Shan; Longnecker, Matthew P; Zhou, Haibo
2017-03-15
Outcome-dependent sampling (ODS) scheme is a cost-effective sampling scheme where one observes the exposure with a probability that depends on the outcome. The well-known such design is the case-control design for binary response, the case-cohort design for the failure time data, and the general ODS design for a continuous response. While substantial work has been carried out for the univariate response case, statistical inference and design for the ODS with multivariate cases remain under-developed. Motivated by the need in biological studies for taking the advantage of the available responses for subjects in a cluster, we propose a multivariate outcome-dependent sampling (multivariate-ODS) design that is based on a general selection of the continuous responses within a cluster. The proposed inference procedure for the multivariate-ODS design is semiparametric where all the underlying distributions of covariates are modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimator is consistent and developed the asymptotically normality properties. Simulation studies show that the proposed estimator is more efficient than the estimator obtained using only the simple-random-sample portion of the multivariate-ODS or the estimator from a simple random sample with the same sample size. The multivariate-ODS design together with the proposed estimator provides an approach to further improve study efficiency for a given fixed study budget. We illustrate the proposed design and estimator with an analysis of association of polychlorinated biphenyl exposure to hearing loss in children born to the Collaborative Perinatal Study. Copyright © 2016 John Wiley & Sons, Ltd.
Duchesne, Thierry; Fortin, Daniel; Rivest, Louis-Paul
2015-01-01
Animal movement has a fundamental impact on population and community structure and dynamics. Biased correlated random walks (BCRW) and step selection functions (SSF) are commonly used to study movements. Because no studies have contrasted the parameters and the statistical properties of their estimators for models constructed under these two Lagrangian approaches, it remains unclear whether or not they allow for similar inference. First, we used the Weak Law of Large Numbers to demonstrate that the log-likelihood function for estimating the parameters of BCRW models can be approximated by the log-likelihood of SSFs. Second, we illustrated the link between the two approaches by fitting BCRW with maximum likelihood and with SSF to simulated movement data in virtual environments and to the trajectory of bison (Bison bison L.) trails in natural landscapes. Using simulated and empirical data, we found that the parameters of a BCRW estimated directly from maximum likelihood and by fitting an SSF were remarkably similar. Movement analysis is increasingly used as a tool for understanding the influence of landscape properties on animal distribution. In the rapidly developing field of movement ecology, management and conservation biologists must decide which method they should implement to accurately assess the determinants of animal movement. We showed that BCRW and SSF can provide similar insights into the environmental features influencing animal movements. Both techniques have advantages. BCRW has already been extended to allow for multi-state modeling. Unlike BCRW, however, SSF can be estimated using most statistical packages, it can simultaneously evaluate habitat selection and movement biases, and can easily integrate a large number of movement taxes at multiple scales. SSF thus offers a simple, yet effective, statistical technique to identify movement taxis.
Duchesne, Thierry; Fortin, Daniel; Rivest, Louis-Paul
2015-01-01
Animal movement has a fundamental impact on population and community structure and dynamics. Biased correlated random walks (BCRW) and step selection functions (SSF) are commonly used to study movements. Because no studies have contrasted the parameters and the statistical properties of their estimators for models constructed under these two Lagrangian approaches, it remains unclear whether or not they allow for similar inference. First, we used the Weak Law of Large Numbers to demonstrate that the log-likelihood function for estimating the parameters of BCRW models can be approximated by the log-likelihood of SSFs. Second, we illustrated the link between the two approaches by fitting BCRW with maximum likelihood and with SSF to simulated movement data in virtual environments and to the trajectory of bison (Bison bison L.) trails in natural landscapes. Using simulated and empirical data, we found that the parameters of a BCRW estimated directly from maximum likelihood and by fitting an SSF were remarkably similar. Movement analysis is increasingly used as a tool for understanding the influence of landscape properties on animal distribution. In the rapidly developing field of movement ecology, management and conservation biologists must decide which method they should implement to accurately assess the determinants of animal movement. We showed that BCRW and SSF can provide similar insights into the environmental features influencing animal movements. Both techniques have advantages. BCRW has already been extended to allow for multi-state modeling. Unlike BCRW, however, SSF can be estimated using most statistical packages, it can simultaneously evaluate habitat selection and movement biases, and can easily integrate a large number of movement taxes at multiple scales. SSF thus offers a simple, yet effective, statistical technique to identify movement taxis. PMID:25898019
The statistical distributions of one-dimensional “turbulence”
NASA Astrophysics Data System (ADS)
Peyrard, Michel
2004-06-01
We study a one-dimensional discrete analog of the von Kármán flow widely investigated in turbulence, made of a lattice of anharmonic oscillators excited by both ends in the presence of a dissipative term proportional to the second-order finite difference of the velocities, similar to the viscous term in a fluid. The dynamics of the model shows striking similarities with an actual turbulent flow, both at local and global scales. Calculations of the probability distribution function of velocity increments, extensively studied in turbulence, with a very large number of points in order to determine accurately the statistics of rare events, allow us to provide a meaningful comparison of different theoretical expressions of the PDFs.
Velocity statistics in two-dimensional granular turbulence
NASA Astrophysics Data System (ADS)
Isobe, Masaharu
2003-10-01
We studied the macroscopic statistical properties on the freely evolving quasielastic hard disk (granular) system by performing a large-scale (up to a few million particles) event-driven molecular dynamics systematically and found it to be remarkably analogous to an enstrophy cascade process in the decaying two-dimensional fluid turbulence. There are four typical stages in the freely evolving inelastic hard disk system, which are homogeneous, shearing (vortex), clustering, and final state. In the shearing stage, the self-organized macroscopic coherent vortices become dominant. In the clustering stage, the energy spectra are close to the expectation of Kraichnan-Batchelor theory and the squared two-particle separation strictly obeys Richardson law.
Velocity statistics in two-dimensional granular turbulence.
Isobe, Masaharu
2003-10-01
We studied the macroscopic statistical properties on the freely evolving quasielastic hard disk (granular) system by performing a large-scale (up to a few million particles) event-driven molecular dynamics systematically and found it to be remarkably analogous to an enstrophy cascade process in the decaying two-dimensional fluid turbulence. There are four typical stages in the freely evolving inelastic hard disk system, which are homogeneous, shearing (vortex), clustering, and final state. In the shearing stage, the self-organized macroscopic coherent vortices become dominant. In the clustering stage, the energy spectra are close to the expectation of Kraichnan-Batchelor theory and the squared two-particle separation strictly obeys Richardson law.
NASA Astrophysics Data System (ADS)
Hu, Zixi; Yao, Zhewei; Li, Jinglai
2017-03-01
Many scientific and engineering problems require to perform Bayesian inference for unknowns of infinite dimension. In such problems, many standard Markov Chain Monte Carlo (MCMC) algorithms become arbitrary slow under the mesh refinement, which is referred to as being dimension dependent. To this end, a family of dimensional independent MCMC algorithms, known as the preconditioned Crank-Nicolson (pCN) methods, were proposed to sample the infinite dimensional parameters. In this work we develop an adaptive version of the pCN algorithm, where the covariance operator of the proposal distribution is adjusted based on sampling history to improve the simulation efficiency. We show that the proposed algorithm satisfies an important ergodicity condition under some mild assumptions. Finally we provide numerical examples to demonstrate the performance of the proposed method.
Conn, Paul B.; Johnson, Devin S.; Ver Hoef, Jay M.; Hooten, Mevin B.; London, Joshua M.; Boveng, Peter L.
2015-01-01
Ecologists often fit models to survey data to estimate and explain variation in animal abundance. Such models typically require that animal density remains constant across the landscape where sampling is being conducted, a potentially problematic assumption for animals inhabiting dynamic landscapes or otherwise exhibiting considerable spatiotemporal variation in density. We review several concepts from the burgeoning literature on spatiotemporal statistical models, including the nature of the temporal structure (i.e., descriptive or dynamical) and strategies for dimension reduction to promote computational tractability. We also review several features as they specifically relate to abundance estimation, including boundary conditions, population closure, choice of link function, and extrapolation of predicted relationships to unsampled areas. We then compare a suite of novel and existing spatiotemporal hierarchical models for animal count data that permit animal density to vary over space and time, including formulations motivated by resource selection and allowing for closed populations. We gauge the relative performance (bias, precision, computational demands) of alternative spatiotemporal models when confronted with simulated and real data sets from dynamic animal populations. For the latter, we analyze spotted seal (Phoca largha) counts from an aerial survey of the Bering Sea where the quantity and quality of suitable habitat (sea ice) changed dramatically while surveys were being conducted. Simulation analyses suggested that multiple types of spatiotemporal models provide reasonable inference (low positive bias, high precision) about animal abundance, but have potential for overestimating precision. Analysis of spotted seal data indicated that several model formulations, including those based on a log-Gaussian Cox process, had a tendency to overestimate abundance. By contrast, a model that included a population closure assumption and a scale prior on total
Statistical Inference of a RANS closure for a Jet-in-Crossflow simulation
NASA Astrophysics Data System (ADS)
Heyse, Jan; Edeling, Wouter; Iaccarino, Gianluca
2016-11-01
The jet-in-crossflow is found in several engineering applications, such as discrete film cooling for turbine blades, where a coolant injected through hols in the blade's surface protects the component from the hot gases leaving the combustion chamber. Experimental measurements using MRI techniques have been completed for a single hole injection into a turbulent crossflow, providing full 3D averaged velocity field. For such flows of engineering interest, Reynolds-Averaged Navier-Stokes (RANS) turbulence closure models are often the only viable computational option. However, RANS models are known to provide poor predictions in the region close to the injection point. Since these models are calibrated on simple canonical flow problems, the obtained closure coefficient estimates are unlikely to extrapolate well to more complex flows. We will therefore calibrate the parameters of a RANS model using statistical inference techniques informed by the experimental jet-in-crossflow data. The obtained probabilistic parameter estimates can in turn be used to compute flow fields with quantified uncertainty. Stanford Graduate Fellowship in Science and Engineering.
Racing to learn: statistical inference and learning in a single spiking neuron with adaptive kernels
Afshar, Saeed; George, Libin; Tapson, Jonathan; van Schaik, André; Hamilton, Tara J.
2014-01-01
This paper describes the Synapto-dendritic Kernel Adapting Neuron (SKAN), a simple spiking neuron model that performs statistical inference and unsupervised learning of spatiotemporal spike patterns. SKAN is the first proposed neuron model to investigate the effects of dynamic synapto-dendritic kernels and demonstrate their computational power even at the single neuron scale. The rule-set defining the neuron is simple: there are no complex mathematical operations such as normalization, exponentiation or even multiplication. The functionalities of SKAN emerge from the real-time interaction of simple additive and binary processes. Like a biological neuron, SKAN is robust to signal and parameter noise, and can utilize both in its operations. At the network scale neurons are locked in a race with each other with the fastest neuron to spike effectively “hiding” its learnt pattern from its neighbors. The robustness to noise, high speed, and simple building blocks not only make SKAN an interesting neuron model in computational neuroscience, but also make it ideal for implementation in digital and analog neuromorphic systems which is demonstrated through an implementation in a Field Programmable Gate Array (FPGA). Matlab, Python, and Verilog implementations of SKAN are available at: http://www.uws.edu.au/bioelectronics_neuroscience/bens/reproducible_research. PMID:25505378
Statistical inference of selection and divergence of the rice blast resistance gene Pi-ta.
Amei, Amei; Lee, Seonghee; Mysore, Kirankumar S; Jia, Yulin
2014-10-21
The resistance gene Pi-ta has been effectively used to control rice blast disease, but some populations of cultivated and wild rice have evolved resistance. Insights into the evolutionary processes that led to this resistance during crop domestication may be inferred from the population history of domesticated and wild rice strains. In this study, we applied a recently developed statistical method, time-dependent Poisson random field model, to examine the evolution of the Pi-ta gene in cultivated and weedy rice. Our study suggests that the Pi-ta gene may have more recently introgressed into cultivated rice, indica and japonica, and U.S. weedy rice from the wild species, O. rufipogon. In addition, the Pi-ta gene is under positive selection in japonica, tropical japonica, U.S. cultivars and U.S. weedy rice. We also found that sequences of two domains of the Pi-ta gene, the nucleotide binding site and leucine-rich repeat domain, are highly conserved among all rice accessions examined. Our results provide a valuable analytical tool for understanding the evolution of disease resistance genes in crop plants.
Ogunnaike, Babatunde A; Gelmi, Claudio A; Edwards, Jeremy S
2010-05-21
Gene expression studies generate large quantities of data with the defining characteristic that the number of genes (whose expression profiles are to be determined) exceed the number of available replicates by several orders of magnitude. Standard spot-by-spot analysis still seeks to extract useful information for each gene on the basis of the number of available replicates, and thus plays to the weakness of microarrays. On the other hand, because of the data volume, treating the entire data set as an ensemble, and developing theoretical distributions for these ensembles provides a framework that plays instead to the strength of microarrays. We present theoretical results that under reasonable assumptions, the distribution of microarray intensities follows the Gamma model, with the biological interpretations of the model parameters emerging naturally. We subsequently establish that for each microarray data set, the fractional intensities can be represented as a mixture of Beta densities, and develop a procedure for using these results to draw statistical inference regarding differential gene expression. We illustrate the results with experimental data from gene expression studies on Deinococcus radiodurans following DNA damage using cDNA microarrays.
Maximum entropy approach to statistical inference for an ocean acoustic waveguide.
Knobles, D P; Sagers, J D; Koch, R A
2012-02-01
A conditional probability distribution suitable for estimating the statistical properties of ocean seabed parameter values inferred from acoustic measurements is derived from a maximum entropy principle. The specification of the expectation value for an error function constrains the maximization of an entropy functional. This constraint determines the sensitivity factor (β) to the error function of the resulting probability distribution, which is a canonical form that provides a conservative estimate of the uncertainty of the parameter values. From the conditional distribution, marginal distributions for individual parameters can be determined from integration over the other parameters. The approach is an alternative to obtaining the posterior probability distribution without an intermediary determination of the likelihood function followed by an application of Bayes' rule. In this paper the expectation value that specifies the constraint is determined from the values of the error function for the model solutions obtained from a sparse number of data samples. The method is applied to ocean acoustic measurements taken on the New Jersey continental shelf. The marginal probability distribution for the values of the sound speed ratio at the surface of the seabed and the source levels of a towed source are examined for different geoacoustic model representations.
Afshar, Saeed; George, Libin; Tapson, Jonathan; van Schaik, André; Hamilton, Tara J
2014-01-01
This paper describes the Synapto-dendritic Kernel Adapting Neuron (SKAN), a simple spiking neuron model that performs statistical inference and unsupervised learning of spatiotemporal spike patterns. SKAN is the first proposed neuron model to investigate the effects of dynamic synapto-dendritic kernels and demonstrate their computational power even at the single neuron scale. The rule-set defining the neuron is simple: there are no complex mathematical operations such as normalization, exponentiation or even multiplication. The functionalities of SKAN emerge from the real-time interaction of simple additive and binary processes. Like a biological neuron, SKAN is robust to signal and parameter noise, and can utilize both in its operations. At the network scale neurons are locked in a race with each other with the fastest neuron to spike effectively "hiding" its learnt pattern from its neighbors. The robustness to noise, high speed, and simple building blocks not only make SKAN an interesting neuron model in computational neuroscience, but also make it ideal for implementation in digital and analog neuromorphic systems which is demonstrated through an implementation in a Field Programmable Gate Array (FPGA). Matlab, Python, and Verilog implementations of SKAN are available at: http://www.uws.edu.au/bioelectronics_neuroscience/bens/reproducible_research.
Toward 'smart' DNA microarrays: algorithms for improving data quality and statistical inference
NASA Astrophysics Data System (ADS)
Bakewell, David J. G.; Wit, Ernst
2007-12-01
DNA microarrays are a laboratory tool for understanding biological processes at the molecular scale and future applications of this technology include healthcare, agriculture, and environment. Despite their usefulness, however, the information microarrays make available to the end-user is not used optimally, and the data is often noisy and of variable quality. This paper describes the use of hierarchical Maximum Likelihood Estimation (MLE) for generating algorithms that improve the quality of microarray data and enhance statistical inference about gene behavior. The paper describes examples of recent work that improves microarray performance, demonstrated using data from both Monte Carlo simulations and published experiments. One example looks at the variable quality of cDNA spots on a typical microarray surface. It is shown how algorithms, derived using MLE, are used to "weight" these spots according to their morphological quality, and subsequently lead to improved detection of gene activity. Another example, briefly discussed, addresses the "noisy data about too many genes" issue confronting many analysts who are also interested in the collective action of a group of genes, often organized as a pathway or complex. Preliminary work is described where MLE is used to "share" variance information across a pre-assigned group of genes of interest, leading to improved detection of gene activity.
NASA Astrophysics Data System (ADS)
Knobles, David; Stotts, Steven; Sagers, Jason
2012-03-01
Why can one obtain from similar measurements a greater amount of information about cosmological parameters than seabed parameters in ocean waveguides? The cosmological measurements are in the form of a power spectrum constructed from spatial correlations of temperature fluctuations within the microwave background radiation. The seabed acoustic measurements are in the form of spatial correlations along the length of a spatial aperture. This study explores the above question from the perspective of posterior probability distributions obtained from maximizing a relative entropy functional. An answer is in part that the seabed in shallow ocean environments generally has large temporal and spatial inhomogeneities, whereas the early universe was a nearly homogeneous cosmological soup with small but important fluctuations. Acoustic propagation models used in shallow water acoustics generally do not capture spatial and temporal variability sufficiently well, which leads to model error dominating the statistical inference problem. This is not the case in cosmology. Further, the physics of the acoustic modes in cosmology is that of a standing wave with simple initial conditions, whereas for underwater acoustics it is a traveling wave in a strongly inhomogeneous bounded medium.
Sassenhagen, Jona; Alday, Phillip M
2016-11-01
Experimental research on behavior and cognition frequently rests on stimulus or subject selection where not all characteristics can be fully controlled, even when attempting strict matching. For example, when contrasting patients to controls, variables such as intelligence or socioeconomic status are often correlated with patient status. Similarly, when presenting word stimuli, variables such as word frequency are often correlated with primary variables of interest. One procedure very commonly employed to control for such nuisance effects is conducting inferential tests on confounding stimulus or subject characteristics. For example, if word length is not significantly different for two stimulus sets, they are considered as matched for word length. Such a test has high error rates and is conceptually misguided. It reflects a common misunderstanding of statistical tests: interpreting significance not to refer to inference about a particular population parameter, but about 1. the sample in question, 2. the practical relevance of a sample difference (so that a nonsignificant test is taken to indicate evidence for the absence of relevant differences). We show inferential testing for assessing nuisance effects to be inappropriate both pragmatically and philosophically, present a survey showing its high prevalence, and briefly discuss an alternative in the form of regression including nuisance variables.
Schlichting, Margaret L; Guarino, Katharine F; Schapiro, Anna C; Turk-Browne, Nicholas B; Preston, Alison R
2017-01-01
Despite the importance of learning and remembering across the lifespan, little is known about how the episodic memory system develops to support the extraction of associative structure from the environment. Here, we relate individual differences in volumes along the hippocampal long axis to performance on statistical learning and associative inference tasks-both of which require encoding associations that span multiple episodes-in a developmental sample ranging from ages 6 to 30 years. Relating age to volume, we found dissociable patterns across the hippocampal long axis, with opposite nonlinear volume changes in the head and body. These structural differences were paralleled by performance gains across the age range on both tasks, suggesting improvements in the cross-episode binding ability from childhood to adulthood. Controlling for age, we also found that smaller hippocampal heads were associated with superior behavioral performance on both tasks, consistent with this region's hypothesized role in forming generalized codes spanning events. Collectively, these results highlight the importance of examining hippocampal development as a function of position along the hippocampal axis and suggest that the hippocampal head is particularly important in encoding associative structure across development.
Brannigan, V.M.; Bier, V.M.; Berg, C.
1992-09-01
Toxic torts are product liability cases dealing with alleged injuries due to chemical or biological hazards such as radiation, thalidomide, or Agent Orange. Toxic tort cases typically rely more heavily that other product liability cases on indirect or statistical proof of injury in toxic cases. However, there have been only a handful of actual legal decisions regarding the use of such statistical evidence, and most of those decisions have been inconclusive. Recently, a major case from the Fifth Circuit, involving allegations that Benedectin (a morning sickness drug) caused birth defects, was decided entirely on the basis of statistical inference. This paper examines both the conceptual basis of that decision, and also the relationships among statistical inference, scientific evidence, and the rules of product liability in general. 23 refs.
NASA Astrophysics Data System (ADS)
Hasan, A.; Maloney, C. E.
2014-12-01
We compute the effective dispersion and vibrational density of states (DOS) of two-dimensional subregions of three-dimensional face-centered-cubic crystals using both a direct projection-inversion technique and a Monte Carlo simulation based on a common underlying Hamiltonian. We study both a (111) and (100) plane. We show that for any given direction of wave vector, both (111) and (100) show an anomalous ω2˜q regime at low q where ω2 is the energy associated with the given mode and q is its wave number. The ω2˜q scaling should be expected to give rise to an anomalous DOS, Dω, at low ω : Dω˜ω3 rather than the conventional Debye result: Dω˜ω2 . The DOS for (100) looks to be consistent with Dω˜ω3 , while (111) shows something closer to the conventional Debye result at the smallest frequencies. In addition to the direct projection-inversion calculation, we perform Monte Carlo simulations to study the effects of finite sampling statistics. We show that finite sampling artifacts act as an effective disorder and bias Dω, giving a behavior closer to Dω˜ω2 than Dω˜ω3 . These results should have an important impact on the interpretation of recent studies of colloidal solids where the two-point displacement correlations can be obtained directly in real-space via microscopy.
Hasan, A; Maloney, C E
2014-12-01
We compute the effective dispersion and vibrational density of states (DOS) of two-dimensional subregions of three-dimensional face-centered-cubic crystals using both a direct projection-inversion technique and a Monte Carlo simulation based on a common underlying Hamiltonian. We study both a (111) and (100) plane. We show that for any given direction of wave vector, both (111) and (100) show an anomalous ω(2)∼q regime at low q where ω(2) is the energy associated with the given mode and q is its wave number. The ω(2)∼q scaling should be expected to give rise to an anomalous DOS, D(ω), at low ω: D(ω)∼ω(3) rather than the conventional Debye result: D(ω)∼ω(2). The DOS for (100) looks to be consistent with D(ω)∼ω(3), while (111) shows something closer to the conventional Debye result at the smallest frequencies. In addition to the direct projection-inversion calculation, we perform Monte Carlo simulations to study the effects of finite sampling statistics. We show that finite sampling artifacts act as an effective disorder and bias D(ω), giving a behavior closer to D(ω)∼ω(2) than D(ω)∼ω(3). These results should have an important impact on the interpretation of recent studies of colloidal solids where the two-point displacement correlations can be obtained directly in real-space via microscopy.
NASA Astrophysics Data System (ADS)
Riccio, A.; Caporaso, L.; di Giuseppe, F.; Bonafè, G.; Gobbi, G. P.; Angelini, A.
2010-09-01
The nowadays availability of low-cost commercial LIDAR/ceilometer, provides the opportunity to widely employ these active instruments to furnish continuous observation of the planetary boundary layer (PBL) evolution which could serve the scope of both air-quality model initialization and numerical weather prediction system evaluation. Their range-corrected signal is in fact proportional to the aerosol backscatter cross section, and therefore, in clear conditions, it allows to track the PBL evolution using aerosols as markers. The LIDAR signal is then processed to retrieve an estimate of the PBL mixing height. A standard approach uses the so called wavelet covariance transform (WCT) method which consists in the convolution of the vertical signal with a step function, which is able to detect local discontinuities in the backscatter profile. There are, nevertheless, several drawbacks which have to be considered when the WCT method is employed. Since water droplets may have a very large extinction and backscattering cross section, the presence of rain, clouds or fog decreases the returning signal causing interference and uncertainties in the mixing height retrievals. Moreover, if vertical mixing is scarce, aerosols remain suspended in a persistent residual layer which is detected even if it is not significantly connected to the actual mixing height. Finally, multiple layers are also cause of uncertainties. In this work we present a novel methodology to infer the height of planetary boundary layers (PBLs) from LIDAR data which corrects the unrealistic fluctuations introduced by the WCT method. It implements the assimilation of WCT-PBL heights estimations into a Bayesian statistical inference procedure which includes a physical model for the boundary layer (bulk model) as the first guess hypothesis. A hierarchical Bayesian Markov chain Monte Carlo (MCMC) approach is then used to explore the posterior state space and calculate the data likelihood of previously assigned
Stang, Andreas; Deckert, Markus; Poole, Charles; Rothman, Kenneth J
2017-01-01
Since its introduction in the twentieth century, null hypothesis significance testing (NHST), a hybrid of significance testing (ST) advocated by Fisher and null hypothesis testing (NHT) developed by Neyman and Pearson, has become widely adopted but has also been a source of debate. The principal alternative to such testing is estimation with point estimates and confidence intervals (CI). Our aim was to estimate time trends in NHST, ST, NHT and CI reporting in abstracts of major medical and epidemiological journals. We reviewed 89,533 abstracts in five major medical journals and seven major epidemiological journals, 1975-2014, and estimated time trends in the proportions of abstracts containing statistical inference. In those abstracts, we estimated time trends in the proportions relying on NHST and its major variants, ST and NHT, and in the proportions reporting CIs without explicit use of NHST (CI-only approach). The CI-only approach rose monotonically during the study period in the abstracts of all journals. In Epidemiology abstracts, as a result of the journal's editorial policy, the CI-only approach has always been the most common approach. In the other 11 journals, the NHST approach started out more common, but by 2014, this disparity had narrowed, disappeared or reversed in 9 of them. The exceptions were JAMA, New England Journal of Medicine, and Lancet abstracts, where the predominance of the NHST approach prevailed over time. In 2014, the CI-only approach is as popular as the NHST approach in the abstracts of 4 of the epidemiology journals: the American Journal of Epidemiology (48%), the Annals of Epidemiology (55%), Epidemiology (79%) and the International Journal of Epidemiology (52%). The reporting of CIs without explicitly interpreting them as statistical tests is becoming more common in abstracts, particularly in epidemiology journals. Although NHST is becoming less popular in abstracts of most epidemiology journals studied and some widely read medical
Statistical properties of two-dimensional magnetohydrodynamic turbulence
NASA Astrophysics Data System (ADS)
Biskamp, D.; Welter, H.; Walter, M.
1990-12-01
The statistical properties of two-dimensional (2-D) magnetohydrodynamic (MHD) turbulence are studied by means of high-resolution numerical simulations. As a theoretical point of reference, the β model of intermittent turbulence is adapted to the MHD case. Comparison of simulation results for energy spectra with the β-model predictions shows intermittency corrections to be small, δ<0.2, while fourth-order correlation functions exhibit a stronger effect δ≂0.35, consistent with the numerically observed Reynolds-number dependence of the flatness factor F∝R1/2λ. An argument is given that this scaling valid for Rλ˜102 is, however, not characteristic of the asymptotic regime Rλ→∞, where a constant value of F is to be expected. The probability distributions of the field difference δv(x,t), δB(x,t) are Gaussian for large separation x or t, approaching an approximately exponential distribution for x, t→0. This behavior can be understood by a simple probabilistic argument. The probability distribution of the local energy dissipation rate ɛ is roughly consistent with a log-normal distribution at larger ɛ but shows a different behavior at small ɛ.
Gross, Kevin; Rosenheim, Jay A
2011-10-01
Secondary pest outbreaks occur when the use of a pesticide to reduce densities of an unwanted target pest species triggers subsequent outbreaks of other pest species. Although secondary pest outbreaks are thought to be familiar in agriculture, their rigorous documentation is made difficult by the challenges of performing randomized experiments at suitable scales. Here, we quantify the frequency and monetary cost of secondary pest outbreaks elicited by early-season applications of broad-spectrum insecticides to control the plant bug Lygus spp. (primarily L. hesperus) in cotton grown in the San Joaquin Valley, California, USA. We do so by analyzing pest-control management practices for 969 cotton fields spanning nine years and 11 private ranches. Our analysis uses statistical methods to draw formal causal inferences from nonexperimental data that have become popular in public health and economics, but that are not yet widely known in ecology or agriculture. We find that, in fields that received an early-season broad-spectrum insecticide treatment for Lygus, 20.2% +/- 4.4% (mean +/- SE) of late-season pesticide costs were attributable to secondary pest outbreaks elicited by the early-season insecticide application for Lygus. In 2010 U.S. dollars, this equates to an additional $6.00 +/- $1.30 (mean +/- SE) per acre in management costs. To the extent that secondary pest outbreaks may be driven by eliminating pests' natural enemies, these figures place a lower bound on the monetary value of ecosystem services provided by native communities of arthropod predators and parasitoids in this agricultural system.
Vincent, Martin; Mundbjerg, Kamilla; Skou Pedersen, Jakob; Liang, Gangning; Jones, Peter A; Ørntoft, Torben Falck; Dalsgaard Sørensen, Karina; Wiuf, Carsten
2017-02-21
The study of epigenetic heterogeneity at the level of individual cells and in whole populations is the key to understanding cellular differentiation, organismal development, and the evolution of cancer. We develop a statistical method, epiG, to infer and differentiate between different epi-allelic haplotypes, annotated with CpG methylation status and DNA polymorphisms, from whole-genome bisulfite sequencing data, and nucleosome occupancy from NOMe-seq data. We demonstrate the capabilities of the method by inferring allele-specific methylation and nucleosome occupancy in cell lines, and colon and tumor samples, and by benchmarking the method against independent experimental data.
Sweeney, Elizabeth M; Shinohara, Russell T; Shiee, Navid; Mateen, Farrah J; Chudgar, Avni A; Cuzzocreo, Jennifer L; Calabresi, Peter A; Pham, Dzung L; Reich, Daniel S; Crainiceanu, Ciprian M
2013-01-01
Magnetic resonance imaging (MRI) can be used to detect lesions in the brains of multiple sclerosis (MS) patients and is essential for diagnosing the disease and monitoring its progression. In practice, lesion load is often quantified by either manual or semi-automated segmentation of MRI, which is time-consuming, costly, and associated with large inter- and intra-observer variability. We propose OASIS is Automated Statistical Inference for Segmentation (OASIS), an automated statistical method for segmenting MS lesions in MRI studies. We use logistic regression models incorporating multiple MRI modalities to estimate voxel-level probabilities of lesion presence. Intensity-normalized T1-weighted, T2-weighted, fluid-attenuated inversion recovery and proton density volumes from 131 MRI studies (98 MS subjects, 33 healthy subjects) with manual lesion segmentations were used to train and validate our model. Within this set, OASIS detected lesions with a partial area under the receiver operating characteristic curve for clinically relevant false positive rates of 1% and below of 0.59% (95% CI; [0.50%, 0.67%]) at the voxel level. An experienced MS neuroradiologist compared these segmentations to those produced by LesionTOADS, an image segmentation software that provides segmentation of both lesions and normal brain structures. For lesions, OASIS out-performed LesionTOADS in 74% (95% CI: [65%, 82%]) of cases for the 98 MS subjects. To further validate the method, we applied OASIS to 169 MRI studies acquired at a separate center. The neuroradiologist again compared the OASIS segmentations to those from LesionTOADS. For lesions, OASIS ranked higher than LesionTOADS in 77% (95% CI: [71%, 83%]) of cases. For a randomly selected subset of 50 of these studies, one additional radiologist and one neurologist also scored the images. Within this set, the neuroradiologist ranked OASIS higher than LesionTOADS in 76% (95% CI: [64%, 88%]) of cases, the neurologist 66% (95% CI: [52%, 78
NASA Technical Reports Server (NTRS)
Slater, G. L.; Freeland, S. L.; Hoeksema, T.; Zhao, X.; Hudson, H. S.
1995-01-01
The Yohkoh/SXT images provide full-disk coverage of the solar corona, usually extending before and after one of the large-scale eruptive events that occur in the polar crown These produce large arcades of X-ray loops, often with a cusp-shaped coronal extension, and are known to be associated with coronal mass ejections. The Yohkoh prototype of such events occurred 12 Nov. 1991. This allows us to determine heights from the apparent rotation rates of these structures. In comparison v with magnetic-field extrapolations from Wilcox Solar Observatory. use use this tool to infer the three dimensional structure of the corona in particular cases: 24 Jan. 1992, 24 Feb. 1993, 14 Apr. 1994, and 13 Nov. 1994. The last event is a long-duration flare event.
Convertino, Matteo; Mangoubi, Rami S.; Linkov, Igor; Lowry, Nathan C.; Desai, Mukund
2012-01-01
Shannon entropy of pixel intensity.To test our approach, we specifically use the green band of Landsat images for a water conservation area in the Florida Everglades. We validate our predictions against data of species occurrences for a twenty-eight years long period for both wet and dry seasons. Our method correctly predicts 73% of species richness. For species turnover, the newly proposed KL divergence prediction performance is near 100% accurate. This represents a significant improvement over the more conventional Shannon entropy difference, which provides 85% accuracy. Furthermore, we find that changes in soil and water patterns, as measured by fluctuations of the Shannon entropy for the red and blue bands respectively, are positively correlated with changes in vegetation. The fluctuations are smaller in the wet season when compared to the dry season. Conclusions/Significance Texture-based statistical multiresolution image analysis is a promising method for quantifying interseasonal differences and, consequently, the degree to which vegetation, soil, and water patterns vary. The proposed automated method for quantifying species richness and turnover can also provide analysis at higher spatial and temporal resolution than is currently obtainable from expensive monitoring campaigns, thus enabling more prompt, more cost effective inference and decision making support regarding anomalous variations in biodiversity. Additionally, a matrix-based visualization of the statistical multiresolution analysis is presented to facilitate both insight and quick recognition of anomalous data. PMID:23115629
Wallace, D L; Perlman, M D
1980-06-01
This report describes the research activities of the Department of Statistics, University of Chicago, during the period June 15, 1975 to July 30, 1979. Nine research projects are briefly described on the following subjects: statistical computing and approximation techniques in statistics; numerical computation of first passage distributions; probabilities of large deviations; combining independent tests of significance; small-sample efficiencies of tests and estimates; improved procedures for simultaneous estimation and testing of many correlations; statistical computing and improved regression methods; comparison of several populations; and unbiasedness in multivariate statistics. A description of the statistical consultation activities of the Department that are of interest to DOE, in particular, the scientific interactions between the Department and the scientists at Argonne National Laboratories, is given. A list of publications issued during the term of the contract is included.
Garcia-Retamero, Rocio; Hoffrage, Ulrich
2013-04-01
Doctors and patients have difficulty inferring the predictive value of a medical test from information about the prevalence of a disease and the sensitivity and false-positive rate of the test. Previous research has established that communicating such information in a format the human mind is adapted to-namely natural frequencies-as compared to probabilities, boosts accuracy of diagnostic inferences. In a study, we investigated to what extent these inferences can be improved-beyond the effect of natural frequencies-by providing visual aids. Participants were 81 doctors and 81 patients who made diagnostic inferences about three medical tests on the basis of information about prevalence of a disease, and the sensitivity and false-positive rate of the tests. Half of the participants received the information in natural frequencies, while the other half received the information in probabilities. Half of the participants only received numerical information, while the other half additionally received a visual aid representing the numerical information. In addition, participants completed a numeracy scale. Our study showed three important findings: (1) doctors and patients made more accurate inferences when information was communicated in natural frequencies as compared to probabilities; (2) visual aids boosted accuracy even when the information was provided in natural frequencies; and (3) doctors were more accurate in their diagnostic inferences than patients, though differences in accuracy disappeared when differences in numerical skills were controlled for. Our findings have important implications for medical practice as they suggest suitable ways to communicate quantitative medical data.
2010-01-01
Background The rate of emergence of human pathogens is steadily increasing; most of these novel agents originate in wildlife. Bats, remarkably, are the natural reservoirs of many of the most pathogenic viruses in humans. There are two bat genome projects currently underway, a circumstance that promises to speed the discovery host factors important in the coevolution of bats with their viruses. These genomes, however, are not yet assembled and one of them will provide only low coverage, making the inference of most genes of immunological interest error-prone. Many more wildlife genome projects are underway and intend to provide only shallow coverage. Results We have developed a statistical method for the assembly of gene families from partial genomes. The method takes full advantage of the quality scores generated by base-calling software, incorporating them into a complete probabilistic error model, to overcome the limitation inherent in the inference of gene family members from partial sequence information. We validated the method by inferring the human IFNA genes from the genome trace archives, and used it to infer 61 type-I interferon genes, and single type-II interferon genes in the bats Pteropus vampyrus and Myotis lucifugus. We confirmed our inferences by direct cloning and sequencing of IFNA, IFNB, IFND, and IFNK in P. vampyrus, and by demonstrating transcription of some of the inferred genes by known interferon-inducing stimuli. Conclusion The statistical trace assembler described here provides a reliable method for extracting information from the many available and forthcoming partial or shallow genome sequencing projects, thereby facilitating the study of a wider variety of organisms with ecological and biomedical significance to humans than would otherwise be possible. PMID:20663124
Kepler, Thomas B; Sample, Christopher; Hudak, Kathryn; Roach, Jeffrey; Haines, Albert; Walsh, Allyson; Ramsburg, Elizabeth A
2010-07-21
The rate of emergence of human pathogens is steadily increasing; most of these novel agents originate in wildlife. Bats, remarkably, are the natural reservoirs of many of the most pathogenic viruses in humans. There are two bat genome projects currently underway, a circumstance that promises to speed the discovery host factors important in the coevolution of bats with their viruses. These genomes, however, are not yet assembled and one of them will provide only low coverage, making the inference of most genes of immunological interest error-prone. Many more wildlife genome projects are underway and intend to provide only shallow coverage. We have developed a statistical method for the assembly of gene families from partial genomes. The method takes full advantage of the quality scores generated by base-calling software, incorporating them into a complete probabilistic error model, to overcome the limitation inherent in the inference of gene family members from partial sequence information. We validated the method by inferring the human IFNA genes from the genome trace archives, and used it to infer 61 type-I interferon genes, and single type-II interferon genes in the bats Pteropus vampyrus and Myotis lucifugus. We confirmed our inferences by direct cloning and sequencing of IFNA, IFNB, IFND, and IFNK in P. vampyrus, and by demonstrating transcription of some of the inferred genes by known interferon-inducing stimuli. The statistical trace assembler described here provides a reliable method for extracting information from the many available and forthcoming partial or shallow genome sequencing projects, thereby facilitating the study of a wider variety of organisms with ecological and biomedical significance to humans than would otherwise be possible.
ERIC Educational Resources Information Center
Page, Robert; Satake, Eiki
2017-01-01
While interest in Bayesian statistics has been growing in statistics education, the treatment of the topic is still inadequate in both textbooks and the classroom. Because so many fields of study lead to careers that involve a decision-making process requiring an understanding of Bayesian methods, it is becoming increasingly clear that Bayesian…
Dimensionality reduction and polynomial chaos acceleration of Bayesian inference in inverse problems
Marzouk, Youssef M. Najm, Habib N.
2009-04-01
We consider a Bayesian approach to nonlinear inverse problems in which the unknown quantity is a spatial or temporal field, endowed with a hierarchical Gaussian process prior. Computational challenges in this construction arise from the need for repeated evaluations of the forward model (e.g., in the context of Markov chain Monte Carlo) and are compounded by high dimensionality of the posterior. We address these challenges by introducing truncated Karhunen-Loeve expansions, based on the prior distribution, to efficiently parameterize the unknown field and to specify a stochastic forward problem whose solution captures that of the deterministic forward model over the support of the prior. We seek a solution of this problem using Galerkin projection on a polynomial chaos basis, and use the solution to construct a reduced-dimensionality surrogate posterior density that is inexpensive to evaluate. We demonstrate the formulation on a transient diffusion equation with prescribed source terms, inferring the spatially-varying diffusivity of the medium from limited and noisy data.
NASA Astrophysics Data System (ADS)
Saatchi, R.
2004-03-01
The aim of the study was to automate the identification of a saccade-related visual evoked potential (EP) called the lambda wave. The lambda waves were extracted from single trials of electroencephalogram (EEG) waveforms using independent component analysis (ICA). A trial was a set of EEG waveforms recorded from 64 scalp electrode locations while a saccade was performed. Forty saccade-related EEG trials (recorded from four normal subjects) were used in the study. The number of waveforms per trial was reduced from 64 to 22 by pre-processing. The application of ICA to the resulting waveforms produced 880 components (i.e. 4 subjects × 10 trials per subject × 22 components per trial). The components were divided into 373 lambda and 507 nonlambda waves by visual inspection and then they were represented by one spatial and two temporal features. The classification performance of a Bayesian approach called predictive statistical diagnosis (PSD) was compared with that of a fuzzy logic approach called a fuzzy inference system (FIS). The outputs from the two classification approaches were then combined and the resulting discrimination accuracy was evaluated. For each approach, half the data from the lambda and nonlambda wave categories were used to determine the operating parameters of the classification schemes while the rest (i.e. the validation set) were used to evaluate their classification accuracies. The sensitivity and specificity values when the classification approaches were applied to the lambda wave validation data set were as follows: for the PSD 92.51% and 91.73% respectively, for the FIS 95.72% and 89.76% respectively, and for the combined FIS and PSD approach 97.33% and 97.24% respectively (classification threshold was 0.5). The devised signal processing techniques together with the classification approaches provided for an effective extraction and classification of the single-trial lambda waves. However, as only four subjects were included, it will be
Statistical Inference in the Wright-Fisher Model Using Allele Frequency Data.
Tataru, Paula; Simonsen, Maria; Bataillon, Thomas; Hobolth, Asger
2016-08-02
The Wright-Fisher model provides an elegant mathematical framework for understanding allele frequency data. In particular, the model can be used to infer the demographic history of species and identify loci under selection. A crucial quantity for inference under the Wright-Fisher model is the distribution of allele frequencies (DAF). Despite the apparent simplicity of the model, the calculation of the DAF is challenging. We review and discuss strategies for approximating the DAF, and how these are used in methods that perform inference from allele frequency data. Various evolutionary forces can be incorporated in the Wright-Fisher model, and we consider these in turn. We begin our review with the basic bi-allelic Wright-Fisher model where random genetic drift is the only evolutionary force. We then consider mutation, migration, and selection. In particular, we compare diffusion-based and moment-based methods in terms of accuracy, computational efficiency, and analytical tractability. We conclude with a brief overview of the multi-allelic process with a general mutation model. [Allele frequency, diffusion, inference, moments, selection, Wright-Fisher.].
Moura, Lidia Mvr; Westover, M Brandon; Kwasnik, David; Cole, Andrew J; Hsu, John
2017-01-01
The elderly population faces an increasing number of cases of chronic neurological conditions, such as epilepsy and Alzheimer's disease. Because the elderly with epilepsy are commonly excluded from randomized controlled clinical trials, there are few rigorous studies to guide clinical practice. When the elderly are eligible for trials, they either rarely participate or frequently have poor adherence to therapy, thus limiting both generalizability and validity. In contrast, large observational data sets are increasingly available, but are susceptible to bias when using common analytic approaches. Recent developments in causal inference-analytic approaches also introduce the possibility of emulating randomized controlled trials to yield valid estimates. We provide a practical example of the application of the principles of causal inference to a large observational data set of patients with epilepsy. This review also provides a framework for comparative-effectiveness research in chronic neurological conditions.
Moura, Lidia MVR; Westover, M Brandon; Kwasnik, David; Cole, Andrew J; Hsu, John
2017-01-01
The elderly population faces an increasing number of cases of chronic neurological conditions, such as epilepsy and Alzheimer’s disease. Because the elderly with epilepsy are commonly excluded from randomized controlled clinical trials, there are few rigorous studies to guide clinical practice. When the elderly are eligible for trials, they either rarely participate or frequently have poor adherence to therapy, thus limiting both generalizability and validity. In contrast, large observational data sets are increasingly available, but are susceptible to bias when using common analytic approaches. Recent developments in causal inference-analytic approaches also introduce the possibility of emulating randomized controlled trials to yield valid estimates. We provide a practical example of the application of the principles of causal inference to a large observational data set of patients with epilepsy. This review also provides a framework for comparative-effectiveness research in chronic neurological conditions. PMID:28115873
2011-04-30
Commander, Naval Sea Systems Command • Army Contracting Command, U.S. Army Materiel Command • Program Manager, Airborne, Maritime and Fixed Station...are in the area of the Design and Acquisition of Military Assets. Specific domains of interests include the concept of value and its integration...inference may point to areas where the test may be modified or additional control measures may be introduced to increase the likelihood of obtaining
Bayesian Statistical Inference in Ion-Channel Models with Exact Missed Event Correction.
Epstein, Michael; Calderhead, Ben; Girolami, Mark A; Sivilotti, Lucia G
2016-07-26
The stochastic behavior of single ion channels is most often described as an aggregated continuous-time Markov process with discrete states. For ligand-gated channels each state can represent a different conformation of the channel protein or a different number of bound ligands. Single-channel recordings show only whether the channel is open or shut: states of equal conductance are aggregated, so transitions between them have to be inferred indirectly. The requirement to filter noise from the raw signal further complicates the modeling process, as it limits the time resolution of the data. The consequence of the reduced bandwidth is that openings or shuttings that are shorter than the resolution cannot be observed; these are known as missed events. Postulated models fitted using filtered data must therefore explicitly account for missed events to avoid bias in the estimation of rate parameters and therefore assess parameter identifiability accurately. In this article, we present the first, to our knowledge, Bayesian modeling of ion-channels with exact missed events correction. Bayesian analysis represents uncertain knowledge of the true value of model parameters by considering these parameters as random variables. This allows us to gain a full appreciation of parameter identifiability and uncertainty when estimating values for model parameters. However, Bayesian inference is particularly challenging in this context as the correction for missed events increases the computational complexity of the model likelihood. Nonetheless, we successfully implemented a two-step Markov chain Monte Carlo method that we called "BICME", which performs Bayesian inference in models of realistic complexity. The method is demonstrated on synthetic and real single-channel data from muscle nicotinic acetylcholine channels. We show that parameter uncertainty can be characterized more accurately than with maximum-likelihood methods. Our code for performing inference in these ion channel
Using Action Research to Develop a Course in Statistical Inference for Workplace-Based Adults
ERIC Educational Resources Information Center
Forbes, Sharleen
2014-01-01
Many adults who need an understanding of statistical concepts have limited mathematical skills. They need a teaching approach that includes as little mathematical context as possible. Iterative participatory qualitative research (action research) was used to develop a statistical literacy course for adult learners informed by teaching in…
Using Action Research to Develop a Course in Statistical Inference for Workplace-Based Adults
ERIC Educational Resources Information Center
Forbes, Sharleen
2014-01-01
Many adults who need an understanding of statistical concepts have limited mathematical skills. They need a teaching approach that includes as little mathematical context as possible. Iterative participatory qualitative research (action research) was used to develop a statistical literacy course for adult learners informed by teaching in…
From a Logical Point of View: An Illuminating Perspective in Teaching Statistical Inference
ERIC Educational Resources Information Center
Sowey, Eric R
2005-01-01
Offering perspectives in the teaching of statistics assists students, immersed in the study of detail, to see the leading principles of the subject more clearly. Especially helpful can be a perspective on the logic of statistical inductive reasoning. Such a perspective can bring to prominence a broad principle on which both interval estimation and…
From a Logical Point of View: An Illuminating Perspective in Teaching Statistical Inference
ERIC Educational Resources Information Center
Sowey, Eric R
2005-01-01
Offering perspectives in the teaching of statistics assists students, immersed in the study of detail, to see the leading principles of the subject more clearly. Especially helpful can be a perspective on the logic of statistical inductive reasoning. Such a perspective can bring to prominence a broad principle on which both interval estimation and…
Nonequilibrium statistical mechanics in one-dimensional bose gases
NASA Astrophysics Data System (ADS)
Baldovin, F.; Cappellaro, A.; Orlandini, E.; Salasnich, L.
2016-06-01
We study cold dilute gases made of bosonic atoms, showing that in the mean-field one-dimensional regime they support stable out-of-equilibrium states. Starting from the 3D Boltzmann-Vlasov equation with contact interaction, we derive an effective 1D Landau-Vlasov equation under the condition of a strong transverse harmonic confinement. We investigate the existence of out-of-equilibrium states, obtaining stability criteria similar to those of classical plasmas.
Multiple processes in two-dimensional visual statistical learning
Hoshino, Eiichi; Mogi, Ken
2017-01-01
Knowledge about the arrangement of visual elements is an important aspect of perception. This study investigates whether humans learn rules of two-dimensional abstract patterns (exemplars) generated from Reber's artificial grammar. The key question is whether the subjects can implicitly learn them without explicit instructions, and, if so, how they use the acquired knowledge to judge new patterns (probes) in relation to their finite experience of the exemplars. The analysis was conducted using dissimilarities among patterns, which are defined with n-gram probabilities and the Levenshtein distance. The results show that subjects are able to learn rules of two-dimensional visual patterns (exemplars) and make categorical judgment of probes based on knowledge of exemplar-based representation. Our analysis revealed that subjects' judgments of probes were related to the degree of dissimilarities between the probes and exemplars. The result suggests the coexistence of configural and element-based processing in exemplar-based representations. Exemplar-based representation was preferred to prototypical representation through tasks requiring discrimination, recognition and working memory. Relations of the studied judgment processes to the neural basis are discussed. We conclude that knowledge of a finite experience of two-dimensional visual patterns would be crystalized in different levels of relations among visual elements. PMID:28212388
Salehi, Sohrab; Steif, Adi; Roth, Andrew; Aparicio, Samuel; Bouchard-Côté, Alexandre; Shah, Sohrab P
2017-03-01
Next-generation sequencing (NGS) of bulk tumour tissue can identify constituent cell populations in cancers and measure their abundance. This requires computational deconvolution of allelic counts from somatic mutations, which may be incapable of fully resolving the underlying population structure. Single cell sequencing (SCS) is a more direct method, although its replacement of NGS is impeded by technical noise and sampling limitations. We propose ddClone, which analytically integrates NGS and SCS data, leveraging their complementary attributes through joint statistical inference. We show on real and simulated datasets that ddClone produces more accurate results than can be achieved by either method alone.
Shakhnovich, Boris E; Harvey, John M; Comeau, Steve; Lorenz, David; DeLisi, Charles; Shakhnovich, Eugene
2003-01-01
The problem of functional annotation based on homology modeling is primary to current bioinformatics research. Researchers have noted regularities in sequence, structure and even chromosome organization that allow valid functional cross-annotation. However, these methods provide a lot of false negatives due to limited specificity inherent in the system. We want to create an evolutionarily inspired organization of data that would approach the issue of structure-function correlation from a new, probabilistic perspective. Such organization has possible applications in phylogeny, modeling of functional evolution and structural determination. ELISA (Evolutionary Lineage Inferred from Structural Analysis, ) is an online database that combines functional annotation with structure and sequence homology modeling to place proteins into sequence-structure-function "neighborhoods". The atomic unit of the database is a set of sequences and structural templates that those sequences encode. A graph that is built from the structural comparison of these templates is called PDUG (protein domain universe graph). We introduce a method of functional inference through a probabilistic calculation done on an arbitrary set of PDUG nodes. Further, all PDUG structures are mapped onto all fully sequenced proteomes allowing an easy interface for evolutionary analysis and research into comparative proteomics. ELISA is the first database with applicability to evolutionary structural genomics explicitly in mind. Availability: The database is available at . PMID:12952559
Reconstructing a B-cell clonal lineage. I. Statistical inference of unobserved ancestors.
Kepler, Thomas B
2013-01-01
One of the key phenomena in the adaptive immune response to infection and immunization is affinity maturation, during which antibody genes are mutated and selected, typically resulting in a substantial increase in binding affinity to the eliciting antigen. Advances in technology on several fronts have made it possible to clone large numbers of heavy-chain light-chain pairs from individual B cells and thereby identify whole sets of clonally related antibodies. These collections could provide the information necessary to reconstruct their own history - the sequence of changes introduced into the lineage during the development of the clone - and to study affinity maturation in detail. But the success of such a program depends entirely on accurately inferring the founding ancestor and the other unobserved intermediates. Given a set of clonally related immunoglobulin V-region genes, the method described here allows one to compute the posterior distribution over their possible ancestors, thereby giving a thorough accounting of the uncertainty inherent in the reconstruction. I demonstrate the application of this method on heavy-chain and light-chain clones, assess the reliability of the inference, and discuss the sources of uncertainty.
Reconstructing a B-cell clonal lineage. I. Statistical inference of unobserved ancestors
Kepler, Thomas B
2013-01-01
One of the key phenomena in the adaptive immune response to infection and immunization is affinity maturation, during which antibody genes are mutated and selected, typically resulting in a substantial increase in binding affinity to the eliciting antigen. Advances in technology on several fronts have made it possible to clone large numbers of heavy-chain light-chain pairs from individual B cells and thereby identify whole sets of clonally related antibodies. These collections could provide the information necessary to reconstruct their own history - the sequence of changes introduced into the lineage during the development of the clone - and to study affinity maturation in detail. But the success of such a program depends entirely on accurately inferring the founding ancestor and the other unobserved intermediates. Given a set of clonally related immunoglobulin V-region genes, the method described here allows one to compute the posterior distribution over their possible ancestors, thereby giving a thorough accounting of the uncertainty inherent in the reconstruction. I demonstrate the application of this method on heavy-chain and light-chain clones, assess the reliability of the inference, and discuss the sources of uncertainty. PMID:24555054
Sandoval-Castellanos, Edson; Palkopoulou, Eleftheria; Dalén, Love
2014-01-01
Inference of population demographic history has vastly improved in recent years due to a number of technological and theoretical advances including the use of ancient DNA. Approximate Bayesian computation (ABC) stands among the most promising methods due to its simple theoretical fundament and exceptional flexibility. However, limited availability of user-friendly programs that perform ABC analysis renders it difficult to implement, and hence programming skills are frequently required. In addition, there is limited availability of programs able to deal with heterochronous data. Here we present the software BaySICS: Bayesian Statistical Inference of Coalescent Simulations. BaySICS provides an integrated and user-friendly platform that performs ABC analyses by means of coalescent simulations from DNA sequence data. It estimates historical demographic population parameters and performs hypothesis testing by means of Bayes factors obtained from model comparisons. Although providing specific features that improve inference from datasets with heterochronous data, BaySICS also has several capabilities making it a suitable tool for analysing contemporary genetic datasets. Those capabilities include joint analysis of independent tables, a graphical interface and the implementation of Markov-chain Monte Carlo without likelihoods.
A statistical formulation of one-dimensional electron fluid turbulence
NASA Technical Reports Server (NTRS)
Fyfe, D.; Montgomery, D.
1977-01-01
A one-dimensional electron fluid model is investigated using the mathematical methods of modern fluid turbulence theory. Non-dissipative equilibrium canonical distributions are determined in a phase space whose co-ordinates are the real and imaginary parts of the Fourier coefficients for the field variables. Spectral densities are calculated, yielding a wavenumber electric field energy spectrum proportional to k to the negative second power for large wavenumbers. The equations of motion are numerically integrated and the resulting spectra are found to compare well with the theoretical predictions.
NASA Astrophysics Data System (ADS)
Al-Yousef, Ali Abdallah
Reservoir characterization is one of the most important factors in successful reservoir management. In water injection projects, a knowledge of reservoir heterogeneities and discontinuities is particularly important to maximize oil recovery. This research project presents a new technique to quantify communication between injection and production wells in a reservoir based on temporal fluctuations in rates. The technique combines a nonlinear signal processing model and multiple linear regression (MLR) to provide information about permeability trends and the presence of flow barriers. The method was tested in synthetic fields using rates generated by a numerical simulator and then applied to producing fields in Argentina, the North Sea, Texas, and Wyoming. Results indicate that the model coefficients (weights) between wells are consistent with the known geology and relative location between wells; they are independent of injection/production rates. The developed procedure provides parameters (time constants) that explicitly indicate the attenuation and time lag between injector and producer pairs. The new procedure allows for a better insight into the well-to-well connectivities for the fields than MLR. Complex geological conditions are often not easily identified using the weights and time constants values individually. However, combining both sets of parameters in certain representations enhances the inference about the geological features. The applications of the new representations to numerically simulated fields and then to real fields indicate that these representations are capable of identifying whether the connectivity of an injector-producer well pair is through fractures, a high-permeability layer, or through partially completed wells. The technique may produce negative weights for some well pairs. Because there is no physical explanation in waterfloods for negative weights, these are also investigated. The negative weights have at least three causes
Inferring the hosts of coronavirus using dual statistical models based on nucleotide composition.
Tang, Qin; Song, Yulong; Shi, Mijuan; Cheng, Yingyin; Zhang, Wanting; Xia, Xiao-Qin
2015-11-26
Many coronaviruses are capable of interspecies transmission. Some of them have caused worldwide panic as emerging human pathogens in recent years, e.g., severe acute respiratory syndrome coronavirus (SARS-CoV) and Middle East respiratory syndrome coronavirus (MERS-CoV). In order to assess their threat to humans, we explored to infer the potential hosts of coronaviruses using a dual-model approach based on nineteen parameters computed from spike genes of coronaviruses. Both the support vector machine (SVM) model and the Mahalanobis distance (MD) discriminant model achieved high accuracies in leave-one-out cross-validation of training data consisting of 730 representative coronaviruses (99.86% and 98.08% respectively). Predictions on 47 additional coronaviruses precisely conformed to conclusions or speculations by other researchers. Our approach is implemented as a web server that can be accessed at http://bioinfo.ihb.ac.cn/seq2hosts.
Hirose, Osamu; Yoshida, Ryo; Imoto, Seiya; Yamaguchi, Rui; Higuchi, Tomoyuki; Charnock-Jones, D Stephen; Print, Cristin; Miyano, Satoru
2008-04-01
Statistical inference of gene networks by using time-course microarray gene expression profiles is an essential step towards understanding the temporal structure of gene regulatory mechanisms. Unfortunately, most of the current studies have been limited to analysing a small number of genes because the length of time-course gene expression profiles is fairly short. One promising approach to overcome such a limitation is to infer gene networks by exploring the potential transcriptional modules which are sets of genes sharing a common function or involved in the same pathway. In this article, we present a novel approach based on the state space model to identify the transcriptional modules and module-based gene networks simultaneously. The state space model has the potential to infer large-scale gene networks, e.g. of order 10(3), from time-course gene expression profiles. Particularly, we succeeded in the identification of a cell cycle system by using the gene expression profiles of Saccharomyces cerevisiae in which the length of the time-course and number of genes were 24 and 4382, respectively. However, when analysing shorter time-course data, e.g. of length 10 or less, the parameter estimations of the state space model often fail due to overfitting. To extend the applicability of the state space model, we provide an approach to use the technical replicates of gene expression profiles, which are often measured in duplicate or triplicate. The use of technical replicates is important for achieving highly-efficient inferences of gene networks with short time-course data. The potential of the proposed method has been demonstrated through the time-course analysis of the gene expression profiles of human umbilical vein endothelial cells (HUVECs) undergoing growth factor deprivation-induced apoptosis. Supplementary Information and the software (TRANS-MNET) are available at http://daweb.ism.ac.jp/~yoshidar/software/ssm/.
Statistical Inference on Optimal Points to Evaluate Multi-State Classification Systems
2014-09-18
invasive tool for evaluating graft function [37]. This study suggested that the biomarkers transforming growth factor-β1 (TGF- β1), angiotensinogen ( AGT ...broken into three classes: NKF, NKF+, CAN) to classify kidney function. Feature Class Mean Standard Deviation Median Range AGT NKF 15.47 16.02 8.02...utilized both the AGT and TGF-β1 biomarkers, splitting the two dimensional parameter space into regions for classification using arrays. However, the
Multivariate statistics process control for dimensionality reduction in structural assessment
NASA Astrophysics Data System (ADS)
Mujica, L. E.; Vehí, J.; Ruiz, M.; Verleysen, M.; Staszewski, W.; Worden, K.
2008-01-01
This paper presents advantages of using techniques like principal component analysis (PCA), partial least square (PLS) and some extensions called multiway PCA (MPCA) and multiway PLS (MPLS) for reducing dimensionality in damage identification problem, in particular, detecting and locating impacts in a part of a commercial aircraft wing flap. It is shown that applying MPCA and MPLS is convenient in systems which many sensors are monitoring the structures, because the reciprocal relation between signals is considered. The methodology used for detecting and locating the impact uses the philosophy of case-based reasoning, where single PCA and PLS are used also for organizing previous knowledge in memory. Sixteen approaches combining those techniques have been performed. Results from all of them are presented, compared and discussed.
Statistical validation of high-dimensional models of growing networks
NASA Astrophysics Data System (ADS)
Medo, Matúš
2014-03-01
The abundance of models of complex networks and the current insufficient validation standards make it difficult to judge which models are strongly supported by data and which are not. We focus here on likelihood maximization methods for models of growing networks with many parameters and compare their performance on artificial and real datasets. While high dimensionality of the parameter space harms the performance of direct likelihood maximization on artificial data, this can be improved by introducing a suitable penalization term. Likelihood maximization on real data shows that the presented approach is able to discriminate among available network models. To make large-scale datasets accessible to this kind of analysis, we propose a subset sampling technique and show that it yields substantial model evidence in a fraction of time necessary for the analysis of the complete data.
NASA Astrophysics Data System (ADS)
Hashim, Mohammad Firdaus Abu; Ramakrishnan, Sivakumar; Mohamad, Ahmad Azmin
2014-06-01
Due to low environmental impact and rechargeable capability, the Nickel Metal Hydride battery has been considered to be one of the most promising candidate battery for electrical vehicle nowadays. The energy delivered by the Nickel Metal Hydride battery depends heavily on its discharge profile and generally it is intangible to tract the trend of the energydissipation that is stored in the battery for informative analysis. The thermal models were developed in 1-dimensional and 2-dimensional using Matlab and these models are capable of predicting the temperature distributions inside a cell. The simulated results were validated and verified with referred exact sources of experimental data using Minitab software. The result for 1-Dimensional showed that the correlations between experimental and predicted results for the time intervals 60 minutes, 90 minutes, and 114 minutes frompositive to negative electrode thermal dissipationdirection are34%, 83%, and 94% accordingly while for the 2-Dimensional the correlational results for the same above time intervals are44%, 93% and 95%. These correlationalresults between experimental and predicted clearly indicating the thermal behavior under natural convention can be well fitted after around 90 minutes durational time and 2-Dimensional model can predict the results more accurately compared to 1-Dimensional model. Based on the results obtained from simulations, it can be concluded that both 1-Dimensional and 2-Dimensional models can predict nearly similar thermal behavior under natural convention while 2-Dimensional model was used to predict thermal behavior under forced convention for better accuracy.
NASA Astrophysics Data System (ADS)
Ravela, S.
2016-12-01
We propose that statistically identifying a compact manifold on which the solution of nonlinear, possibly chaotic, dynamical systems lie can provide a new mechanism for many difficult geophysical inference problems in the presence of uncertainty. This includes managing uncertainty and propagating it, synthesizing reduced models, data assimilation, and adaptive sampling. In this talk, we used randomized representations of manifolds produced from snapshots of initial condition (or parameter) perturbations and then show that a diffeomorphic realignment in the presence of observations is effective data assimilation procedure that offers improvements over mixture and kernel-based methods for non-Gaussian inference. We then discuss a targeted re-sampling strategy on the manifold to improve uncertainty representation, and further propagate uncertainty. Examples from chaotic and nonlinear dynamical systems suggests that this approach is promising for solving the inference problems and shows direct relevance of manifold learning to geophysical problems.
Recent Developments in Statistical Inference: Quasi-Experiments and Perquimans County.
ERIC Educational Resources Information Center
Cox, Gary W.
1988-01-01
Critiques "The Statistical Analysis of Quasi-Experiments" by Achen and examines its relevance for historians. Discusses the problems that arise when quasi-experiments involving nonrandom assignment or nonrandom selection are analyzed as if they were true experiments. Concludes that Achen's book will help historians recognize these…
Statistical inference of selection and divergence of rice blast resistance gene Pi-ta
USDA-ARS?s Scientific Manuscript database
The resistance gene Pi-ta has been effectively used to control rice blast disease worldwide. A few recent studies have described the possible evolution of Pi-ta in cultivated and weedy rice. However, evolutionary statistics used for the studies are too limited to precisely understand selection and d...
Computer-Based Instruction in Statistical Inference; Final Report. Technical Memorandum (TM Series).
ERIC Educational Resources Information Center
Rosenbaum, J.; And Others
A two-year investigation into the development of computer-assisted instruction (CAI) for the improvement of undergraduate training in statistics was undertaken. The first year was largely devoted to designing PLANIT (Programming LANguage for Interactive Teaching) which reduces, or completely eliminates, the need an author of CAI lessons would…
Fu, Ji-Meng; Winchester, J.W. )
1994-03-01
Nitrogen in fresh waters of three rivers in northern Florida-the Apalachicola-Chattahoochee-Flint (ACF) River system, Ochlockonee (Och), and Sopchoppy (Sop)- is inferred to be derived mostly from atmospheric deposition. Because the N:P mole ratios in the rivers are nearly three times higher than the Redfield ratio for aquatic photosynthesis, N is saturate in the ecosystems, not a limiting nutrient, although it may be chemically transformed. Absolute principal component analysis (APCA), a receptor model, was applied to many years of monitoring data for Apalachicola River water and rainfall over its basin in order to better understand aquatic chemistry of nitrogen in the watershed. The APCA model aged rain and groundwater. In the fresh rain component, the ratio of atmospheric nitrate to sulfate is close to that in rainwater, as if some samples had been collected following very recent rainfall. The aged rain component of the river water is distinguished by a low NO[sup [minus][sub 3
Statistical inference for the additive hazards model under outcome-dependent sampling.
Yu, Jichang; Liu, Yanyan; Sandler, Dale P; Zhou, Haibo
2015-09-01
Cost-effective study design and proper inference procedures for data from such designs are always of particular interests to study investigators. In this article, we propose a biased sampling scheme, an outcome-dependent sampling (ODS) design for survival data with right censoring under the additive hazards model. We develop a weighted pseudo-score estimator for the regression parameters for the proposed design and derive the asymptotic properties of the proposed estimator. We also provide some suggestions for using the proposed method by evaluating the relative efficiency of the proposed method against simple random sampling design and derive the optimal allocation of the subsamples for the proposed design. Simulation studies show that the proposed ODS design is more powerful than other existing designs and the proposed estimator is more efficient than other estimators. We apply our method to analyze a cancer study conducted at NIEHS, the Cancer Incidence and Mortality of Uranium Miners Study, to study the risk of radon exposure to cancer.
Statistical inference for the additive hazards model under outcome-dependent sampling
Yu, Jichang; Liu, Yanyan; Sandler, Dale P.; Zhou, Haibo
2015-01-01
Cost-effective study design and proper inference procedures for data from such designs are always of particular interests to study investigators. In this article, we propose a biased sampling scheme, an outcome-dependent sampling (ODS) design for survival data with right censoring under the additive hazards model. We develop a weighted pseudo-score estimator for the regression parameters for the proposed design and derive the asymptotic properties of the proposed estimator. We also provide some suggestions for using the proposed method by evaluating the relative efficiency of the proposed method against simple random sampling design and derive the optimal allocation of the subsamples for the proposed design. Simulation studies show that the proposed ODS design is more powerful than other existing designs and the proposed estimator is more efficient than other estimators. We apply our method to analyze a cancer study conducted at NIEHS, the Cancer Incidence and Mortality of Uranium Miners Study, to study the risk of radon exposure to cancer. PMID:26379363
Soap film flows: Statistics of two-dimensional turbulence
Vorobieff, P.; Rivera, M.; Ecke, R.E.
1999-08-01
Soap film flows provide a very convenient laboratory model for studies of two-dimensional (2-D) hydrodynamics including turbulence. For a gravity-driven soap film channel with a grid of equally spaced cylinders inserted in the flow, we have measured the simultaneous velocity and thickness fields in the irregular flow downstream from the cylinders. The velocity field is determined by a modified digital particle image velocimetry method and the thickness from the light scattered by the particles in the film. From these measurements, we compute the decay of mean energy, enstrophy, and thickness fluctuations with downstream distance, and the structure functions of velocity, vorticity, thickness fluctuation, and vorticity flux. From these quantities we determine the microscale Reynolds number of the flow R{sub {lambda}}{approx}100 and the integral and dissipation scales of 2D turbulence. We also obtain quantitative measures of the degree to which our flow can be considered incompressible and isotropic as a function of downstream distance. We find coarsening of characteristic spatial scales, qualitative correspondence of the decay of energy and enstrophy with the Batchelor model, scaling of energy in {ital k} space consistent with the k{sup {minus}3} spectrum of the Kraichnan{endash}Batchelor enstrophy-scaling picture, and power-law scalings of the structure functions of velocity, vorticity, vorticity flux, and thickness. These results are compared with models of 2-D turbulence and with numerical simulations. {copyright} {ital 1999 American Institute of Physics.}
Fragmentation and exfoliation of 2-dimensional materials: a statistical approach
NASA Astrophysics Data System (ADS)
Kouroupis-Agalou, Konstantinos; Liscio, Andrea; Treossi, Emanuele; Ortolani, Luca; Morandi, Vittorio; Pugno, Nicola Maria; Palermo, Vincenzo
2014-05-01
The main advantage for applications of graphene and related 2D materials is that they can be produced on large scales by liquid phase exfoliation. The exfoliation process shall be considered as a particular fragmentation process, where the 2D character of the exfoliated objects will influence significantly fragmentation dynamics as compared to standard materials. Here, we used automatized image processing of Atomic Force Microscopy (AFM) data to measure, one by one, the exact shape and size of thousands of nanosheets obtained by exfoliation of an important 2D-material, boron nitride, and used different statistical functions to model the asymmetric distribution of nanosheet sizes typically obtained. Being the resolution of AFM much larger than the average sheet size, analysis could be performed directly at the nanoscale and at the single sheet level. We find that the size distribution of the sheets at a given time follows a log-normal distribution, indicating that the exfoliation process has a ``typical'' scale length that changes with time and that exfoliation proceeds through the formation of a distribution of random cracks that follow Poisson statistics. The validity of this model implies that the size distribution does not depend on the different preparation methods used, but is a common feature in the exfoliation of this material and thus probably for other 2D materials.The main advantage for applications of graphene and related 2D materials is that they can be produced on large scales by liquid phase exfoliation. The exfoliation process shall be considered as a particular fragmentation process, where the 2D character of the exfoliated objects will influence significantly fragmentation dynamics as compared to standard materials. Here, we used automatized image processing of Atomic Force Microscopy (AFM) data to measure, one by one, the exact shape and size of thousands of nanosheets obtained by exfoliation of an important 2D-material, boron nitride, and used
Jacobs, Kevin B; Yeager, Meredith; Wacholder, Sholom; Craig, David; Kraft, Peter; Hunter, David J; Paschal, Justin; Manolio, Teri A; Tucker, Margaret; Hoover, Robert N; Thomas, Gilles D; Chanock, Stephen J; Chatterjee, Nilanjan
2009-11-01
Aggregate results from genome-wide association studies (GWAS), such as genotype frequencies for cases and controls, were until recently often made available on public websites because they were thought to disclose negligible information concerning an individual's participation in a study. Homer et al. recently suggested that a method for forensic detection of an individual's contribution to an admixed DNA sample could be applied to aggregate GWAS data. Using a likelihood-based statistical framework, we developed an improved statistic that uses genotype frequencies and individual genotypes to infer whether a specific individual or any close relatives participated in the GWAS and, if so, what the participant's phenotype status is. Our statistic compares the logarithm of genotype frequencies, in contrast to that of Homer et al., which is based on differences in either SNP probe intensity or allele frequencies. We derive the theoretical power of our test statistics and explore the empirical performance in scenarios with varying numbers of randomly chosen or top-associated SNPs.
Statistical inference for classification of RRIM clone series using near IR reflectance properties
NASA Astrophysics Data System (ADS)
Ismail, Faridatul Aima; Madzhi, Nina Korlina; Hashim, Hadzli; Abdullah, Noor Ezan; Khairuzzaman, Noor Aishah; Azmi, Azrie Faris Mohd; Sampian, Ahmad Faiz Mohd; Harun, Muhammad Hafiz
2015-08-01
RRIM clone is a rubber breeding series produced by RRIM (Rubber Research Institute of Malaysia) through "rubber breeding program" to improve latex yield and producing clones attractive to farmers. The objective of this work is to analyse measurement of optical sensing device on latex of selected clone series. The device using transmitting NIR properties and its reflectance is converted in terms of voltage. The obtained reflectance index value via voltage was analyzed using statistical technique in order to find out the discrimination among the clones. From the statistical results using error plots and one-way ANOVA test, there is an overwhelming evidence showing discrimination of RRIM 2002, RRIM 2007 and RRIM 3001 clone series with p value = 0.000. RRIM 2008 cannot be discriminated with RRIM 2014; however both of these groups are distinct from the other clones.
NASA Technical Reports Server (NTRS)
Abbey, Craig K.; Eckstein, Miguel P.
2002-01-01
We consider estimation and statistical hypothesis testing on classification images obtained from the two-alternative forced-choice experimental paradigm. We begin with a probabilistic model of task performance for simple forced-choice detection and discrimination tasks. Particular attention is paid to general linear filter models because these models lead to a direct interpretation of the classification image as an estimate of the filter weights. We then describe an estimation procedure for obtaining classification images from observer data. A number of statistical tests are presented for testing various hypotheses from classification images based on some more compact set of features derived from them. As an example of how the methods we describe can be used, we present a case study investigating detection of a Gaussian bump profile.
NASA Technical Reports Server (NTRS)
Abbey, Craig K.; Eckstein, Miguel P.
2002-01-01
We consider estimation and statistical hypothesis testing on classification images obtained from the two-alternative forced-choice experimental paradigm. We begin with a probabilistic model of task performance for simple forced-choice detection and discrimination tasks. Particular attention is paid to general linear filter models because these models lead to a direct interpretation of the classification image as an estimate of the filter weights. We then describe an estimation procedure for obtaining classification images from observer data. A number of statistical tests are presented for testing various hypotheses from classification images based on some more compact set of features derived from them. As an example of how the methods we describe can be used, we present a case study investigating detection of a Gaussian bump profile.
Exploring the Connection Between Sampling Problems in Bayesian Inference and Statistical Mechanics
NASA Technical Reports Server (NTRS)
Pohorille, Andrew
2006-01-01
The Bayesian and statistical mechanical communities often share the same objective in their work - estimating and integrating probability distribution functions (pdfs) describing stochastic systems, models or processes. Frequently, these pdfs are complex functions of random variables exhibiting multiple, well separated local minima. Conventional strategies for sampling such pdfs are inefficient, sometimes leading to an apparent non-ergodic behavior. Several recently developed techniques for handling this problem have been successfully applied in statistical mechanics. In the multicanonical and Wang-Landau Monte Carlo (MC) methods, the correct pdfs are recovered from uniform sampling of the parameter space by iteratively establishing proper weighting factors connecting these distributions. Trivial generalizations allow for sampling from any chosen pdf. The closely related transition matrix method relies on estimating transition probabilities between different states. All these methods proved to generate estimates of pdfs with high statistical accuracy. In another MC technique, parallel tempering, several random walks, each corresponding to a different value of a parameter (e.g. "temperature"), are generated and occasionally exchanged using the Metropolis criterion. This method can be considered as a statistically correct version of simulated annealing. An alternative approach is to represent the set of independent variables as a Hamiltonian system. Considerab!e progress has been made in understanding how to ensure that the system obeys the equipartition theorem or, equivalently, that coupling between the variables is correctly described. Then a host of techniques developed for dynamical systems can be used. Among them, probably the most powerful is the Adaptive Biasing Force method, in which thermodynamic integration and biased sampling are combined to yield very efficient estimates of pdfs. The third class of methods deals with transitions between states described
Li, Changyang; Wang, Xiuying; Eberl, Stefan; Fulham, Michael; Yin, Yong; Dagan Feng, David
2015-01-01
Automated and general medical image segmentation can be challenging because the foreground and the background may have complicated and overlapping density distributions in medical imaging. Conventional region-based level set algorithms often assume piecewise constant or piecewise smooth for segments, which are implausible for general medical image segmentation. Furthermore, low contrast and noise make identification of the boundaries between foreground and background difficult for edge-based level set algorithms. Thus, to address these problems, we suggest a supervised variational level set segmentation model to harness the statistical region energy functional with a weighted probability approximation. Our approach models the region density distributions by using the mixture-of-mixtures Gaussian model to better approximate real intensity distributions and distinguish statistical intensity differences between foreground and background. The region-based statistical model in our algorithm can intuitively provide better performance on noisy images. We constructed a weighted probability map on graphs to incorporate spatial indications from user input with a contextual constraint based on the minimization of contextual graphs energy functional. We measured the performance of our approach on ten noisy synthetic images and 58 medical datasets with heterogeneous intensities and ill-defined boundaries and compared our technique to the Chan-Vese region-based level set model, the geodesic active contour model with distance regularization, and the random walker model. Our method consistently achieved the highest Dice similarity coefficient when compared to the other methods.
Inferring earthquake statistics from soft-glass dynamics below yield stress
NASA Astrophysics Data System (ADS)
Kumar, Pinaki; Toschi, Federico; Benzi, Roberto; Trampert, Jeannot
2016-11-01
The current practice to generate synthetic earthquake catalogs employs purely statistical models, mechanical methods based on ad-hoc constitutive friction laws or a combination of the above. We adopt a new numerical approach based on the multi-component Lattice Boltzmann method to simulate yield stress materials. Below yield stress, under shear forcing, we find that the highly intermittent in time, irreversible T1 topological changes in the soft-glass (termed plastic events) bear a statistical resemblance to seismic events, radiating elastic perturbations through the system. Statistical analysis reveals scaling laws for magnitude similar to the Gutenberg-Richter law for quakes, a recurrence time scale with similar slope, a well-defined clustering of events into causal-aftershock sequences and Poisson events leading to the Omori law. Additionally space intermittency reveals a complex multi-fractal structure, like real quakes, and a characterization of the stick-slip behavior in terms of avalanche size and time distribution agrees with the de-pinning transition. The model system once properly tuned using real earthquake data, may help highlighting the origin of scaling in phenomenological seismic power laws. This research was partly funded by the Shell-NWO/FOM programme "Computational sciences for energy research" under Project Number 14CSER022.
Statistical inference on censored data for targeted clinical trials under enrichment design.
Chen, Chen-Fang; Lin, Jr-Rung; Liu, Jen-Pei
2013-01-01
For the traditional clinical trials, inclusion and exclusion criteria are usually based on some clinical endpoints; the genetic or genomic variability of the trial participants are not totally utilized in the criteria. After completion of the human genome project, the disease targets at the molecular level can be identified and can be utilized for the treatment of diseases. However, the accuracy of diagnostic devices for identification of such molecular targets is usually not perfect. Some of the patients enrolled in targeted clinical trials with a positive result for the molecular target might not have the specific molecular targets. As a result, the treatment effect may be underestimated in the patient population truly with the molecular target. To resolve this issue, under the exponential distribution, we develop inferential procedures for the treatment effects of the targeted drug based on the censored endpoints in the patients truly with the molecular targets. Under an enrichment design, we propose using the expectation-maximization algorithm in conjunction with the bootstrap technique to incorporate the inaccuracy of the diagnostic device for detection of the molecular targets on the inference of the treatment effects. A simulation study was conducted to empirically investigate the performance of the proposed methods. Simulation results demonstrate that under the exponential distribution, the proposed estimator is nearly unbiased with adequate precision, and the confidence interval can provide adequate coverage probability. In addition, the proposed testing procedure can adequately control the size with sufficient power. On the other hand, when the proportional hazard assumption is violated, additional simulation studies show that the type I error rate is not controlled at the nominal level and is an increasing function of the positive predictive value. A numerical example illustrates the proposed procedures.
Bayesian inference based on dual generalized order statistics from the exponentiated Weibull model
NASA Astrophysics Data System (ADS)
Al Sobhi, Mashail M.
2015-02-01
Bayesian estimation for the two parameters and the reliability function of the exponentiated Weibull model are obtained based on dual generalized order statistics (DGOS). Also, Bayesian prediction bounds for future DGOS from exponentiated Weibull model are obtained. The symmetric and asymmetric loss functions are considered for Bayesian computations. The Markov chain Monte Carlo (MCMC) methods are used for computing the Bayes estimates and prediction bounds. The results have been specialized to the lower record values. Comparisons are made between Bayesian and maximum likelihood estimators via Monte Carlo simulation.
Teshima, Tara Lynn; Patel, Vaibhav; Mainprize, James G; Edwards, Glenn; Antonyshyn, Oleh M
2015-07-01
The utilization of three-dimensional modeling technology in craniomaxillofacial surgery has grown exponentially during the last decade. Future development, however, is hindered by the lack of a normative three-dimensional anatomic dataset and a statistical mean three-dimensional virtual model. The purpose of this study is to develop and validate a protocol to generate a statistical three-dimensional virtual model based on a normative dataset of adult skulls. Two hundred adult skull CT images were reviewed. The average three-dimensional skull was computed by processing each CT image in the series using thin-plate spline geometric morphometric protocol. Our statistical average three-dimensional skull was validated by reconstructing patient-specific topography in cranial defects. The experiment was repeated 4 times. In each case, computer-generated cranioplasties were compared directly to the original intact skull. The errors describing the difference between the prediction and the original were calculated. A normative database of 33 adult human skulls was collected. Using 21 anthropometric landmark points, a protocol for three-dimensional skull landmarking and data reduction was developed and a statistical average three-dimensional skull was generated. Our results show the root mean square error (RMSE) for restoration of a known defect using the native best match skull, our statistical average skull, and worst match skull was 0.58, 0.74, and 4.4 mm, respectively. The ability to statistically average craniofacial surface topography will be a valuable instrument for deriving missing anatomy in complex craniofacial defects and deficiencies as well as in evaluating morphologic results of surgery.
NASA Astrophysics Data System (ADS)
Goddard, C. R.; Pascoe, D. J.; Anfinogentov, S.; Nakariakov, V. M.
2017-09-01
Aims: We carry out a statistical study of the inferred coronal loop cross-sectional density profiles using extreme ultraviolet (EUV) imaging data from the Atmospheric Imaging Assembly (AIA) on board the Solar Dynamics Observatory (SDO). Methods: We analysed 233 coronal loops observed during 2015/2016. We consider three models for the density profile; the step function (model S), the linear transition region profile (model L), and a Gaussian profile (model G). Bayesian inference is used to compare the three corresponding forward modelled intensity profiles for each loop. These are constructed by integrating the square of the density from a cylindrical loop cross-section along the line of sight, assuming an isothermal cross-section, and applying the instrumental point spread function. Results: Calculating the Bayes factors for comparisons between the models, it was found that in 47% of cases there is very strong evidence for model L over model S and in 45% of cases very strong evidence for model G over S. Using multiple permutations of the Bayes factor the favoured density profile for each loop was determined for multiple evidence thresholds. There were a similar number of cases where model L or G are favoured, showing evidence for inhomogeneous layers and constantly varying density cross-sections, subject to our assumptions and simplifications. Conclusions: For sufficiently well resolved loop threads with no visible substructure it has been shown that using Bayesian inference and the observed intensity profile we can distinguish between the proposed density profiles at a given AIA wavelength and spatial resolution. We have found very strong evidence for inhomogeneous layers, with model L being the most general, and a tendency towards thicker or even continuous layers.
NASA Astrophysics Data System (ADS)
Calderon, Christopher P.; Weiss, Lucien E.; Moerner, W. E.
2014-05-01
Experimental advances have improved the two- (2D) and three-dimensional (3D) spatial resolution that can be extracted from in vivo single-molecule measurements. This enables researchers to quantitatively infer the magnitude and directionality of forces experienced by biomolecules in their native environment. Situations where such force information is relevant range from mitosis to directed transport of protein cargo along cytoskeletal structures. Models commonly applied to quantify single-molecule dynamics assume that effective forces and velocity in the x ,y (or x ,y,z) directions are statistically independent, but this assumption is physically unrealistic in many situations. We present a hypothesis testing approach capable of determining if there is evidence of statistical dependence between positional coordinates in experimentally measured trajectories; if the hypothesis of independence between spatial coordinates is rejected, then a new model accounting for 2D (3D) interactions can and should be considered. Our hypothesis testing technique is robust, meaning it can detect interactions, even if the noise statistics are not well captured by the model. The approach is demonstrated on control simulations and on experimental data (directed transport of intraflagellar transport protein 88 homolog in the primary cilium).
Anderson, Eric C
2012-11-08
Advances in genotyping that allow tens of thousands of individuals to be genotyped at a moderate number of single nucleotide polymorphisms (SNPs) permit parentage inference to be pursued on a very large scale. The intergenerational tagging this capacity allows is revolutionizing the management of cultured organisms (cows, salmon, etc.) and is poised to do the same for scientific studies of natural populations. Currently, however, there are no likelihood-based methods of parentage inference which are implemented in a manner that allows them to quickly handle a very large number of potential parents or parent pairs. Here we introduce an efficient likelihood-based method applicable to the specialized case of cultured organisms in which both parents can be reliably sampled. We develop a Markov chain representation for the cumulative number of Mendelian incompatibilities between an offspring and its putative parents and we exploit it to develop a fast algorithm for simulation-based estimates of statistical confidence in SNP-based assignments of offspring to pairs of parents. The method is implemented in the freely available software SNPPIT. We describe the method in detail, then assess its performance in a large simulation study using known allele frequencies at 96 SNPs from ten hatchery salmon populations. The simulations verify that the method is fast and accurate and that 96 well-chosen SNPs can provide sufficient power to identify the correct pair of parents from amongst millions of candidate pairs.
Statistical Inference in Hidden Markov Models Using k-Segment Constraints
Titsias, Michalis K.; Holmes, Christopher C.; Yau, Christopher
2016-01-01
Hidden Markov models (HMMs) are one of the most widely used statistical methods for analyzing sequence data. However, the reporting of output from HMMs has largely been restricted to the presentation of the most-probable (MAP) hidden state sequence, found via the Viterbi algorithm, or the sequence of most probable marginals using the forward–backward algorithm. In this article, we expand the amount of information we could obtain from the posterior distribution of an HMM by introducing linear-time dynamic programming recursions that, conditional on a user-specified constraint in the number of segments, allow us to (i) find MAP sequences, (ii) compute posterior probabilities, and (iii) simulate sample paths. We collectively call these recursions k-segment algorithms and illustrate their utility using simulated and real examples. We also highlight the prospective and retrospective use of k-segment constraints for fitting HMMs or exploring existing model fits. Supplementary materials for this article are available online. PMID:27226674
Simulations and cosmological inference: A statistical model for power spectra means and covariances
Schneider, Michael D.; Knox, Lloyd; Habib, Salman; Heitmann, Katrin; Higdon, David; Nakhleh, Charles
2008-09-15
We describe an approximate statistical model for the sample variance distribution of the nonlinear matter power spectrum that can be calibrated from limited numbers of simulations. Our model retains the common assumption of a multivariate normal distribution for the power spectrum band powers but takes full account of the (parameter-dependent) power spectrum covariance. The model is calibrated using an extension of the framework in Habib et al. (2007) to train Gaussian processes for the power spectrum mean and covariance given a set of simulation runs over a hypercube in parameter space. We demonstrate the performance of this machinery by estimating the parameters of a power-law model for the power spectrum. Within this framework, our calibrated sample variance distribution is robust to errors in the estimated covariance and shows rapid convergence of the posterior parameter constraints with the number of training simulations.
NASA Astrophysics Data System (ADS)
Doss, F. W.; Drake, R. P.; Kuranz, C. C.
2011-11-01
A laser-driven experiment produces images of dense shocked material by x-ray transmission. The post-shock material is sufficiently dense that no significant signal passes through the dense layer, and therefore the shock compression cannot be directly measured by comparing transmitted intensities. One could try to determine the shock compression ratio by observing the ratio of the total distance travelled by the shock to the dense post-shock layer width, but small deviations of the angle of the shock with respect to the angle of imaging create large asymmetric errors in observation. A statistical approach to recovering shock compression by appropriately combining data from several experiments is developed, using fits to a simple model for the shock and shock tube geometry.
Statistical Inference in Hidden Markov Models Using k-Segment Constraints.
Titsias, Michalis K; Holmes, Christopher C; Yau, Christopher
2016-01-02
Hidden Markov models (HMMs) are one of the most widely used statistical methods for analyzing sequence data. However, the reporting of output from HMMs has largely been restricted to the presentation of the most-probable (MAP) hidden state sequence, found via the Viterbi algorithm, or the sequence of most probable marginals using the forward-backward algorithm. In this article, we expand the amount of information we could obtain from the posterior distribution of an HMM by introducing linear-time dynamic programming recursions that, conditional on a user-specified constraint in the number of segments, allow us to (i) find MAP sequences, (ii) compute posterior probabilities, and (iii) simulate sample paths. We collectively call these recursions k-segment algorithms and illustrate their utility using simulated and real examples. We also highlight the prospective and retrospective use of k-segment constraints for fitting HMMs or exploring existing model fits. Supplementary materials for this article are available online.
Statistical inference methods for two crossing survival curves: a comparison of methods.
Li, Huimin; Han, Dong; Hou, Yawen; Chen, Huilin; Chen, Zheng
2015-01-01
A common problem that is encountered in medical applications is the overall homogeneity of survival distributions when two survival curves cross each other. A survey demonstrated that under this condition, which was an obvious violation of the assumption of proportional hazard rates, the log-rank test was still used in 70% of studies. Several statistical methods have been proposed to solve this problem. However, in many applications, it is difficult to specify the types of survival differences and choose an appropriate method prior to analysis. Thus, we conducted an extensive series of Monte Carlo simulations to investigate the power and type I error rate of these procedures under various patterns of crossing survival curves with different censoring rates and distribution parameters. Our objective was to evaluate the strengths and weaknesses of tests in different situations and for various censoring rates and to recommend an appropriate test that will not fail for a wide range of applications. Simulation studies demonstrated that adaptive Neyman's smooth tests and the two-stage procedure offer higher power and greater stability than other methods when the survival distributions cross at early, middle or late times. Even for proportional hazards, both methods maintain acceptable power compared with the log-rank test. In terms of the type I error rate, Renyi and Cramér-von Mises tests are relatively conservative, whereas the statistics of the Lin-Xu test exhibit apparent inflation as the censoring rate increases. Other tests produce results close to the nominal 0.05 level. In conclusion, adaptive Neyman's smooth tests and the two-stage procedure are found to be the most stable and feasible approaches for a variety of situations and censoring rates. Therefore, they are applicable to a wider spectrum of alternatives compared with other tests.
Ma, Yan; Zhang, Wei; Lyman, Stephen; Huang, Yihe
2017-05-04
To identify the most appropriate imputation method for missing data in the HCUP State Inpatient Databases (SID) and assess the impact of different missing data methods on racial disparities research. HCUP SID. A novel simulation study compared four imputation methods (random draw, hot deck, joint multiple imputation [MI], conditional MI) for missing values for multiple variables, including race, gender, admission source, median household income, and total charges. The simulation was built on real data from the SID to retain their hierarchical data structures and missing data patterns. Additional predictive information from the U.S. Census and American Hospital Association (AHA) database was incorporated into the imputation. Conditional MI prediction was equivalent or superior to the best performing alternatives for all missing data structures and substantially outperformed each of the alternatives in various scenarios. Conditional MI substantially improved statistical inferences for racial health disparities research with the SID. © Health Research and Educational Trust.
Statistically Based Inference of Physical Rock Properties of Main Rock Types in Germany
NASA Astrophysics Data System (ADS)
Koch, A.; Jorand, R.; Clauser, C.
2009-12-01
A major obstacle for an increased use of geothermal energy often lies in the high success risk for the development of geothermal reservoirs due to the unknown rock properties. In general, the ranges of thermal and hydraulic properties (thermal conductivity, volumetric heat capacity, porosity, permeability) in existing compilations of rock properties are too large to be useful to constrain properties for specific sites. Usually, conservative assumptions are made about these properties, resulting in greater drilling depth and increased exploration cost. In this study, data from direct measurements on more than 600 core samples from different borehole locations and depths enable to derive statistical moments of the desired properties for selected main rock types in the German subsurface. Using modern core scanning technology allowed measuring rapidly thermal conductivity, sonic velocity, and gamma density with high resolution on a large number of samples. In addition, we measured porosity, bulk density, and matrix density based on Archimedes’ principle and pycnometer analysis. Tests on a smaller collection of samples also include specific heat capacity, hydraulic permeability, and radiogenic heat production rate. In addition, we complemented the petrophysical measurements by quantitative mineralogical analysis. The results reveal that even for the same main rock type the results differ significantly depending on geologic age, origin, compaction, and mineralogical composition. For example, water saturated thermal conductivity of tight Palaeozoic sandstones from the Lower Rhine Embayment and the Ruhr Area is 4.0±0.7 W m-1 K-1 and 4.6±0.6 W m-1 K-1, respectively, which is nearly identical to values for the Lower Triassic Bunter sandstone in Southwest-Germany (high in quartz showing an average value of 4.3±0.4 W m-1 K-1). In contrast, saturated thermal conductivity of Upper Triassic sandstone in the same area is considerably lower at 2.5±0.1 W m-1 K-1 (Schilf
Statistical inference from multiple iTRAQ experiments without using common reference standards.
Herbrich, Shelley M; Cole, Robert N; West, Keith P; Schulze, Kerry; Yager, James D; Groopman, John D; Christian, Parul; Wu, Lee; O'Meally, Robert N; May, Damon H; McIntosh, Martin W; Ruczinski, Ingo
2013-02-01
Isobaric tags for relative and absolute quantitation (iTRAQ) is a prominent mass spectrometry technology for protein identification and quantification that is capable of analyzing multiple samples in a single experiment. Frequently, iTRAQ experiments are carried out using an aliquot from a pool of all samples, or "masterpool", in one of the channels as a reference sample standard to estimate protein relative abundances in the biological samples and to combine abundance estimates from multiple experiments. In this manuscript, we show that using a masterpool is counterproductive. We obtain more precise estimates of protein relative abundance by using the available biological data instead of the masterpool and do not need to occupy a channel that could otherwise be used for another biological sample. In addition, we introduce a simple statistical method to associate proteomic data from multiple iTRAQ experiments with a numeric response and show that this approach is more powerful than the conventionally employed masterpool-based approach. We illustrate our methods using data from four replicate iTRAQ experiments on aliquots of the same pool of plasma samples and from a 406-sample project designed to identify plasma proteins that covary with nutrient concentrations in chronically undernourished children from South Asia.
Statistical inference methods for recurrent event processes with shape and size parameters
WANG, MEI-CHENG; HUANG, CHIUNG-YU
2015-01-01
Summary This paper proposes a unified framework to characterize the rate function of a recurrent event process through shape and size parameters. In contrast to the intensity function, which is the event occurrence rate conditional on the event history, the rate function is the occurrence rate unconditional on the event history, and thus it can be interpreted as a population-averaged count of events in unit time. In this paper, shape and size parameters are introduced and used to characterize the association between the rate function λ(·) and a random variable X. Measures of association between X and λ(·) are defined via shape- and size-based coefficients. Rate-independence of X and λ(·) is studied through tests of shape-independence and size-independence, where the shape-and size-based test statistics can be used separately or in combination. These tests can be applied when X is a covariable possibly correlated with the recurrent event process through λ(·) or, in the one-sample setting, when X is the censoring time at which the observation of N(·) is terminated. The proposed tests are shape- and size-based, so when a null hypothesis is rejected, the test results can serve to distinguish the source of violation. PMID:26412863
Statistical inference methods for recurrent event processes with shape and size parameters.
Wang, Mei-Cheng; Huang, Chiung-Yu
2014-09-01
This paper proposes a unified framework to characterize the rate function of a recurrent event process through shape and size parameters. In contrast to the intensity function, which is the event occurrence rate conditional on the event history, the rate function is the occurrence rate unconditional on the event history, and thus it can be interpreted as a population-averaged count of events in unit time. In this paper, shape and size parameters are introduced and used to characterize the association between the rate function λ(·) and a random variable X. Measures of association between X and λ(·) are defined via shape- and size-based coefficients. Rate-independence of X and λ(·) is studied through tests of shape-independence and size-independence, where the shape-and size-based test statistics can be used separately or in combination. These tests can be applied when X is a covariable possibly correlated with the recurrent event process through λ(·) or, in the one-sample setting, when X is the censoring time at which the observation of N(·) is terminated. The proposed tests are shape- and size-based, so when a null hypothesis is rejected, the test results can serve to distinguish the source of violation.
Statistical Inference from Multiple iTRAQ Experiments without Using Common Reference Standards
Herbrich, Shelley M.; Cole, Robert N.; West, Keith P.; Schulze, Kerry; Yager, James D.; Groopman, John D.; Christian, Parul; Wu, Lee; O’Meally, Robert N.; May, Damon H.; McIntosh, Martin W.; Ruczinski, Ingo
2014-01-01
Isobaric tags for relative and absolute quantitation (iTRAQ) is a prominent mass spectrometry technology for protein identification and quantification that is capable of analyzing multiple samples in a single experiment. Frequently, iTRAQ experiments are carried out using an aliquot from a pool of all samples, or “masterpool”, in one of the channels as a reference sample standard to estimate protein relative abundances in the biological samples and to combine abundance estimates from multiple experiments. In this manuscript, we show that using a masterpool is counterproductive. We obtain more precise estimates of protein relative abundance by using the available biological data instead of the masterpool and do not need to occupy a channel that could otherwise be used for another biological sample. In addition, we introduce a simple statistical method to associate proteomic data from multiple iTRAQ experiments with a numeric response and show that this approach is more powerful than the conventionally employed masterpool-based approach. We illustrate our methods using data from four replicate iTRAQ experiments on aliquots of the same pool of plasma samples and from a 406-sample project designed to identify plasma proteins that covary with nutrient concentrations in chronically undernourished children from South Asia. PMID:23270375
NASA Astrophysics Data System (ADS)
Kononova, Olga; Jones, Lee; Barsegov, V.
2013-09-01
Cooperativity is a hallmark of proteins, many of which show a modular architecture comprising discrete structural domains. Detecting and describing dynamic couplings between structural regions is difficult in view of the many-body nature of protein-protein interactions. By utilizing the GPU-based computational acceleration, we carried out simulations of the protein forced unfolding for the dimer WW - WW of the all-β-sheet WW domains used as a model multidomain protein. We found that while the physically non-interacting identical protein domains (WW) show nearly symmetric mechanical properties at low tension, reflected, e.g., in the similarity of their distributions of unfolding times, these properties become distinctly different when tension is increased. Moreover, the uncorrelated unfolding transitions at a low pulling force become increasingly more correlated (dependent) at higher forces. Hence, the applied force not only breaks "the mechanical symmetry" but also couples the physically non-interacting protein domains forming a multi-domain protein. We call this effect "the topological coupling." We developed a new theory, inspired by order statistics, to characterize protein-protein interactions in multi-domain proteins. The method utilizes the squared-Gaussian model, but it can also be used in conjunction with other parametric models for the distribution of unfolding times. The formalism can be taken to the single-molecule experimental lab to probe mechanical cooperativity and domain communication in multi-domain proteins.
Menon, Ravishankar; Gerstoft, Peter; Hodgkiss, William S
2012-11-01
Cross-correlations of diffuse noise fields can be used to extract environmental information. The influence of directional sources (usually ships) often results in a bias of the travel time estimates obtained from the cross-correlations. Using an array of sensors, insights from random matrix theory on the behavior of the eigenvalues of the sample covariance matrix (SCM) in an isotropic noise field are used to isolate the diffuse noise component from the directional sources. A sequential hypothesis testing of the eigenvalues of the SCM reveals eigenvalues dominated by loud sources that are statistical outliers for the assumed diffuse noise model. Travel times obtained from cross-correlations using only the diffuse noise component (i.e., by discarding or attenuating the outliers) converge to the predicted travel times based on the known array sensor spacing and measured sound speed at the site and are stable temporally (i.e., unbiased estimates). Data from the Shallow Water 2006 experiment demonstrates the effectiveness of this approach and that the signal-to-noise ratio builds up as the square root of time, as predicted by theory.
Statistical analysis of EQ-5D profiles: does the use of value sets bias inference?
Parkin, David; Rice, Nigel; Devlin, Nancy
2010-01-01
Health state profile data, such as those provided by the EQ-5D, are widely collected in clinical trials, population surveys, and a growing range of other important health sector applications. However, these profile data are difficult to summarize to give an overall view of the health of a given population that can be analyzed for differences between groups or within groups over time. A common way of short cutting this problem is to transform profiles into a single number, or index, using sets of weights, often elicited from the general public in the form of values. Are there any problems with this procedure? In this article, the authors demonstrate the underlying effects of the use of value sets as a means of weighting profile data. They show that any set of weights introduces an exogenous source of variance to health profile data. These can distort findings about the significance of changes in health between groups or over time. No set of weights is neutral in its effect. If a summary of patient-reported outcomes is required, it may be better to use an instrument that yields this directly (such as the EQ VAS) along with the descriptive instrument. If this is not possible, researchers should have a clear rationale for their choice of weights and be aware that those weights may exert a nontrivial effect on their analysis. This article focuses on the EQ-5D, but the arguments and their implications for statistical analysis are relevant to all health state descriptive systems.
Moon, Inkyu; Javidi, Bahram
2009-03-15
We present a statistical approach to recognize three-dimensional (3D) objects with a small number of photons captured by using integral imaging (II). For 3D recognition of the events, the photon-limited elemental image set of a 3D object is obtained using the II technique. A computational geometrical ray propagation algorithm and the parametric maximum likelihood estimator are applied to the photon-limited elemental image set to reconstruct the irradiance of the original 3D scene voxels. The sampling distributions for the statistical parameters of the reconstructed image are determined. Finally, hypothesis testing for the equality of the statistical parameters between reference and input data sets is performed for statistical classification of populations on the basis of sampling distribution information. It is shown that large data sets of photon-limited 3D images can be converted into sampling distributions with their own statistical parameters, resulting in a substantial data dimensionality reduction for processing.
Shu-Jiang, Liu; Zhan-Ying, Chen; Yin-Zhong, Chang; Shi-Lian, Wang; Qi, Li; Yuan-Qing, Fan
2013-10-11
Multidimensional gas chromatography is widely applied to atmospheric xenon monitoring for the Comprehensive Nuclear-Test-Ban Treaty (CTBT). To improve the capability for xenon sampling from the atmosphere, sampling techniques have been investigated in detail. The sampling techniques are designed by xenon outflow curves which are influenced by many factors, and the injecting condition is one of the key factors that could influence the xenon outflow curves. In this paper, the xenon outflow curves of single-pulse injection in two-dimensional gas chromatography has been tested and fitted as a function of exponential modified Gaussian distribution. An inference formula of the xenon outflow curve for six-pulse injection is derived, and the inference formula is also tested to compare with its fitting formula of the xenon outflow curve. As a result, the curves of both the one-pulse and six-pulse injections obey the exponential modified Gaussian distribution when the temperature of the activated carbon column's temperature is 26°C and the flow rate of the carrier gas is 35.6mLmin(-1). The retention time of the xenon peak for one-pulse injection is 215min, and the peak width is 138min. For the six-pulse injection, however, the retention time is delayed to 255min, and the peak width broadens to 222min. According to the inferred formula of the xenon outflow curve for the six-pulse injection, the inferred retention time is 243min, the relative deviation of the retention time is 4.7%, and the inferred peak width is 225min, with a relative deviation of 1.3%.
Constrained statistical inference: sample-size tables for ANOVA and regression
Vanbrabant, Leonard; Van De Schoot, Rens; Rosseel, Yves
2015-01-01
Researchers in the social and behavioral sciences often have clear expectations about the order/direction of the parameters in their statistical model. For example, a researcher might expect that regression coefficient β1 is larger than β2 and β3. The corresponding hypothesis is H: β1 > {β2, β3} and this is known as an (order) constrained hypothesis. A major advantage of testing such a hypothesis is that power can be gained and inherently a smaller sample size is needed. This article discusses this gain in sample size reduction, when an increasing number of constraints is included into the hypothesis. The main goal is to present sample-size tables for constrained hypotheses. A sample-size table contains the necessary sample-size at a pre-specified power (say, 0.80) for an increasing number of constraints. To obtain sample-size tables, two Monte Carlo simulations were performed, one for ANOVA and one for multiple regression. Three results are salient. First, in an ANOVA the needed sample-size decreases with 30–50% when complete ordering of the parameters is taken into account. Second, small deviations from the imposed order have only a minor impact on the power. Third, at the maximum number of constraints, the linear regression results are comparable with the ANOVA results. However, in the case of fewer constraints, ordering the parameters (e.g., β1 > β2) results in a higher power than assigning a positive or a negative sign to the parameters (e.g., β1 > 0). PMID:25628587
Constrained statistical inference: sample-size tables for ANOVA and regression.
Vanbrabant, Leonard; Van De Schoot, Rens; Rosseel, Yves
2014-01-01
Researchers in the social and behavioral sciences often have clear expectations about the order/direction of the parameters in their statistical model. For example, a researcher might expect that regression coefficient β1 is larger than β2 and β3. The corresponding hypothesis is H: β1 > {β2, β3} and this is known as an (order) constrained hypothesis. A major advantage of testing such a hypothesis is that power can be gained and inherently a smaller sample size is needed. This article discusses this gain in sample size reduction, when an increasing number of constraints is included into the hypothesis. The main goal is to present sample-size tables for constrained hypotheses. A sample-size table contains the necessary sample-size at a pre-specified power (say, 0.80) for an increasing number of constraints. To obtain sample-size tables, two Monte Carlo simulations were performed, one for ANOVA and one for multiple regression. Three results are salient. First, in an ANOVA the needed sample-size decreases with 30-50% when complete ordering of the parameters is taken into account. Second, small deviations from the imposed order have only a minor impact on the power. Third, at the maximum number of constraints, the linear regression results are comparable with the ANOVA results. However, in the case of fewer constraints, ordering the parameters (e.g., β1 > β2) results in a higher power than assigning a positive or a negative sign to the parameters (e.g., β1 > 0).
Can we infer the effect of river works on streamflow statistics?
NASA Astrophysics Data System (ADS)
Ganora, Daniele
2016-04-01
Most of our river network system is affected by anthropic pressure of different types. While climate and land use change are widely recognized as important factors, the effects of "in-line" water infrastructures on the global behavior of the river system is often overlooked. This is due to the difficulty in including local "physical" knowledge (e.g., the hydraulic behavior of a river reach with levees during a flood) into large-scale models that provide a statistical description of the streamflow, and which are the basis for the implementation of resources/risk management plans (e.g., regional models for prediction of the flood frequency curve). This work presents some preliminary applications regarding two widely used hydrological signatures, the flow duration curve and the flood frequency curve. We adopt a pragmatic (i.e., reliable and implementable at large scales) and parsimonious (i.e., that requires a few data) framework of analysis, considering that we operate in a complex system (many river work are already existing, and many others could be built in the future). In the first case, a method is proposed to correct observations of streamflow affected by the presence of upstream run-of-the-river power plants in order to provide the "natural" flow duration curve, using only simple information about the plant (i.e., the maximum intake flow). The second case regards the effects of flood-protection works on the downstream sections, to support the application of along-stream cost-benefit analysis in the flood risk management context. Current applications and possible future developments are discussed.
Walker, Martin; Basáñez, María-Gloria; Ouédraogo, André Lin; Hermsen, Cornelus; Bousema, Teun; Churcher, Thomas S
2015-01-16
Quantitative molecular methods (QMMs) such as quantitative real-time polymerase chain reaction (q-PCR), reverse-transcriptase PCR (qRT-PCR) and quantitative nucleic acid sequence-based amplification (QT-NASBA) are increasingly used to estimate pathogen density in a variety of clinical and epidemiological contexts. These methods are often classified as semi-quantitative, yet estimates of reliability or sensitivity are seldom reported. Here, a statistical framework is developed for assessing the reliability (uncertainty) of pathogen densities estimated using QMMs and the associated diagnostic sensitivity. The method is illustrated with quantification of Plasmodium falciparum gametocytaemia by QT-NASBA. The reliability of pathogen (e.g. gametocyte) densities, and the accompanying diagnostic sensitivity, estimated by two contrasting statistical calibration techniques, are compared; a traditional method and a mixed model Bayesian approach. The latter accounts for statistical dependence of QMM assays run under identical laboratory protocols and permits structural modelling of experimental measurements, allowing precision to vary with pathogen density. Traditional calibration cannot account for inter-assay variability arising from imperfect QMMs and generates estimates of pathogen density that have poor reliability, are variable among assays and inaccurately reflect diagnostic sensitivity. The Bayesian mixed model approach assimilates information from replica QMM assays, improving reliability and inter-assay homogeneity, providing an accurate appraisal of quantitative and diagnostic performance. Bayesian mixed model statistical calibration supersedes traditional techniques in the context of QMM-derived estimates of pathogen density, offering the potential to improve substantially the depth and quality of clinical and epidemiological inference for a wide variety of pathogens.
Two-Dimensional Hermite Filters Simplify the Description of High-Order Statistics of Natural Images.
Hu, Qin; Victor, Jonathan D
2016-09-01
Natural image statistics play a crucial role in shaping biological visual systems, understanding their function and design principles, and designing effective computer-vision algorithms. High-order statistics are critical for conveying local features, but they are challenging to study - largely because their number and variety is large. Here, via the use of two-dimensional Hermite (TDH) functions, we identify a covert symmetry in high-order statistics of natural images that simplifies this task. This emerges from the structure of TDH functions, which are an orthogonal set of functions that are organized into a hierarchy of ranks. Specifically, we find that the shape (skewness and kurtosis) of the distribution of filter coefficients depends only on the projection of the function onto a 1-dimensional subspace specific to each rank. The characterization of natural image statistics provided by TDH filter coefficients reflects both their phase and amplitude structure, and we suggest an intuitive interpretation for the special subspace within each rank.
Hawe, David; Hernández Fernández, Francisco R; O'Suilleabháin, Liam; Huang, Jian; Wolsztynski, Eric; O'Sullivan, Finbarr
2012-05-01
In dynamic mode, positron emission tomography (PET) can be used to track the evolution of injected radio-labelled molecules in living tissue. This is a powerful diagnostic imaging technique that provides a unique opportunity to probe the status of healthy and pathological tissue by examining how it processes substrates. The spatial aspect of PET is well established in the computational statistics literature. This article focuses on its temporal aspect. The interpretation of PET time-course data is complicated because the measured signal is a combination of vascular delivery and tissue retention effects. If the arterial time-course is known, the tissue time-course can typically be expressed in terms of a linear convolution between the arterial time-course and the tissue residue. In statistical terms, the residue function is essentially a survival function - a familiar life-time data construct. Kinetic analysis of PET data is concerned with estimation of the residue and associated functionals such as flow, flux, volume of distribution and transit time summaries. This review emphasises a nonparametric approach to the estimation of the residue based on a piecewise linear form. Rapid implementation of this by quadratic programming is described. The approach provides a reference for statistical assessment of widely used one- and two-compartmental model forms. We illustrate the method with data from two of the most well-established PET radiotracers, (15)O-H(2)O and (18)F-fluorodeoxyglucose, used for assessment of blood perfusion and glucose metabolism respectively. The presentation illustrates the use of two open-source tools, AMIDE and R, for PET scan manipulation and model inference.
Grodwohl, Jean-Baptiste
2016-08-01
This paper gives a detailed narrative of a controversial empirical research in postwar population genetics, the analysis of the cytological polymorphisms of an Australian grasshopper, Moraba scurra. This research intertwined key technical developments in three research areas during the 1950s and 1960s: it involved Dobzhansky's empirical research program on cytological polymorphisms, the mathematical theory of natural selection in two-locus systems, and the building of reliable estimates of natural selection in the wild. In the mid-1950s the cytologist Michael White discovered an interesting case of epistasis in populations of Moraba scurra. These observations received a wide diffusion when theoretical population geneticist Richard Lewontin represented White's data on adaptive topographies. These topographies connected the information on the genetic structure of these grasshopper populations with the formal framework of theoretical population genetics. As such, they appeared at the time as the most successful application of two-locus models of natural selection to an empirical study system. However, this connection generated paradoxical results: in the landscapes, all grasshopper populations were located on a ridge (an unstable equilibrium) while they were expected to reach a peak. This puzzling result fueled years of research and triggered a controversy attracting contributors from Australia, the United States and the United Kingdom. While the original problem seemed, at first, purely empirical, the subsequent controversy affected the main mathematical tools used in the study of two-gene systems under natural selection. Adaptive topographies and their underlying mathematical structure, Wright's mean fitness equations, were submitted to close scrutiny. Suspicion eventually shifted to the statistical machinery used in data analysis, reflecting the crucial role of statistical inference in applied population genetics. In the 1950s and 1960s, population geneticists were
A three-dimensional statistical mechanical model of folding double-stranded chain molecules
NASA Astrophysics Data System (ADS)
Zhang, Wenbing; Chen, Shi-Jie
2001-05-01
Based on a graphical representation of intrachain contacts, we have developed a new three-dimensional model for the statistical mechanics of double-stranded chain molecules. The theory has been tested and validated for the cubic lattice chain conformations. The statistical mechanical model can be applied to the equilibrium folding thermodynamics of a large class of chain molecules, including protein β-hairpin conformations and RNA secondary structures. The application of a previously developed two-dimensional model to RNA secondary structure folding thermodynamics generally overestimates the breadth of the melting curves [S-J. Chen and K. A. Dill, Proc. Natl. Acad. Sci. U.S.A. 97, 646 (2000)], suggesting an underestimation for the sharpness of the conformational transitions. In this work, we show that the new three-dimensional model gives much sharper melting curves than the two-dimensional model. We believe that the new three-dimensional model may give much improved predictions for the thermodynamic properties of RNA conformational changes than the previous two-dimensional model.
NASA Astrophysics Data System (ADS)
von Nessi, G. T.; Hole, M. J.; The MAST Team
2014-11-01
We present recent results and technical breakthroughs for the Bayesian inference of tokamak equilibria using force-balance as a prior constraint. Issues surrounding model parameter representation and posterior analysis are discussed and addressed. These points motivate the recent advancements embodied in the Bayesian Equilibrium Analysis and Simulation Tool (BEAST) software being presently utilized to study equilibria on the Mega-Ampere Spherical Tokamak (MAST) experiment in the UK (von Nessi et al 2012 J. Phys. A 46 185501). State-of-the-art results of using BEAST to study MAST equilibria are reviewed, with recent code advancements being systematically presented though out the manuscript.
Schimek, Michael G; Budinská, Eva; Kugler, Karl G; Švendová, Vendula; Ding, Jie; Lin, Shili
2015-06-01
High-throughput sequencing techniques are increasingly affordable and produce massive amounts of data. Together with other high-throughput technologies, such as microarrays, there are an enormous amount of resources in databases. The collection of these valuable data has been routine for more than a decade. Despite different technologies, many experiments share the same goal. For instance, the aims of RNA-seq studies often coincide with those of differential gene expression experiments based on microarrays. As such, it would be logical to utilize all available data. However, there is a lack of biostatistical tools for the integration of results obtained from different technologies. Although diverse technological platforms produce different raw data, one commonality for experiments with the same goal is that all the outcomes can be transformed into a platform-independent data format - rankings - for the same set of items. Here we present the R package TopKLists, which allows for statistical inference on the lengths of informative (top-k) partial lists, for stochastic aggregation of full or partial lists, and for graphical exploration of the input and consolidated output. A graphical user interface has also been implemented for providing access to the underlying algorithms. To illustrate the applicability and usefulness of the package, we integrated microRNA data of non-small cell lung cancer across different measurement techniques and draw conclusions. The package can be obtained from CRAN under a LGPL-3 license.
Braiding statistics and classification of two-dimensional charge-2 m superconductors
NASA Astrophysics Data System (ADS)
Wang, Chenjie
2016-08-01
We study braiding statistics between quasiparticles and vortices in two-dimensional charge-2 m (in units of e ) superconductors that are coupled to a Z2 m dynamical gauge field, where m is any positive integer. We show that there exist 16 m types of braiding statistics when m is odd, but only 4 m types when m is even. Based on the braiding statistics, we obtain a classification of topological phases of charge-2 m superconductors—or formally speaking, a classification of symmetry-protected topological phases, as well as invertible topological phases, of two-dimensional gapped fermions with Z2m f symmetry. Interestingly, we find that there is no nontrivial fermionic symmetry-protected topological phase with Z4f symmetry.
NASA Astrophysics Data System (ADS)
Ye, Fei; Marchetti, P. A.; Su, Z. B.; Yu, L.
2017-09-01
The relation between braid and exclusion statistics is examined in one-dimensional systems, within the framework of Chern–Simons statistical transmutation in gauge invariant form with an appropriate dimensional reduction. If the matter action is anomalous, as for chiral fermions, a relation between braid and exclusion statistics can be established explicitly for both mutual and nonmutual cases. However, if it is not anomalous, the exclusion statistics of emergent low energy excitations is not necessarily connected to the braid statistics of the physical charged fields of the system. Finally, we also discuss the bosonization of one-dimensional anyonic systems through T-duality. Dedicated to the memory of Mario Tonin.
NASA Astrophysics Data System (ADS)
Pandarinath, Kailasa
2014-12-01
Several new multi-dimensional tectonomagmatic discrimination diagrams employing log-ratio variables of chemical elements and probability based procedure have been developed during the last 10 years for basic-ultrabasic, intermediate and acid igneous rocks. There are numerous studies on extensive evaluations of these newly developed diagrams which have indicated their successful application to know the original tectonic setting of younger and older as well as sea-water and hydrothermally altered volcanic rocks. In the present study, these diagrams were applied to Precambrian rocks of Mexico (southern and north-eastern) and Argentina. The study indicated the original tectonic setting of Precambrian rocks from the Oaxaca Complex of southern Mexico as follows: (1) dominant rift (within-plate) setting for rocks of 1117-988 Ma age; (2) dominant rift and less-dominant arc setting for rocks of 1157-1130 Ma age; and (3) a combined tectonic setting of collision and rift for Etla Granitoid Pluton (917 Ma age). The diagrams have indicated the original tectonic setting of the Precambrian rocks from the north-eastern Mexico as: (1) a dominant arc tectonic setting for the rocks of 988 Ma age; and (2) an arc and collision setting for the rocks of 1200-1157 Ma age. Similarly, the diagrams have indicated the dominant original tectonic setting for the Precambrian rocks from Argentina as: (1) with-in plate (continental rift-ocean island) and continental rift (CR) setting for the rocks of 800 Ma and 845 Ma age, respectively; and (2) an arc setting for the rocks of 1174-1169 Ma and of 1212-1188 Ma age. The inferred tectonic setting for these Precambrian rocks are, in general, in accordance to the tectonic setting reported in the literature, though there are some inconsistence inference of tectonic settings by some of the diagrams. The present study confirms the importance of these newly developed discriminant-function based diagrams in inferring the original tectonic setting of
Kimura, S; Araki, D; Matsumura, K; Okada-Hatakeyama, M
2012-02-01
Voit and Almeida have proposed the decoupling approach as a method for inferring the S-system models of genetic networks. The decoupling approach defines the inference of a genetic network as a problem requiring the solutions of sets of algebraic equations. The computation can be accomplished in a very short time, as the approach estimates S-system parameters without solving any of the differential equations. Yet the defined algebraic equations are non-linear, which sometimes prevents us from finding reasonable S-system parameters. In this study, we propose a new technique to overcome this drawback of the decoupling approach. This technique transforms the problem of solving each set of algebraic equations into a one-dimensional function optimization problem. The computation can still be accomplished in a relatively short time, as the problem is transformed by solving a linear programming problem. We confirm the effectiveness of the proposed approach through numerical experiments. Copyright © 2011 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Rezaei Kh., S.; Bailer-Jones, C. A. L.; Hanson, R. J.; Fouesneau, M.
2017-02-01
We present a non-parametric model for inferring the three-dimensional (3D) distribution of dust density in the Milky Way. Our approach uses the extinction measured towards stars at different locations in the Galaxy at approximately known distances. Each extinction measurement is proportional to the integrated dust density along its line of sight (LoS). Making simple assumptions about the spatial correlation of the dust density, we can infer the most probable 3D distribution of dust across the entire observed region, including along sight lines which were not observed. This is possible because our model employs a Gaussian process to connect all LoS. We demonstrate the capability of our model to capture detailed dust density variations using mock data and simulated data from the Gaia Universe Model Snapshot. We then apply our method to a sample of giant stars observed by APOGEE and Kepler to construct a 3D dust map over a small region of the Galaxy. Owing to our smoothness constraint and its isotropy, we provide one of the first maps which does not show the "fingers of God" effect.
NASA Astrophysics Data System (ADS)
Norris, P. M.; da Silva, A. M., Jr.
2016-12-01
Norris and da Silva recently published a method to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation (CDA). The gridcolumn model includes assumed-PDF intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used are MODIS cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. In the example provided, the method is able to restore marine stratocumulus near the Californian coast where the background state has a clear swath. The new approach not only significantly reduces mean and standard deviation biases with respect to the assimilated observables, but also improves the simulated rotational-Ramman scattering cloud optical centroid pressure against independent (non-assimilated) retrievals from the OMI instrument. One obvious difficulty for the method, and other CDA methods, is the lack of information content in passive cloud observables on cloud vertical structure, beyond cloud-top and thickness, thus necessitating strong dependence on the background vertical moisture structure. It is found that a simple flow-dependent correlation modification due to Riishojgaard is helpful, better honoring inversion structures in the background state.
Anomalous wave function statistics on a one-dimensional lattice with power-law disorder.
Titov, M; Schomerus, H
2003-10-24
Within a general framework, we discuss the wave function statistics in the Lloyd model of Anderson localization on a one-dimensional lattice with a Cauchy distribution for random on-site potential. We demonstrate that already in leading order in the disorder strength, there exists a hierarchy of anomalies in the probability distributions of the wave function, the conductance, and the local density of states, for every energy which corresponds to a rational ratio of wavelength to lattice constant. Power-law rather than log-normal tails dominate the short-distance wave-function statistics.
Kelleher, B; Goulding, D; Huyet, G; Viktorov, E A; Erneux, T; Hegarty, S P
2011-08-01
Noise-induced excitability is a prevalent feature in many nonlinear dynamical systems. The optically injected semiconductor laser is one of the simplest such systems and is readily amenable to both experimental and theoretical analysis. We show that the dimensionality of this system may be tuned experimentally and that this has a strong signature on the interspike statistics. The phase of the slave laser is resolved experimentally in the frame of the master laser, allowing an examination of the dynamics at extremely low injection strengths where intensity measurements alone cannot determine the dynamics fully. Generic phase equations are found for the different dimensional scenarios. When the dimensionality is greater than 1, we show that a precursor of a homoclinic bifurcation generates a noise-induced frequency and that the homoclinic bifurcation admits a bistability in the system.
A statistical theory of sound radiation from a two-dimensional lined duct
NASA Technical Reports Server (NTRS)
Cho, Y. C.; Watson, W. R.
1979-01-01
A statistical theory coupled with a finite element theory is employed for investigation of sound radiation from a two-dimensional lined duct. The analysis does not utilize duct modes, and can be applied to a non-uniform duct with variable wall liner properties. Numerical results are presented for various shapes of the incident wave. The results are in good agreement with the Wiener-Hopf calculation for cases where the latter can be made.
Schwermann, Achim H; dos Santos Rolo, Tomy; Caterino, Michael S; Bechly, Günter; Schmied, Heiko; Baumbach, Tilo; van de Kamp, Thomas
2016-01-01
External and internal morphological characters of extant and fossil organisms are crucial to establishing their systematic position, ecological role and evolutionary trends. The lack of internal characters and soft-tissue preservation in many arthropod fossils, however, impedes comprehensive phylogenetic analyses and species descriptions according to taxonomic standards for Recent organisms. We found well-preserved three-dimensional anatomy in mineralized arthropods from Paleogene fissure fillings and demonstrate the value of these fossils by utilizing digitally reconstructed anatomical structure of a hister beetle. The new anatomical data facilitate a refinement of the species diagnosis and allowed us to reject a previous hypothesis of close phylogenetic relationship to an extant congeneric species. Our findings suggest that mineralized fossils, even those of macroscopically poor preservation, constitute a rich but yet largely unexploited source of anatomical data for fossil arthropods. DOI: http://dx.doi.org/10.7554/eLife.12129.001 PMID:26854367
NASA Astrophysics Data System (ADS)
Hata, Maki; Takakura, Shinichi; Matsushima, Nobuo; Hashimoto, Takeshi; Utsugi, Mitsuru
2016-10-01
At Naka-dake cone, Aso caldera, Japan, volcanic activity is raised cyclically, an example of which was a phreatomagmatic eruption in September 2015. Using a three-dimensional model of electrical resistivity, we identify a magma pathway from a series of northward dipping conductive anomalies in the upper crust beneath the caldera. Our resistivity model was created from magnetotelluric measurements conducted in November-December 2015; thus, it provides the latest information about magma reservoir geometry beneath the caldera. The center of the conductive anomalies shifts from the north of Naka-dake at depths >10 km toward Naka-dake, along with a decrease in anomaly depths. The melt fraction is estimated at 13-15% at 2 km depth. Moreover, these anomalies are spatially correlated with the locations of earthquake clusters, which are distributed within resistive blocks on the conductive anomalies in the northwest of Naka-dake but distributed at the resistive sides of resistivity boundaries in the northeast.
Schwermann, Achim H; Dos Santos Rolo, Tomy; Caterino, Michael S; Bechly, Günter; Schmied, Heiko; Baumbach, Tilo; van de Kamp, Thomas
2016-02-05
External and internal morphological characters of extant and fossil organisms are crucial to establishing their systematic position, ecological role and evolutionary trends. The lack of internal characters and soft-tissue preservation in many arthropod fossils, however, impedes comprehensive phylogenetic analyses and species descriptions according to taxonomic standards for Recent organisms. We found well-preserved three-dimensional anatomy in mineralized arthropods from Paleogene fissure fillings and demonstrate the value of these fossils by utilizing digitally reconstructed anatomical structure of a hister beetle. The new anatomical data facilitate a refinement of the species diagnosis and allowed us to reject a previous hypothesis of close phylogenetic relationship to an extant congeneric species. Our findings suggest that mineralized fossils, even those of macroscopically poor preservation, constitute a rich but yet largely unexploited source of anatomical data for fossil arthropods.
Two-Dimensional Hermite Filters Simplify the Description of High-Order Statistics of Natural Images
Hu, Qin; Victor, Jonathan D.
2016-01-01
Natural image statistics play a crucial role in shaping biological visual systems, understanding their function and design principles, and designing effective computer-vision algorithms. High-order statistics are critical for conveying local features, but they are challenging to study – largely because their number and variety is large. Here, via the use of two-dimensional Hermite (TDH) functions, we identify a covert symmetry in high-order statistics of natural images that simplifies this task. This emerges from the structure of TDH functions, which are an orthogonal set of functions that are organized into a hierarchy of ranks. Specifically, we find that the shape (skewness and kurtosis) of the distribution of filter coefficients depends only on the projection of the function onto a 1-dimensional subspace specific to each rank. The characterization of natural image statistics provided by TDH filter coefficients reflects both their phase and amplitude structure, and we suggest an intuitive interpretation for the special subspace within each rank. PMID:27713838
Crossett, Ben; Edwards, Alistair V G; White, Melanie Y; Cordwell, Stuart J
2008-01-01
Standardized methods for the solubilization of proteins prior to proteomics analyses incorporating two-dimensional gel electrophoresis (2-DE) are essential for providing reproducible data that can be subjected to rigorous statistical interrogation for comparative studies investigating disease-genesis. In this chapter, we discuss the imaging and image analysis of proteins separated by 2-DE, in the context of determining protein abundance alterations related to a change in biochemical or biophysical conditions. We then describe the principles behind 2-DE gel statistical analysis, including subtraction of background noise, spot detection, gel matching, spot quantitation for data comparison, and statistical requirements to create meaningful gel data sets. We also emphasize the need to develop reproducible and robust protocols for protein sample preparation and 2-DE itself.
de Matos Simoes, Ricardo; Emmert-Streib, Frank
2011-01-01
The inference of gene regulatory networks from gene expression data is a difficult problem because the performance of the inference algorithms depends on a multitude of different factors. In this paper we study two of these. First, we investigate the influence of discrete mutual information (MI) estimators on the global and local network inference performance of the C3NET algorithm. More precisely, we study 4 different MI estimators (Empirical, Miller-Madow, Shrink and Schürmann-Grassberger) in combination with 3 discretization methods (equal frequency, equal width and global equal width discretization). We observe the best global and local inference performance of C3NET for the Miller-Madow estimator with an equal width discretization. Second, our numerical analysis can be considered as a systems approach because we simulate gene expression data from an underlying gene regulatory network, instead of making a distributional assumption to sample thereof. We demonstrate that despite the popularity of the latter approach, which is the traditional way of studying MI estimators, this is in fact not supported by simulated and biological expression data because of their heterogeneity. Hence, our study provides guidance for an efficient design of a simulation study in the context of network inference, supporting a systems approach.
Austin, Peter C
2011-05-20
Propensity-score matching allows one to reduce the effects of treatment-selection bias or confounding when estimating the effects of treatments when using observational data. Some authors have suggested that methods of inference appropriate for independent samples can be used for assessing the statistical significance of treatment effects when using propensity-score matching. Indeed, many authors in the applied medical literature use methods for independent samples when making inferences about treatment effects using propensity-score matched samples. Dichotomous outcomes are common in healthcare research. In this study, we used Monte Carlo simulations to examine the effect on inferences about risk differences (or absolute risk reductions) when statistical methods for independent samples are used compared with when statistical methods for paired samples are used in propensity-score matched samples. We found that compared with using methods for independent samples, the use of methods for paired samples resulted in: (i) empirical type I error rates that were closer to the advertised rate; (ii) empirical coverage rates of 95 per cent confidence intervals that were closer to the advertised rate; (iii) narrower 95 per cent confidence intervals; and (iv) estimated standard errors that more closely reflected the sampling variability of the estimated risk difference. Differences between the empirical and advertised performance of methods for independent samples were greater when the treatment-selection process was stronger compared with when treatment-selection process was weaker. We recommend using statistical methods for paired samples when using propensity-score matched samples for making inferences on the effect of treatment on the reduction in the probability of an event occurring.
Carlsen, Michelle; Fu, Guifang; Bushman, Shaun; Corcoran, Christopher
2016-02-01
Genome-wide data with millions of single-nucleotide polymorphisms (SNPs) can be highly correlated due to linkage disequilibrium (LD). The ultrahigh dimensionality of big data brings unprecedented challenges to statistical modeling such as noise accumulation, the curse of dimensionality, computational burden, spurious correlations, and a processing and storing bottleneck. The traditional statistical approaches lose their power due to [Formula: see text] (n is the number of observations and p is the number of SNPs) and the complex correlation structure among SNPs. In this article, we propose an integrated distance correlation ridge regression (DCRR) approach to accommodate the ultrahigh dimensionality, joint polygenic effects of multiple loci, and the complex LD structures. Initially, a distance correlation (DC) screening approach is used to extensively remove noise, after which LD structure is addressed using a ridge penalized multiple logistic regression (LRR) model. The false discovery rate, true positive discovery rate, and computational cost were simultaneously assessed through a large number of simulations. A binary trait of Arabidopsis thaliana, the hypersensitive response to the bacterial elicitor AvrRpm1, was analyzed in 84 inbred lines (28 susceptibilities and 56 resistances) with 216,130 SNPs. Compared to previous SNP discovery methods implemented on the same data set, the DCRR approach successfully detected the causative SNP while dramatically reducing spurious associations and computational time. Copyright © 2016 by the Genetics Society of America.
Impact of data resolution on three-dimensional structure inference methods.
Park, Jincheol; Lin, Shili
2016-02-06
Assays that are capable of detecting genome-wide chromatin interactions have produced massive amount of data and led to great understanding of the chromosomal three-dimensional (3D) structure. As technology becomes more sophisticated, higher-and-higher resolution data are being produced, going from the initial 1 Megabases (Mb) resolution to the current 10 Kilobases (Kb) or even 1 Kb resolution. The availability of genome-wide interaction data necessitates development of analytical methods to recover the underlying 3D spatial chromatin structure, but challenges abound. Most of the methods were proposed for analyzing data at low resolution (1 Mb). Their behaviors are thus unknown for higher resolution data. For such data, one of the key features is the high proportion of "0" contact counts among all available data, in other words, the excess of zeros. To address the issue of excess of zeros, in this paper, we propose a truncated Random effect EXpression (tREX) method that can handle data at various resolutions. We then assess the performance of tREX and a number of leading existing methods for recovering the underlying chromatin 3D structure. This was accomplished by creating in-silico data to mimic multiple levels of resolution and submit the methods to a "stress test". Finally, we applied tREX and the comparison methods to a Hi-C dataset for which FISH measurements are available to evaluate estimation accuracy. The proposed tREX method achieves consistently good performance in all 30 simulated settings considered. It is not only robust to resolution level and underlying parameters, but also insensitive to model misspecification. This conclusion is based on observations made in terms of 3D structure estimation accuracy and preservation of topologically associated domains. Application of the methods to the human lymphoblastoid cell line data on chromosomes 14 and 22 further substantiates the superior performance of tREX: the constructed 3D structure from tREX is
Artigaud, Sébastien; Gauthier, Olivier; Pichereau, Vianney
2013-11-01
Two-dimensional electrophoresis is a crucial method in proteomics that allows the characterization of proteins' function and expression. This usually implies the identification of proteins that are differentially expressed between two contrasting conditions, for example, healthy versus diseased in human proteomics biomarker discovery and stressful conditions versus control in animal experimentation. The statistical procedures that lead to such identifications are critical steps in the 2-DE analysis workflow. They include a normalization step and a test and probability correction for multiple testing. Statistical issues caused by the high dimensionality of the data and large-scale multiple testing have been a more active topic in transcriptomics than proteomics, especially in microarray analysis. We thus propose to adapt innovative statistical tools developed for microarray analysis and incorporate them in the 2-DE analysis pipeline. In this article, we evaluate the performance of different normalization procedures, different statistical tests and false discovery rate calculation methods with both real and simulated datasets. We demonstrate that the use of statistical procedures adapted from microarrays lead to notable increase in power as well as a minimization of false-positive discovery rate. More specifically, we obtained the best results in terms of reliability and sensibility when using the 'moderate t-test' from Smyth in association with classic false discovery rate from Benjamini and Hochberg. The methods discussed are freely available in the 'prot2D' open source R-package from Bioconductor (http://www.bioconductor.org//) under the terms of the GNU General Public License (version 2 or later). sebastien.artigaud@univ-brest.fr or sebastien.artigaud@gmx.com.
Stevens, Katherine; McCabe, Christopher; Brazier, John; Roberts, Jennifer
2007-09-01
A key issue in health state valuation modelling is the choice of functional form. The two most frequently used preference based instruments adopt different approaches; one based on multi-attribute utility theory (MAUT), the other on statistical analysis. There has been no comparison of these alternative approaches in the context of health economics. We report a comparison of these approaches for the health utilities index mark 2. The statistical inference model predicts more accurately than the one based on MAUT. We discuss possible explanations for the differences in performance, the importance of the findings, and implications for future research.
Interoccurrence time statistics in the two-dimensional Burridge-Knopoff earthquake model
Hasumi, Tomohiro
2007-08-15
We have numerically investigated statistical properties of the so-called interoccurrence time or the waiting time, i.e., the time interval between successive earthquakes, based on the two-dimensional (2D) spring-block (Burridge-Knopoff) model, selecting the velocity-weakening property as the constitutive friction law. The statistical properties of frequency distribution and the cumulative distribution of the interoccurrence time are discussed by tuning the dynamical parameters, namely, a stiffness and frictional property of a fault. We optimize these model parameters to reproduce the interoccurrence time statistics in nature; the frequency and cumulative distribution can be described by the power law and Zipf-Mandelbrot type power law, respectively. In an optimal case, the b value of the Gutenberg-Richter law and the ratio of wave propagation velocity are in agreement with those derived from real earthquakes. As the threshold of magnitude is increased, the interoccurrence time distribution tends to follow an exponential distribution. Hence it is suggested that a temporal sequence of earthquakes, aside from small-magnitude events, is a Poisson process, which is observed in nature. We found that the interoccurrence time statistics derived from the 2D BK (original) model can efficiently reproduce that of real earthquakes, so that the model can be recognized as a realistic one in view of interoccurrence time statistics.
Predicting adsorption isotherms using a two-dimensional statistical associating fluid theory
NASA Astrophysics Data System (ADS)
Martinez, Alejandro; Castro, Martin; McCabe, Clare; Gil-Villegas, Alejandro
2007-02-01
A molecular thermodynamics approach is developed in order to describe the adsorption of fluids on solid surfaces. The new theory is based on the statistical associating fluid theory for potentials of variable range [A. Gil-Villegas et al., J. Chem. Phys. 106, 4168 (1997)] and uses a quasi-two-dimensional approximation to describe the properties of adsorbed fluids. The theory is tested against Gibbs ensemble Monte Carlo simulations and excellent agreement with the theoretical predictions is achieved. Additionally the authors use the new approach to describe the adsorption isotherms for nitrogen and methane on dry activated carbon.
Predicting adsorption isotherms using a two-dimensional statistical associating fluid theory.
Martinez, Alejandro; Castro, Martin; McCabe, Clare; Gil-Villegas, Alejandro
2007-02-21
A molecular thermodynamics approach is developed in order to describe the adsorption of fluids on solid surfaces. The new theory is based on the statistical associating fluid theory for potentials of variable range [A. Gil-Villegas et al., J. Chem. Phys. 106, 4168 (1997)] and uses a quasi-two-dimensional approximation to describe the properties of adsorbed fluids. The theory is tested against Gibbs ensemble Monte Carlo simulations and excellent agreement with the theoretical predictions is achieved. Additionally the authors use the new approach to describe the adsorption isotherms for nitrogen and methane on dry activated carbon.
Okamoto, Takashi; Fujita, Shuhei
2008-12-01
The statistical properties of three-dimensional normal and fractal speckle fields produced by two or three scattered waves crossed orthogonally are studied theoretically. The probability density function and the autocorrelation function of intensity are derived for speckle fields superposed with and without interference. It is shown that the spatial anisotropy of intensity distributions exists even when three scattered waves interfere with one another. This spatial anisotropy affects the power-law distribution of intensity correlation for fractal speckles and leads to intensity patterns that are not self-similar in two or three dimensions. A potential application of the superposed speckle field is proposed.
Harari, Gil
2014-01-01
Statistic significance, also known as p-value, and CI (Confidence Interval) are common statistics measures and are essential for the statistical analysis of studies in medicine and life sciences. These measures provide complementary information about the statistical probability and conclusions regarding the clinical significance of study findings. This article is intended to describe the methodologies, compare between the methods, assert their suitability for the different needs of study results analysis and to explain situations in which each method should be used.
Three Dimensional Measurements of Boundary Layer Statistics using Scanning Doppler Lidar
NASA Astrophysics Data System (ADS)
Frehlich, R.
2008-12-01
Accurate measurements and modeling of the boundary layer is challenging, especially for the stable night time boundary layer, the highly turbulent boundary layer, and the early morning transition to convection. High quality profiles of mean and turbulent statistics of the night time boundary layer are logistically difficult using instrumented towers or instrumented research aircraft. One of the fundamental limits to the accuracy of atmospheric estimates of mean and turbulent quantities is the number of independent samples of the relevant processes. Traditional measurements from towers, sodars, radar profilers, and instrumented aircraft essentially produce a spatial sample of the atmosphere along a line defined by the mean wind (or aircraft trajectory). Advanced three dimensional measurements of the boundary layer provides the highest statistical accuracy which is essential to understand complex rapidly changing processes. The development of eye-safe scanning Doppler lidars and processing algorithms to correct for the spatial filtering by the laser pulse smoothing and the contribution from estimation error have produced profiles of the mean velocity and key turbulence statistics (the energy dissipation rate, velocity variance, and turbulence length scale) for two orthogonal horizontal velocity components. This requires accurate information about the sensing volume of the lidar measurements as well as the statistical properties of the estimation error. The various processing techniques and fundamental assumptions for the analysis of scanning Doppler lidar data will be presented for various atmospheric conditions. Unresolved issues for future work will also be outlined.
Halpin, Peter F; Stam, Henderikus J
2006-01-01
The application of statistical testing in psychological research over the period of 1940-1960 is examined in order to address psychologists' reconciliation of the extant controversy between the Fisher and Neyman-Pearson approaches. Textbooks of psychological statistics and the psychological journal literature are reviewed to examine the presence of what Gigerenzer (1993) called a hybrid model of statistical testing. Such a model is present in the textbooks, although the mathematically incomplete character of this model precludes the appearance of a similarly hybridized approach to statistical testing in the research literature. The implications of this hybrid model for psychological research and the statistical testing controversy are discussed.
Yang, Yuqing; Chen, Ning; Chen, Ting
2017-01-25
The inference of associations between environmental factors and microbes and among microbes is critical to interpreting metagenomic data, but compositional bias, indirect associations resulting from common factors, and variance within metagenomic sequencing data limit the discovery of associations. To account for these problems, we propose metagenomic Lognormal-Dirichlet-Multinomial (mLDM), a hierarchical Bayesian model with sparsity constraints, to estimate absolute microbial abundance and simultaneously infer both conditionally dependent associations among microbes and direct associations between microbes and environmental factors. We empirically show the effectiveness of the mLDM model using synthetic data, data from the TARA Oceans project, and a colorectal cancer dataset. Finally, we apply mLDM to 16S sequencing data from the western English Channel and report several associations. Our model can be used on both natural environmental and human metagenomic datasets, promoting the understanding of associations in the microbial community.
Statistical Projections for Multi-resolution, Multi-dimensional Visual Data Exploration and Analysis
Hoa T. Nguyen; Stone, Daithi; E. Wes Bethel
2016-01-01
An ongoing challenge in visual exploration and analysis of large, multi-dimensional datasets is how to present useful, concise information to a user for some specific visualization tasks. Typical approaches to this problem have proposed either reduced-resolution versions of data, or projections of data, or both. These approaches still have some limitations such as consuming high computation or suffering from errors. In this work, we explore the use of a statistical metric as the basis for both projections and reduced-resolution versions of data, with a particular focus on preserving one key trait in data, namely variation. We use two different case studies to explore this idea, one that uses a synthetic dataset, and another that uses a large ensemble collection produced by an atmospheric modeling code to study long-term changes in global precipitation. The primary findings of our work are that in terms of preserving the variation signal inherent in data, that using a statistical measure more faithfully preserves this key characteristic across both multi-dimensional projections and multi-resolution representations than a methodology based upon averaging.
NASA Astrophysics Data System (ADS)
Blanc, Guillermo A.; Kewley, Lisa; Vogt, Frédéric P. A.; Dopita, Michael A.
2015-01-01
We present a new method for inferring the metallicity (Z) and ionization parameter (q) of H II regions and star-forming galaxies using strong nebular emission lines (SELs). We use Bayesian inference to derive the joint and marginalized posterior probability density functions for Z and q given a set of observed line fluxes and an input photoionization model. Our approach allows the use of arbitrary sets of SELs and the inclusion of flux upper limits. The method provides a self-consistent way of determining the physical conditions of ionized nebulae that is not tied to the arbitrary choice of a particular SEL diagnostic and uses all the available information. Unlike theoretically calibrated SEL diagnostics, the method is flexible and not tied to a particular photoionization model. We describe our algorithm, validate it against other methods, and present a tool that implements it called IZI. Using a sample of nearby extragalactic H II regions, we assess the performance of commonly used SEL abundance diagnostics. We also use a sample of 22 local H II regions having both direct and recombination line (RL) oxygen abundance measurements in the literature to study discrepancies in the abundance scale between different methods. We find that oxygen abundances derived through Bayesian inference using currently available photoionization models in the literature can be in good (~30%) agreement with RL abundances, although some models perform significantly better than others. We also confirm that abundances measured using the direct method are typically ~0.2 dex lower than both RL and photoionization-model-based abundances.
Blanc, Guillermo A.; Kewley, Lisa; Vogt, Frédéric P. A.; Dopita, Michael A.
2015-01-10
We present a new method for inferring the metallicity (Z) and ionization parameter (q) of H II regions and star-forming galaxies using strong nebular emission lines (SELs). We use Bayesian inference to derive the joint and marginalized posterior probability density functions for Z and q given a set of observed line fluxes and an input photoionization model. Our approach allows the use of arbitrary sets of SELs and the inclusion of flux upper limits. The method provides a self-consistent way of determining the physical conditions of ionized nebulae that is not tied to the arbitrary choice of a particular SEL diagnostic and uses all the available information. Unlike theoretically calibrated SEL diagnostics, the method is flexible and not tied to a particular photoionization model. We describe our algorithm, validate it against other methods, and present a tool that implements it called IZI. Using a sample of nearby extragalactic H II regions, we assess the performance of commonly used SEL abundance diagnostics. We also use a sample of 22 local H II regions having both direct and recombination line (RL) oxygen abundance measurements in the literature to study discrepancies in the abundance scale between different methods. We find that oxygen abundances derived through Bayesian inference using currently available photoionization models in the literature can be in good (∼30%) agreement with RL abundances, although some models perform significantly better than others. We also confirm that abundances measured using the direct method are typically ∼0.2 dex lower than both RL and photoionization-model-based abundances.
Erguler, Kamil; Stumpf, Michael P H
2011-05-01
The size and complexity of cellular systems make building predictive models an extremely difficult task. In principle dynamical time-course data can be used to elucidate the structure of the underlying molecular mechanisms, but a central and recurring problem is that many and very different models can be fitted to experimental data, especially when the latter are limited and subject to noise. Even given a model, estimating its parameters remains challenging in real-world systems. Here we present a comprehensive analysis of 180 systems biology models, which allows us to classify the parameters with respect to their contribution to the overall dynamical behaviour of the different systems. Our results reveal candidate elements of control in biochemical pathways that differentially contribute to dynamics. We introduce sensitivity profiles that concisely characterize parameter sensitivity and demonstrate how this can be connected to variability in data. Systematically linking data and model sloppiness allows us to extract features of dynamical systems that determine how well parameters can be estimated from time-course measurements, and associates the extent of data required for parameter inference with the model structure, and also with the global dynamical state of the system. The comprehensive analysis of so many systems biology models reaffirms the inability to estimate precisely most model or kinetic parameters as a generic feature of dynamical systems, and provides safe guidelines for performing better inferences and model predictions in the context of reverse engineering of mathematical models for biological systems.
Bei, Yuanzhe; Hong, Pengyu
2016-12-19
Performing statistical tests is an important step in analyzing genome-wide datasets for detecting genomic features differentially expressed between conditions. Each type of statistical test has its own advantages in characterizing certain aspects of differences between population means and often assumes a relatively simple data distribution (e.g., Gaussian, Poisson, negative binomial, etc.), which may not be well met by the datasets of interest. Making insufficient distributional assumptions can lead to inferior results when dealing with complex differential expression patterns. We propose to capture differential expression information more comprehensively by integrating multiple test statistics, each of which has relatively limited capacity to summarize the observed differential expression information. This work addresses a general application scenario, in which users want to detect as many as DEFs while requiring the false discovery rate (FDR) to be lower than a cut-off. We treat each test statistic as a basic attribute, and model the detection of differentially expressed genomic features as learning a discriminant boundary in a multi-dimensional space of basic attributes. We mathematically formulated our goal as a constrained optimization problem aiming to maximize discoveries satisfying a user-defined FDR. An effective algorithm, Discriminant-Cut, has been developed to solve an instantiation of this problem. Extensive comparisons of Discriminant-Cut with 13 existing methods were carried out to demonstrate its robustness and effectiveness. We have developed a novel machine learning methodology for robust differential expression analysis, which can be a new avenue to significantly advance research on large-scale differential expression analysis.
Kravtsov, V.E.; Yudson, V.I.
2011-07-15
Highlights: > Statistics of normalized eigenfunctions in one-dimensional Anderson localization at E = 0 is studied. > Moments of inverse participation ratio are calculated. > Equation for generating function is derived at E = 0. > An exact solution for generating function at E = 0 is obtained. > Relation of the generating function to the phase distribution function is established. - Abstract: The one-dimensional (1d) Anderson model (AM), i.e. a tight-binding chain with random uncorrelated on-site energies, has statistical anomalies at any rational point f=(2a)/({lambda}{sub E}) , where a is the lattice constant and {lambda}{sub E} is the de Broglie wavelength. We develop a regular approach to anomalous statistics of normalized eigenfunctions {psi}(r) at such commensurability points. The approach is based on an exact integral transfer-matrix equation for a generating function {Phi}{sub r}(u, {phi}) (u and {phi} have a meaning of the squared amplitude and phase of eigenfunctions, r is the position of the observation point). This generating function can be used to compute local statistics of eigenfunctions of 1d AM at any disorder and to address the problem of higher-order anomalies at f=p/q with q > 2. The descender of the generating function P{sub r}({phi}){identical_to}{Phi}{sub r}(u=0,{phi}) is shown to be the distribution function of phase which determines the Lyapunov exponent and the local density of states. In the leading order in the small disorder we derived a second-order partial differential equation for the r-independent ('zero-mode') component {Phi}(u, {phi}) at the E = 0 (f=1/2 ) anomaly. This equation is nonseparable in variables u and {phi}. Yet, we show that due to a hidden symmetry, it is integrable and we construct an exact solution for {Phi}(u, {phi}) explicitly in quadratures. Using this solution we computed moments I{sub m} = N< vertical bar {psi} vertical bar {sup 2m}> (m {>=} 1) for a chain of the length N {yields} {infinity} and found an
Statistical properties of chaos demonstrated in a class of one-dimensional maps
NASA Astrophysics Data System (ADS)
Csordás, András; Györgyi, Géza; Szépfalusy, Péter; Tél, Tamás
1993-01-01
One-dimensional maps with complete grammar are investigated in both permanent and transient chaotic cases. The discussion focuses on statistical characteristics such as Lyapunov exponent, generalized entropies and dimensions, free energies, and their finite size corrections. Our approach is based on the eigenvalue problem of generalized Frobenius-Perron operators, which are treated numerically as well as by perturbative and other analytical methods. The examples include the universal chaos function relevant near the period doubling threshold. Special emphasis is put on the entropies and their decay rates because of their invariance under the most general class of coordinate changes. Phase-transition-like phenomena at the border state of chaos due to intermittency and super instability are presented.
Mode-resolved travel-time statistics for elastic rays in three-dimensional billiards.
Ortega, A; Stringlo, K; Gorin, T
2012-03-01
We consider the ray limit of propagating ultrasound waves in three-dimensional bodies made from a homogeneous, isotropic, elastic material. Using a Monte Carlo approach, we simulate the propagation and proliferation of elastic rays using realistic angle-dependent reflection coefficients, taking into account mode conversion and ray splitting. For a few simple geometries, we analyze the long-time equilibrium distribution, focusing on the energy ratio between compressional and shear waves. Finally, we study the travel time statistics, i.e., the distribution of the amount of time a given trajectory spends as a compressional wave, as compared to the total travel time. These results are intimately related to recent elastodynamics experiments on Coda-wave interferometry by Lobkis and Weaver [Phys. Rev. E 78, 066212 (2008)].
Statistical properties of chaos demonstrated in a class of one-dimensional maps.
Csordas, Andras; Gyorgyi, Geza; Szepfalusy, Peter; Tel, Tamas
1993-01-01
One-dimensional maps with complete grammar are investigated in both permanent and transient chaotic cases. The discussion focuses on statistical characteristics such as Lyapunov exponent, generalized entropies and dimensions, free energies, and their finite size corrections. Our approach is based on the eigenvalue problem of generalized Frobenius-Perron operators, which are treated numerically as well as by perturbative and other analytical methods. The examples include the universal chaos function relevant near the period doubling threshold. Special emphasis is put on the entropies and their decay rates because of their invariance under the most general class of coordinate changes. Phase-transition-like phenomena at the border state of chaos due to intermittency and super instability are presented.
A Method to Categorize 2-Dimensional Patterns Using Statistics of Spatial Organization.
López-Sauceda, Juan; Rueda-Contreras, Mara D
2017-01-01
We developed a measurement framework of spatial organization to categorize 2-dimensional patterns from 2 multiscalar biological architectures. We propose that underlying shapes of biological entities can be approached using the statistical concept of degrees of freedom, defining it through expansion of area variability in a pattern. To help scope this suggestion, we developed a mathematical argument recognizing the deep foundations of area variability in a polygonal pattern (spatial heterogeneity). This measure uses a parameter called eutacticity. Our measuring platform of spatial heterogeneity can assign particular ranges of distribution of spatial areas for 2 biological architectures: ecological patterns of Namibia fairy circles and epithelial sheets. The spatial organizations of our 2 analyzed biological architectures are demarcated by being in a particular position among spatial order and disorder. We suggest that this theoretical platform can give us some insights about the nature of shapes in biological systems to understand organizational constraints.
NASA Astrophysics Data System (ADS)
Villeta, M.; Sanz-Lobera, A.; González, C.; Sebastián, M. A.
2009-11-01
The implantation of Statistical Process Control, SPC designated in short, requires the use of measurement systems. The inherent variability of these systems influences on the reliability of measurement results obtained, and as a consequence of it, influences on the SPC results. This paper investigates about the influence of the uncertainty of measurement on the analysis of process capability. It looks for reducing the effect of measurement uncertainty, to approach the capability that the productive process really has. In this work processes centered at a nominal value as well as off-center processes are raised, and a criterion is proposed that allows validate the adequacy of the dimensional measurement systems used in a SPC implantation.
A Method to Categorize 2-Dimensional Patterns Using Statistics of Spatial Organization
López-Sauceda, Juan; Rueda-Contreras, Mara D
2017-01-01
We developed a measurement framework of spatial organization to categorize 2-dimensional patterns from 2 multiscalar biological architectures. We propose that underlying shapes of biological entities can be approached using the statistical concept of degrees of freedom, defining it through expansion of area variability in a pattern. To help scope this suggestion, we developed a mathematical argument recognizing the deep foundations of area variability in a polygonal pattern (spatial heterogeneity). This measure uses a parameter called eutacticity. Our measuring platform of spatial heterogeneity can assign particular ranges of distribution of spatial areas for 2 biological architectures: ecological patterns of Namibia fairy circles and epithelial sheets. The spatial organizations of our 2 analyzed biological architectures are demarcated by being in a particular position among spatial order and disorder. We suggest that this theoretical platform can give us some insights about the nature of shapes in biological systems to understand organizational constraints. PMID:28469379
Statistical mechanics of two-dimensional foams: Physical foundations of the model.
Durand, Marc
2015-12-01
In a recent series of papers, a statistical model that accounts for correlations between topological and geometrical properties of a two-dimensional shuffled foam has been proposed and compared with experimental and numerical data. Here, the various assumptions on which the model is based are exposed and justified: the equiprobability hypothesis of the foam configurations is argued. The range of correlations between bubbles is discussed, and the mean-field approximation that is used in the model is detailed. The two self-consistency equations associated with this mean-field description can be interpreted as the conservation laws of number of sides and bubble curvature, respectively. Finally, the use of a "Grand-Canonical" description, in which the foam constitutes a reservoir of sides and curvature, is justified.
Freely Evolving Process and Statistics in the Two-Dimensional Granular Turbulence
NASA Astrophysics Data System (ADS)
Isobe, Masaharu
2002-08-01
We studied the macroscopic statistical properties on the freely evolving quasi-inelastic hard disk (granular) system by performing large-scale (more than a million particles) event-driven molecular dynamics systematically and found that remarkably analogous to an enstrophy cascade process in decaying two-dimensional fluid turbulence. There are four typcial stages in the freely evolving inelastic hard disk system, which are homogeneous, shearing (vortex), clustering and final state. In the shearing stage, the self-organized macroscopic coherent vortices become dominant and the enstrophy decays power-low behavior. In the clustering stage, the energy spectra are close to the expectation of Kraichnan-Batchelor theory and the squared two particle separation strictly obeys Richardson law. These results indicate that the cooperative behavior of quasi-inelastic hard disks system has a same universal class as the macroscopic Navier-Stokes fluid turbulence in the study of dissipative structure.
Collisional statistics and dynamics of two-dimensional hard-disk systems: From fluid to solid.
Taloni, Alessandro; Meroz, Yasmine; Huerta, Adrián
2015-08-01
We perform extensive MD simulations of two-dimensional systems of hard disks, focusing on the collisional statistical properties. We analyze the distribution functions of velocity, free flight time, and free path length for packing fractions ranging from the fluid to the solid phase. The behaviors of the mean free flight time and path length between subsequent collisions are found to drastically change in the coexistence phase. We show that single-particle dynamical properties behave analogously in collisional and continuous-time representations, exhibiting apparent crossovers between the fluid and the solid phases. We find that, both in collisional and continuous-time representation, the mean-squared displacement, velocity autocorrelation functions, intermediate scattering functions, and self-part of the van Hove function (propagator) closely reproduce the same behavior exhibited by the corresponding quantities in granular media, colloids, and supercooled liquids close to the glass or jamming transition.
ERIC Educational Resources Information Center
Smith, A. Delany; Henson, Robin K.
This paper addresses the state of the art regarding the use of statistical significance tests (SSTs). How social science research will be conducted in the future is impacted directly by current debates regarding hypothesis testing. This paper: (1) briefly explicates the current debate on hypothesis testing; (2) reviews the newly published report…
Statistical conservation law in two- and three-dimensional turbulent flows.
Frishman, Anna; Boffetta, Guido; De Lillo, Filippo; Liberzon, Alex
2015-03-01
Particles in turbulence live complicated lives. It is nonetheless sometimes possible to find order in this complexity. It was proposed in Falkovich et al. [Phys. Rev. Lett. 110, 214502 (2013)] that pairs of Lagrangian tracers at small scales, in an incompressible isotropic turbulent flow, have a statistical conservation law. More specifically, in a d-dimensional flow the distance R(t) between two neutrally buoyant particles, raised to the power -d and averaged over velocity realizations, remains at all times equal to the initial, fixed, separation raised to the same power. In this work we present evidence from direct numerical simulations of two- and three-dimensional turbulence for this conservation. In both cases the conservation is lost when particles exit the linear flow regime. In two dimensions we show that, as an extension of the conservation law, an Evans-Cohen-Morriss or Gallavotti-Cohen type fluctuation relation exists. We also analyze data from a 3D laboratory experiment [Liberzon et al., Physica D 241, 208 (2012)], finding that although it probes small scales they are not in the smooth regime. Thus instead of 〈R-3〉, we look for a similar, power-law-in-separation conservation law. We show that the existence of an initially slowly varying function of this form can be predicted but that it does not turn into a conservation law. We suggest that the conservation of 〈R-d〉, demonstrated here, can be used as a check of isotropy, incompressibility, and flow dimensionality in numerical and laboratory experiments that focus on small scales.
Statistical conservation law in two- and three-dimensional turbulent flows
NASA Astrophysics Data System (ADS)
Frishman, Anna; Boffetta, Guido; De Lillo, Filippo; Liberzon, Alex
2015-03-01
Particles in turbulence live complicated lives. It is nonetheless sometimes possible to find order in this complexity. It was proposed in Falkovich et al. [Phys. Rev. Lett. 110, 214502 (2013), 10.1103/PhysRevLett.110.214502] that pairs of Lagrangian tracers at small scales, in an incompressible isotropic turbulent flow, have a statistical conservation law. More specifically, in a d -dimensional flow the distance R (t ) between two neutrally buoyant particles, raised to the power -d and averaged over velocity realizations, remains at all times equal to the initial, fixed, separation raised to the same power. In this work we present evidence from direct numerical simulations of two- and three-dimensional turbulence for this conservation. In both cases the conservation is lost when particles exit the linear flow regime. In two dimensions we show that, as an extension of the conservation law, an Evans-Cohen-Morriss or Gallavotti-Cohen type fluctuation relation exists. We also analyze data from a 3D laboratory experiment [Liberzon et al., Physica D 241, 208 (2012), 10.1016/j.physd.2011.07.008], finding that although it probes small scales they are not in the smooth regime. Thus instead of
Experiments with a three-dimensional statistical objective analysis scheme using FGGE data
NASA Technical Reports Server (NTRS)
Baker, Wayman E.; Bloom, Stephen C.; Woollen, John S.; Nestler, Mark S.; Brin, Eugenia
1987-01-01
A three-dimensional (3D), multivariate, statistical objective analysis scheme (referred to as optimum interpolation or OI) has been developed for use in numerical weather prediction studies with the FGGE data. Some novel aspects of the present scheme include: (1) a multivariate surface analysis over the oceans, which employs an Ekman balance instead of the usual geostrophic relationship, to model the pressure-wind error cross correlations, and (2) the capability to use an error correlation function which is geographically dependent. A series of 4-day data assimilation experiments are conducted to examine the importance of some of the key features of the OI in terms of their effects on forecast skill, as well as to compare the forecast skill using the OI with that utilizing a successive correction method (SCM) of analysis developed earlier. For the three cases examined, the forecast skill is found to be rather insensitive to varying the error correlation function geographically. However, significant differences are noted between forecasts from a two-dimensional (2D) version of the OI and those from the 3D OI, with the 3D OI forecasts exhibiting better forecast skill. The 3D OI forecasts are also more accurate than those from the SCM initial conditions. The 3D OI with the multivariate oceanic surface analysis was found to produce forecasts which were slightly more accurate, on the average, than a univariate version.
Experiments with a three-dimensional statistical objective analysis scheme using FGGE data
NASA Technical Reports Server (NTRS)
Baker, Wayman E.; Bloom, Stephen C.; Woollen, John S.; Nestler, Mark S.; Brin, Eugenia
1987-01-01
A three-dimensional (3D), multivariate, statistical objective analysis scheme (referred to as optimum interpolation or OI) has been developed for use in numerical weather prediction studies with the FGGE data. Some novel aspects of the present scheme include: (1) a multivariate surface analysis over the oceans, which employs an Ekman balance instead of the usual geostrophic relationship, to model the pressure-wind error cross correlations, and (2) the capability to use an error correlation function which is geographically dependent. A series of 4-day data assimilation experiments are conducted to examine the importance of some of the key features of the OI in terms of their effects on forecast skill, as well as to compare the forecast skill using the OI with that utilizing a successive correction method (SCM) of analysis developed earlier. For the three cases examined, the forecast skill is found to be rather insensitive to varying the error correlation function geographically. However, significant differences are noted between forecasts from a two-dimensional (2D) version of the OI and those from the 3D OI, with the 3D OI forecasts exhibiting better forecast skill. The 3D OI forecasts are also more accurate than those from the SCM initial conditions. The 3D OI with the multivariate oceanic surface analysis was found to produce forecasts which were slightly more accurate, on the average, than a univariate version.
Saeki, Hiroyuki; Tango, Toshiro; Wang, Jinfang
2017-01-01
In clinical investigations of diagnostic procedures to indicate noninferiority, efficacy is generally evaluated on the basis of results from independent multiple raters. For each subject, if two diagnostic procedures are performed and some units are evaluated, the difference in proportions for matched-pair data is correlated between the two diagnostic procedures and within the subject, i.e. the data are clustered. In this article, we propose a noninferiority test to infer the difference in the correlated proportions of clustered data between the two diagnostic procedures. The proposed noninferiority test was validated in a Monte Carlo simulation study. Empirical sizes of the noninferiority test were close to the nominal level. The proposed test is illustrated on data of aneurysm diagnostic procedures for patients with acute subarachnoid hemorrhage.
Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.
Aggelopoulos, Nikolaos C
2015-08-01
Perceptual inference refers to the ability to infer sensory stimuli from predictions that result from internal neural representations built through prior experience. Methods of Bayesian statistical inference and decision theory model cognition adequately by using error sensing either in guiding action or in "generative" models that predict the sensory information. In this framework, perception can be seen as a process qualitatively distinct from sensation, a process of information evaluation using previously acquired and stored representations (memories) that is guided by sensory feedback. The stored representations can be utilised as internal models of sensory stimuli enabling long term associations, for example in operant conditioning. Evidence for perceptual inference is contributed by such phenomena as the cortical co-localisation of object perception with object memory, the response invariance in the responses of some neurons to variations in the stimulus, as well as from situations in which perception can be dissociated from sensation. In the context of perceptual inference, sensory areas of the cerebral cortex that have been facilitated by a priming signal may be regarded as comparators in a closed feedback loop, similar to the better known motor reflexes in the sensorimotor system. The adult cerebral cortex can be regarded as similar to a servomechanism, in using sensory feedback to correct internal models, producing predictions of the outside world on the basis of past experience.
Current Sheet Statistics in Three-Dimensional Simulations of Coronal Heating
NASA Astrophysics Data System (ADS)
Lin, L.; Ng, C. S.; Bhattacharjee, A.
2013-04-01
In a recent numerical study [Ng et al., Astrophys. J. 747, 109, 2012], with a three-dimensional model of coronal heating using reduced magnetohydrodynamics (RMHD), we have obtained scaling results of heating rate versus Lundquist number based on a series of runs in which random photospheric motions are imposed for hundreds to thousands of Alfvén time in order to obtain converged statistical values. The heating rate found in these simulations saturate to a level that is independent of the Lundquist number. This scaling result was also supported by an analysis with the assumption of the Sweet-Parker scaling of the current sheets, as well as how the width, length and number of current sheets scale with Lundquist number. In order to test these assumptions, we have implemented an automated routine to analyze thousands of current sheets in these simulations and return statistical scalings for these quantities. It is found that the Sweet-Parker scaling is justified. However, some discrepancies are also found and require further study.
NASA Astrophysics Data System (ADS)
Evans, J.; Nord, R. S.
1987-04-01
We consider the angular distribution of the diffracted intensity, I(q), for systems of disordered one-dimensional double-spaced islands with domain boundaries. Although I(q) is directly determined by the spatial pair correlations, it is often naturally reexpressed in terms of island size and separation distributions. We analyze the effect on I(q) of various approximate specifications of the island statistics. In particular, we highlight the approximations implicit in Guinier (-type) formulations, and provide a new very accurate approximation. Motivated by the scarcity of analysis for kinetically limited island growth (often seen in chemisorption), we consider such an irreversible cooperative filling model for which exact results are available for the (highly nontrivial) island statistics [and thus for I(q)]. We find that the (exact) integral-order beam intensity effectively disappears at saturation (where neighboring islands are out of phase) due to a propensity for cancellation of (a sum over) spatial pair correlations. This feature is missing not only in island-size-broadening-model (ISBM) calculations (which neglect the interisland interface), but also in Guinier formations. There is also significant interference at the half-order beams, and again Guinier formulations are inaccurate. Determination of average island size from these beam widths, via the usual ISBM algorithm, results in underestimation by a factor increasing to ~3 as the coverage increases to saturation.
Extreme value statistics for two-dimensional convective penetration in a pre-main sequence star
NASA Astrophysics Data System (ADS)
Pratt, J.; Baraffe, I.; Goffrey, T.; Constantino, T.; Viallet, M.; Popov, M. V.; Walder, R.; Folini, D.
2017-08-01
Context. In the interior of stars, a convectively unstable zone typically borders a zone that is stable to convection. Convective motions can penetrate the boundary between these zones, creating a layer characterized by intermittent convective mixing, and gradual erosion of the density and temperature stratification. Aims: We examine a penetration layer formed between a central radiative zone and a large convection zone in the deep interior of a young low-mass star. Using the Multidimensional Stellar Implicit Code (MUSIC) to simulate two-dimensional compressible stellar convection in a spherical geometry over long times, we produce statistics that characterize the extent and impact of convective penetration in this layer. Methods: We apply extreme value theory to the maximal extent of convective penetration at any time. We compare statistical results from simulations which treat non-local convection, throughout a large portion of the stellar radius, with simulations designed to treat local convection in a small region surrounding the penetration layer. For each of these situations, we compare simulations of different resolution, which have different velocity magnitudes. We also compare statistical results between simulations that radiate energy at a constant rate to those that allow energy to radiate from the stellar surface according to the local surface temperature. Results: Based on the frequency and depth of penetrating convective structures, we observe two distinct layers that form between the convection zone and the stable radiative zone. We show that the probability density function of the maximal depth of convective penetration at any time corresponds closely in space with the radial position where internal waves are excited. We find that the maximal penetration depth can be modeled by a Weibull distribution with a small shape parameter. Using these results, and building on established scalings for diffusion enhanced by large-scale convective motions, we
Chapman, Benjamin P; Weiss, Alexander; Duberstein, Paul R
2016-12-01
Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in "big data" problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how 3 common SLT algorithms-supervised principal components, regularization, and boosting-can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach-or perhaps because of them-SLT methods may hold value as a statistically rigorous approach to exploratory regression. (PsycINFO Database Record
NASA Astrophysics Data System (ADS)
Chavanis, Pierre-Henri
2014-04-01
We complement the literature on the statistical mechanics of point vortices in two-dimensional hydrodynamics. Using a maximum entropy principle, we determine the multi-species Boltzmann-Poisson equation and establish a form of Virial theorem. Using a maximum entropy production principle (MEPP), we derive a set of relaxation equations towards statistical equilibrium. These relaxation equations can be used as a numerical algorithm to compute the maximum entropy state. We mention the analogies with the Fokker-Planck equations derived by Debye and Hückel for electrolytes. We then consider the limit of strong mixing (or low energy). To leading order, the relationship between the vorticity and the stream function at equilibrium is linear and the maximization of the entropy becomes equivalent to the minimization of the enstrophy. This expansion is similar to the Debye-Hückel approximation for electrolytes, except that the temperature is negative instead of positive so that the effective interaction between like-sign vortices is attractive instead of repulsive. This leads to an organization at large scales presenting geometry-induced phase transitions, instead of Debye shielding. We compare the results obtained with point vortices to those obtained in the context of the statistical mechanics of continuous vorticity fields described by the Miller-Robert-Sommeria (MRS) theory. At linear order, we get the same results but differences appear at the next order. In particular, the MRS theory predicts a transition between sinh and tanh-like ω - ψ relationships depending on the sign of Ku - 3 (where Ku is the Kurtosis) while there is no such transition for point vortices which always show a sinh-like ω - ψ relationship. We derive the form of the relaxation equations in the strong mixing limit and show that the enstrophy plays the role of a Lyapunov functional.
Yoshimatsu, Katsunori; Kawahara, Yasuhiro; Schneider, Kai; Okamoto, Naoya; Farge, Marie
2011-09-15
Scale-dependent and geometrical statistics of three-dimensional incompressible homogeneous magnetohydrodynamic turbulence without mean magnetic field are examined by means of the orthogonal wavelet decomposition. The flow is computed by direct numerical simulation with a Fourier spectral method at resolution 512{sup 3} and a unit magnetic Prandtl number. Scale-dependent second and higher order statistics of the velocity and magnetic fields allow to quantify their intermittency in terms of spatial fluctuations of the energy spectra, the flatness, and the probability distribution functions at different scales. Different scale-dependent relative helicities, e.g., kinetic, cross, and magnetic relative helicities, yield geometrical information on alignment between the different scale-dependent fields. At each scale, the alignment between the velocity and magnetic field is found to be more pronounced than the other alignments considered here, i.e., the scale-dependent alignment between the velocity and vorticity, the scale-dependent alignment between the magnetic field and its vector potential, and the scale-dependent alignment between the magnetic field and the current density. Finally, statistical scale-dependent analyses of both Eulerian and Lagrangian accelerations and the corresponding time-derivatives of the magnetic field are performed. It is found that the Lagrangian acceleration does not exhibit substantially stronger intermittency compared to the Eulerian acceleration, in contrast to hydrodynamic turbulence where the Lagrangian acceleration shows much stronger intermittency than the Eulerian acceleration. The Eulerian time-derivative of the magnetic field is more intermittent than the Lagrangian time-derivative of the magnetic field.
Chapman, Benjamin P.; Weiss, Alexander; Duberstein, Paul
2016-01-01
Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in “big data” problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how three common SLT algorithms–Supervised Principal Components, Regularization, and Boosting—can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach—or perhaps because of them–SLT methods may hold value as a statistically rigorous approach to exploratory regression. PMID:27454257
Statistics of extreme waves in the framework of one-dimensional Nonlinear Schrodinger Equation
NASA Astrophysics Data System (ADS)
Agafontsev, Dmitry; Zakharov, Vladimir
2013-04-01
We examine the statistics of extreme waves for one-dimensional classical focusing Nonlinear Schrodinger (NLS) equation, iÎ¨t + Î¨xx + |Î¨ |2Î¨ = 0, (1) as well as the influence of the first nonlinear term beyond Eq. (1) - the six-wave interactions - on the statistics of waves in the framework of generalized NLS equation accounting for six-wave interactions, dumping (linear dissipation, two- and three-photon absorption) and pumping terms, We solve these equations numerically in the box with periodically boundary conditions starting from the initial data Î¨t=0 = F(x) + ?(x), where F(x) is an exact modulationally unstable solution of Eq. (1) seeded by stochastic noise ?(x) with fixed statistical properties. We examine two types of initial conditions F(x): (a) condensate state F(x) = 1 for Eq. (1)-(2) and (b) cnoidal wave for Eq. (1). The development of modulation instability in Eq. (1)-(2) leads to formation of one-dimensional wave turbulence. In the integrable case the turbulence is called integrable and relaxes to one of infinite possible stationary states. Addition of six-wave interactions term leads to appearance of collapses that eventually are regularized by the dumping terms. The energy lost during regularization of collapses in (2) is restored by the pumping term. In the latter case the system does not demonstrate relaxation-like behavior. We measure evolution of spectra Ik =< |Î¨k|2 >, spatial correlation functions and the PDFs for waves amplitudes |Î¨|, concentrating special attention on formation of "fat tails" on the PDFs. For the classical integrable NLS equation (1) with condensate initial condition we observe Rayleigh tails for extremely large waves and a "breathing region" for middle waves with oscillations of the frequency of waves appearance with time, while nonintegrable NLS equation with dumping and pumping terms (2) with the absence of six-wave interactions α = 0 demonstrates perfectly Rayleigh PDFs without any oscillations with
A Study of the Statistical Inference Criteria: Can We Agree on When to Use Z versus "t"?
ERIC Educational Resources Information Center
Ozgur, Ceyhun; Strasser, Sandra E.
2004-01-01
Authors who write introductory business statistics texts do not agree on when to use a t distribution and when to use a Z distribution in both the construction of confidence intervals and the use of hypothesis testing. In a survey of textbooks written in the last 15 years, we found the decision rules to be contradictory and, at times, the…
ERIC Educational Resources Information Center
Schochet, Peter Z.
2015-01-01
This report presents the statistical theory underlying the "RCT-YES" software that estimates and reports impacts for RCTs for a wide range of designs used in social policy research. The report discusses a unified, non-parametric design-based approach for impact estimation using the building blocks of the Neyman-Rubin-Holland causal…
Wu, Wei; Mast, Thomas G; Ziembko, Christopher; Breza, Joseph M; Contreras, Robert J
2013-01-01
We analyzed the spike discharge patterns of two types of neurons in the rodent peripheral gustatory system, Na specialists (NS) and acid generalists (AG) to lingual stimulation with NaCl, acetic acid, and mixtures of the two stimuli. Previous computational investigations found that both spike rate and spike timing contribute to taste quality coding. These studies used commonly accepted computational methods, but they do not provide a consistent statistical evaluation of spike trains. In this paper, we adopted a new computational framework that treated each spike train as an individual data point for computing summary statistics such as mean and variance in the spike train space. We found that these statistical summaries properly characterized the firing patterns (e. g. template and variability) and quantified the differences between NS and AG neurons. The same framework was also used to assess the discrimination performance of NS and AG neurons and to remove spontaneous background activity or "noise" from the spike train responses. The results indicated that the new metric system provided the desired decoding performance and noise-removal improved stimulus classification accuracy, especially of neurons with high spontaneous rates. In summary, this new method naturally conducts statistical analysis and neural decoding under one consistent framework, and the results demonstrated that individual peripheral-gustatory neurons generate a unique and reliable firing pattern during sensory stimulation and that this pattern can be reliably decoded.
ERIC Educational Resources Information Center
Davis, Philip M.; Solla, Leah R.
2003-01-01
Reports an analysis of American Chemical Society electronic journal downloads at Cornell University (Ithaca, New York) by individual IP (Internet Protocol) addresses. Highlights include usage statistics to evaluate library journal subscriptions; understanding scientists' reading behavior; individual use of articles and of journals; and the…
ERIC Educational Resources Information Center
Davis, Philip M.; Solla, Leah R.
2003-01-01
Reports an analysis of American Chemical Society electronic journal downloads at Cornell University (Ithaca, New York) by individual IP (Internet Protocol) addresses. Highlights include usage statistics to evaluate library journal subscriptions; understanding scientists' reading behavior; individual use of articles and of journals; and the…
NASA Astrophysics Data System (ADS)
Hasan, Asad; Maloney, Craig
2013-03-01
We compute the effective dispersion and density of states (DOS) of two-dimensional sub-regions of three dimensional face centered cubic (FCC) crystals with both a direct projection-inversion technique and a Monte Carlo simulation based on a common Hamiltonian. We study sub-regions of both (111) and (100) planes. For any direction of wavevector, we show an anomalous ω2 ~ q scaling regime at low q where ω2 is the energy associated with a mode of wavenumber q. This scaling should give rise to an anomalous DOS, Dω, at low ω: Dω ~ω3 rather than the conventional Debye result: Dω ~ω2 . The DOS for the (100) sub-region looks to be consistent with Dω ~ω3 , while the (111) shows something closer to the Debye result at the smallest frequencies. Our Monte Carlo simulation shows that finite sampling artifacts act as an effective disorder and bias the Dω in the same way as the finite size artifacts, giving a behavior closer to Dω ~ω2 than Dω ~ω3 . These results should have an important impact on interpretation of recent studies of colloidal solids where two-point displacement correlations can be obtained in real-space via microscopy.
Larmat, Carene; Maceira, Monica; Higdon, David M.; ...
2017-08-29
Seismic inversions produce seismic models, which are 3-dimensional (3D) images of wave velocity of the entire planet retrieved by fitting seismic measurements made on records of past earthquakes or other seismic events. Computing power of the TeraFlop era, along with the dataflow from new, very dense, seismic arrays, has led to a new generation of 3D seismic Earth models with an unprecedented level of resolution. Here we compare two recent models of western United States from the Dynamic North America (DNA) seismic imaging effort. The two models only differ in the wave propagation that was used for their inversion: onemore » is based on ray theory (RT), and the other on finite frequency (FF). We evaluate the two models using an independent numerical method and statistical tests. We show that they differ in how they produce seismic signals from a subset of earthquakes that were used in the original inversion and were recorded on the US array. This is especially true for measurements done in the Yellowstone area which has a large negative seismic anomaly. This result is of importance for seismologists who have been debating on the practical benefit of using FF in ill-posed Earth inversions. Model evaluation, such as the one reported here, represents an opportunity for collaboration between geophysical and statistical communities. Finally, more opportunities should arise with the upcoming Exascale era, which will provide enough computational power to explore together several sources of errors in models with thousands of parameters, opening the way of uncertainty quantification of seismic models.« less
NASA Astrophysics Data System (ADS)
Das Sarma, S.; Nag, Amit; Sau, Jay D.
2016-07-01
We consider a simple conceptual question with respect to Majorana zero modes in semiconductor nanowires: can the measured nonideal values of the zero-bias-conductance-peak in the tunneling experiments be used as a characteristic to predict the underlying topological nature of the proximity induced nanowire superconductivity? In particular, we define and calculate the topological visibility, which is a variation of the topological invariant associated with the scattering matrix of the system as well as the zero-bias-conductance-peak heights in the tunneling measurements, in the presence of dissipative broadening, using precisely the same realistic nanowire parameters to connect the topological invariants with the zero-bias tunneling conductance values. This dissipative broadening is present in both (the existing) tunneling measurements and also (any future) braiding experiments as an inevitable consequence of a finite braiding time. The connection between the topological visibility and the conductance allows us to obtain the visibility of realistic braiding experiments in nanowires, and to conclude that the current experimentally accessible systems with nonideal zero-bias conductance peaks may indeed manifest (with rather low visibility) non-Abelian statistics for the Majorana zero modes. In general, we find that a large (small) superconducting gap (Majorana peak splitting) is essential for the manifestation of the non-Abelian braiding statistics, and in particular, a zero-bias conductance value of around half the ideal quantized Majorana value should be sufficient for the manifestation of non-Abelian statistics in experimental nanowires. Our work also establishes that as a matter of principle the topological transition associated with the emergence of Majorana zero modes in finite nanowires is always a crossover (akin to a quantum phase transition at finite temperature) requiring the presence of dissipative broadening (which must be larger than the Majorana energy
NASA Astrophysics Data System (ADS)
Shen, Samuel S. P.; Wied, Olaf; Weithmann, Alexander; Regele, Tobias; Bailey, Barbara A.; Lawrimore, Jay H.
2016-07-01
This paper describes six different temporal climate regimes of the contiguous United States (CONUS) according to interdecadal variations of surface air temperature (SAT) and precipitation using the United States Historical Climatology Network (USHCN) monthly data (Tmax, Tmin, Tmean, and precipitation) from 1895 to 2010. Our analysis is based on the probability distribution, mean, standard deviation, skewness, kurtosis, Kolmogorov-Smirnov (KS) test, and Welch's t test. The relevant statistical parameters are computed from gridded monthly SAT and precipitation data. SAT variations lead to classification of four regimes: 1895-1930 (cool), 1931-1960 (warm), 1961-1985 (cool), and 1986-2010 (warm), while precipitation variations lead to a classification of two regimes: 1895-1975 (dry) and 1976-2010 (wet). The KS test shows that any two regimes of the above six are statistically significantly different from each other due to clear shifts of the probability density functions. Extremes of SAT and precipitation identify the ten hottest, coldest, driest, and wettest years. Welch's t test is used to discern significant differences among these extremes. The spatial patterns of the six climate regimes and some years of extreme climate are analyzed. Although the recent two decades are the warmest among the other decades since 1895 and many hottest years measured by CONUS Tmin and Tmean are in these two decades, the hottest year according to the CONUS Tmax anomalies is 1934 (1.37 °C), which is very close to the second Tmax hottest year 2006 (1.35 °C).
A one-dimensional statistical mechanics model for nucleosome positioning on genomic DNA
NASA Astrophysics Data System (ADS)
Tesoro, S.; Ali, I.; Morozov, A. N.; Sulaiman, N.; Marenduzzo, D.
2016-02-01
The first level of folding of DNA in eukaryotes is provided by the so-called ‘10 nm chromatin fibre’, where DNA wraps around histone proteins (∼10 nm in size) to form nucleosomes, which go on to create a zig-zagging bead-on-a-string structure. In this work we present a one-dimensional statistical mechanics model to study nucleosome positioning within one such 10 nm fibre. We focus on the case of genomic sheep DNA, and we start from effective potentials valid at infinite dilution and determined from high-resolution in vitro salt dialysis experiments. We study positioning within a polynucleosome chain, and compare the results for genomic DNA to that obtained in the simplest case of homogeneous DNA, where the problem can be mapped to a Tonks gas [1]. First, we consider the simple, analytically solvable, case where nucleosomes are assumed to be point-like. Then, we perform numerical simulations to gauge the effect of their finite size on the nucleosomal distribution probabilities. Finally we compare nucleosome distributions and simulated nuclease digestion patterns for the two cases (homogeneous and sheep DNA), thereby providing testable predictions of the effect of sequence on experimentally observable quantities in experiments on polynucleosome chromatin fibres reconstituted in vitro.
Air entrainment and bubble statistics in three-dimensional breaking waves
NASA Astrophysics Data System (ADS)
Deike, Luc; Melville, W. K.; Popinet, Stephane
2015-11-01
Wave breaking in the ocean is of fundamental importance in order to quantify wave dissipation and air-sea interaction, including gas and momentum exchange, and to improve parametrizationsfor weather and climate models. Here, we investigate air entrainment and bubble statistics in three-dimensional breaking waves through direct numerical simulations of the two-phase air-water flow using the Open Source solver Gerris. As in previous 2D simulations, the dissipation due to breaking is found to be in good agreement with previous experimental observations and inertial-scaling arguments. For radii larger than the Hinze scale, the bubble size distribution, is found to follow a power law of the radius, r-3and to scale linearly with the time dependent turbulent dissipation rate during the active breaking stages. The time-averaged bubble size distribution is found to follow the same power law of the radius and to scale linearly with the wave dissipation rate per unit length of breaking crest. We propose a phenomenological turbulent bubble break-up model that describes the numerical results and existing experimental results.
A three-dimensional statistical approach to improved image quality for multislice helical CT
Thibault, Jean-Baptiste; Sauer, Ken D.; Bouman, Charles A.; Hsieh, Jiang
2007-11-15
Multislice helical computed tomography scanning offers the advantages of faster acquisition and wide organ coverage for routine clinical diagnostic purposes. However, image reconstruction is faced with the challenges of three-dimensional cone-beam geometry, data completeness issues, and low dosage. Of all available reconstruction methods, statistical iterative reconstruction (IR) techniques appear particularly promising since they provide the flexibility of accurate physical noise modeling and geometric system description. In this paper, we present the application of Bayesian iterative algorithms to real 3D multislice helical data to demonstrate significant image quality improvement over conventional techniques. We also introduce a novel prior distribution designed to provide flexibility in its parameters to fine-tune image quality. Specifically, enhanced image resolution and lower noise have been achieved, concurrently with the reduction of helical cone-beam artifacts, as demonstrated by phantom studies. Clinical results also illustrate the capabilities of the algorithm on real patient data. Although computational load remains a significant challenge for practical development, superior image quality combined with advancements in computing technology make IR techniques a legitimate candidate for future clinical applications.
A statistical mechanical theory for a two-dimensional model of water
NASA Astrophysics Data System (ADS)
Urbic, Tomaz; Dill, Ken A.
2010-06-01
We develop a statistical mechanical model for the thermal and volumetric properties of waterlike fluids. Each water molecule is a two-dimensional disk with three hydrogen-bonding arms. Each water interacts with neighboring waters through a van der Waals interaction and an orientation-dependent hydrogen-bonding interaction. This model, which is largely analytical, is a variant of the Truskett and Dill (TD) treatment of the "Mercedes-Benz" (MB) model. The present model gives better predictions than TD for hydrogen-bond populations in liquid water by distinguishing strong cooperative hydrogen bonds from weaker ones. We explore properties versus temperature T and pressure p. We find that the volumetric and thermal properties follow the same trends with T as real water and are in good general agreement with Monte Carlo simulations of MB water, including the density anomaly, the minimum in the isothermal compressibility, and the decreased number of hydrogen bonds for increasing temperature. The model reproduces that pressure squeezes out water's heat capacity and leads to a negative thermal expansion coefficient at low temperatures. In terms of water structuring, the variance in hydrogen-bonding angles increases with both T and p, while the variance in water density increases with T but decreases with p. Hydrogen bonding is an energy storage mechanism that leads to water's large heat capacity (for its size) and to the fragility in its cagelike structures, which are easily melted by temperature and pressure to a more van der Waals-like liquid state.
A statistical mechanical theory for a two-dimensional model of water.
Urbic, Tomaz; Dill, Ken A
2010-06-14
We develop a statistical mechanical model for the thermal and volumetric properties of waterlike fluids. Each water molecule is a two-dimensional disk with three hydrogen-bonding arms. Each water interacts with neighboring waters through a van der Waals interaction and an orientation-dependent hydrogen-bonding interaction. This model, which is largely analytical, is a variant of the Truskett and Dill (TD) treatment of the "Mercedes-Benz" (MB) model. The present model gives better predictions than TD for hydrogen-bond populations in liquid water by distinguishing strong cooperative hydrogen bonds from weaker ones. We explore properties versus temperature T and pressure p. We find that the volumetric and thermal properties follow the same trends with T as real water and are in good general agreement with Monte Carlo simulations of MB water, including the density anomaly, the minimum in the isothermal compressibility, and the decreased number of hydrogen bonds for increasing temperature. The model reproduces that pressure squeezes out water's heat capacity and leads to a negative thermal expansion coefficient at low temperatures. In terms of water structuring, the variance in hydrogen-bonding angles increases with both T and p, while the variance in water density increases with T but decreases with p. Hydrogen bonding is an energy storage mechanism that leads to water's large heat capacity (for its size) and to the fragility in its cagelike structures, which are easily melted by temperature and pressure to a more van der Waals-like liquid state.
NASA Astrophysics Data System (ADS)
Kravtsov, V. E.; Yudson, V. I.
2011-07-01
The one-dimensional (1d) Anderson model (AM), i.e. a tight-binding chain with random uncorrelated on-site energies, has statistical anomalies at any rational point f=2a/λE, where a is the lattice constant and λE is the de Broglie wavelength. We develop a regular approach to anomalous statistics of normalized eigenfunctions ψ( r) at such commensurability points. The approach is based on an exact integral transfer-matrix equation for a generating function Φr( u, ϕ) ( u and ϕ have a meaning of the squared amplitude and phase of eigenfunctions, r is the position of the observation point). This generating function can be used to compute local statistics of eigenfunctions of 1d AM at any disorder and to address the problem of higher-order anomalies at f={p}/{q} with q > 2. The descender of the generating function Pr(ϕ)≡Φr(u=0,ϕ) is shown to be the distribution function of phase which determines the Lyapunov exponent and the local density of states. In the leading order in the small disorder we derived a second-order partial differential equation for the r-independent ("zero-mode") component Φ( u, ϕ) at the E = 0 ( f={1}/{2}) anomaly. This equation is nonseparable in variables u and ϕ. Yet, we show that due to a hidden symmetry, it is integrable and we construct an exact solution for Φ( u, ϕ) explicitly in quadratures. Using this solution we computed moments Im = N<∣ ψ∣ 2 m> ( m ⩾ 1) for a chain of the length N → ∞ and found an essential difference between their m-behavior in the center-of-band anomaly and for energies outside this anomaly. Outside the anomaly the "extrinsic" localization length defined from the Lyapunov exponent coincides with that defined from the inverse participation ratio ("intrinsic" localization length). This is not the case at the E = 0 anomaly where the extrinsic localization length is smaller than the intrinsic one. At E = 0 one also observes an anomalous enhancement of large moments compatible with existence of yet
NASA Astrophysics Data System (ADS)
Abramson, Louis Evan; Imacs Cluster Building Survey
2015-01-01
The growth of galaxies is a central theme of the cosmological narrative, but we do not yet understand how these objects build their stellar populations over time. Largely, this is because star formation histories must be inferred from statistical metrics (at z > 0), e.g., the cosmic star formation rate density, the stellar mass function, and the SFR/stellar mass relation. The relationship between these observations and the behavior of individual systems is unclear, but it deeply affects views on galaxy evolution. Here, I discuss key issues complicating this relationship, and explore attempts to deal with them from both 'population-down' and 'galaxy-up' perspectives. I suggest that these interpretations ultimately differ in their emphasis on astrophysical processes that 'quench' versus those that diversify galaxies, and the extent to which individual star formation histories encode these processes. I close by highlighting observations which might soon reveal the accuracy of either vision.
NASA Astrophysics Data System (ADS)
Kumar, Ranjeet; Chandra, Navin; Tomar, Surekha
2016-02-01
This paper deals with the role of triple encounters with low initial velocities and equal masses in the framework of statistical escape theory in two-dimensional space. This system is described by allowing for both energy and angular momentum conservation in the phase space. The complete statistical solutions (i.e. the semi-major axis `a', the distributions of eccentricity `e', and energy Eb of the final binary, escape energy Es of escaper and its escape velocity vs) of the system are calculated. These are in good agreement with the numerical results of Chandra and Bhatnagar (1999) in the range of perturbing velocities vi (10^{-1} ≤ vi ≤ 10^{-10}) in two-dimensional space. The double limit process has been applied to the system. It is observed that when vi to 0^{ +}, a vs2 to 2 / 3 for all directions in two-dimensional space.
Garelli, F M; Espinosa, M O; Gürtler, R E
2012-05-01
Understanding the processes that affect Aedes aegypti (L.) (Diptera: Culicidae) may serve as a starting point to create and/or improve vector control strategies. For this purpose, we performed statistical modeling of three entomological surveys conducted in Clorinda City, northern Argentina. Previous 'basic' models of presence or absence of larvae and/or pupae (infestation) and the number of pupae in infested containers (productivity), mainly based on physical characteristics of containers, were expanded to include variables selected a priori reflecting water use practices, vector-related context factors, the history of chemical control, and climate. Model selection was performed using Akaike's Information Criterion. In total, 5,431 water-holding containers were inspected and 12,369 Ae. aegypti pupae collected from 963 positive containers. Large tanks were the most productive container type. Variables reflecting every putative process considered, except for history of chemical control, were selected in the best models obtained for infestation and productivity. The associations found were very strong, particularly in the case of infestation. Water use practices and vector-related context factors were the most important ones, as evidenced by their impact on Akaike's Information Criterion scores of the infestation model. Risk maps based on empirical data and model predictions showed a heterogeneous distribution of entomological risk. An integrated vector control strategy is recommended, aiming at community participation for healthier water use practices and targeting large tanks for key elements such as lid status, water addition frequency and water use.
NASA Astrophysics Data System (ADS)
Germa, Aurelie; Connor, Laura; Connor, Chuck; Malservisi, Rocco
2015-04-01
One challenge of volcanic hazard assessment in distributed volcanic fields (large number of small-volume basaltic volcanoes along with one or more silicic central volcanoes) is to constrain the location of future activity. Although the extent of the source of melts at depth can be known using geophysical methods or the location of past eruptive vents, the location of preferential pathways and zones of higher magma flux are still unobserved. How does the spatial distribution of eruptive vents at the surface reveal the location of magma sources or focusing? When this distribution is investigated, the location of central polygenetic edifices as well as clusters of monogenetic volcanoes denote zones of high magma flux and recurrence rate, whereas areas of dispersed monogenetic vents represent zones of lower flux. Additionally, central polygenetic edifices, acting as magma filters, prevent dense mafic magmas from reaching the surface close to their central silicic system. Subsequently, the spatial distribution of mafic monogenetic vents may provide clues to the subsurface structure of a volcanic field, such as the location of magma sources, preferential magma pathways, and flux distribution across the field. Gathering such data is of highly importance in improving the assessment of volcanic hazards. We are developing a modeling framework that compares output of statistical models of vent distribution with outputs form numerical models of subsurface magma transport. Geologic data observed at the Earth's surface are used to develop statistical models of spatial intensity (vents per unit area), volume intensity (erupted volume per unit area) and volume-flux intensity (erupted volume per unit time and area). Outputs are in the form of probability density functions assumed to represent volcanic flow output at the surface. These are then compared to outputs from conceptual models of the subsurface processes of magma storage and transport. These models are using Darcy's law
NASA Astrophysics Data System (ADS)
Guber, A.; Pachepsky, Y. A.; Yakirevich, A.; Gish, T. J.; Nicholson, T. J.; Cady, R.
2012-12-01
The objective of this study was to assess the importance of existing and new monitoring data for calibration of HYDRUS-3D model. A tracer experiment was carried out at USDA-ARS OPE3 field site. A pulse of KCL solution was applied on 13x14 m irrigation plot and the Cl concentrations were monitored for 131 days at three depths in12 observations wells installed within, and at distances of 7 m and 14 m from the irrigation plot. Distributions of soil material were obtained from soil cores and hydraulic properties for each material were estimated using the ROSETTA software. HYDRUS-3D model was manually calibrated on Cl time series measured in the observation wells. Local sensitivity analysis was conducted for the saturated hydraulic conductivity by varying values of Ksat at each soil material. The sensitivity indices (Si) were computed for 8 soil materials in 256 simulation runs. The importance of existing and new observation locations for the HYDRUS-3D calibration was evaluated using the Observation-Prediction (OPR) statistics. Averaged over time the Si values obtained in the HYDRUS-3D simulations were used to calculate the matrix of sensitivities for the OPR method, while the observation weights in the OPR were represented by the proportion of simulation days with nonzero Si values in the total number of simulated days. The results showed that the number of depths for Cl monitoring in the existing wells can be reduced to one. To reduce the uncertainty in Si values new observation wells should be installed in the zone where transition from relatively high to low concentrations occurs. The outcome of this study can provide the information for future data collection and monitoring efforts to improve reliability of 3D model calibrations.
NASA Astrophysics Data System (ADS)
Eduardo, S.; Wagner-Riddle, C.; Warland, J.
2009-05-01
Lagrangian dispersion methods have been used as an alternative to infer scalar source/sink distributions and fluxes inside plant canopies. Warland and Thurtell (2000) proposed a method (hereafter WT analysis) to relate source and concentration profiles of scalars within plant canopies using a 'dispersion matrix', which is calculated using turbulence statistics, represented by the Lagrangian time scale (TL) and the standard deviation of the vertical wind speed (ów) . The objective of this study was to assess different turbulence statistics parameterizations on the results net flux provided by WT analysis in a corn field for various atmospheric stability conditions. The WT analysis requires the specification of the turbulence statistics in advance, so parameterizations proposed by Raupach (1989) (TSR), Leuning (2000) (TSL), Denmead (2000) and Styles (2009) (TSDS) were used. The TL and ów profiles were corrected for stability conditions according to methodology proposed by Leuning (2000). The field experiment was carried out in a corn field during the field season in 2007 at the Elora Research Station, Elora, ON. Profiles of water vapour and CO2 concentrations were measured using a multiport sampling system connect to an infrared gas analyzer LI6262 (Li-Cor, Inc., Lincoln, NE, USA) at 6 heights inside and two heights above the canopy. The estimates of CO2 and latent heat fluxes, provided by the sum of source strength profiles from WT analysis, were compared with the measurements taken using an eddy covariance system set up at the same site. In addition a profile of leaf area index was obtained using an area meter LI-3100C (Li-Cor, Inc., Lincoln, NE, USA) to estimate some turbulence statistics. The concentration profiles of CO2 and H2O presented larger gradients close to the ground. During the growing season lower CO2 concentrations were observed at 1/2 of the canopy height during daytime, indicating the existence of a strong CO2 sink at this height. The measurements
Zhang, Miaomiao; Wells, William M; Golland, Polina
2016-10-01
Using image-based descriptors to investigate clinical hypotheses and therapeutic implications is challenging due to the notorious "curse of dimensionality" coupled with a small sample size. In this paper, we present a low-dimensional analysis of anatomical shape variability in the space of diffeomorphisms and demonstrate its benefits for clinical studies. To combat the high dimensionality of the deformation descriptors, we develop a probabilistic model of principal geodesic analysis in a bandlimited low-dimensional space that still captures the underlying variability of image data. We demonstrate the performance of our model on a set of 3D brain MRI scans from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. Our model yields a more compact representation of group variation at substantially lower computational cost than models based on the high-dimensional state-of-the-art approaches such as tangent space PCA (TPCA) and probabilistic principal geodesic analysis (PPGA).
Hall effect, edge states, and Haldane exclusion statistics in two-dimensional space
NASA Astrophysics Data System (ADS)
Ye, F.; Marchetti, P. A.; Su, Z. B.; Yu, L.
2015-12-01
We clarify the relation between two kinds of statistics for particle excitations in planar systems: the braid statistics of anyons and the Haldane exclusion statistics (HES). It is shown nonperturbatively that the HES exists for incompressible anyon liquid in the presence of a Hall response. We also study the statistical properties of a specific quantum anomalous Hall model with Chern-Simons term by perturbation in both compressible and incompressible regimes, where the crucial role of edge states to the HES is shown.
Wells, William M.; Golland, Polina
2017-01-01
Using image-based descriptors to investigate clinical hypotheses and therapeutic implications is challenging due to the notorious “curse of dimensionality” coupled with a small sample size. In this paper, we present a low-dimensional analysis of anatomical shape variability in the space of diffeomorphisms and demonstrate its benefits for clinical studies. To combat the high dimensionality of the deformation descriptors, we develop a probabilistic model of principal geodesic analysis in a bandlimited low-dimensional space that still captures the underlying variability of image data. We demonstrate the performance of our model on a set of 3D brain MRI scans from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. Our model yields a more compact representation of group variation at substantially lower computational cost than models based on the high-dimensional state-of-the-art approaches such as tangent space PCA (TPCA) and probabilistic principal geodesic analysis (PPGA). PMID:28664199
On the statistical properties of Klein polyhedra in three-dimensional lattices
Illarionov, A A
2013-06-30
We obtain asymptotic formulae for the average values of the number of faces of a fixed type and of vertices of Klein polyhedra of three-dimensional integer lattices with a given determinant. Bibliography: 20 titles.
NASA Technical Reports Server (NTRS)
Norris, Peter M.; Da Silva, Arlindo M.
2016-01-01
A method is presented to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation. The gridcolumn model includes assumed probability density function (PDF) intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used in the current study are Moderate Resolution Imaging Spectroradiometer (MODIS) cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. The current study uses a skewed-triangle distribution for layer moisture. The article also includes a discussion of the Metropolis and multiple-try Metropolis versions of MCMC.
Statistical Signal Models and Algorithms for Image Analysis
1984-10-25
In this report, two-dimensional stochastic linear models are used in developing algorithms for image analysis such as classification, segmentation, and object detection in images characterized by textured backgrounds. These models generate two-dimensional random processes as outputs to which statistical inference procedures can naturally be applied. A common thread throughout our algorithms is the interpretation of the inference procedures in terms of linear prediction
NASA Astrophysics Data System (ADS)
Verma, Sanjeet K.; Oliveira, Elson P.
2013-08-01
In present work, we applied two sets of new multi-dimensional geochemical diagrams (Verma et al., 2013) obtained from linear discriminant analysis (LDA) of natural logarithm-transformed ratios of major elements and immobile major and trace elements in acid magmas to decipher plate tectonic settings and corresponding probability estimates for Paleoproterozoic rocks from Amazonian craton, São Francisco craton, São Luís craton, and Borborema province of Brazil. The robustness of LDA minimizes the effects of petrogenetic processes and maximizes the separation among the different tectonic groups. The probability based boundaries further provide a better objective statistical method in comparison to the commonly used subjective method of determining the boundaries by eye judgment. The use of readjusted major element data to 100% on an anhydrous basis from SINCLAS computer program, also helps to minimize the effects of post-emplacement compositional changes and analytical errors on these tectonic discrimination diagrams. Fifteen case studies of acid suites highlighted the application of these diagrams and probability calculations. The first case study on Jamon and Musa granites, Carajás area (Central Amazonian Province, Amazonian craton) shows a collision setting (previously thought anorogenic). A collision setting was clearly inferred for Bom Jardim granite, Xingú area (Central Amazonian Province, Amazonian craton) The third case study on Older São Jorge, Younger São Jorge and Maloquinha granites Tapajós area (Ventuari-Tapajós Province, Amazonian craton) indicated a within-plate setting (previously transitional between volcanic arc and within-plate). We also recognized a within-plate setting for the next three case studies on Aripuanã and Teles Pires granites (SW Amazonian craton), and Pitinga area granites (Mapuera Suite, NW Amazonian craton), which were all previously suggested to have been emplaced in post-collision to within-plate settings. The seventh case
Miecznikowski, Jeffrey C; Damodaran, Senthilkumar; Sellers, Kimberly F; Rabin, Richard A
2010-12-15
Numerous gel-based softwares exist to detect protein changes potentially associated with disease. The data, however, are abundant with technical and structural complexities, making statistical analysis a difficult task. A particularly important topic is how the various softwares handle missing data. To date, no one has extensively studied the impact that interpolating missing data has on subsequent analysis of protein spots. This work highlights the existing algorithms for handling missing data in two-dimensional gel analysis and performs a thorough comparison of the various algorithms and statistical tests on simulated and real datasets. For imputation methods, the best results in terms of root mean squared error are obtained using the least squares method of imputation along with the expectation maximization (EM) algorithm approach to estimate missing values with an array covariance structure. The bootstrapped versions of the statistical tests offer the most liberal option for determining protein spot significance while the generalized family wise error rate (gFWER) should be considered for controlling the multiple testing error. In summary, we advocate for a three-step statistical analysis of two-dimensional gel electrophoresis (2-DE) data with a data imputation step, choice of statistical test, and lastly an error control method in light of multiple testing. When determining the choice of statistical test, it is worth considering whether the protein spots will be subjected to mass spectrometry. If this is the case a more liberal test such as the percentile-based bootstrap t can be employed. For error control in electrophoresis experiments, we advocate that gFWER be controlled for multiple testing rather than the false discovery rate.
Zhao, Xi; Dellandréa, Emmanuel; Chen, Liming; Kakadiaris, Ioannis A
2011-10-01
Three-dimensional face landmarking aims at automatically localizing facial landmarks and has a wide range of applications (e.g., face recognition, face tracking, and facial expression analysis). Existing methods assume neutral facial expressions and unoccluded faces. In this paper, we propose a general learning-based framework for reliable landmark localization on 3-D facial data under challenging conditions (i.e., facial expressions and occlusions). Our approach relies on a statistical model, called 3-D statistical facial feature model, which learns both the global variations in configurational relationships between landmarks and the local variations of texture and geometry around each landmark. Based on this model, we further propose an occlusion classifier and a fitting algorithm. Results from experiments on three publicly available 3-D face databases (FRGC, BU-3-DFE, and Bosphorus) demonstrate the effectiveness of our approach, in terms of landmarking accuracy and robustness, in the presence of expressions and occlusions.
Applying Clustering to Statistical Analysis of Student Reasoning about Two-Dimensional Kinematics
ERIC Educational Resources Information Center
Springuel, R. Padraic; Wittman, Michael C.; Thompson, John R.
2007-01-01
We use clustering, an analysis method not presently common to the physics education research community, to group and characterize student responses to written questions about two-dimensional kinematics. Previously, clustering has been used to analyze multiple-choice data; we analyze free-response data that includes both sketches of vectors and…
Spatial mapping and statistical reproducibility of an array of 256 one-dimensional quantum wires
NASA Astrophysics Data System (ADS)
Al-Taie, H.; Smith, L. W.; Lesage, A. A. J.; See, P.; Griffiths, J. P.; Beere, H. E.; Jones, G. A. C.; Ritchie, D. A.; Kelly, M. J.; Smith, C. G.
2015-08-01
We utilize a multiplexing architecture to measure the conductance properties of an array of 256 split gates. We investigate the reproducibility of the pinch off and one-dimensional definition voltage as a function of spatial location on two different cooldowns, and after illuminating the device. The reproducibility of both these properties on the two cooldowns is high, the result of the density of the two-dimensional electron gas returning to a similar state after thermal cycling. The spatial variation of the pinch-off voltage reduces after illumination; however, the variation of the one-dimensional definition voltage increases due to an anomalous feature in the center of the array. A technique which quantifies the homogeneity of split-gate properties across the array is developed which captures the experimentally observed trends. In addition, the one-dimensional definition voltage is used to probe the density of the wafer at each split gate in the array on a micron scale using a capacitive model.
Spatial mapping and statistical reproducibility of an array of 256 one-dimensional quantum wires
Al-Taie, H. Kelly, M. J.; Smith, L. W.; Lesage, A. A. J.; Griffiths, J. P.; Beere, H. E.; Jones, G. A. C.; Ritchie, D. A.; Smith, C. G.; See, P.
2015-08-21
We utilize a multiplexing architecture to measure the conductance properties of an array of 256 split gates. We investigate the reproducibility of the pinch off and one-dimensional definition voltage as a function of spatial location on two different cooldowns, and after illuminating the device. The reproducibility of both these properties on the two cooldowns is high, the result of the density of the two-dimensional electron gas returning to a similar state after thermal cycling. The spatial variation of the pinch-off voltage reduces after illumination; however, the variation of the one-dimensional definition voltage increases due to an anomalous feature in the center of the array. A technique which quantifies the homogeneity of split-gate properties across the array is developed which captures the experimentally observed trends. In addition, the one-dimensional definition voltage is used to probe the density of the wafer at each split gate in the array on a micron scale using a capacitive model.
Lange, Kenneth; Papp, Jeanette C.; Sinsheimer, Janet S.; Sobel, Eric M.
2014-01-01
Statistical genetics is undergoing the same transition to big data that all branches of applied statistics are experiencing. With the advent of inexpensive DNA sequencing, the transition is only accelerating. This brief review highlights some modern techniques with recent successes in statistical genetics. These include: (a) lasso penalized regression and association mapping, (b) ethnic admixture estimation, (c) matrix completion for genotype and sequence data, (d) the fused lasso and copy number variation, (e) haplotyping, (f) estimation of relatedness, (g) variance components models, and (h) rare variant testing. For more than a century, genetics has been both a driver and beneficiary of statistical theory and practice. This symbiotic relationship will persist for the foreseeable future. PMID:24955378
Thermodynamics of a one-dimensional ideal gas with fractional exclusion statistics
Murthy, M.V.N.; Shankar, R. )
1994-12-19
We show that the particles in the Calogero-Sutherland model obey fractional exclusion statistics as defined by Haldane. We construct anyon number densities and derive the energy distribution function. We show that the partition function factorizes in the form characteristic of an ideal gas. The virial expansion is exactly computable and interestingly it is only the second virial coefficient that encodes the statistics information.
Computationally efficient Bayesian inference for inverse problems.
Marzouk, Youssef M.; Najm, Habib N.; Rahn, Larry A.
2007-10-01
Bayesian statistics provides a foundation for inference from noisy and incomplete data, a natural mechanism for regularization in the form of prior information, and a quantitative assessment of uncertainty in the inferred results. Inverse problems - representing indirect estimation of model parameters, inputs, or structural components - can be fruitfully cast in this framework. Complex and computationally intensive forward models arising in physical applications, however, can render a Bayesian approach prohibitive. This difficulty is compounded by high-dimensional model spaces, as when the unknown is a spatiotemporal field. We present new algorithmic developments for Bayesian inference in this context, showing strong connections with the forward propagation of uncertainty. In particular, we introduce a stochastic spectral formulation that dramatically accelerates the Bayesian solution of inverse problems via rapid evaluation of a surrogate posterior. We also explore dimensionality reduction for the inference of spatiotemporal fields, using truncated spectral representations of Gaussian process priors. These new approaches are demonstrated on scalar transport problems arising in contaminant source inversion and in the inference of inhomogeneous material or transport properties. We also present a Bayesian framework for parameter estimation in stochastic models, where intrinsic stochasticity may be intermingled with observational noise. Evaluation of a likelihood function may not be analytically tractable in these cases, and thus several alternative Markov chain Monte Carlo (MCMC) schemes, operating on the product space of the observations and the parameters, are introduced.
Spatial statistics of magnetic field in two-dimensional chaotic flow in the resistive growth stage
NASA Astrophysics Data System (ADS)
Kolokolov, I. V.
2017-03-01
The correlation tensors of magnetic field in a two-dimensional chaotic flow of conducting fluid are studied. It is shown that there is a stage of resistive evolution where the field correlators grow exponentially with time. The two- and four-point field correlation tensors are computed explicitly in this stage in the framework of Batchelor-Kraichnan-Kazantsev model. They demonstrate strong temporal intermittency of the field fluctuations and high level of non-Gaussianity in spatial field distribution.
NASA Astrophysics Data System (ADS)
Rezanezhad, V.; Wang, L.; Holschneider, M.
2015-12-01
Co-seismic surface displacements are in general related with a spatial slip distribution on a fault surface by linear integral equations which parametric expansion of fault slip distribution by a finite number of known basis functions yields a set of observation equations expressed in a simple vector form. Generally in the geodetic inversion the source is parameterized with some fixed number of unknowns. This parametrization yield non-uniqueness of the results. The Bayesian procedure lets us to include a priori to abandon this problem. Nevertheless this fixed number of unknowns is not mostly considered as a parameter, which can be limited by data. Then the number of parameters can give us an idea about the complexity of the source. In order to do such an inversion we need to apply a trans-dimensional procedure in which the number of parameters is a parameter of the problem. Here we are going to apply the trans-dimensional approach in a Bayesian framework to invert the 2004 M6.0 Parkfield earthquake co-seismic offsets. The trans-dimensional approach has a upper limit for the number of parameters, which is limited by the number of observations. In other words after we cross this limit, regardless of the complexity of source we can not have bigger number of parameters in our inversion. In the case of geodetic static data which decays as 1/r2 , where r is the distance between the source and the observation point, one can apply a priori in order to have a tessellation which represent this resolution power. This means if the source is a complex one we can have more finer patches close to GPS sites in order to get a better resolved slip distribution. The results of synthetic tests and the 2004 M6.0 Parkfield earthquake show that the the data limits the number of parameters. The results of the 2004 M6.0 Parkfield earthquake indicates that this event has a homogeneous slip distribution. In our results, the hypocenter slip which is present in the inversions with fixed number
NASA Technical Reports Server (NTRS)
Bonavito, N. L.; Gordon, C. L.; Inguva, R.; Serafino, G. N.; Barnes, R. A.
1994-01-01
NASA's Mission to Planet Earth (MTPE) will address important interdisciplinary and environmental issues such as global warming, ozone depletion, deforestation, acid rain, and the like with its long term satellite observations of the Earth and with its comprehensive Data and Information System. Extensive sets of satellite observations supporting MTPE will be provided by the Earth Observing System (EOS), while more specific process related observations will be provided by smaller Earth Probes. MTPE will use data from ground and airborne scientific investigations to supplement and validate the global observations obtained from satellite imagery, while the EOS satellites will support interdisciplinary research and model development. This is important for understanding the processes that control the global environment and for improving the prediction of events. In this paper we illustrate the potential for powerful artificial intelligence (AI) techniques when used in the analysis of the formidable problems that exist in the NASA Earth Science programs and of those to be encountered in the future MTPE and EOS programs. These techniques, based on the logical and probabilistic reasoning aspects of plausible inference, strongly emphasize the synergetic relation between data and information. As such, they are ideally suited for the analysis of the massive data streams to be provided by both MTPE and EOS. To demonstrate this, we address both the satellite imagery and model enhancement issues for the problem of ozone profile retrieval through a method based on plausible scientific inferencing. Since in the retrieval problem, the atmospheric ozone profile that is consistent with a given set of measured radiances may not be unique, an optimum statistical method is used to estimate a 'best' profile solution from the radiances and from additional a priori information.
Melville, C A; Johnson, P C D; Smiley, E; Simpson, N; McConnachie, A; Purves, D; Osugo, M; Cooper, S-A
2016-01-01
Diagnosing mental ill-health using categorical classification systems has limited validity for clinical practice and research. Dimensions of psychopathology have greater validity than categorical diagnoses in the general population, but dimensional models have not had a significant impact on our understanding of mental ill-health and problem behaviours experienced by adults with intellectual disabilities. This paper systematically reviews the methods and findings from intellectual disabilities studies that use statistical methods to identify dimensions of psychopathology from data collected using structured assessments of psychopathology. The PRISMA framework for systematic review was used to identify studies for inclusion. Study methods were compared to best-practice guidelines on the use of exploratory factor analysis. Data from the 20 studies included suggest that it is possible to use statistical methods to model dimensions of psychopathology experienced by adults with intellectual disabilities. However, none of the studies used methods recommended for the analysis of non-continuous psychopathology data and all 20 studies used statistical methods that produce unstable results that lack reliability. Statistical modelling is a promising methodology to improve our understanding of mental ill-health experienced by adults with intellectual disabilities but future studies should use robust statistical methods to build on the existing evidence base. Copyright © 2016 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Hunziker, Jürg; Laloy, Eric; Linde, Niklas
2016-04-01
Deterministic inversion procedures can often explain field data, but they only deliver one final subsurface model that depends on the initial model and regularization constraints. This leads to poor insights about the uncertainties associated with the inferred model properties. In contrast, probabilistic inversions can provide an ensemble of model realizations that accurately span the range of possible models that honor the available calibration data and prior information allowing a quantitative description of model uncertainties. We reconsider the problem of inferring the dielectric permittivity (directly related to radar velocity) structure of the subsurface by inversion of first-arrival travel times from crosshole ground penetrating radar (GPR) measurements. We rely on the DREAM_(ZS) algorithm that is a state-of-the-art Markov chain Monte Carlo (MCMC) algorithm. Such algorithms need several orders of magnitude more forward simulations than deterministic algorithms and often become infeasible in high parameter dimensions. To enable high-resolution imaging with MCMC, we use a recently proposed dimensionality reduction approach that allows reproducing 2D multi-Gaussian fields with far fewer parameters than a classical grid discretization. We consider herein a dimensionality reduction from 5000 to 257 unknowns. The first 250 parameters correspond to a spectral representation of random and uncorrelated spatial fluctuations while the remaining seven geostatistical parameters are (1) the standard deviation of the data error, (2) the mean and (3) the variance of the relative electric permittivity, (4) the integral scale along the major axis of anisotropy, (5) the anisotropy angle, (6) the ratio of the integral scale along the minor axis of anisotropy to the integral scale along the major axis of anisotropy and (7) the shape parameter of the Matérn function. The latter essentially defines the type of covariance function (e.g., exponential, Whittle, Gaussian). We present
Statistical Analysis of Current Sheets in Three-dimensional Magnetohydrodynamic Turbulence
NASA Astrophysics Data System (ADS)
Zhdankin, Vladimir; Uzdensky, Dmitri A.; Perez, Jean C.; Boldyrev, Stanislav
2013-07-01
We develop a framework for studying the statistical properties of current sheets in numerical simulations of magnetohydrodynamic (MHD) turbulence with a strong guide field, as modeled by reduced MHD. We describe an algorithm that identifies current sheets in a simulation snapshot and then determines their geometrical properties (including length, width, and thickness) and intensities (peak current density and total energy dissipation rate). We then apply this procedure to simulations of reduced MHD and perform a statistical analysis on the obtained population of current sheets. We evaluate the role of reconnection by separately studying the populations of current sheets which contain magnetic X-points and those which do not. We find that the statistical properties of the two populations are different in general. We compare the scaling of these properties to phenomenological predictions obtained for the inertial range of MHD turbulence. Finally, we test whether the reconnecting current sheets are consistent with the Sweet-Parker model.
STATISTICAL ANALYSIS OF CURRENT SHEETS IN THREE-DIMENSIONAL MAGNETOHYDRODYNAMIC TURBULENCE
Zhdankin, Vladimir; Boldyrev, Stanislav; Uzdensky, Dmitri A.; Perez, Jean C. E-mail: boldyrev@wisc.edu E-mail: jcperez@wisc.edu
2013-07-10
We develop a framework for studying the statistical properties of current sheets in numerical simulations of magnetohydrodynamic (MHD) turbulence with a strong guide field, as modeled by reduced MHD. We describe an algorithm that identifies current sheets in a simulation snapshot and then determines their geometrical properties (including length, width, and thickness) and intensities (peak current density and total energy dissipation rate). We then apply this procedure to simulations of reduced MHD and perform a statistical analysis on the obtained population of current sheets. We evaluate the role of reconnection by separately studying the populations of current sheets which contain magnetic X-points and those which do not. We find that the statistical properties of the two populations are different in general. We compare the scaling of these properties to phenomenological predictions obtained for the inertial range of MHD turbulence. Finally, we test whether the reconnecting current sheets are consistent with the Sweet-Parker model.
NASA Astrophysics Data System (ADS)
Cai, Juntao; Chen, Xiaobin; Xu, Xiwei; Tang, Ji; Wang, Lifeng; Guo, Chunling; Han, Bing; Dong, Zeyi
2017-02-01
A three-dimensional (3-D) resistivity model around the 2014 Ms6.5 Ludian earthquake was obtained. The model shows that the aftershocks were mainly distributed in a shallow inverse L-shaped conductive angular region surrounded by resistive structures. The presences of this shallow conductive zone may be the key factor leading to the severe damage and surface rupture of the Ludian earthquake. A northwest trending local resistive belt along the Baogunao-Xiaohe fault interrupts the northeast trending conductive zone at the Zhaotong-Lianfeng fault zone in the middle crust, which may be the seismogenic structure of the main shock. Based on the 3-D electrical model, combining with GPS, thermal structure, and seismic survey results, a geodynamic model is proposed to interpret the seismotectonics, deep seismogenic background, and deformation characterized by a sinistral strike slip with a tensile component of the Ludian earthquake.
NASA Astrophysics Data System (ADS)
Yamawaki, Teruo; Tanaka, Satoru; Ueki, Sadato; Hamaguchi, Hiroyuki; Nakamichi, Haruhisa; Nishimura, Takeshi; Oikawa, Jun; Tsutsui, Tomoki; Nishi, Kiyoshi; Shimizu, Hiroshi; Yamaguchi, Sosuke; Miyamachi, Hiroki; Yamasato, Hitoshi; Hayashi, Yutaka
2004-12-01
The three-dimensional P-wave velocity structure of the Bandai volcano has been revealed by tomographic inversion using approximately 2200 travel-time data collected during an active seismic survey comprising 298 temporary seismic stations and eight artificial shots. The key result of this study is the delineation of a high-velocity anomaly (Vp>4.6 km/s at sea-level) immediately below the summit peak. This feature extends to depths of 1-2 km below sea-level. The near-surface horizontal position of the high-velocity anomaly coincides well with that of a positive Bouguer gravity anomaly. Geological data demonstrate that sector collapses have occurred in all directions from the summit and that the summit crater has been repeatedly refilled with magmatic material. These observations suggest that the high-velocity region revealed in this study is a manifestation of an almost-solidified magmatic plumbing system. We have also noted that a near-surface low-velocity region (Vp<3.0 km/s at sea-level) on the southern foot of the volcano corresponds to the position of volcanic sediments including ash and debris avalanche material. In addition, we have made use of the tomographic results to recompute the hypocenters of earthquake occurring during seismic swarms beneath the summit in 1988 and 2000. Relocating the earthquakes using the three-dimensional velocity model clearly indicates that they predominantly occurred on two steeply dipping planes. Low-frequency earthquakes observed during the swarms in 2000 occurred in the seismic gap between the two clusters. The hypocentral regions of the seismic swarms and the low-frequency earthquakes are close to the higher-velocity zone beneath the volcano's summit. These observations suggest that the recent seismic activity beneath the summit is likely associated with thermal energy being released within the solidifying magmatic plumbing system.
Heat balance statistics derived from four-dimensional assimilations with a global circulation model
NASA Technical Reports Server (NTRS)
Schubert, S. D.; Herman, G. F.
1981-01-01
The reported investigation was conducted to develop a reliable procedure for obtaining the diabatic and vertical terms required for atmospheric heat balance studies. The method developed employs a four-dimensional assimilation mode in connection with the general circulation model of NASA's Goddard Laboratory for Atmospheric Sciences. The initial analysis was conducted with data obtained in connection with the 1976 Data Systems Test. On the basis of the results of the investigation, it appears possible to use the model's observationally constrained diagnostics to provide estimates of the global distribution of virtually all of the quantities which are needed to compute the atmosphere's heat and energy balance.
Singleton, J.; Harrison, N.; Mielke, C. H.; Schlueter, J. A.; Materials Science Division; LANL; Univ. of Oxford
2001-11-05
Although quasi-two-dimensional organic superconductors such as {kappa}-(BEDT-TTF){sub 2}Cu(NCS){sub 2} (BEDT-TTF{triple_bond}bis(ethylene-dithio)tetrathiafulvalene) seem to be very clean systems, with apparent quasiparticle mean free paths of several thousand angstroms, the superconducting transition is intrinsically broad (e.g. {approx}1 K wide for {Tc}{approx}10 K). We propose that this is due to the extreme anisotropy of these materials, which greatly exacerbates the statistical effects of spatial variations in the potential experienced by the quasiparticles. Using a statistical model, we are able to account for the experimental observations. A parameter {bar x}, which characterizes the spatial potential variations, may be derived from Shubnikov-de Haas oscillation experiments. Using this value, we are able to predict a transition width which is in good agreement with that observed in megahertz penetration-depth measurements on the same sample.
Chavanis, P H
1998-12-30
The statistical mechanics of two-dimensional vortices and stellar systems both at equilibrium and out of equilibrium are discussed, with emphasis on the analogies (and on the differences) between these two systems. Limitations of statistical theory and problems posed by the long-range nature of the interactions are described in detail. Special attention is devoted to the problem of "incomplete relaxation" and, in the case of stellar systems, to the "gravothermal catastrophe." The relaxation toward equilibrium, possibly restricted to a "maximum entropy bubble," is described with the aid of a maximum entropy production principle (MEPP). The relation with Fokker-Planck equations is made explicit and the structure of the diffusion current analyzed in terms of a pure diffusion compensated by an appropriate friction or a drift.
Zenil, Hector; Kiani, Narsis A.; Ball, Gordon; Gomez-Cabrero, David
2016-01-01
Systems in nature capable of collective behaviour are nonlinear, operating across several scales. Yet our ability to account for their collective dynamics differs in physics, chemistry and biology. Here, we briefly review the similarities and differences between mathematical modelling of adaptive living systems versus physico-chemical systems. We find that physics-based chemistry modelling and computational neuroscience have a shared interest in developing techniques for model reductions aiming at the identification of a reduced subsystem or slow manifold, capturing the effective dynamics. By contrast, as relations and kinetics between biological molecules are less characterized, current quantitative analysis under the umbrella of bioinformatics focuses on signal extraction, correlation, regression and machine-learning analysis. We argue that model reduction analysis and the ensuing identification of manifolds bridges physics and biology. Furthermore, modelling living systems presents deep challenges as how to reconcile rich molecular data with inherent modelling uncertainties (formalism, variables selection and model parameters). We anticipate a new generative data-driven modelling paradigm constrained by identified governing principles extracted from low-dimensional manifold analysis. The rise of a new generation of models will ultimately connect biology to quantitative mechanistic descriptions, thereby setting the stage for investigating the character of the model language and principles driving living systems. This article is part of the themed issue ‘Multiscale modelling at the physics–chemistry–biology interface’. PMID:27698038
Tegnér, Jesper; Zenil, Hector; Kiani, Narsis A; Ball, Gordon; Gomez-Cabrero, David
2016-11-13
Systems in nature capable of collective behaviour are nonlinear, operating across several scales. Yet our ability to account for their collective dynamics differs in physics, chemistry and biology. Here, we briefly review the similarities and differences between mathematical modelling of adaptive living systems versus physico-chemical systems. We find that physics-based chemistry modelling and computational neuroscience have a shared interest in developing techniques for model reductions aiming at the identification of a reduced subsystem or slow manifold, capturing the effective dynamics. By contrast, as relations and kinetics between biological molecules are less characterized, current quantitative analysis under the umbrella of bioinformatics focuses on signal extraction, correlation, regression and machine-learning analysis. We argue that model reduction analysis and the ensuing identification of manifolds bridges physics and biology. Furthermore, modelling living systems presents deep challenges as how to reconcile rich molecular data with inherent modelling uncertainties (formalism, variables selection and model parameters). We anticipate a new generative data-driven modelling paradigm constrained by identified governing principles extracted from low-dimensional manifold analysis. The rise of a new generation of models will ultimately connect biology to quantitative mechanistic descriptions, thereby setting the stage for investigating the character of the model language and principles driving living systems.This article is part of the themed issue 'Multiscale modelling at the physics-chemistry-biology interface'.
Boundary dynamics and the statistical mechanics of the 2 + 1-dimensional black hole
NASA Astrophysics Data System (ADS)
Bañados, Máximo; Brotz, Thorsten; Ortiz, Miguel E.
1999-04-01
We calculate the density of states of the 2 + 1-dimensional BTZ black hole in the micro-and grand-canonical ensembles. Our starting point is the relation between 2 + 1-dimensional quantum gravity and quantised Chern-Simons theory. In the micro-canonical ensemble, we find the Bekenstein-Hawking entropy by relating a Kac-Moody algebra of global gauge charges to a Virasoro algebra with a classical central charge via a twisted Sugawara construction. This construction is valid at all values of the black hole radius. At infinity it gives the asymptotic isometries of the black hole, and at the horizon it gives an explicit form for a set of deformations of the horizon whose algebra is the same Virasoro algebra. In the grand-canonical ensemble we define the partition function by using a surface term at infinity that is compatible with fixing the temperature and angular velocity of the black hole. We then compute the partition function directly in a boundary Wess-Zumino-Witten theory, and find that we obtain the correct result only after we include a source term at the horizon that induces a non-trivial spin-structure on the WZW partition function.
Statistical Mechanics of the Geometric Control of Flow Topology in Two-Dimensional Turbulence
NASA Astrophysics Data System (ADS)
Nadiga, Balasubramanya; Loxley, Peter
2013-04-01
We apply the principle of maximum entropy to two dimensional turbulence in a new fashion to predict the effect of geometry on flow topology. We consider two prototypical regimes of turbulence that lead to frequently observed self-organized coherent structures. Our theory predicts bistable behavior that exhibits hysteresis and large abrupt changes in flow topology in one regime; the other regime is predicted to exhibit monstable behavior with a continuous change of flow topology. The predictions are confirmed in fully nonlinear numerical simulations of the two-dimensional Navier-Stokes equation. These results suggest an explanation of the low frequency regime transitions that have been observed in the non-equilibrium setting of this problem. Following further development in the non-equilibrium context, we expect that insights developed in this problem should be useful in developing a better understanding of the phenomenon of low frequency regime transitions that is a pervasive feature of the weather and climate systems. Familiar occurrences of this phenomenon---wherein extreme and abrupt qualitative changes occur, seemingly randomly, after very long periods of apparent stability---include blocking in the extra-tropical winter atmosphere, the bimodality of the Kuroshio extension system, the Dansgaard-Oeschger events, and the glacial-interglacial transitions.
2017-01-01
A major purpose of exploratory metabolic profiling is for the identification of molecular species that are statistically associated with specific biological or medical outcomes; unfortunately, the structure elucidation process of unknowns is often a major bottleneck in this process. We present here new holistic strategies that combine different statistical spectroscopic and analytical techniques to improve and simplify the process of metabolite identification. We exemplify these strategies using study data collected as part of a dietary intervention to improve health and which elicits a relatively subtle suite of changes from complex molecular profiles. We identify three new dietary biomarkers related to the consumption of peas (N-methyl nicotinic acid), apples (rhamnitol), and onions (N-acetyl-S-(1Z)-propenyl-cysteine-sulfoxide) that can be used to enhance dietary assessment and assess adherence to diet. As part of the strategy, we introduce a new probabilistic statistical spectroscopy tool, RED-STORM (Resolution EnhanceD SubseT Optimization by Reference Matching), that uses 2D J-resolved 1H NMR spectra for enhanced information recovery using the Bayesian paradigm to extract a subset of spectra with similar spectral signatures to a reference. RED-STORM provided new information for subsequent experiments (e.g., 2D-NMR spectroscopy, solid-phase extraction, liquid chromatography prefaced mass spectrometry) used to ultimately identify an unknown compound. In summary, we illustrate the benefit of acquiring J-resolved experiments alongside conventional 1D 1H NMR as part of routine metabolic profiling in large data sets and show that application of complementary statistical and analytical techniques for the identification of unknown metabolites can be used to save valuable time and resources. PMID:28240543
Statistical significance of variables driving systematic variation in high-dimensional data
Chung, Neo Christopher; Storey, John D.
2015-01-01
Motivation: There are a number of well-established methods such as principal component analysis (PCA) for automatically capturing systematic variation due to latent variables in large-scale genomic data. PCA and related methods may directly provide a quantitative characterization of a complex biological variable that is otherwise difficult to precisely define or model. An unsolved problem in this context is how to systematically identify the genomic variables that are drivers of systematic variation captured by PCA. Principal components (PCs) (and other estimates of systematic variation) are directly constructed from the genomic variables themselves, making measures of statistical significance artificially inflated when using conventional methods due to over-fitting. Results: We introduce a new approach called the jackstraw that allows one to accurately identify genomic variables that are statistically significantly associated with any subset or linear combination of PCs. The proposed method can greatly simplify complex significance testing problems encountered in genomics and can be used to identify the genomic variables significantly associated with latent variables. Using simulation, we demonstrate that our method attains accurate measures of statistical significance over a range of relevant scenarios. We consider yeast cell-cycle gene expression data, and show that the proposed method can be used to straightforwardly identify genes that are cell-cycle regulated with an accurate measure of statistical significance. We also analyze gene expression data from post-trauma patients, allowing the gene expression data to provide a molecularly driven phenotype. Using our method, we find a greater enrichment for inflammatory-related gene sets compared to the original analysis that uses a clinically defined, although likely imprecise, phenotype. The proposed method provides a useful bridge between large-scale quantifications of systematic variation and gene
Brands, H; Maassen, S R; Clercx, H J
1999-09-01
In this paper the applicability of a statistical-mechanical theory to freely decaying two-dimensional (2D) turbulence on a bounded domain is investigated. We consider an ensemble of direct numerical simulations in a square box with stress-free boundaries, with a Reynolds number that is of the same order as in experiments on 2D decaying Navier-Stokes turbulence. The results of these simulations are compared with the corresponding statistical equilibria, calculated from different stages of the evolution. It is shown that the statistical equilibria calculated from early times of the Navier-Stokes evolution do not correspond to the dynamical quasistationary states. At best, the global topological structure is correctly predicted from a relatively late time in the Navier-Stokes evolution, when the quasistationary state has almost been reached. This failure of the (basically inviscid) statistical-mechanical theory is related to viscous dissipation and net leakage of vorticity in the Navier-Stokes dynamics at moderate values of the Reynolds number.
NASA Astrophysics Data System (ADS)
Qi, Di; Majda, Andrew J.
2017-03-01
A low-dimensional reduced-order statistical closure model is developed for quantifying the uncertainty to changes in forcing in a barotropic turbulent system with topography involving interactions between small-scale motions and a large-scale mean flow. Imperfect model sensitivity is improved through a recent mathematical strategy for calibrating model errors in a training phase, where information theory and linear statistical response theory are combined in a systematic fashion to achieve the optimal model parameters. Statistical theories about a Gaussian invariant measure and the exact statistical energy equations are also developed for the truncated barotropic equations that can be used to improve the imperfect model prediction skill. A stringent paradigm model of 57 degrees of freedom is used to display the feasibility of the reduced-order methods. This simple model creates large-scale zonal mean flow shifting directions from westward to eastward jets with an abrupt change in amplitude when perturbations are applied, and prototype blocked and unblocked patterns can be generated in this simple model similar to the real natural system. Principal statistical responses in mean and variance can be captured by the reduced-order models with desirable accuracy and efficiency with only 3 resolved modes. An even more challenging regime with non-Gaussian equilibrium statistics using the fluctuation equations is also tested in the reduced-order models with accurate prediction using the first 5 resolved modes. These reduced-order models also show potential for uncertainty quantification and prediction in more complex realistic geophysical turbulent dynamical systems.
Moon, Inkyu; Javidi, Bahram; Yi, Faliu; Boss, Daniel; Marquet, Pierre
2012-04-23
In this paper, we present an automated approach to quantify information about three-dimensional (3D) morphology, hemoglobin content and density of mature red blood cells (RBCs) using off-axis digital holographic microscopy (DHM) and statistical algorithms. The digital hologram of RBCs is recorded by a CCD camera using an off-axis interferometry setup and quantitative phase images of RBCs are obtained by a numerical reconstruction algorithm. In order to remove unnecessary parts and obtain clear targets in the reconstructed phase image with many RBCs, the marker-controlled watershed segmentation algorithm is applied to the phase image. Each RBC in the segmented phase image is three-dimensionally investigated. Characteristic properties such as projected cell surface, average phase, sphericity coefficient, mean corpuscular hemoglobin (MCH) and MCH surface density of each RBC is quantitatively measured. We experimentally demonstrate that joint statistical distributions of the characteristic parameters of RBCs can be obtained by our algorithm and efficiently used as a feature pattern to discriminate between RBC populations that differ in shape and hemoglobin content. Our study opens the possibility of automated RBC quantitative analysis suitable for the rapid classification of a large number of RBCs from an individual blood specimen, which is a fundamental step to develop a diagnostic approach based on DHM. © 2012 Optical Society of America
NASA Astrophysics Data System (ADS)
Jameson, A. R.; Larsen, M. L.
2016-06-01
Microphysical understanding of the variability in rain requires a statistical characterization of different drop sizes both in time and in all dimensions of space. Temporally, there have been several statistical characterizations of raindrop counts. However, temporal and spatial structures are neither equivalent nor readily translatable. While there are recent reports of the one-dimensional spatial correlation functions in rain, they can only be assumed to represent the two-dimensional (2D) correlation function under the assumption of spatial isotropy. To date, however, there are no actual observations of the (2D) spatial correlation function in rain over areas. Two reasons for this deficiency are the fiscal and the physical impossibilities of assembling a dense network of instruments over even hundreds of meters much less over kilometers. Consequently, all measurements over areas will necessarily be sparsely sampled. A dense network of data must then be estimated using interpolations from the available observations. In this work, a network of 19 optical disdrometers over a 100 m by 71 m area yield observations of drop spectra every minute. These are then interpolated to a 1 m resolution grid. Fourier techniques then yield estimates of the 2D spatial correlation functions. Preliminary examples using this technique found that steadier, light rain decorrelates spatially faster than does the convective rain, but in both cases the 2D spatial correlation functions are anisotropic, reflecting an asymmetry in the physical processes influencing the rain reaching the ground not accounted for in numerical microphysical models.
NASA Astrophysics Data System (ADS)
Lee, Kean Loon; Grémaud, Benoît; Miniatura, Christian
2014-10-01
As recently discovered [T. Karpiuk et al., Phys. Rev. Lett. 109, 190601 (2012), 10.1103/PhysRevLett.109.190601], Anderson localization in a bulk disordered system triggers the emergence of a coherent forward scattering (CFS) peak in momentum space, which twins the well-known coherent backscattering (CBS) peak observed in weak localization experiments. Going beyond the perturbative regime, we address here the long-time dynamics of the CFS peak in a one-dimensional random system and we relate this novel interference effect to the statistical properties of the eigenfunctions and eigenspectrum of the corresponding random Hamiltonian. Our numerical results show that the dynamics of the CFS peak is governed by the logarithmic level repulsion between localized states, with a time scale that is, with good accuracy, twice the Heisenberg time. This is in perfect agreement with recent findings based on the nonlinear sigma model. In the stationary regime, the width of the CFS peak in momentum space is inversely proportional to the localization length, reflecting the exponential decay of the eigenfunctions in real space, while its height is exactly twice the background, reflecting the Poisson statistical properties of the eigenfunctions. It would be interesting to extend our results to higher dimensional systems and other symmetry classes.
NASA Astrophysics Data System (ADS)
Derrida, Bernard; Meerson, Baruch; Sasorov, Pavel V.
2016-04-01
Consider a one-dimensional branching Brownian motion and rescale the coordinate and time so that the rates of branching and diffusion are both equal to 1. If X1(t ) is the position of the rightmost particle of the branching Brownian motion at time t , the empirical velocity c of this rightmost particle is defined as c =X1(t ) /t . Using the Fisher-Kolmogorov-Petrovsky-Piscounov equation, we evaluate the probability distribution P (c ,t ) of this empirical velocity c in the long-time t limit for c >2 . It is already known that, for a single seed particle, P (c ,t ) ˜exp[-(c2/4 -1 ) t ] up to a prefactor that can depend on c and t . Here we show how to determine this prefactor. The result can be easily generalized to the case of multiple seed particles and to branching random walks associated with other traveling-wave equations.
Derrida, Bernard; Meerson, Baruch; Sasorov, Pavel V
2016-04-01
Consider a one-dimensional branching Brownian motion and rescale the coordinate and time so that the rates of branching and diffusion are both equal to 1. If X_{1}(t) is the position of the rightmost particle of the branching Brownian motion at time t, the empirical velocity c of this rightmost particle is defined as c=X_{1}(t)/t. Using the Fisher-Kolmogorov-Petrovsky-Piscounov equation, we evaluate the probability distribution P(c,t) of this empirical velocity c in the long-time t limit for c>2. It is already known that, for a single seed particle, P(c,t)∼exp[-(c^{2}/4-1)t] up to a prefactor that can depend on c and t. Here we show how to determine this prefactor. The result can be easily generalized to the case of multiple seed particles and to branching random walks associated with other traveling-wave equations.
Statistical theory of reversals in two-dimensional confined turbulent flows
NASA Astrophysics Data System (ADS)
Shukla, Vishwanath; Fauve, Stephan; Brachet, Marc
2016-12-01
It is shown that the truncated Euler equation (TEE), i.e., a finite set of ordinary differential equations for the amplitude of the large-scale modes, can correctly describe the complex transitional dynamics that occur within the turbulent regime of a confined two-dimensional flow obeying Navier-Stokes equation (NSE) with bottom friction and a spatially periodic forcing. The random reversals of the NSE large-scale circulation on the turbulent background involve bifurcations of the probability distribution function of the large-scale circulation. We demonstrate that these NSE bifurcations are described by the related TEE microcanonical distribution which displays transitions from Gaussian to bimodal and broken ergodicity. A minimal 13-mode model reproduces these results.
Vorticity statistics in the direct cascade of two-dimensional turbulence.
Falkovich, Gregory; Lebedev, Vladimir
2011-04-01
For the direct cascade of steady two-dimensional (2D) Navier-Stokes turbulence, we derive analytically the probability of strong vorticity fluctuations. When ϖ is the vorticity coarse-grained over a scale R, the probability density function (PDF), P(ϖ), has a universal asymptotic behavior lnP~-ϖ/ϖ(rms) at ϖ≫ϖ(rms)=[Hln(L/R)](1/3), where H is the enstrophy flux and L is the pumping length. Therefore, the PDF has exponential tails and is self-similar, that is, it can be presented as a function of a single argument, ϖ/ϖ(rms), in distinction from other known direct cascades.
Statistical theory of reversals in two-dimensional confined turbulent flows.
Shukla, Vishwanath; Fauve, Stephan; Brachet, Marc
2016-12-01
It is shown that the truncated Euler equation (TEE), i.e., a finite set of ordinary differential equations for the amplitude of the large-scale modes, can correctly describe the complex transitional dynamics that occur within the turbulent regime of a confined two-dimensional flow obeying Navier-Stokes equation (NSE) with bottom friction and a spatially periodic forcing. The random reversals of the NSE large-scale circulation on the turbulent background involve bifurcations of the probability distribution function of the large-scale circulation. We demonstrate that these NSE bifurcations are described by the related TEE microcanonical distribution which displays transitions from Gaussian to bimodal and broken ergodicity. A minimal 13-mode model reproduces these results.
Three-dimensional segmentation of the heart muscle using image statistics
NASA Astrophysics Data System (ADS)
Nillesen, Maartje M.; Lopata, Richard G. P.; Gerrits, Inge H.; Kapusta, Livia; Huisman, Henkjan H.; Thijssen, Johan M.; de Korte, Chris L.
2006-03-01
Segmentation of the heart muscle in 3D echocardiographic images provides a tool for visualization of cardiac anatomy and assessment of heart function, and serves as an important pre-processing step for cardiac strain imaging. By incorporating spatial and temporal information of 3D ultrasound image sequences (4D), a fully automated method using image statistics was developed to perform 3D segmentation of the heart muscle. 3D rf-data were acquired with a Philips SONOS 7500 live 3D ultrasound system, and an X4 matrix array transducer (2-4 MHz). Left ventricular images of five healthy children were taken in transthoracial short/long axis view. As a first step, image statistics of blood and heart muscle were investigated. Next, based on these statistics, an adaptive mean squares filter was selected and applied to the images. Window size was related to speckle size (5x2 speckles). The degree of adaptive filtering was automatically steered by the local homogeneity of tissue. As a result, discrimination of heart muscle and blood was optimized, while sharpness of edges was preserved. After this pre-processing stage, homomorphic filtering and automatic thresholding were performed to obtain the inner borders of the heart muscle. Finally, a deformable contour algorithm was used to yield a closed contour of the left ventricular cavity in each elevational plane. Each contour was optimized using contours of the surrounding planes (spatial and temporal) as limiting condition to ensure spatial and temporal continuity. Better segmentation of the ventricle was obtained using 4D information than using information of each plane separately.
NASA Technical Reports Server (NTRS)
Balkanski, Yves J.; Jacob, Daniel J.; Gardner, Geraldine M.; Graustein, William C.; Turekian, Karl K.
1993-01-01
A global three-dimensional model is used to investigate the transport and tropospheric residence time of Pb-210, an aerosol tracer produced in the atmosphere by radioactive decay of Rn-222 emitted from soils. The model uses meteorological input with 4 deg x 5 deg horizontal resolution and 4-hour temporal resolution from the Goddard Institute for Space Studies general circulation model (GCM). It computes aerosol scavenging by convective precipitation as part of the wet convective mass transport operator in order to capture the coupling between vertical transport and rainout. Scavenging in convective precipitation accounts for 74% of the global Pb-210 sink in the model; scavenging in large-scale precipitation accounts for 12%, and scavenging in dry deposition accounts for 14%. The model captures 63% of the variance of yearly mean Pb-210 concentrations measured at 85 sites around the world with negligible mean bias, lending support to the computation of aerosol scavenging. There are, however, a number of regional and seasonal discrepancies that reflect in part anomalies in GCM precipitation. Computed residence times with respect to deposition for Pb-210 aerosol in the tropospheric column are about 5 days at southern midlatitudes and 10-15 days in the tropics; values at northern midlatitudes vary from about 5 days in winter to 10 days in summer. The residence time of Pb-210 produced in the lowest 0.5 km of atmosphere is on average four times shorter than that of Pb-210 produced in the upper atmosphere. Both model and observations indicate a weaker decrease of Pb-210 concentrations between the continental mixed layer and the free troposphere than is observed for total aerosol concentrations; an explanation is that Rn-222 is transported to high altitudes in wet convective updrafts, while aerosols and soluble precursors of aerosols are scavenged by precipitation in the updrafts. Thus Pb-210 is not simply a tracer of aerosols produced in the continental boundary layer, but
NASA Technical Reports Server (NTRS)
Balkanski, Yves J.; Jacob, Daniel J.; Gardner, Geraldine M.; Graustein, William C.; Turekian, Karl K.
1993-01-01
A global three-dimensional model is used to investigate the transport and tropospheric residence time of Pb-210, an aerosol tracer produced in the atmosphere by radioactive decay of Rn-222 emitted from soils. The model uses meteorological input with 4 deg x 5 deg horizontal resolution and 4-hour temporal resolution from the Goddard Institute for Space Studies general circulation model (GCM). It computes aerosol scavenging by convective precipitation as part of the wet convective mass transport operator in order to capture the coupling between vertical transport and rainout. Scavenging in convective precipitation accounts for 74% of the global Pb-210 sink in the model; scavenging in large-scale precipitation accounts for 12%, and scavenging in dry deposition accounts for 14%. The model captures 63% of the variance of yearly mean Pb-210 concentrations measured at 85 sites around the world with negligible mean bias, lending support to the computation of aerosol scavenging. There are, however, a number of regional and seasonal discrepancies that reflect in part anomalies in GCM precipitation. Computed residence times with respect to deposition for Pb-210 aerosol in the tropospheric column are about 5 days at southern midlatitudes and 10-15 days in the tropics; values at northern midlatitudes vary from about 5 days in winter to 10 days in summer. The residence time of Pb-210 produced in the lowest 0.5 km of atmosphere is on average four times shorter than that of Pb-210 produced in the upper atmosphere. Both model and observations indicate a weaker decrease of Pb-210 concentrations between the continental mixed layer and the free troposphere than is observed for total aerosol concentrations; an explanation is that Rn-222 is transported to high altitudes in wet convective updrafts, while aerosols and soluble precursors of aerosols are scavenged by precipitation in the updrafts. Thus Pb-210 is not simply a tracer of aerosols produced in the continental boundary layer, but
NASA Astrophysics Data System (ADS)
Heimbach, P.; Bugnion, V.
2008-12-01
We present a new and original approach to understanding the sensitivity of the Greenland ice sheet to key model parameters and environmental conditions. At the heart of this approach is the use of an adjoint ice sheet model. MacAyeal (1992) introduced adjoints in the context of applying control theory to estimate basal sliding parameters (basal shear stress, basal friction) of an ice stream model which minimize a least-squares model vs. observation misfit. Since then, this method has become widespread to fit ice stream models to the increasing number and diversity of satellite observations, and to estimate uncertain model parameters. However, no attempt has been made to extend this method to comprehensive ice sheet models. Here, we present a first step toward moving beyond limiting the use of control theory to ice stream models. We have generated an adjoint of the three-dimensional thermo-mechanical ice sheet model SICOPOLIS of Greve (1997). The adjoint was generated using the automatic differentiation (AD) tool TAF. TAF generates exact source code representing the tangent linear and adjoint model of the parent model provided. Model sensitivities are given by the partial derivatives of a scalar-valued model diagnostic or "cost function" with respect to the controls, and can be efficiently calculated via the adjoint. An effort to generate an efficient adjoint with the newly developed open-source AD tool OpenAD is also under way. To gain insight into the adjoint solutions, we explore various cost functions, such as local and domain-integrated ice temperature, total ice volume or the velocity of ice at the margins of the ice sheet. Elements of our control space include initial cold ice temperatures, surface mass balance, as well as parameters such as appear in Glen's flow law, or in the surface degree-day or basal sliding parameterizations. Sensitivity maps provide a comprehensive view, and allow a quantification of where and to which variables the ice sheet model is
Emergent exclusion statistics of quasiparticles in two-dimensional topological phases
NASA Astrophysics Data System (ADS)
Hu, Yuting; Stirling, Spencer D.; Wu, Yong-Shi
2014-03-01
We demonstrate how the generalized Pauli exclusion principle emerges for quasiparticle excitations in 2D topological phases. As an example, we examine the Levin-Wen model with the Fibonacci data (specified in the text), and construct the number operator for fluxons living on plaquettes. By numerically counting the many-body states with fluxon number fixed, the matrix of exclusion statistics parameters is identified and is shown to depend on the spatial topology (sphere or torus) of the system. Our work reveals the structure of the (many-body) Hilbert space and some general features of thermodynamics for quasiparticle excitations in topological matter.
Statistical properties of three-dimensional two-fluid plasma model
Qaisrani, M. Hasnain; Xia, ZhenWei; Zou, Dandan
2015-09-15
The nonlinear dynamics of incompressible non-dissipative two-fluid plasma model is investigated through classical Gibbs ensemble methods. Liouville's theorem of phase space for each wave number is proved, and the absolute equilibrium spectra for Galerkin truncated two-fluid model are calculated. In two-fluid theory, the equilibrium is built on the conservation of three quadratic invariants: the total energy and the self-helicities for ions and electrons fluid, respectively. The implications of statistic equilibrium spectra with arbitrary ratios of conserved invariants are discussed.
ERIC Educational Resources Information Center
Watson, Jane
2007-01-01
Inference, or decision making, is seen in curriculum documents as the final step in a statistical investigation. For a formal statistical enquiry this may be associated with sophisticated tests involving probability distributions. For young students without the mathematical background to perform such tests, it is still possible to draw informal…
NASA Astrophysics Data System (ADS)
King, Gary; Rosen, Ori; Tanner, Martin A.
2004-09-01
This collection of essays brings together a diverse group of scholars to survey the latest strategies for solving ecological inference problems in various fields. The last half-decade has witnessed an explosion of research in ecological inference--the process of trying to infer individual behavior from aggregate data. Although uncertainties and information lost in aggregation make ecological inference one of the most problematic types of research to rely on, these inferences are required in many academic fields, as well as by legislatures and the Courts in redistricting, by business in marketing research, and by governments in policy analysis.
Statistical Mechanics and Dynamics of a Three-Dimensional Glass-Forming System
NASA Astrophysics Data System (ADS)
Lerner, Edan; Procaccia, Itamar; Zylberg, Jacques
2009-03-01
In the context of a classical example of glass formation in three dimensions, we exemplify how to construct a statistical-mechanical theory of the glass transition. At the heart of the approach is a simple criterion for verifying a proper choice of upscaled quasispecies that allow the construction of a theory with a finite number of “states.” Once constructed, the theory identifies a typical scale ξ that increases rapidly with lowering the temperature and which determines the α-relaxation time τα as τα˜exp(μξ/T), with μ a typical chemical potential. The theory can predict relaxation times at temperatures that are inaccessible to numerical simulations.
NASA Astrophysics Data System (ADS)
Dienes, Keith R.
2006-05-01
Recent developments in string theory have reinforced the notion that the space of stable supersymmetric and nonsupersymmetric string vacua fills out a landscape whose features are largely unknown. It is then hoped that progress in extracting phenomenological predictions from string theory—such as correlations between gauge groups, matter representations, potential values of the cosmological constant, and so forth—can be achieved through statistical studies of these vacua. To date, most of the efforts in these directions have focused on type I vacua. In this note, we present the first results of a statistical study of the heterotic landscape, focusing on more than 105 explicit nonsupersymmetric tachyon-free heterotic string vacua and their associated gauge groups and one-loop cosmological constants. Although this study has several important limitations, we find a number of intriguing features which may be relevant for the heterotic landscape as a whole. These features include different probabilities and correlations for different possible gauge groups as functions of the number of orbifold twists. We also find a vast degeneracy amongst nonsupersymmetric string models, leading to a severe reduction in the number of realizable values of the cosmological constant as compared with naïve expectations. Finally, we find strong correlations between cosmological constants and gauge groups which suggest that heterotic string models with extremely small cosmological constants are overwhelmingly more likely to exhibit the standard model gauge group at the string scale than any of its grand-unified extensions. In all cases, heterotic world sheet symmetries such as modular invariance provide important constraints that do not appear in corresponding studies of type I vacua.
Statistics of velocity and temperature fluctuations in two-dimensional Rayleigh-Bénard convection
NASA Astrophysics Data System (ADS)
Zhang, Yang; Huang, Yong-Xiang; Jiang, Nan; Liu, Yu-Lu; Lu, Zhi-Ming; Qiu, Xiang; Zhou, Quan
2017-08-01
We investigate fluctuations of the velocity and temperature fields in two-dimensional (2D) Rayleigh-Bénard (RB) convection by means of direct numerical simulations (DNS) over the Rayleigh number range 106≤Ra≤1010 and for a fixed Prandtl number Pr=5.3 and aspect ratio Γ =1 . Our results show that there exists a counter-gradient turbulent transport of energy from fluctuations to the mean flow both locally and globally, implying that the Reynolds stress is one of the driving mechanisms of the large-scale circulation in 2D turbulent RB convection besides the buoyancy of thermal plumes. We also find that the viscous boundary layer (BL) thicknesses near the horizontal conducting plates and near the vertical sidewalls, δu and δv, are almost the same for a given Ra, and they scale with the Rayleigh and Reynolds numbers as ˜Ra-0.26±0.03 and ˜Re-0.43±0.04 . Furthermore, the thermal BL thickness δθ defined based on the root-mean-square (rms) temperature profiles is found to agree with Prandtl-Blasius predictions from the scaling point of view. In addition, the probability density functions of turbulent energy ɛu' and thermal ɛθ' dissipation rates, calculated, respectively, within the viscous and thermal BLs, are found to be always non-log-normal and obey approximately a Bramwell-Holdsworth-Pinton distribution first introduced to characterize rare fluctuations in a confined turbulent flow and critical phenomena.
Two-dimensional wetting with binary disorder: a numerical study of the loop statistics
NASA Astrophysics Data System (ADS)
Garel, T.; Monthus, C.
2005-07-01
We numerically study the wetting (adsorption) transition of a polymer chain on a disordered substrate in 1+1 dimension. Following the Poland-Scheraga model of DNA denaturation, we use a Fixman-Freire scheme for the entropy of loops. This allows us to consider chain lengths of order N ˜105 to 106, with 104 disorder realizations. Our study is based on the statistics of loops between two contacts with the substrate, from which we define Binder-like parameters: their crossings for various sizes N allow a precise determination of the critical temperature, and their finite size properties yields a crossover exponent φ=1/(2-α) ≃0.5. We then analyse at criticality the distribution of loop length l in both regimes l ˜O(N) and 1 ≪l ≪N, as well as the finite-size properties of the contact density and energy. Our conclusion is that the critical exponents for the thermodynamics are the same as those of the pure case, except for strong logarithmic corrections to scaling. The presence of these logarithmic corrections in the thermodynamics is related to a disorder-dependent logarithmic singularity that appears in the critical loop distribution in the rescaled variable λ=l/N as λ↦1.
Allen, J; Velsko, S
2009-11-16
This report explores the question of whether meaningful conclusions can be drawn regarding the transmission relationship between two microbial samples on the basis of differences observed between the two sample's respective genomes. Unlike similar forensic applications using human DNA, the rapid rate of microbial genome evolution combined with the dynamics of infectious disease require a shift in thinking on what it means for two samples to 'match' in support of a forensic hypothesis. Previous outbreaks for SARS-CoV, FMDV and HIV were examined to investigate the question of how microbial sequence data can be used to draw inferences that link two infected individuals by direct transmission. The results are counter intuitive with respect to human DNA forensic applications in that some genetic change rather than exact matching improve confidence in inferring direct transmission links, however, too much genetic change poses challenges, which can weaken confidence in inferred links. High rates of infection coupled with relatively weak selective pressure observed in the SARS-CoV and FMDV data lead to fairly low confidence for direct transmission links. Confidence values for forensic hypotheses increased when testing for the possibility that samples are separated by at most a few intermediate hosts. Moreover, the observed outbreak conditions support the potential to provide high confidence values for hypothesis that exclude direct transmission links. Transmission inferences are based on the total number of observed or inferred genetic changes separating two sequences rather than uniquely weighing the importance of any one genetic mismatch. Thus, inferences are surprisingly robust in the presence of sequencing errors provided the error rates are randomly distributed across all samples in the reference outbreak database and the novel sequence samples in question. When the number of observed nucleotide mutations are limited due to characteristics of the outbreak or the
Pascal, Jean-Claude; Thomas, Jean-Hugh; Li, Jing-Fang
2008-10-01
It was recently shown that the statistical errors of the measurement in the acoustic energy density by the two microphone method in waveguide have little variation when the losses of coherence between microphones increase. To explain these intervals of uncertainty, the variance of the measurement is expressed in this paper as a function of the various energy quantities of the acoustic fields--energy densities and sound intensities. The necessary conditions to reach the lower bound are clarified. The results obtained are illustrated by an example of a one-dimensional partially coherent field, which allows one to specify the relationship between the coherence functions of the pressure and particle velocity and those of the two microphone signals.
Guo, Yu; Graber, Armin; McBurney, Robert N; Balasubramanian, Raji
2010-09-03
data generated using 'omics' technologies are characterized by high dimensionality, where the number of features measured per subject vastly exceeds the number of subjects in the study. In this paper, we consider issues relevant in the design of biomedical studies in which the goal is the discovery of a subset of features and an associated algorithm that can predict a binary outcome, such as disease status. We compare the performance of four commonly used classifiers (K-Nearest Neighbors, Prediction Analysis for Microarrays, Random Forests and Support Vector Machines) in high-dimensionality data settings. We evaluate the effects of varying levels of signal-to-noise ratio in the dataset, imbalance in class distribution and choice of metric for quantifying performance of the classifier. To guide study design, we present a summary of the key characteristics of 'omics' data profiled in several human or animal model experiments utilizing high-content mass spectrometry and multiplexed immunoassay based techniques. the analysis of data from seven 'omics' studies revealed that the average magnitude of effect size observed in human studies was markedly lower when compared to that in animal studies. The data measured in human studies were characterized by higher biological variation and the presence of outliers. The results from simulation studies indicated that the classifier Prediction Analysis for Microarrays (PAM) had the highest power when the class conditional feature distributions were Gaussian and outcome distributions were balanced. Random Forests was optimal when feature distributions were skewed and when class distributions were unbalanced. We provide a free open-source R statistical software library (MVpower) that implements the simulation strategy proposed in this paper. no single classifier had optimal performance under all settings. Simulation studies provide useful guidance for the design of biomedical studies involving high-dimensionality data.
NASA Astrophysics Data System (ADS)
Smith, L. W.; Al-Taie, H.; Sfigakis, F.; See, P.; Lesage, A. A. J.; Xu, B.; Griffiths, J. P.; Beere, H. E.; Jones, G. A. C.; Ritchie, D. A.; Kelly, M. J.; Smith, C. G.
2014-07-01
The properties of conductance in one-dimensional (1D) quantum wires are statistically investigated using an array of 256 lithographically identical split gates, fabricated on a GaAs/AlGaAs heterostructure. All the split gates are measured during a single cooldown under the same conditions. Electron many-body effects give rise to an anomalous feature in the conductance of a one-dimensional quantum wire, known as the "0.7 structure" (or "0.7 anomaly"). To handle the large data set, a method of automatically estimating the conductance value of the 0.7 structure is developed. Large differences are observed in the strength and value of the 0.7 structure [from 0.63 to 0.84×(2e2/h)], despite the constant temperature and identical device design. Variations in the 1D potential profile are quantified by estimating the curvature of the barrier in the direction of electron transport, following a saddle-point model. The 0.7 structure appears to be highly sensitive to the specific confining potential within individual devices.
NASA Astrophysics Data System (ADS)
Dashti-Naserabadi, H.; Najafi, M. N.
2015-05-01
We consider the three-dimensional (3D) Bak-Tang-Wiesenfeld model in a cubic lattice. Along with analyzing the 3D problem, the geometrical structure of the two-dimensional (2D) cross section of waves is investigated. By analyzing the statistical observables defined in the cross sections, it is shown that the model in that plane (named as 2D-induced model) is in the critical state and fulfills the finite-size scaling hypothesis. The analysis of the critical loops that are interfaces of the 2D-induced model is of special importance in this paper. Most importantly, we see that their fractal dimension is Df=1.387 ±0.005 , which is compatible with the fractal dimension of the external perimeter of geometrical spin clusters of 2D critical Ising model. Some hyperscaling relations between the exponents of the model are proposed and numerically confirmed. We then address the problem of conformal invariance of the mentioned domain walls using Schramm-Lowener evolution (SLE). We found that they are described by SLE with the diffusivity parameter κ =2.8 ±0.2 , nearly consistent with observed fractal dimension.
Viecelli, J.A. )
1993-10-01
The Hamiltonian flow of a set of point vortices of like sign and strength has a low-temperature phase consisting of a rotating triangular lattice of vortices, and a normal temperature turbulent phase consisting of random clusters of vorticity that orbit about a common center along random tracks. The mean-field flow in the normal temperature phase has similarities with turbulent quasi-two-dimensional rotating laboratory and geophysical flows, whereas the low-temperature phase displays effects associated with quantum fluids. In the normal temperature phase the vortices follow power-law clustering distributions, while in the time domain random interval modulation of the vortex orbit radii fluctuations produces singular fractional exponent power-law low-frequency spectra corresponding to time autocorrelation functions with fractional exponent power-law tails. Enhanced diffusion is present in the turbulent state, whereas in the solid-body rotation state vortices thermally diffuse across the lattice. Over the entire temperature range the interaction energy of a single vortex in the field of the rest of the vortices follows positive temperature Fermi--Dirac statistics, with the zero temperature limit corresponding to the rotating crystal phase, and the infinite temperature limit corresponding to a Maxwellian distribution. Analyses of weather records dependent on the large-scale quasi-two-dimensional atmospheric circulation suggest the presence of singular fractional exponent power-law spectra and fractional exponent power-law autocorrelation tails, consistent with the theory.
Malinowski, Kathleen T.; Pantarotto, Jason R.; Senan, Suresh
2010-08-01
Purpose: To investigate the feasibility of modeling Stage III lung cancer tumor and node positions from anatomical surrogates. Methods and Materials: To localize their centroids, the primary tumor and lymph nodes from 16 Stage III lung cancer patients were contoured in 10 equal-phase planning four-dimensional (4D) computed tomography (CT) image sets. The centroids of anatomical respiratory surrogates (carina, xyphoid, nipples, mid-sternum) in each image set were also localized. The correlations between target and surrogate positions were determined, and ordinary least-squares (OLS) and partial least-squares (PLS) regression models based on a subset of respiratory phases (three to eight randomly selected) were created to predict the target positions in the remaining images. The three-phase image sets that provided the best predictive information were used to create models based on either the carina alone or all surrogates. Results: The surrogate most correlated with target motion varied widely. Depending on the number of phases used to build the models, mean OLS and PLS errors were 1.0 to 1.4 mm and 0.8 to 1.0 mm, respectively. Models trained on the 0%, 40%, and 80% respiration phases had mean ({+-} standard deviation) PLS errors of 0.8 {+-} 0.5 mm and 1.1 {+-} 1.1 mm for models based on all surrogates and carina alone, respectively. For target coordinates with motion >5 mm, the mean three-phase PLS error based on all surrogates was 1.1 mm. Conclusions: Our results establish the feasibility of inferring primary tumor and nodal motion from anatomical surrogates in 4D CT scans of Stage III lung cancer. Using inferential modeling to decrease the processing time of 4D CT scans may facilitate incorporation of patient-specific treatment margins.
Liu, Siyuan; Davis, Joe M
2006-09-08
A theory is proposed for the dependence on saturation of the average minimum resolution R(*) in point-process statistical-overlap theory for two-dimensional separations. Peak maxima are modelled by clusters of overlapping circles in hexagonal arrangements similar to close-packed layers. Such clusters exist only for specific circle numbers, but equations are derived that facilitate prediction of equivalent cluster properties for any number of circles. A metric is proposed for the average minimum resolution that separates two such clusters into two maxima. From this metric, the average minimum resolution of the two nearest-neighbor single-component peaks (SCPs)--one in each cluster--is calculated. Its value varies with the number of SCPs in both clusters. These resolutions are weighted by the probability that the two clusters contain the postulated numbers of SCPs and summed to give R(*), which decreases with increasing saturation. The dependence of R(*) on saturation is combined with a theory correcting the probability of overlap in a reduced square for boundary effects. The numbers of maxima in simulations of 75, 150, and 300 randomly distributed bi-Gaussians having exponential heights and aspect ratios of 1, 30, and 60 are compared to predictions. Excellent agreement between maxima numbers and theory is found at low and high saturation. Good estimates of the numbers of bi-Gaussians in simulations are calculated by fitting theory to numbers of maxima using least-squares regression. The theory is applied to mimicked GC x GCs of 93 compounds having many correlated retention times, with predictions that agree fairly well with maxima numbers.
NASA Astrophysics Data System (ADS)
Fyodorov, Yan V.; Bouchaud, Jean-Philippe
2008-08-01
We construct an N-dimensional Gaussian landscape with multiscale, translation invariant, logarithmic correlations and investigate the statistical mechanics of a single particle in this environment. In the limit of high dimension N → ∞ the free energy of the system and overlap function are calculated exactly using the replica trick and Parisi's hierarchical ansatz. In the thermodynamic limit, we recover the most general version of the Derrida's generalized random energy model (GREM). The low-temperature behaviour depends essentially on the spectrum of length scales involved in the construction of the landscape. If the latter consists of K discrete values, the system is characterized by a K-step replica symmetry breaking solution. We argue that our construction is in fact valid in any finite spatial dimensions N >= 1. We discuss the implications of our results for the singularity spectrum describing multifractality of the associated Boltzmann-Gibbs measure. Finally we discuss several generalizations and open problems, such as the dynamics in such a landscape and the construction of a generalized multifractal random walk.
NASA Astrophysics Data System (ADS)
Zhang, Honghai; Walker, Nicholas; Mitchell, Steven C.; Thomas, Matthew; Wahle, Andreas; Scholz, Thomas; Sonka, Milan
2006-03-01
Conventional analysis of cardiac ventricular magnetic resonance images is performed using short axis images and does not guarantee completeness and consistency of the ventricle coverage. In this paper, a four-dimensional (4D, 3D+time) left and right ventricle statistical shape model was generated from the combination of the long axis and short axis images. Iterative mutual intensity registration and interpolation were used to merge the long axis and short axis images into isotropic 4D images and simultaneously correct existing breathing artifact. Distance-based shape interpolation and approximation were used to generate complete ventricle shapes from the long axis and short axis manual segmentations. Landmarks were automatically generated and propagated to 4D data samples using rigid alignment, distance-based merging, and B-spline transform. Principal component analysis (PCA) was used in model creation and analysis. The two strongest modes of the shape model captured the most important shape feature of Tetralogy of Fallot (TOF) patients, right ventricle enlargement. Classification of cardiac images into classes of normal and TOF subjects performed on 3D and 4D models showed 100% classification correctness rates for both normal and TOF subjects using k-Nearest Neighbor (k=1 or 3) classifier and the two strongest shape modes.
NASA Astrophysics Data System (ADS)
Durand, Marc; Kraynik, Andrew M.; van Swol, Frank; Käfer, Jos; Quilliet, Catherine; Cox, Simon; Ataei Talebi, Shirin; Graner, François
2014-06-01
Bubble monolayers are model systems for experiments and simulations of two-dimensional packing problems of deformable objects. We explore the relation between the distributions of the number of bubble sides (topology) and the bubble areas (geometry) in the low liquid fraction limit. We use a statistical model [M. Durand, Europhys. Lett. 90, 60002 (2010), 10.1209/0295-5075/90/60002] which takes into account Plateau laws. We predict the correlation between geometrical disorder (bubble size dispersity) and topological disorder (width of bubble side number distribution) over an extended range of bubble size dispersities. Extensive data sets arising from shuffled foam experiments, surface evolver simulations, and cellular Potts model simulations all collapse surprisingly well and coincide with the model predictions, even at extremely high size dispersity. At moderate size dispersity, we recover our earlier approximate predictions [M. Durand, J. Kafer, C. Quilliet, S. Cox, S. A. Talebi, and F. Graner, Phys. Rev. Lett. 107, 168304 (2011), 10.1103/PhysRevLett.107.168304]. At extremely low dispersity, when approaching the perfectly regular honeycomb pattern, we study how both geometrical and topological disorders vanish. We identify a crystallization mechanism and explore it quantitatively in the case of bidisperse foams. Due to the deformability of the bubbles, foams can crystallize over a larger range of size dispersities than hard disks. The model predicts that the crystallization transition occurs when the ratio of largest to smallest bubble radii is 1.4.
Durand, Marc; Kraynik, Andrew M; van Swol, Frank; Käfer, Jos; Quilliet, Catherine; Cox, Simon; Ataei Talebi, Shirin; Graner, François
2014-06-01
Bubble monolayers are model systems for experiments and simulations of two-dimensional packing problems of deformable objects. We explore the relation between the distributions of the number of bubble sides (topology) and the bubble areas (geometry) in the low liquid fraction limit. We use a statistical model [M. Durand, Europhys. Lett. 90, 60002 (2010)] which takes into account Plateau laws. We predict the correlation between geometrical disorder (bubble size dispersity) and topological disorder (width of bubble side number distribution) over an extended range of bubble size dispersities. Extensive data sets arising from shuffled foam experiments, surface evolver simulations, and cellular Potts model simulations all collapse surprisingly well and coincide with the model predictions, even at extremely high size dispersity. At moderate size dispersity, we recover our earlier approximate predictions [M. Durand, J. Kafer, C. Quilliet, S. Cox, S. A. Talebi, and F. Graner, Phys. Rev. Lett. 107, 168304 (2011)]. At extremely low dispersity, when approaching the perfectly regular honeycomb pattern, we study how both geometrical and topological disorders vanish. We identify a crystallization mechanism and explore it quantitatively in the case of bidisperse foams. Due to the deformability of the bubbles, foams can crystallize over a larger range of size dispersities than hard disks. The model predicts that the crystallization transition occurs when the ratio of largest to smallest bubble radii is 1.4.
NASA Astrophysics Data System (ADS)
Stock, Eduardo Velasco; da Silva, Roberto; Fernandes, H. A.
2017-07-01
In this paper, we propose a stochastic model which describes two species of particles moving in counterflow. The model generalizes the theoretical framework that describes the transport in random systems by taking into account two different scenarios: particles can work as mobile obstacles, whereas particles of one species move in the opposite direction to the particles of the other species, or particles of a given species work as fixed obstacles remaining in their places during the time evolution. We conduct a detailed study about the statistics concerning the crossing time of particles, as well as the effects of the lateral transitions on the time required to the system reaches a state of complete geographic separation of species. The spatial effects of jamming are also studied by looking into the deformation of the concentration of particles in the two-dimensional corridor. Finally, we observe in our study the formation of patterns of lanes which reach the steady state regardless of the initial conditions used for the evolution. A similar result is also observed in real experiments involving charged colloids motion and simulations of pedestrian dynamics based on Langevin equations, when periodic boundary conditions are considered (particles counterflow in a ring symmetry). The results obtained through Monte Carlo simulations and numerical integrations are in good agreement with each other. However, differently from previous studies, the dynamics considered in this work is not Newton-based, and therefore, even artificial situations of self-propelled objects should be studied in this first-principles modeling.
NASA Astrophysics Data System (ADS)
Romé, M.; Lepreti, F.; Maero, G.; Pozzoli, R.; Vecchio, A.; Carbone, V.
2013-03-01
Highly magnetized, pure electron plasmas confined in a Penning-Malmberg trap allow one to perform experiments on the two-dimensional (2D) fluid dynamics under conditions where non-ideal effects are almost negligible. Recent results on the freely decaying 2D turbulence obtained from experiments with electron plasmas performed in the Penning-Malmberg trap ELTRAP are presented. The analysis has been applied to experimental sequences with different types of initial density distributions. The dynamical properties of the system have been investigated by means of wavelet transforms and Proper Orthogonal Decomposition (POD). The wavelet analysis shows that most of the enstrophy is contained at spatial scales corresponding to the typical size of the persistent vortices in the 2D electron plasma flow. The POD analysis allows one to identify the coherent structures which give the dominant contribution to the plasma evolution. The statistical properties of the turbulence have been investigated by means of Probability Density Functions (PDFs) and structure functions of spatial vorticity increments. The analysis evidences how the shape and evolution of the dominant coherent structures and the intermittency properties of the turbulence strongly depend on the initial conditions for the electron density.
Coory, M
2008-04-01
The aim of statistical analyses in cluster investigations is to estimate the probability that the aggregation of cases could be due to chance. As a result of several statistical problems - including the post-hoc nature of the analysis and the subjective nature of implied multiple comparisons - this cannot be carried out with any certainty. In cluster investigations, expert opinion should carry much more weight than P-values, which are exceedingly difficult to interpret.
NASA Astrophysics Data System (ADS)
Nair, Anish Kumar M.; Rajeev, Kunjukrishnapillai
2012-07-01
Long-term (2006-2011) monthly and seasonal mean vertical distributions of clouds and their spatial variations over the Indian subcontinent and surrounding oceanic regions have been derived using data obtained from the space-borne radar, CloudSat. Together with the data from space-borne imagers (Kalpana-1-VHRR and NOAA-AVHRR), this provide insight into the 3-dimensional distribution of clouds and its linkage with dominant tropical dynamical features, which are largely unexplored over the Indian region. Meridonal cross sections of ITCZ, inferred from the vertical distribution of clouds, clearly reveal the relatively narrow structure of ITCZ flanked by thick cirrus outflows in the upper troposphere on either side. The base of cirrus clouds in the outflow region significantly increases away from the ITCZ core, while the corresponding variations in cirrus top is negligible, resulting in considerable thinning of cirrus away from the ITCZ. This provides direct observational evidence for the infrared radiative heating at cloud base and its role in regulating the cirrus lifetime through sublimation. On average, the frequency of occurrence of clouds rapidly decreases with altitude in the altitude band of 12-14 km, which corresponds to the convective tropopause altitude. North-south inclination and east-west asymmetry of ITCZ during the winter season are distinctly clear in the vertical distribution of clouds, which provide information on the pathways for inter-hemispheric transport over the Indian Ocean during this season. During the Asian summer monsoon season (June-September), substantial amount of deep convective clouds are found to occur over the North Bay of Bengal, extending up to an altitude of >14 km, which is ~1-2 km higher than that over other deep convective regions. This has potential implications in the pumping of tropospheric airmass across the tropical tropopause over the region. This study characterizes a pool of inhibited cloudiness over the southwest Bay of
Inverse Ising inference with correlated samples
NASA Astrophysics Data System (ADS)
Obermayer, Benedikt; Levine, Erel
2014-12-01
Correlations between two variables of a high-dimensional system can be indicative of an underlying interaction, but can also result from indirect effects. Inverse Ising inference is a method to distinguish one from the other. Essentially, the parameters of the least constrained statistical model are learned from the observed correlations such that direct interactions can be separated from indirect correlations. Among many other applications, this approach has been helpful for protein structure prediction, because residues which interact in the 3D structure often show correlated substitutions in a multiple sequence alignment. In this context, samples used for inference are not independent but share an evolutionary history on a phylogenetic tree. Here, we discuss the effects of correlations between samples on global inference. Such correlations could arise due to phylogeny but also via other slow dynamical processes. We present a simple analytical model to address the resulting inference biases, and develop an exact method accounting for background correlations in alignment data by combining phylogenetic modeling with an adaptive cluster expansion algorithm. We find that popular reweighting schemes are only marginally effective at removing phylogenetic bias, suggest a rescaling strategy that yields better results, and provide evidence that our conclusions carry over to the frequently used mean-field approach to the inverse Ising problem.
NASA Astrophysics Data System (ADS)
Mackay, Robert Malcolm
The two-dimensional statistical dynamical climate model that has recently been developed at the Global Change Research Center and the Oregon Graduate Institute of Science & Technology (GCRC 2D climate model) is presented and several new results obtained using the model are discussed. The model solves the 2-D primitive equations in finite difference form (mass continuity, Newton's second law, and the first law of thermodynamics) for the prognostic variables zonal mean density, zonal mean zonal velocity, zonal mean meridional velocity, and zonal mean temperature on a grid that has 18 nodes in latitude and 9 vertical nodes (plus the surface). The equation of state, p=rho RT and an assumed hydrostatic atmosphere, Delta p = -rho gDelta z, are used to diagnostically calculate the zonal mean pressure and vertical velocity for each grid node, and the moisture balance equation is used to estimate the precipitation rate. The performance of the model at simulating the two-dimensional temperature, zonal winds, and mass stream function is explored. The strengths and weaknesses of the model are highlighted and suggestions for future model improvements are given. The parameterization of the transient eddy fluxes of heat and momentum developed by Stone and Yao (1987 and 1990) are used with small modifications. These modifications are shown to help the performance of the model at simulating the observed climate system as well as increase the model's computational stability. Following earlier work that analyzed the response of the zonal wind fields predicted by three GCM simulations for a doubling of atmospheric CO_2, the response of the GCRC 2D model's zonal wind fields is also explored for the same experiment. Unlike the GCM simulations, our 2D model results in distinct patterns of change. It is suggested that the observed changes in zonal winds for the 2xCO_2 experiment are related to the increase in the upper level temperature gradients predicted by our model and most climate
van IJsseldijk, E. A.; Valstar, E. R.; Stoel, B. C.; Nelissen, R. G. H. H.; Baka, N.; van’t Klooster, R.
2016-01-01
t Klooster, B. L. Kaptein. Three dimensional measurement of minimum joint space width in the knee from stereo radiographs using statistical shape models. Bone Joint Res 2016;320–327. DOI: 10.1302/2046-3758.58.2000626. PMID:27491660