Wijeysundera, Duminda N; Austin, Peter C; Hux, Janet E; Beattie, W Scott; Laupacis, Andreas
2009-01-01
Randomized trials generally use "frequentist" statistics based on P-values and 95% confidence intervals. Frequentist methods have limitations that might be overcome, in part, by Bayesian inference. To illustrate these advantages, we re-analyzed randomized trials published in four general medical journals during 2004. We used Medline to identify randomized superiority trials with two parallel arms, individual-level randomization and dichotomous or time-to-event primary outcomes. Studies with P<0.05 in favor of the intervention were deemed "positive"; otherwise, they were "negative." We used several prior distributions and exact conjugate analyses to calculate Bayesian posterior probabilities for clinically relevant effects. Of 88 included studies, 39 were positive using a frequentist analysis. Although the Bayesian posterior probabilities of any benefit (relative risk or hazard ratio<1) were high in positive studies, these probabilities were lower and variable for larger benefits. The positive studies had only moderate probabilities for exceeding the effects that were assumed for calculating the sample size. By comparison, there were moderate probabilities of any benefit in negative studies. Bayesian and frequentist analyses complement each other when interpreting the results of randomized trials. Future reports of randomized trials should include both.
Krefeld-Schwalb, Antonia; Witte, Erich H.; Zenker, Frank
2018-01-01
In psychology as elsewhere, the main statistical inference strategy to establish empirical effects is null-hypothesis significance testing (NHST). The recent failure to replicate allegedly well-established NHST-results, however, implies that such results lack sufficient statistical power, and thus feature unacceptably high error-rates. Using data-simulation to estimate the error-rates of NHST-results, we advocate the research program strategy (RPS) as a superior methodology. RPS integrates Frequentist with Bayesian inference elements, and leads from a preliminary discovery against a (random) H0-hypothesis to a statistical H1-verification. Not only do RPS-results feature significantly lower error-rates than NHST-results, RPS also addresses key-deficits of a “pure” Frequentist and a standard Bayesian approach. In particular, RPS aggregates underpowered results safely. RPS therefore provides a tool to regain the trust the discipline had lost during the ongoing replicability-crisis. PMID:29740363
Krefeld-Schwalb, Antonia; Witte, Erich H; Zenker, Frank
2018-01-01
In psychology as elsewhere, the main statistical inference strategy to establish empirical effects is null-hypothesis significance testing (NHST). The recent failure to replicate allegedly well-established NHST-results, however, implies that such results lack sufficient statistical power, and thus feature unacceptably high error-rates. Using data-simulation to estimate the error-rates of NHST-results, we advocate the research program strategy (RPS) as a superior methodology. RPS integrates Frequentist with Bayesian inference elements, and leads from a preliminary discovery against a (random) H 0 -hypothesis to a statistical H 1 -verification. Not only do RPS-results feature significantly lower error-rates than NHST-results, RPS also addresses key-deficits of a "pure" Frequentist and a standard Bayesian approach. In particular, RPS aggregates underpowered results safely. RPS therefore provides a tool to regain the trust the discipline had lost during the ongoing replicability-crisis.
Advances in Bayesian Modeling in Educational Research
ERIC Educational Resources Information Center
Levy, Roy
2016-01-01
In this article, I provide a conceptually oriented overview of Bayesian approaches to statistical inference and contrast them with frequentist approaches that currently dominate conventional practice in educational research. The features and advantages of Bayesian approaches are illustrated with examples spanning several statistical modeling…
Suggestions for presenting the results of data analyses
Anderson, David R.; Link, William A.; Johnson, Douglas H.; Burnham, Kenneth P.
2001-01-01
We give suggestions for the presentation of research results from frequentist, information-theoretic, and Bayesian analysis paradigms, followed by several general suggestions. The information-theoretic and Bayesian methods offer alternative approaches to data analysis and inference compared to traditionally used methods. Guidance is lacking on the presentation of results under these alternative procedures and on nontesting aspects of classical frequentists methods of statistical analysis. Null hypothesis testing has come under intense criticism. We recommend less reporting of the results of statistical tests of null hypotheses in cases where the null is surely false anyway, or where the null hypothesis is of little interest to science or management.
Zonta, Zivko J; Flotats, Xavier; Magrí, Albert
2014-08-01
The procedure commonly used for the assessment of the parameters included in activated sludge models (ASMs) relies on the estimation of their optimal value within a confidence region (i.e. frequentist inference). Once optimal values are estimated, parameter uncertainty is computed through the covariance matrix. However, alternative approaches based on the consideration of the model parameters as probability distributions (i.e. Bayesian inference), may be of interest. The aim of this work is to apply (and compare) both Bayesian and frequentist inference methods when assessing uncertainty for an ASM-type model, which considers intracellular storage and biomass growth, simultaneously. Practical identifiability was addressed exclusively considering respirometric profiles based on the oxygen uptake rate and with the aid of probabilistic global sensitivity analysis. Parameter uncertainty was thus estimated according to both the Bayesian and frequentist inferential procedures. Results were compared in order to evidence the strengths and weaknesses of both approaches. Since it was demonstrated that Bayesian inference could be reduced to a frequentist approach under particular hypotheses, the former can be considered as a more generalist methodology. Hence, the use of Bayesian inference is encouraged for tackling inferential issues in ASM environments.
Love, Jeffrey J.
2012-01-01
Statistical analysis is made of rare, extreme geophysical events recorded in historical data -- counting the number of events $k$ with sizes that exceed chosen thresholds during specific durations of time $\\tau$. Under transformations that stabilize data and model-parameter variances, the most likely Poisson-event occurrence rate, $k/\\tau$, applies for frequentist inference and, also, for Bayesian inference with a Jeffreys prior that ensures posterior invariance under changes of variables. Frequentist confidence intervals and Bayesian (Jeffreys) credibility intervals are approximately the same and easy to calculate: $(1/\\tau)[(\\sqrt{k} - z/2)^{2},(\\sqrt{k} + z/2)^{2}]$, where $z$ is a parameter that specifies the width, $z=1$ ($z=2$) corresponding to $1\\sigma$, $68.3\\%$ ($2\\sigma$, $95.4\\%$). If only a few events have been observed, as is usually the case for extreme events, then these "error-bar" intervals might be considered to be relatively wide. From historical records, we estimate most likely long-term occurrence rates, 10-yr occurrence probabilities, and intervals of frequentist confidence and Bayesian credibility for large earthquakes, explosive volcanic eruptions, and magnetic storms.
Frequentist Model Averaging in Structural Equation Modelling.
Jin, Shaobo; Ankargren, Sebastian
2018-06-04
Model selection from a set of candidate models plays an important role in many structural equation modelling applications. However, traditional model selection methods introduce extra randomness that is not accounted for by post-model selection inference. In the current study, we propose a model averaging technique within the frequentist statistical framework. Instead of selecting an optimal model, the contributions of all candidate models are acknowledged. Valid confidence intervals and a [Formula: see text] test statistic are proposed. A simulation study shows that the proposed method is able to produce a robust mean-squared error, a better coverage probability, and a better goodness-of-fit test compared to model selection. It is an interesting compromise between model selection and the full model.
Can the behavioral sciences self-correct? A social epistemic study.
Romero, Felipe
2016-12-01
Advocates of the self-corrective thesis argue that scientific method will refute false theories and find closer approximations to the truth in the long run. I discuss a contemporary interpretation of this thesis in terms of frequentist statistics in the context of the behavioral sciences. First, I identify experimental replications and systematic aggregation of evidence (meta-analysis) as the self-corrective mechanism. Then, I present a computer simulation study of scientific communities that implement this mechanism to argue that frequentist statistics may converge upon a correct estimate or not depending on the social structure of the community that uses it. Based on this study, I argue that methodological explanations of the "replicability crisis" in psychology are limited and propose an alternative explanation in terms of biases. Finally, I conclude suggesting that scientific self-correction should be understood as an interaction effect between inference methods and social structures. Copyright © 2016 Elsevier Ltd. All rights reserved.
Bayes in biological anthropology.
Konigsberg, Lyle W; Frankenberg, Susan R
2013-12-01
In this article, we both contend and illustrate that biological anthropologists, particularly in the Americas, often think like Bayesians but act like frequentists when it comes to analyzing a wide variety of data. In other words, while our research goals and perspectives are rooted in probabilistic thinking and rest on prior knowledge, we often proceed to use statistical hypothesis tests and confidence interval methods unrelated (or tenuously related) to the research questions of interest. We advocate for applying Bayesian analyses to a number of different bioanthropological questions, especially since many of the programming and computational challenges to doing so have been overcome in the past two decades. To facilitate such applications, this article explains Bayesian principles and concepts, and provides concrete examples of Bayesian computer simulations and statistics that address questions relevant to biological anthropology, focusing particularly on bioarchaeology and forensic anthropology. It also simultaneously reviews the use of Bayesian methods and inference within the discipline to date. This article is intended to act as primer to Bayesian methods and inference in biological anthropology, explaining the relationships of various methods to likelihoods or probabilities and to classical statistical models. Our contention is not that traditional frequentist statistics should be rejected outright, but that there are many situations where biological anthropology is better served by taking a Bayesian approach. To this end it is hoped that the examples provided in this article will assist researchers in choosing from among the broad array of statistical methods currently available. Copyright © 2013 Wiley Periodicals, Inc.
IMNN: Information Maximizing Neural Networks
NASA Astrophysics Data System (ADS)
Charnock, Tom; Lavaux, Guilhem; Wandelt, Benjamin D.
2018-04-01
This software trains artificial neural networks to find non-linear functionals of data that maximize Fisher information: information maximizing neural networks (IMNNs). As compressing large data sets vastly simplifies both frequentist and Bayesian inference, important information may be inadvertently missed. Likelihood-free inference based on automatically derived IMNN summaries produces summaries that are good approximations to sufficient statistics. IMNNs are robustly capable of automatically finding optimal, non-linear summaries of the data even in cases where linear compression fails: inferring the variance of Gaussian signal in the presence of noise, inferring cosmological parameters from mock simulations of the Lyman-α forest in quasar spectra, and inferring frequency-domain parameters from LISA-like detections of gravitational waveforms. In this final case, the IMNN summary outperforms linear data compression by avoiding the introduction of spurious likelihood maxima.
Certified Reduced Basis Model Characterization: a Frequentistic Uncertainty Framework
2011-01-11
14) It then follows that the Legendre coefficient random vector, (Z [0], Z [1], . . . , Z [I])(ω), is (I+1)– variate normally distributed with mean (δ...I. Note each two-sided inequality represents two constraints. 3. PDE-Based Statistical Inference We now proceed to the parametrized partial...appearance of defects or geometric variations relative to an initial baseline, or perhaps manufacturing departures from nominal specifications; if our
[Bayesian statistics in medicine -- part II: main applications and inference].
Montomoli, C; Nichelatti, M
2008-01-01
Bayesian statistics is not only used when one is dealing with 2-way tables, but it can be used for inferential purposes. Using the basic concepts presented in the first part, this paper aims to give a simple overview of Bayesian methods by introducing its foundation (Bayes' theorem) and then applying this rule to a very simple practical example; whenever possible, the elementary processes at the basis of analysis are compared to those of frequentist (classical) statistical analysis. The Bayesian reasoning is naturally connected to medical activity, since it appears to be quite similar to a diagnostic process.
Rodgers, Joseph Lee
2016-01-01
The Bayesian-frequentist debate typically portrays these statistical perspectives as opposing views. However, both Bayesian and frequentist statisticians have expanded their epistemological basis away from a singular focus on the null hypothesis, to a broader perspective involving the development and comparison of competing statistical/mathematical models. For frequentists, statistical developments such as structural equation modeling and multilevel modeling have facilitated this transition. For Bayesians, the Bayes factor has facilitated this transition. The Bayes factor is treated in articles within this issue of Multivariate Behavioral Research. The current presentation provides brief commentary on those articles and more extended discussion of the transition toward a modern modeling epistemology. In certain respects, Bayesians and frequentists share common goals.
Bayesian data analysis in population ecology: motivations, methods, and benefits
Dorazio, Robert
2016-01-01
During the 20th century ecologists largely relied on the frequentist system of inference for the analysis of their data. However, in the past few decades ecologists have become increasingly interested in the use of Bayesian methods of data analysis. In this article I provide guidance to ecologists who would like to decide whether Bayesian methods can be used to improve their conclusions and predictions. I begin by providing a concise summary of Bayesian methods of analysis, including a comparison of differences between Bayesian and frequentist approaches to inference when using hierarchical models. Next I provide a list of problems where Bayesian methods of analysis may arguably be preferred over frequentist methods. These problems are usually encountered in analyses based on hierarchical models of data. I describe the essentials required for applying modern methods of Bayesian computation, and I use real-world examples to illustrate these methods. I conclude by summarizing what I perceive to be the main strengths and weaknesses of using Bayesian methods to solve ecological inference problems.
"Magnitude-based inference": a statistical review.
Welsh, Alan H; Knight, Emma J
2015-04-01
We consider "magnitude-based inference" and its interpretation by examining in detail its use in the problem of comparing two means. We extract from the spreadsheets, which are provided to users of the analysis (http://www.sportsci.org/), a precise description of how "magnitude-based inference" is implemented. We compare the implemented version of the method with general descriptions of it and interpret the method in familiar statistical terms. We show that "magnitude-based inference" is not a progressive improvement on modern statistics. The additional probabilities introduced are not directly related to the confidence interval but, rather, are interpretable either as P values for two different nonstandard tests (for different null hypotheses) or as approximate Bayesian calculations, which also lead to a type of test. We also discuss sample size calculations associated with "magnitude-based inference" and show that the substantial reduction in sample sizes claimed for the method (30% of the sample size obtained from standard frequentist calculations) is not justifiable so the sample size calculations should not be used. Rather than using "magnitude-based inference," a better solution is to be realistic about the limitations of the data and use either confidence intervals or a fully Bayesian analysis.
[Bayesian approach for the cost-effectiveness evaluation of healthcare technologies].
Berchialla, Paola; Gregori, Dario; Brunello, Franco; Veltri, Andrea; Petrinco, Michele; Pagano, Eva
2009-01-01
The development of Bayesian statistical methods for the assessment of the cost-effectiveness of health care technologies is reviewed. Although many studies adopt a frequentist approach, several authors have advocated the use of Bayesian methods in health economics. Emphasis has been placed on the advantages of the Bayesian approach, which include: (i) the ability to make more intuitive and meaningful inferences; (ii) the ability to tackle complex problems, such as allowing for the inclusion of patients who generate no cost, thanks to the availability of powerful computational algorithms; (iii) the importance of a full use of quantitative and structural prior information to produce realistic inferences. Much literature comparing the cost-effectiveness of two treatments is based on the incremental cost-effectiveness ratio. However, new methods are arising with the purpose of decision making. These methods are based on a net benefits approach. In the present context, the cost-effectiveness acceptability curves have been pointed out to be intrinsically Bayesian in their formulation. They plot the probability of a positive net benefit against the threshold cost of a unit increase in efficacy.A case study is presented in order to illustrate the Bayesian statistics in the cost-effectiveness analysis. Emphasis is placed on the cost-effectiveness acceptability curves. Advantages and disadvantages of the method described in this paper have been compared to frequentist methods and discussed.
Kruschke, John K; Liddell, Torrin M
2018-02-01
In the practice of data analysis, there is a conceptual distinction between hypothesis testing, on the one hand, and estimation with quantified uncertainty on the other. Among frequentists in psychology, a shift of emphasis from hypothesis testing to estimation has been dubbed "the New Statistics" (Cumming 2014). A second conceptual distinction is between frequentist methods and Bayesian methods. Our main goal in this article is to explain how Bayesian methods achieve the goals of the New Statistics better than frequentist methods. The article reviews frequentist and Bayesian approaches to hypothesis testing and to estimation with confidence or credible intervals. The article also describes Bayesian approaches to meta-analysis, randomized controlled trials, and power analysis.
NASA Astrophysics Data System (ADS)
Hincks, Ian; Granade, Christopher; Cory, David G.
2018-01-01
The analysis of photon count data from the standard nitrogen vacancy (NV) measurement process is treated as a statistical inference problem. This has applications toward gaining better and more rigorous error bars for tasks such as parameter estimation (e.g. magnetometry), tomography, and randomized benchmarking. We start by providing a summary of the standard phenomenological model of the NV optical process in terms of Lindblad jump operators. This model is used to derive random variables describing emitted photons during measurement, to which finite visibility, dark counts, and imperfect state preparation are added. NV spin-state measurement is then stated as an abstract statistical inference problem consisting of an underlying biased coin obstructed by three Poisson rates. Relevant frequentist and Bayesian estimators are provided, discussed, and quantitatively compared. We show numerically that the risk of the maximum likelihood estimator is well approximated by the Cramér-Rao bound, for which we provide a simple formula. Of the estimators, we in particular promote the Bayes estimator, owing to its slightly better risk performance, and straightforward error propagation into more complex experiments. This is illustrated on experimental data, where quantum Hamiltonian learning is performed and cross-validated in a fully Bayesian setting, and compared to a more traditional weighted least squares fit.
NASA Astrophysics Data System (ADS)
von der Linden, Wolfgang; Dose, Volker; von Toussaint, Udo
2014-06-01
Preface; Part I. Introduction: 1. The meaning of probability; 2. Basic definitions; 3. Bayesian inference; 4. Combinatrics; 5. Random walks; 6. Limit theorems; 7. Continuous distributions; 8. The central limit theorem; 9. Poisson processes and waiting times; Part II. Assigning Probabilities: 10. Transformation invariance; 11. Maximum entropy; 12. Qualified maximum entropy; 13. Global smoothness; Part III. Parameter Estimation: 14. Bayesian parameter estimation; 15. Frequentist parameter estimation; 16. The Cramer-Rao inequality; Part IV. Testing Hypotheses: 17. The Bayesian way; 18. The frequentist way; 19. Sampling distributions; 20. Bayesian vs frequentist hypothesis tests; Part V. Real World Applications: 21. Regression; 22. Inconsistent data; 23. Unrecognized signal contributions; 24. Change point problems; 25. Function estimation; 26. Integral equations; 27. Model selection; 28. Bayesian experimental design; Part VI. Probabilistic Numerical Techniques: 29. Numerical integration; 30. Monte Carlo methods; 31. Nested sampling; Appendixes; References; Index.
Bayesian Methods for Determining the Importance of Effects
USDA-ARS?s Scientific Manuscript database
Criticisms have plagued the frequentist null-hypothesis significance testing (NHST) procedure since the day it was created from the Fisher Significance Test and Hypothesis Test of Jerzy Neyman and Egon Pearson. Alternatives to NHST exist in frequentist statistics, but competing methods are also avai...
Nonparametric predictive inference for combining diagnostic tests with parametric copula
NASA Astrophysics Data System (ADS)
Muhammad, Noryanti; Coolen, F. P. A.; Coolen-Maturi, T.
2017-09-01
Measuring the accuracy of diagnostic tests is crucial in many application areas including medicine and health care. The Receiver Operating Characteristic (ROC) curve is a popular statistical tool for describing the performance of diagnostic tests. The area under the ROC curve (AUC) is often used as a measure of the overall performance of the diagnostic test. In this paper, we interest in developing strategies for combining test results in order to increase the diagnostic accuracy. We introduce nonparametric predictive inference (NPI) for combining two diagnostic test results with considering dependence structure using parametric copula. NPI is a frequentist statistical framework for inference on a future observation based on past data observations. NPI uses lower and upper probabilities to quantify uncertainty and is based on only a few modelling assumptions. While copula is a well-known statistical concept for modelling dependence of random variables. A copula is a joint distribution function whose marginals are all uniformly distributed and it can be used to model the dependence separately from the marginal distributions. In this research, we estimate the copula density using a parametric method which is maximum likelihood estimator (MLE). We investigate the performance of this proposed method via data sets from the literature and discuss results to show how our method performs for different family of copulas. Finally, we briefly outline related challenges and opportunities for future research.
A bayesian approach to classification criteria for spectacled eiders
Taylor, B.L.; Wade, P.R.; Stehn, R.A.; Cochrane, J.F.
1996-01-01
To facilitate decisions to classify species according to risk of extinction, we used Bayesian methods to analyze trend data for the Spectacled Eider, an arctic sea duck. Trend data from three independent surveys of the Yukon-Kuskokwim Delta were analyzed individually and in combination to yield posterior distributions for population growth rates. We used classification criteria developed by the recovery team for Spectacled Eiders that seek to equalize errors of under- or overprotecting the species. We conducted both a Bayesian decision analysis and a frequentist (classical statistical inference) decision analysis. Bayesian decision analyses are computationally easier, yield basically the same results, and yield results that are easier to explain to nonscientists. With the exception of the aerial survey analysis of the 10 most recent years, both Bayesian and frequentist methods indicated that an endangered classification is warranted. The discrepancy between surveys warrants further research. Although the trend data are abundance indices, we used a preliminary estimate of absolute abundance to demonstrate how to calculate extinction distributions using the joint probability distributions for population growth rate and variance in growth rate generated by the Bayesian analysis. Recent apparent increases in abundance highlight the need for models that apply to declining and then recovering species.
Fisher, Neyman, and Bayes at FDA.
Rubin, Donald B
2016-01-01
The wise use of statistical ideas in practice essentially requires some Bayesian thinking, in contrast to the classical rigid frequentist dogma. This dogma too often has seemed to influence the applications of statistics, even at agencies like the FDA. Greg Campbell was one of the most important advocates there for more nuanced modes of thought, especially Bayesian statistics. Because two brilliant statisticians, Ronald Fisher and Jerzy Neyman, are often credited with instilling the traditional frequentist approach in current practice, I argue that both men were actually seeking very Bayesian answers, and neither would have endorsed the rigid application of their ideas.
Bayesian Inference on the Radio-quietness of Gamma-ray Pulsars
NASA Astrophysics Data System (ADS)
Yu, Hoi-Fung; Hui, Chung Yue; Kong, Albert K. H.; Takata, Jumpei
2018-04-01
For the first time we demonstrate using a robust Bayesian approach to analyze the populations of radio-quiet (RQ) and radio-loud (RL) gamma-ray pulsars. We quantify their differences and obtain their distributions of the radio-cone opening half-angle δ and the magnetic inclination angle α by Bayesian inference. In contrast to the conventional frequentist point estimations that might be non-representative when the distribution is highly skewed or multi-modal, which is often the case when data points are scarce, Bayesian statistics displays the complete posterior distribution that the uncertainties can be readily obtained regardless of the skewness and modality. We found that the spin period, the magnetic field strength at the light cylinder, the spin-down power, the gamma-ray-to-X-ray flux ratio, and the spectral curvature significance of the two groups of pulsars exhibit significant differences at the 99% level. Using Bayesian inference, we are able to infer the values and uncertainties of δ and α from the distribution of RQ and RL pulsars. We found that δ is between 10° and 35° and the distribution of α is skewed toward large values.
Hozo, Iztok; Schell, Michael J; Djulbegovic, Benjamin
2008-07-01
The absolute truth in research is unobtainable, as no evidence or research hypothesis is ever 100% conclusive. Therefore, all data and inferences can in principle be considered as "inconclusive." Scientific inference and decision-making need to take into account errors, which are unavoidable in the research enterprise. The errors can occur at the level of conclusions that aim to discern the truthfulness of research hypothesis based on the accuracy of research evidence and hypothesis, and decisions, the goal of which is to enable optimal decision-making under present and specific circumstances. To optimize the chance of both correct conclusions and correct decisions, the synthesis of all major statistical approaches to clinical research is needed. The integration of these approaches (frequentist, Bayesian, and decision-analytic) can be accomplished through formal risk:benefit (R:B) analysis. This chapter illustrates the rational choice of a research hypothesis using R:B analysis based on decision-theoretic expected utility theory framework and the concept of "acceptable regret" to calculate the threshold probability of the "truth" above which the benefit of accepting a research hypothesis outweighs its risks.
Automatic physical inference with information maximizing neural networks
NASA Astrophysics Data System (ADS)
Charnock, Tom; Lavaux, Guilhem; Wandelt, Benjamin D.
2018-04-01
Compressing large data sets to a manageable number of summaries that are informative about the underlying parameters vastly simplifies both frequentist and Bayesian inference. When only simulations are available, these summaries are typically chosen heuristically, so they may inadvertently miss important information. We introduce a simulation-based machine learning technique that trains artificial neural networks to find nonlinear functionals of data that maximize Fisher information: information maximizing neural networks (IMNNs). In test cases where the posterior can be derived exactly, likelihood-free inference based on automatically derived IMNN summaries produces nearly exact posteriors, showing that these summaries are good approximations to sufficient statistics. In a series of numerical examples of increasing complexity and astrophysical relevance we show that IMNNs are robustly capable of automatically finding optimal, nonlinear summaries of the data even in cases where linear compression fails: inferring the variance of Gaussian signal in the presence of noise, inferring cosmological parameters from mock simulations of the Lyman-α forest in quasar spectra, and inferring frequency-domain parameters from LISA-like detections of gravitational waveforms. In this final case, the IMNN summary outperforms linear data compression by avoiding the introduction of spurious likelihood maxima. We anticipate that the automatic physical inference method described in this paper will be essential to obtain both accurate and precise cosmological parameter estimates from complex and large astronomical data sets, including those from LSST and Euclid.
Inference in the age of big data: Future perspectives on neuroscience.
Bzdok, Danilo; Yeo, B T Thomas
2017-07-15
Neuroscience is undergoing faster changes than ever before. Over 100 years our field qualitatively described and invasively manipulated single or few organisms to gain anatomical, physiological, and pharmacological insights. In the last 10 years neuroscience spawned quantitative datasets of unprecedented breadth (e.g., microanatomy, synaptic connections, and optogenetic brain-behavior assays) and size (e.g., cognition, brain imaging, and genetics). While growing data availability and information granularity have been amply discussed, we direct attention to a less explored question: How will the unprecedented data richness shape data analysis practices? Statistical reasoning is becoming more important to distill neurobiological knowledge from healthy and pathological brain measurements. We argue that large-scale data analysis will use more statistical models that are non-parametric, generative, and mixing frequentist and Bayesian aspects, while supplementing classical hypothesis testing with out-of-sample predictions. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
A Tutorial in Bayesian Potential Outcomes Mediation Analysis.
Miočević, Milica; Gonzalez, Oscar; Valente, Matthew J; MacKinnon, David P
2018-01-01
Statistical mediation analysis is used to investigate intermediate variables in the relation between independent and dependent variables. Causal interpretation of mediation analyses is challenging because randomization of subjects to levels of the independent variable does not rule out the possibility of unmeasured confounders of the mediator to outcome relation. Furthermore, commonly used frequentist methods for mediation analysis compute the probability of the data given the null hypothesis, which is not the probability of a hypothesis given the data as in Bayesian analysis. Under certain assumptions, applying the potential outcomes framework to mediation analysis allows for the computation of causal effects, and statistical mediation in the Bayesian framework gives indirect effects probabilistic interpretations. This tutorial combines causal inference and Bayesian methods for mediation analysis so the indirect and direct effects have both causal and probabilistic interpretations. Steps in Bayesian causal mediation analysis are shown in the application to an empirical example.
A Bayesian test for Hardy–Weinberg equilibrium of biallelic X-chromosomal markers
Puig, X; Ginebra, J; Graffelman, J
2017-01-01
The X chromosome is a relatively large chromosome, harboring a lot of genetic information. Much of the statistical analysis of X-chromosomal information is complicated by the fact that males only have one copy. Recently, frequentist statistical tests for Hardy–Weinberg equilibrium have been proposed specifically for dealing with markers on the X chromosome. Bayesian test procedures for Hardy–Weinberg equilibrium for the autosomes have been described, but Bayesian work on the X chromosome in this context is lacking. This paper gives the first Bayesian approach for testing Hardy–Weinberg equilibrium with biallelic markers at the X chromosome. Marginal and joint posterior distributions for the inbreeding coefficient in females and the male to female allele frequency ratio are computed, and used for statistical inference. The paper gives a detailed account of the proposed Bayesian test, and illustrates it with data from the 1000 Genomes project. In that implementation, a novel approach to tackle multiple testing from a Bayesian perspective through posterior predictive checks is used. PMID:28900292
Rhodes, Kirsty M; Turner, Rebecca M; White, Ian R; Jackson, Dan; Spiegelhalter, David J; Higgins, Julian P T
2016-12-20
Many meta-analyses combine results from only a small number of studies, a situation in which the between-study variance is imprecisely estimated when standard methods are applied. Bayesian meta-analysis allows incorporation of external evidence on heterogeneity, providing the potential for more robust inference on the effect size of interest. We present a method for performing Bayesian meta-analysis using data augmentation, in which we represent an informative conjugate prior for between-study variance by pseudo data and use meta-regression for estimation. To assist in this, we derive predictive inverse-gamma distributions for the between-study variance expected in future meta-analyses. These may serve as priors for heterogeneity in new meta-analyses. In a simulation study, we compare approximate Bayesian methods using meta-regression and pseudo data against fully Bayesian approaches based on importance sampling techniques and Markov chain Monte Carlo (MCMC). We compare the frequentist properties of these Bayesian methods with those of the commonly used frequentist DerSimonian and Laird procedure. The method is implemented in standard statistical software and provides a less complex alternative to standard MCMC approaches. An importance sampling approach produces almost identical results to standard MCMC approaches, and results obtained through meta-regression and pseudo data are very similar. On average, data augmentation provides closer results to MCMC, if implemented using restricted maximum likelihood estimation rather than DerSimonian and Laird or maximum likelihood estimation. The methods are applied to real datasets, and an extension to network meta-analysis is described. The proposed method facilitates Bayesian meta-analysis in a way that is accessible to applied researchers. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
A Bayesian approach to the statistical analysis of device preference studies.
Fu, Haoda; Qu, Yongming; Zhu, Baojin; Huster, William
2012-01-01
Drug delivery devices are required to have excellent technical specifications to deliver drugs accurately, and in addition, the devices should provide a satisfactory experience to patients because this can have a direct effect on drug compliance. To compare patients' experience with two devices, cross-over studies with patient-reported outcomes (PRO) as response variables are often used. Because of the strength of cross-over designs, each subject can directly compare the two devices by using the PRO variables, and variables indicating preference (preferring A, preferring B, or no preference) can be easily derived. Traditionally, methods based on frequentist statistics can be used to analyze such preference data, but there are some limitations for the frequentist methods. Recently, Bayesian methods are considered an acceptable method by the US Food and Drug Administration to design and analyze device studies. In this paper, we propose a Bayesian statistical method to analyze the data from preference trials. We demonstrate that the new Bayesian estimator enjoys some optimal properties versus the frequentist estimator. Copyright © 2012 John Wiley & Sons, Ltd.
Objectively combining AR5 instrumental period and paleoclimate climate sensitivity evidence
NASA Astrophysics Data System (ADS)
Lewis, Nicholas; Grünwald, Peter
2018-03-01
Combining instrumental period evidence regarding equilibrium climate sensitivity with largely independent paleoclimate proxy evidence should enable a more constrained sensitivity estimate to be obtained. Previous, subjective Bayesian approaches involved selection of a prior probability distribution reflecting the investigators' beliefs about climate sensitivity. Here a recently developed approach employing two different statistical methods—objective Bayesian and frequentist likelihood-ratio—is used to combine instrumental period and paleoclimate evidence based on data presented and assessments made in the IPCC Fifth Assessment Report. Probabilistic estimates from each source of evidence are represented by posterior probability density functions (PDFs) of physically-appropriate form that can be uniquely factored into a likelihood function and a noninformative prior distribution. The three-parameter form is shown accurately to fit a wide range of estimated climate sensitivity PDFs. The likelihood functions relating to the probabilistic estimates from the two sources are multiplicatively combined and a prior is derived that is noninformative for inference from the combined evidence. A posterior PDF that incorporates the evidence from both sources is produced using a single-step approach, which avoids the order-dependency that would arise if Bayesian updating were used. Results are compared with an alternative approach using the frequentist signed root likelihood ratio method. Results from these two methods are effectively identical, and provide a 5-95% range for climate sensitivity of 1.1-4.05 K (median 1.87 K).
Fourtune, Lisa; Prunier, Jérôme G; Paz-Vinas, Ivan; Loot, Géraldine; Veyssière, Charlotte; Blanchet, Simon
2018-04-01
Identifying landscape features that affect functional connectivity among populations is a major challenge in fundamental and applied sciences. Landscape genetics combines landscape and genetic data to address this issue, with the main objective of disentangling direct and indirect relationships among an intricate set of variables. Causal modeling has strong potential to address the complex nature of landscape genetic data sets. However, this statistical approach was not initially developed to address the pairwise distance matrices commonly used in landscape genetics. Here, we aimed to extend the applicability of two causal modeling methods-that is, maximum-likelihood path analysis and the directional separation test-by developing statistical approaches aimed at handling distance matrices and improving functional connectivity inference. Using simulations, we showed that these approaches greatly improved the robustness of the absolute (using a frequentist approach) and relative (using an information-theoretic approach) fits of the tested models. We used an empirical data set combining genetic information on a freshwater fish species (Gobio occitaniae) and detailed landscape descriptors to demonstrate the usefulness of causal modeling to identify functional connectivity in wild populations. Specifically, we demonstrated how direct and indirect relationships involving altitude, temperature, and oxygen concentration influenced within- and between-population genetic diversity of G. occitaniae.
Combining statistical inference and decisions in ecology
Williams, Perry J.; Hooten, Mevin B.
2016-01-01
Statistical decision theory (SDT) is a sub-field of decision theory that formally incorporates statistical investigation into a decision-theoretic framework to account for uncertainties in a decision problem. SDT provides a unifying analysis of three types of information: statistical results from a data set, knowledge of the consequences of potential choices (i.e., loss), and prior beliefs about a system. SDT links the theoretical development of a large body of statistical methods including point estimation, hypothesis testing, and confidence interval estimation. The theory and application of SDT have mainly been developed and published in the fields of mathematics, statistics, operations research, and other decision sciences, but have had limited exposure in ecology. Thus, we provide an introduction to SDT for ecologists and describe its utility for linking the conventionally separate tasks of statistical investigation and decision making in a single framework. We describe the basic framework of both Bayesian and frequentist SDT, its traditional use in statistics, and discuss its application to decision problems that occur in ecology. We demonstrate SDT with two types of decisions: Bayesian point estimation, and an applied management problem of selecting a prescribed fire rotation for managing a grassland bird species. Central to SDT, and decision theory in general, are loss functions. Thus, we also provide basic guidance and references for constructing loss functions for an SDT problem.
Ritter, Marie; Sauter, Disa A
2017-01-01
Group membership is important for how we perceive others, but although perceivers can accurately infer group membership from facial expressions and spoken language, it is not clear whether listeners can identify in- and out-group members from non-verbal vocalizations. In the current study, we examined perceivers' ability to identify group membership from non-verbal vocalizations of laughter, testing the following predictions: (1) listeners can distinguish between laughter from different nationalities and (2) between laughter from their in-group, a close out-group, and a distant out-group, and (3) greater exposure to laughter from members of other cultural groups is associated with better performance. Listeners ( n = 814) took part in an online forced-choice classification task in which they were asked to judge the origin of 24 laughter segments. The responses were analyzed using frequentist and Bayesian statistical analyses. Both kinds of analyses showed that listeners were unable to accurately identify group identity from laughter. Furthermore, exposure did not affect performance. These results provide a strong and clear demonstration that group identity cannot be inferred from laughter.
Ritter, Marie; Sauter, Disa A.
2017-01-01
Group membership is important for how we perceive others, but although perceivers can accurately infer group membership from facial expressions and spoken language, it is not clear whether listeners can identify in- and out-group members from non-verbal vocalizations. In the current study, we examined perceivers' ability to identify group membership from non-verbal vocalizations of laughter, testing the following predictions: (1) listeners can distinguish between laughter from different nationalities and (2) between laughter from their in-group, a close out-group, and a distant out-group, and (3) greater exposure to laughter from members of other cultural groups is associated with better performance. Listeners (n = 814) took part in an online forced-choice classification task in which they were asked to judge the origin of 24 laughter segments. The responses were analyzed using frequentist and Bayesian statistical analyses. Both kinds of analyses showed that listeners were unable to accurately identify group identity from laughter. Furthermore, exposure did not affect performance. These results provide a strong and clear demonstration that group identity cannot be inferred from laughter. PMID:29201012
A Bayesian approach to truncated data sets: An application to Malmquist bias in Supernova Cosmology
NASA Astrophysics Data System (ADS)
March, Marisa Cristina
2018-01-01
A problem commonly encountered in statistical analysis of data is that of truncated data sets. A truncated data set is one in which a number of data points are completely missing from a sample, this is in contrast to a censored sample in which partial information is missing from some data points. In astrophysics this problem is commonly seen in a magnitude limited survey such that the survey is incomplete at fainter magnitudes, that is, certain faint objects are simply not observed. The effect of this `missing data' is manifested as Malmquist bias and can result in biases in parameter inference if it is not accounted for. In Frequentist methodologies the Malmquist bias is often corrected for by analysing many simulations and computing the appropriate correction factors. One problem with this methodology is that the corrections are model dependent. In this poster we derive a Bayesian methodology for accounting for truncated data sets in problems of parameter inference and model selection. We first show the methodology for a simple Gaussian linear model and then go on to show the method for accounting for a truncated data set in the case for cosmological parameter inference with a magnitude limited supernova Ia survey.
Sinharay, Sandip; Jensen, Jens Ledet
2018-06-27
In educational and psychological measurement, researchers and/or practitioners are often interested in examining whether the ability of an examinee is the same over two sets of items. Such problems can arise in measurement of change, detection of cheating on unproctored tests, erasure analysis, detection of item preknowledge, etc. Traditional frequentist approaches that are used in such problems include the Wald test, the likelihood ratio test, and the score test (e.g., Fischer, Appl Psychol Meas 27:3-26, 2003; Finkelman, Weiss, & Kim-Kang, Appl Psychol Meas 34:238-254, 2010; Glas & Dagohoy, Psychometrika 72:159-180, 2007; Guo & Drasgow, Int J Sel Assess 18:351-364, 2010; Klauer & Rettig, Br J Math Stat Psychol 43:193-206, 1990; Sinharay, J Educ Behav Stat 42:46-68, 2017). This paper shows that approaches based on higher-order asymptotics (e.g., Barndorff-Nielsen & Cox, Inference and asymptotics. Springer, London, 1994; Ghosh, Higher order asymptotics. Institute of Mathematical Statistics, Hayward, 1994) can also be used to test for the equality of the examinee ability over two sets of items. The modified signed likelihood ratio test (e.g., Barndorff-Nielsen, Biometrika 73:307-322, 1986) and the Lugannani-Rice approximation (Lugannani & Rice, Adv Appl Prob 12:475-490, 1980), both of which are based on higher-order asymptotics, are shown to provide some improvement over the traditional frequentist approaches in three simulations. Two real data examples are also provided.
Bayesian statistics in radionuclide metrology: measurement of a decaying source
NASA Astrophysics Data System (ADS)
Bochud, François O.; Bailat, Claude J.; Laedermann, Jean-Pascal
2007-08-01
The most intuitive way of defining a probability is perhaps through the frequency at which it appears when a large number of trials are realized in identical conditions. The probability derived from the obtained histogram characterizes the so-called frequentist or conventional statistical approach. In this sense, probability is defined as a physical property of the observed system. By contrast, in Bayesian statistics, a probability is not a physical property or a directly observable quantity, but a degree of belief or an element of inference. The goal of this paper is to show how Bayesian statistics can be used in radionuclide metrology and what its advantages and disadvantages are compared with conventional statistics. This is performed through the example of an yttrium-90 source typically encountered in environmental surveillance measurement. Because of the very low activity of this kind of source and the small half-life of the radionuclide, this measurement takes several days, during which the source decays significantly. Several methods are proposed to compute simultaneously the number of unstable nuclei at a given reference time, the decay constant and the background. Asymptotically, all approaches give the same result. However, Bayesian statistics produces coherent estimates and confidence intervals in a much smaller number of measurements. Apart from the conceptual understanding of statistics, the main difficulty that could deter radionuclide metrologists from using Bayesian statistics is the complexity of the computation.
Combining statistical inference and decisions in ecology.
Williams, Perry J; Hooten, Mevin B
2016-09-01
Statistical decision theory (SDT) is a sub-field of decision theory that formally incorporates statistical investigation into a decision-theoretic framework to account for uncertainties in a decision problem. SDT provides a unifying analysis of three types of information: statistical results from a data set, knowledge of the consequences of potential choices (i.e., loss), and prior beliefs about a system. SDT links the theoretical development of a large body of statistical methods, including point estimation, hypothesis testing, and confidence interval estimation. The theory and application of SDT have mainly been developed and published in the fields of mathematics, statistics, operations research, and other decision sciences, but have had limited exposure in ecology. Thus, we provide an introduction to SDT for ecologists and describe its utility for linking the conventionally separate tasks of statistical investigation and decision making in a single framework. We describe the basic framework of both Bayesian and frequentist SDT, its traditional use in statistics, and discuss its application to decision problems that occur in ecology. We demonstrate SDT with two types of decisions: Bayesian point estimation and an applied management problem of selecting a prescribed fire rotation for managing a grassland bird species. Central to SDT, and decision theory in general, are loss functions. Thus, we also provide basic guidance and references for constructing loss functions for an SDT problem. © 2016 by the Ecological Society of America.
Reply to Rouder (2014): good frequentist properties raise confidence.
Sanborn, Adam N; Hills, Thomas T; Dougherty, Michael R; Thomas, Rick P; Yu, Erica C; Sprenger, Amber M
2014-04-01
Established psychological results have been called into question by demonstrations that statistical significance is easy to achieve, even in the absence of an effect. One often-warned-against practice, choosing when to stop the experiment on the basis of the results, is guaranteed to produce significant results. In response to these demonstrations, Bayes factors have been proposed as an antidote to this practice, because they are invariant with respect to how an experiment was stopped. Should researchers only care about the resulting Bayes factor, without concern for how it was produced? Yu, Sprenger, Thomas, and Dougherty (2014) and Sanborn and Hills (2014) demonstrated that Bayes factors are sometimes strongly influenced by the stopping rules used. However, Rouder (2014) has provided a compelling demonstration that despite this influence, the evidence supplied by Bayes factors remains correct. Here we address why the ability to influence Bayes factors should still matter to researchers, despite the correctness of the evidence. We argue that good frequentist properties mean that results will more often agree with researchers' statistical intuitions, and good frequentist properties control the number of studies that will later be refuted. Both help raise confidence in psychological results.
An introduction to using Bayesian linear regression with clinical data.
Baldwin, Scott A; Larson, Michael J
2017-11-01
Statistical training psychology focuses on frequentist methods. Bayesian methods are an alternative to standard frequentist methods. This article provides researchers with an introduction to fundamental ideas in Bayesian modeling. We use data from an electroencephalogram (EEG) and anxiety study to illustrate Bayesian models. Specifically, the models examine the relationship between error-related negativity (ERN), a particular event-related potential, and trait anxiety. Methodological topics covered include: how to set up a regression model in a Bayesian framework, specifying priors, examining convergence of the model, visualizing and interpreting posterior distributions, interval estimates, expected and predicted values, and model comparison tools. We also discuss situations where Bayesian methods can outperform frequentist methods as well has how to specify more complicated regression models. Finally, we conclude with recommendations about reporting guidelines for those using Bayesian methods in their own research. We provide data and R code for replicating our analyses. Copyright © 2017 Elsevier Ltd. All rights reserved.
Predictive inference for best linear combination of biomarkers subject to limits of detection.
Coolen-Maturi, Tahani
2017-08-15
Measuring the accuracy of diagnostic tests is crucial in many application areas including medicine, machine learning and credit scoring. The receiver operating characteristic (ROC) curve is a useful tool to assess the ability of a diagnostic test to discriminate between two classes or groups. In practice, multiple diagnostic tests or biomarkers are combined to improve diagnostic accuracy. Often, biomarker measurements are undetectable either below or above the so-called limits of detection (LoD). In this paper, nonparametric predictive inference (NPI) for best linear combination of two or more biomarkers subject to limits of detection is presented. NPI is a frequentist statistical method that is explicitly aimed at using few modelling assumptions, enabled through the use of lower and upper probabilities to quantify uncertainty. The NPI lower and upper bounds for the ROC curve subject to limits of detection are derived, where the objective function to maximize is the area under the ROC curve. In addition, the paper discusses the effect of restriction on the linear combination's coefficients on the analysis. Examples are provided to illustrate the proposed method. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Jones, Bernard J. T.
2017-04-01
Preface; Notation and conventions; Part I. 100 Years of Cosmology: 1. Emerging cosmology; 2. The cosmic expansion; 3. The cosmic microwave background; 4. Recent cosmology; Part II. Newtonian Cosmology: 5. Newtonian cosmology; 6. Dark energy cosmological models; 7. The early universe; 8. The inhomogeneous universe; 9. The inflationary universe; Part III. Relativistic Cosmology: 10. Minkowski space; 11. The energy momentum tensor; 12. General relativity; 13. Space-time geometry and calculus; 14. The Einstein field equations; 15. Solutions of the Einstein equations; 16. The Robertson-Walker solution; 17. Congruences, curvature and Raychaudhuri; 18. Observing and measuring the universe; Part IV. The Physics of Matter and Radiation: 19. Physics of the CMB radiation; 20. Recombination of the primeval plasma; 21. CMB polarisation; 22. CMB anisotropy; Part V. Precision Tools for Precision Cosmology: 23. Likelihood; 24. Frequentist hypothesis testing; 25. Statistical inference: Bayesian; 26. CMB data processing; 27. Parametrising the universe; 28. Precision cosmology; 29. Epilogue; Appendix A. SI, CGS and Planck units; Appendix B. Magnitudes and distances; Appendix C. Representing vectors and tensors; Appendix D. The electromagnetic field; Appendix E. Statistical distributions; Appendix F. Functions on a sphere; Appendix G. Acknowledgements; References; Index.
A critique of statistical hypothesis testing in clinical research
Raha, Somik
2011-01-01
Many have documented the difficulty of using the current paradigm of Randomized Controlled Trials (RCTs) to test and validate the effectiveness of alternative medical systems such as Ayurveda. This paper critiques the applicability of RCTs for all clinical knowledge-seeking endeavors, of which Ayurveda research is a part. This is done by examining statistical hypothesis testing, the underlying foundation of RCTs, from a practical and philosophical perspective. In the philosophical critique, the two main worldviews of probability are that of the Bayesian and the frequentist. The frequentist worldview is a special case of the Bayesian worldview requiring the unrealistic assumptions of knowing nothing about the universe and believing that all observations are unrelated to each other. Many have claimed that the first belief is necessary for science, and this claim is debunked by comparing variations in learning with different prior beliefs. Moving beyond the Bayesian and frequentist worldviews, the notion of hypothesis testing itself is challenged on the grounds that a hypothesis is an unclear distinction, and assigning a probability on an unclear distinction is an exercise that does not lead to clarity of action. This critique is of the theory itself and not any particular application of statistical hypothesis testing. A decision-making frame is proposed as a way of both addressing this critique and transcending ideological debates on probability. An example of a Bayesian decision-making approach is shown as an alternative to statistical hypothesis testing, utilizing data from a past clinical trial that studied the effect of Aspirin on heart attacks in a sample population of doctors. As a big reason for the prevalence of RCTs in academia is legislation requiring it, the ethics of legislating the use of statistical methods for clinical research is also examined. PMID:22022152
What Is the Probability You Are a Bayesian?
ERIC Educational Resources Information Center
Wulff, Shaun S.; Robinson, Timothy J.
2014-01-01
Bayesian methodology continues to be widely used in statistical applications. As a result, it is increasingly important to introduce students to Bayesian thinking at early stages in their mathematics and statistics education. While many students in upper level probability courses can recite the differences in the Frequentist and Bayesian…
Kernel learning at the first level of inference.
Cawley, Gavin C; Talbot, Nicola L C
2014-05-01
Kernel learning methods, whether Bayesian or frequentist, typically involve multiple levels of inference, with the coefficients of the kernel expansion being determined at the first level and the kernel and regularisation parameters carefully tuned at the second level, a process known as model selection. Model selection for kernel machines is commonly performed via optimisation of a suitable model selection criterion, often based on cross-validation or theoretical performance bounds. However, if there are a large number of kernel parameters, as for instance in the case of automatic relevance determination (ARD), there is a substantial risk of over-fitting the model selection criterion, resulting in poor generalisation performance. In this paper we investigate the possibility of learning the kernel, for the Least-Squares Support Vector Machine (LS-SVM) classifier, at the first level of inference, i.e. parameter optimisation. The kernel parameters and the coefficients of the kernel expansion are jointly optimised at the first level of inference, minimising a training criterion with an additional regularisation term acting on the kernel parameters. The key advantage of this approach is that the values of only two regularisation parameters need be determined in model selection, substantially alleviating the problem of over-fitting the model selection criterion. The benefits of this approach are demonstrated using a suite of synthetic and real-world binary classification benchmark problems, where kernel learning at the first level of inference is shown to be statistically superior to the conventional approach, improves on our previous work (Cawley and Talbot, 2007) and is competitive with Multiple Kernel Learning approaches, but with reduced computational expense. Copyright © 2014 Elsevier Ltd. All rights reserved.
“Magnitude-based Inference”: A Statistical Review
Welsh, Alan H.; Knight, Emma J.
2015-01-01
ABSTRACT Purpose We consider “magnitude-based inference” and its interpretation by examining in detail its use in the problem of comparing two means. Methods We extract from the spreadsheets, which are provided to users of the analysis (http://www.sportsci.org/), a precise description of how “magnitude-based inference” is implemented. We compare the implemented version of the method with general descriptions of it and interpret the method in familiar statistical terms. Results and Conclusions We show that “magnitude-based inference” is not a progressive improvement on modern statistics. The additional probabilities introduced are not directly related to the confidence interval but, rather, are interpretable either as P values for two different nonstandard tests (for different null hypotheses) or as approximate Bayesian calculations, which also lead to a type of test. We also discuss sample size calculations associated with “magnitude-based inference” and show that the substantial reduction in sample sizes claimed for the method (30% of the sample size obtained from standard frequentist calculations) is not justifiable so the sample size calculations should not be used. Rather than using “magnitude-based inference,” a better solution is to be realistic about the limitations of the data and use either confidence intervals or a fully Bayesian analysis. PMID:25051387
A Bayesian nonparametric method for prediction in EST analysis
Lijoi, Antonio; Mena, Ramsés H; Prünster, Igor
2007-01-01
Background Expressed sequence tags (ESTs) analyses are a fundamental tool for gene identification in organisms. Given a preliminary EST sample from a certain library, several statistical prediction problems arise. In particular, it is of interest to estimate how many new genes can be detected in a future EST sample of given size and also to determine the gene discovery rate: these estimates represent the basis for deciding whether to proceed sequencing the library and, in case of a positive decision, a guideline for selecting the size of the new sample. Such information is also useful for establishing sequencing efficiency in experimental design and for measuring the degree of redundancy of an EST library. Results In this work we propose a Bayesian nonparametric approach for tackling statistical problems related to EST surveys. In particular, we provide estimates for: a) the coverage, defined as the proportion of unique genes in the library represented in the given sample of reads; b) the number of new unique genes to be observed in a future sample; c) the discovery rate of new genes as a function of the future sample size. The Bayesian nonparametric model we adopt conveys, in a statistically rigorous way, the available information into prediction. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples. EST libraries, previously studied with frequentist methods, are analyzed in detail. Conclusion The Bayesian nonparametric approach we undertake yields valuable tools for gene capture and prediction in EST libraries. The estimators we obtain do not feature the kind of drawbacks associated with frequentist estimators and are reliable for any size of the additional sample. PMID:17868445
Li, Shi; Mukherjee, Bhramar; Batterman, Stuart; Ghosh, Malay
2013-12-01
Case-crossover designs are widely used to study short-term exposure effects on the risk of acute adverse health events. While the frequentist literature on this topic is vast, there is no Bayesian work in this general area. The contribution of this paper is twofold. First, the paper establishes Bayesian equivalence results that require characterization of the set of priors under which the posterior distributions of the risk ratio parameters based on a case-crossover and time-series analysis are identical. Second, the paper studies inferential issues under case-crossover designs in a Bayesian framework. Traditionally, a conditional logistic regression is used for inference on risk-ratio parameters in case-crossover studies. We consider instead a more general full likelihood-based approach which makes less restrictive assumptions on the risk functions. Formulation of a full likelihood leads to growth in the number of parameters proportional to the sample size. We propose a semi-parametric Bayesian approach using a Dirichlet process prior to handle the random nuisance parameters that appear in a full likelihood formulation. We carry out a simulation study to compare the Bayesian methods based on full and conditional likelihood with the standard frequentist approaches for case-crossover and time-series analysis. The proposed methods are illustrated through the Detroit Asthma Morbidity, Air Quality and Traffic study, which examines the association between acute asthma risk and ambient air pollutant concentrations. © 2013, The International Biometric Society.
The Importance of Practice in the Development of Statistics.
1983-01-01
RESOLUTION TEST CHART NATIONAL BUREAU OIF STANDARDS 1963 -A NRC Technical Summary Report #2471 C THE IMORTANCE OF PRACTICE IN to THE DEVELOPMENT OF STATISTICS...component analysis, bioassay, limits for a ratio, quality control, sampling inspection, non-parametric tests , transformation theory, ARIMA time series...models, sequential tests , cumulative sum charts, data analysis plotting techniques, and a resolution of the Bayes - frequentist controversy. It appears
Nasejje, Justine B; Mwambi, Henry G; Achia, Thomas N O
2015-10-01
Infant and child mortality rates are among the health indicators of importance in a given community or country. It is the fourth millennium development goal that by 2015, all the United Nations member countries are expected to have reduced their infant and child mortality rates by two-thirds. Uganda is one of those countries in Sub-Saharan Africa with high infant and child mortality rates, therefore it is important to use sound statistical methods to determine which factors are strongly associated with child mortality which in turn will help inform the design of intervention strategies The Uganda Demographic Health Survey (UDHS) funded by USAID, UNFPA, UNICEF, Irish Aid and the United Kingdom government provides a data set which is rich in information on child mortality or survival. Survival analysis techniques are among the well-developed methods in Statistics for analysing time to event data. These methods were adopted in this paper to examine factors affecting under-five child mortality rates (UMR) in Uganda using the UDHS data for 2011 in R and STATA software. Results obtained by fitting the Cox-proportional hazard model with frailty effects and drawing inference using both the frequentists and Bayesian approaches at 5 % significance level, show evidence of the existence of unobserved heterogeneity at the household level but there was not enough evidence to conclude the existence of unobserved heterogeneity at the community level. Sex of the household head, sex of the child and number of births in the past one year were found to be significant. The results further suggest that over the period of 1990-2015, Uganda reduced its UMR by 52 % . Uganda has not achieved the MDG4 target but the 52 % reduction in the UMR is a move in the positive direction. Demographic factors (sex of the household head) and Biological determinants (sex of the child and number of births in the past one year) are strongly associated with high UMR. Heterogeneity or unobserved covariates were found to be significant at the household but insignificant at the community level.
Bayesian Propensity Score Analysis: Simulation and Case Study
ERIC Educational Resources Information Center
Kaplan, David; Chen, Cassie J. S.
2011-01-01
Propensity score analysis (PSA) has been used in a variety of settings, such as education, epidemiology, and sociology. Most typically, propensity score analysis has been implemented within the conventional frequentist perspective of statistics. This perspective, as is well known, does not account for uncertainty in either the parameters of the…
Mielke, Johanna; Schmidli, Heinz; Jones, Byron
2018-05-01
For the approval of biosimilars, it is, in most cases, necessary to conduct large Phase III clinical trials in patients to convince the regulatory authorities that the product is comparable in terms of efficacy and safety to the originator product. As the originator product has already been studied in several trials beforehand, it seems natural to include this historical information into the showing of equivalent efficacy. Since all studies for the regulatory approval of biosimilars are confirmatory studies, it is required that the statistical approach has reasonable frequentist properties, most importantly, that the Type I error rate is controlled-at least in all scenarios that are realistic in practice. However, it is well known that the incorporation of historical information can lead to an inflation of the Type I error rate in the case of a conflict between the distribution of the historical data and the distribution of the trial data. We illustrate this issue and confirm, using the Bayesian robustified meta-analytic-predictive (MAP) approach as an example, that simultaneously controlling the Type I error rate over the complete parameter space and gaining power in comparison to a standard frequentist approach that only considers the data in the new study, is not possible. We propose a hybrid Bayesian-frequentist approach for binary endpoints that controls the Type I error rate in the neighborhood of the center of the prior distribution, while improving the power. We study the properties of this approach in an extensive simulation study and provide a real-world example. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Bayesian analyses of time-interval data for environmental radiation monitoring.
Luo, Peng; Sharp, Julia L; DeVol, Timothy A
2013-01-01
Time-interval (time difference between two consecutive pulses) analysis based on the principles of Bayesian inference was investigated for online radiation monitoring. Using experimental and simulated data, Bayesian analysis of time-interval data [Bayesian (ti)] was compared with Bayesian and a conventional frequentist analysis of counts in a fixed count time [Bayesian (cnt) and single interval test (SIT), respectively]. The performances of the three methods were compared in terms of average run length (ARL) and detection probability for several simulated detection scenarios. Experimental data were acquired with a DGF-4C system in list mode. Simulated data were obtained using Monte Carlo techniques to obtain a random sampling of the Poisson distribution. All statistical algorithms were developed using the R Project for statistical computing. Bayesian analysis of time-interval information provided a similar detection probability as Bayesian analysis of count information, but the authors were able to make a decision with fewer pulses at relatively higher radiation levels. In addition, for the cases with very short presence of the source (< count time), time-interval information is more sensitive to detect a change than count information since the source data is averaged by the background data over the entire count time. The relationships of the source time, change points, and modifications to the Bayesian approach for increasing detection probability are presented.
The power prior: theory and applications.
Ibrahim, Joseph G; Chen, Ming-Hui; Gwon, Yeongjin; Chen, Fang
2015-12-10
The power prior has been widely used in many applications covering a large number of disciplines. The power prior is intended to be an informative prior constructed from historical data. It has been used in clinical trials, genetics, health care, psychology, environmental health, engineering, economics, and business. It has also been applied for a wide variety of models and settings, both in the experimental design and analysis contexts. In this review article, we give an A-to-Z exposition of the power prior and its applications to date. We review its theoretical properties, variations in its formulation, statistical contexts for which it has been used, applications, and its advantages over other informative priors. We review models for which it has been used, including generalized linear models, survival models, and random effects models. Statistical areas where the power prior has been used include model selection, experimental design, hierarchical modeling, and conjugate priors. Frequentist properties of power priors in posterior inference are established, and a simulation study is conducted to further examine the empirical performance of the posterior estimates with power priors. Real data analyses are given illustrating the power prior as well as the use of the power prior in the Bayesian design of clinical trials. Copyright © 2015 John Wiley & Sons, Ltd.
A Bayesian Approach for Summarizing and Modeling Time-Series Exposure Data with Left Censoring.
Houseman, E Andres; Virji, M Abbas
2017-08-01
Direct reading instruments are valuable tools for measuring exposure as they provide real-time measurements for rapid decision making. However, their use is limited to general survey applications in part due to issues related to their performance. Moreover, statistical analysis of real-time data is complicated by autocorrelation among successive measurements, non-stationary time series, and the presence of left-censoring due to limit-of-detection (LOD). A Bayesian framework is proposed that accounts for non-stationary autocorrelation and LOD issues in exposure time-series data in order to model workplace factors that affect exposure and estimate summary statistics for tasks or other covariates of interest. A spline-based approach is used to model non-stationary autocorrelation with relatively few assumptions about autocorrelation structure. Left-censoring is addressed by integrating over the left tail of the distribution. The model is fit using Markov-Chain Monte Carlo within a Bayesian paradigm. The method can flexibly account for hierarchical relationships, random effects and fixed effects of covariates. The method is implemented using the rjags package in R, and is illustrated by applying it to real-time exposure data. Estimates for task means and covariates from the Bayesian model are compared to those from conventional frequentist models including linear regression, mixed-effects, and time-series models with different autocorrelation structures. Simulations studies are also conducted to evaluate method performance. Simulation studies with percent of measurements below the LOD ranging from 0 to 50% showed lowest root mean squared errors for task means and the least biased standard deviations from the Bayesian model compared to the frequentist models across all levels of LOD. In the application, task means from the Bayesian model were similar to means from the frequentist models, while the standard deviations were different. Parameter estimates for covariates were significant in some frequentist models, but in the Bayesian model their credible intervals contained zero; such discrepancies were observed in multiple datasets. Variance components from the Bayesian model reflected substantial autocorrelation, consistent with the frequentist models, except for the auto-regressive moving average model. Plots of means from the Bayesian model showed good fit to the observed data. The proposed Bayesian model provides an approach for modeling non-stationary autocorrelation in a hierarchical modeling framework to estimate task means, standard deviations, quantiles, and parameter estimates for covariates that are less biased and have better performance characteristics than some of the contemporary methods. Published by Oxford University Press on behalf of the British Occupational Hygiene Society 2017.
Network meta-analysis: application and practice using Stata
2017-01-01
This review aimed to arrange the concepts of a network meta-analysis (NMA) and to demonstrate the analytical process of NMA using Stata software under frequentist framework. The NMA tries to synthesize evidences for a decision making by evaluating the comparative effectiveness of more than two alternative interventions for the same condition. Before conducting a NMA, 3 major assumptions—similarity, transitivity, and consistency—should be checked. The statistical analysis consists of 5 steps. The first step is to draw a network geometry to provide an overview of the network relationship. The second step checks the assumption of consistency. The third step is to make the network forest plot or interval plot in order to illustrate the summary size of comparative effectiveness among various interventions. The fourth step calculates cumulative rankings for identifying superiority among interventions. The last step evaluates publication bias or effect modifiers for a valid inference from results. The synthesized evidences through five steps would be very useful to evidence-based decision-making in healthcare. Thus, NMA should be activated in order to guarantee the quality of healthcare system. PMID:29092392
Network meta-analysis: application and practice using Stata.
Shim, Sungryul; Yoon, Byung-Ho; Shin, In-Soo; Bae, Jong-Myon
2017-01-01
This review aimed to arrange the concepts of a network meta-analysis (NMA) and to demonstrate the analytical process of NMA using Stata software under frequentist framework. The NMA tries to synthesize evidences for a decision making by evaluating the comparative effectiveness of more than two alternative interventions for the same condition. Before conducting a NMA, 3 major assumptions-similarity, transitivity, and consistency-should be checked. The statistical analysis consists of 5 steps. The first step is to draw a network geometry to provide an overview of the network relationship. The second step checks the assumption of consistency. The third step is to make the network forest plot or interval plot in order to illustrate the summary size of comparative effectiveness among various interventions. The fourth step calculates cumulative rankings for identifying superiority among interventions. The last step evaluates publication bias or effect modifiers for a valid inference from results. The synthesized evidences through five steps would be very useful to evidence-based decision-making in healthcare. Thus, NMA should be activated in order to guarantee the quality of healthcare system.
Efficient Bayesian inference for natural time series using ARFIMA processes
NASA Astrophysics Data System (ADS)
Graves, T.; Gramacy, R. B.; Franzke, C. L. E.; Watkins, N. W.
2015-11-01
Many geophysical quantities, such as atmospheric temperature, water levels in rivers, and wind speeds, have shown evidence of long memory (LM). LM implies that these quantities experience non-trivial temporal memory, which potentially not only enhances their predictability, but also hampers the detection of externally forced trends. Thus, it is important to reliably identify whether or not a system exhibits LM. In this paper we present a modern and systematic approach to the inference of LM. We use the flexible autoregressive fractional integrated moving average (ARFIMA) model, which is widely used in time series analysis, and of increasing interest in climate science. Unlike most previous work on the inference of LM, which is frequentist in nature, we provide a systematic treatment of Bayesian inference. In particular, we provide a new approximate likelihood for efficient parameter inference, and show how nuisance parameters (e.g., short-memory effects) can be integrated over in order to focus on long-memory parameters and hypothesis testing more directly. We illustrate our new methodology on the Nile water level data and the central England temperature (CET) time series, with favorable comparison to the standard estimators. For CET we also extend our method to seasonal long memory.
A Bayesian-frequentist two-stage single-arm phase II clinical trial design.
Dong, Gaohong; Shih, Weichung Joe; Moore, Dirk; Quan, Hui; Marcella, Stephen
2012-08-30
It is well-known that both frequentist and Bayesian clinical trial designs have their own advantages and disadvantages. To have better properties inherited from these two types of designs, we developed a Bayesian-frequentist two-stage single-arm phase II clinical trial design. This design allows both early acceptance and rejection of the null hypothesis ( H(0) ). The measures (for example probability of trial early termination, expected sample size, etc.) of the design properties under both frequentist and Bayesian settings are derived. Moreover, under the Bayesian setting, the upper and lower boundaries are determined with predictive probability of trial success outcome. Given a beta prior and a sample size for stage I, based on the marginal distribution of the responses at stage I, we derived Bayesian Type I and Type II error rates. By controlling both frequentist and Bayesian error rates, the Bayesian-frequentist two-stage design has special features compared with other two-stage designs. Copyright © 2012 John Wiley & Sons, Ltd.
Bartlett, Jonathan W; Keogh, Ruth H
2018-06-01
Bayesian approaches for handling covariate measurement error are well established and yet arguably are still relatively little used by researchers. For some this is likely due to unfamiliarity or disagreement with the Bayesian inferential paradigm. For others a contributory factor is the inability of standard statistical packages to perform such Bayesian analyses. In this paper, we first give an overview of the Bayesian approach to handling covariate measurement error, and contrast it with regression calibration, arguably the most commonly adopted approach. We then argue why the Bayesian approach has a number of statistical advantages compared to regression calibration and demonstrate that implementing the Bayesian approach is usually quite feasible for the analyst. Next, we describe the closely related maximum likelihood and multiple imputation approaches and explain why we believe the Bayesian approach to generally be preferable. We then empirically compare the frequentist properties of regression calibration and the Bayesian approach through simulation studies. The flexibility of the Bayesian approach to handle both measurement error and missing data is then illustrated through an analysis of data from the Third National Health and Nutrition Examination Survey.
A crash course on data analysis in asteroseismology
NASA Astrophysics Data System (ADS)
Appourchaux, Thierry
2014-02-01
In this course, I try to provide a few basics required for performing data analysis in asteroseismology. First, I address how one can properly treat times series: the sampling, the filtering effect, the use of Fourier transform, the associated statistics. Second, I address how one can apply statistics for decision making and for parameter estimation either in a frequentist of a Bayesian framework. Last, I review how these basic principle have been applied (or not) in asteroseismology.
Bayesian approach for counting experiment statistics applied to a neutrino point source analysis
NASA Astrophysics Data System (ADS)
Bose, D.; Brayeur, L.; Casier, M.; de Vries, K. D.; Golup, G.; van Eijndhoven, N.
2013-12-01
In this paper we present a model independent analysis method following Bayesian statistics to analyse data from a generic counting experiment and apply it to the search for neutrinos from point sources. We discuss a test statistic defined following a Bayesian framework that will be used in the search for a signal. In case no signal is found, we derive an upper limit without the introduction of approximations. The Bayesian approach allows us to obtain the full probability density function for both the background and the signal rate. As such, we have direct access to any signal upper limit. The upper limit derivation directly compares with a frequentist approach and is robust in the case of low-counting observations. Furthermore, it allows also to account for previous upper limits obtained by other analyses via the concept of prior information without the need of the ad hoc application of trial factors. To investigate the validity of the presented Bayesian approach, we have applied this method to the public IceCube 40-string configuration data for 10 nearby blazars and we have obtained a flux upper limit, which is in agreement with the upper limits determined via a frequentist approach. Furthermore, the upper limit obtained compares well with the previously published result of IceCube, using the same data set.
Why Current Statistics of Complementary Alternative Medicine Clinical Trials is Invalid.
Pandolfi, Maurizio; Carreras, Giulia
2018-06-07
It is not sufficiently known that frequentist statistics cannot provide direct information on the probability that the research hypothesis tested is correct. The error resulting from this misunderstanding is compounded when the hypotheses under scrutiny have precarious scientific bases, which, generally, those of complementary alternative medicine (CAM) are. In such cases, it is mandatory to use inferential statistics, considering the prior probability that the hypothesis tested is true, such as the Bayesian statistics. The authors show that, under such circumstances, no real statistical significance can be achieved in CAM clinical trials. In this respect, CAM trials involving human material are also hardly defensible from an ethical viewpoint.
The frequentist implications of optional stopping on Bayesian hypothesis tests.
Sanborn, Adam N; Hills, Thomas T
2014-04-01
Null hypothesis significance testing (NHST) is the most commonly used statistical methodology in psychology. The probability of achieving a value as extreme or more extreme than the statistic obtained from the data is evaluated, and if it is low enough, the null hypothesis is rejected. However, because common experimental practice often clashes with the assumptions underlying NHST, these calculated probabilities are often incorrect. Most commonly, experimenters use tests that assume that sample sizes are fixed in advance of data collection but then use the data to determine when to stop; in the limit, experimenters can use data monitoring to guarantee that the null hypothesis will be rejected. Bayesian hypothesis testing (BHT) provides a solution to these ills because the stopping rule used is irrelevant to the calculation of a Bayes factor. In addition, there are strong mathematical guarantees on the frequentist properties of BHT that are comforting for researchers concerned that stopping rules could influence the Bayes factors produced. Here, we show that these guaranteed bounds have limited scope and often do not apply in psychological research. Specifically, we quantitatively demonstrate the impact of optional stopping on the resulting Bayes factors in two common situations: (1) when the truth is a combination of the hypotheses, such as in a heterogeneous population, and (2) when a hypothesis is composite-taking multiple parameter values-such as the alternative hypothesis in a t-test. We found that, for these situations, while the Bayesian interpretation remains correct regardless of the stopping rule used, the choice of stopping rule can, in some situations, greatly increase the chance of experimenters finding evidence in the direction they desire. We suggest ways to control these frequentist implications of stopping rules on BHT.
Bayesian Group Bridge for Bi-level Variable Selection.
Mallick, Himel; Yi, Nengjun
2017-06-01
A Bayesian bi-level variable selection method (BAGB: Bayesian Analysis of Group Bridge) is developed for regularized regression and classification. This new development is motivated by grouped data, where generic variables can be divided into multiple groups, with variables in the same group being mechanistically related or statistically correlated. As an alternative to frequentist group variable selection methods, BAGB incorporates structural information among predictors through a group-wise shrinkage prior. Posterior computation proceeds via an efficient MCMC algorithm. In addition to the usual ease-of-interpretation of hierarchical linear models, the Bayesian formulation produces valid standard errors, a feature that is notably absent in the frequentist framework. Empirical evidence of the attractiveness of the method is illustrated by extensive Monte Carlo simulations and real data analysis. Finally, several extensions of this new approach are presented, providing a unified framework for bi-level variable selection in general models with flexible penalties.
Rediscovery of Good-Turing estimators via Bayesian nonparametrics.
Favaro, Stefano; Nipoti, Bernardo; Teh, Yee Whye
2016-03-01
The problem of estimating discovery probabilities originated in the context of statistical ecology, and in recent years it has become popular due to its frequent appearance in challenging applications arising in genetics, bioinformatics, linguistics, designs of experiments, machine learning, etc. A full range of statistical approaches, parametric and nonparametric as well as frequentist and Bayesian, has been proposed for estimating discovery probabilities. In this article, we investigate the relationships between the celebrated Good-Turing approach, which is a frequentist nonparametric approach developed in the 1940s, and a Bayesian nonparametric approach recently introduced in the literature. Specifically, under the assumption of a two parameter Poisson-Dirichlet prior, we show that Bayesian nonparametric estimators of discovery probabilities are asymptotically equivalent, for a large sample size, to suitably smoothed Good-Turing estimators. As a by-product of this result, we introduce and investigate a methodology for deriving exact and asymptotic credible intervals to be associated with the Bayesian nonparametric estimators of discovery probabilities. The proposed methodology is illustrated through a comprehensive simulation study and the analysis of Expressed Sequence Tags data generated by sequencing a benchmark complementary DNA library. © 2015, The International Biometric Society.
Frequency, probability, and prediction: easy solutions to cognitive illusions?
Griffin, D; Buehler, R
1999-02-01
Many errors in probabilistic judgment have been attributed to people's inability to think in statistical terms when faced with information about a single case. Prior theoretical analyses and empirical results imply that the errors associated with case-specific reasoning may be reduced when people make frequentistic predictions about a set of cases. In studies of three previously identified cognitive biases, we find that frequency-based predictions are different from-but no better than-case-specific judgments of probability. First, in studies of the "planning fallacy, " we compare the accuracy of aggregate frequency and case-specific probability judgments in predictions of students' real-life projects. When aggregate and single-case predictions are collected from different respondents, there is little difference between the two: Both are overly optimistic and show little predictive validity. However, in within-subject comparisons, the aggregate judgments are significantly more conservative than the single-case predictions, though still optimistically biased. Results from studies of overconfidence in general knowledge and base rate neglect in categorical prediction underline a general conclusion. Frequentistic predictions made for sets of events are no more statistically sophisticated, nor more accurate, than predictions made for individual events using subjective probability. Copyright 1999 Academic Press.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Host, Ole; Lahav, Ofer; Abdalla, Filipe B.
We present a showcase for deriving bounds on the neutrino masses from laboratory experiments and cosmological observations. We compare the frequentist and Bayesian bounds on the effective electron neutrino mass m{sub {beta}} which the KATRIN neutrino mass experiment is expected to obtain, using both an analytical likelihood function and Monte Carlo simulations of KATRIN. Assuming a uniform prior in m{sub {beta}}, we find that a null result yields an upper bound of about 0.17 eV at 90% confidence in the Bayesian analysis, to be compared with the frequentist KATRIN reference value of 0.20 eV. This is a significant difference whenmore » judged relative to the systematic and statistical uncertainties of the experiment. On the other hand, an input m{sub {beta}}=0.35 eV, which is the KATRIN 5{sigma} detection threshold, would be detected at virtually the same level. Finally, we combine the simulated KATRIN results with cosmological data in the form of present (post-WMAP) and future (simulated Planck) observations. If an input of m{sub {beta}}=0.2 eV is assumed in our simulations, KATRIN alone excludes a zero neutrino mass at 2.2{sigma}. Adding Planck data increases the probability of detection to a median 2.7{sigma}. The analysis highlights the importance of combining cosmological and laboratory data on an equal footing.« less
NASA Astrophysics Data System (ADS)
Cox, M.; Shirono, K.
2017-10-01
A criticism levelled at the Guide to the Expression of Uncertainty in Measurement (GUM) is that it is based on a mixture of frequentist and Bayesian thinking. In particular, the GUM’s Type A (statistical) uncertainty evaluations are frequentist, whereas the Type B evaluations, using state-of-knowledge distributions, are Bayesian. In contrast, making the GUM fully Bayesian implies, among other things, that a conventional objective Bayesian approach to Type A uncertainty evaluation for a number n of observations leads to the impractical consequence that n must be at least equal to 4, thus presenting a difficulty for many metrologists. This paper presents a Bayesian analysis of Type A uncertainty evaluation that applies for all n ≥slant 2 , as in the frequentist analysis in the current GUM. The analysis is based on assuming that the observations are drawn from a normal distribution (as in the conventional objective Bayesian analysis), but uses an informative prior based on lower and upper bounds for the standard deviation of the sampling distribution for the quantity under consideration. The main outcome of the analysis is a closed-form mathematical expression for the factor by which the standard deviation of the mean observation should be multiplied to calculate the required standard uncertainty. Metrological examples are used to illustrate the approach, which is straightforward to apply using a formula or look-up table.
Error-in-variables models in calibration
NASA Astrophysics Data System (ADS)
Lira, I.; Grientschnig, D.
2017-12-01
In many calibration operations, the stimuli applied to the measuring system or instrument under test are derived from measurement standards whose values may be considered to be perfectly known. In that case, it is assumed that calibration uncertainty arises solely from inexact measurement of the responses, from imperfect control of the calibration process and from the possible inaccuracy of the calibration model. However, the premise that the stimuli are completely known is never strictly fulfilled and in some instances it may be grossly inadequate. Then, error-in-variables (EIV) regression models have to be employed. In metrology, these models have been approached mostly from the frequentist perspective. In contrast, not much guidance is available on their Bayesian analysis. In this paper, we first present a brief summary of the conventional statistical techniques that have been developed to deal with EIV models in calibration. We then proceed to discuss the alternative Bayesian framework under some simplifying assumptions. Through a detailed example about the calibration of an instrument for measuring flow rates, we provide advice on how the user of the calibration function should employ the latter framework for inferring the stimulus acting on the calibrated device when, in use, a certain response is measured.
Li, Baoyue; Lingsma, Hester F; Steyerberg, Ewout W; Lesaffre, Emmanuel
2011-05-23
Logistic random effects models are a popular tool to analyze multilevel also called hierarchical data with a binary or ordinal outcome. Here, we aim to compare different statistical software implementations of these models. We used individual patient data from 8509 patients in 231 centers with moderate and severe Traumatic Brain Injury (TBI) enrolled in eight Randomized Controlled Trials (RCTs) and three observational studies. We fitted logistic random effects regression models with the 5-point Glasgow Outcome Scale (GOS) as outcome, both dichotomized as well as ordinal, with center and/or trial as random effects, and as covariates age, motor score, pupil reactivity or trial. We then compared the implementations of frequentist and Bayesian methods to estimate the fixed and random effects. Frequentist approaches included R (lme4), Stata (GLLAMM), SAS (GLIMMIX and NLMIXED), MLwiN ([R]IGLS) and MIXOR, Bayesian approaches included WinBUGS, MLwiN (MCMC), R package MCMCglmm and SAS experimental procedure MCMC.Three data sets (the full data set and two sub-datasets) were analysed using basically two logistic random effects models with either one random effect for the center or two random effects for center and trial. For the ordinal outcome in the full data set also a proportional odds model with a random center effect was fitted. The packages gave similar parameter estimates for both the fixed and random effects and for the binary (and ordinal) models for the main study and when based on a relatively large number of level-1 (patient level) data compared to the number of level-2 (hospital level) data. However, when based on relatively sparse data set, i.e. when the numbers of level-1 and level-2 data units were about the same, the frequentist and Bayesian approaches showed somewhat different results. The software implementations differ considerably in flexibility, computation time, and usability. There are also differences in the availability of additional tools for model evaluation, such as diagnostic plots. The experimental SAS (version 9.2) procedure MCMC appeared to be inefficient. On relatively large data sets, the different software implementations of logistic random effects regression models produced similar results. Thus, for a large data set there seems to be no explicit preference (of course if there is no preference from a philosophical point of view) for either a frequentist or Bayesian approach (if based on vague priors). The choice for a particular implementation may largely depend on the desired flexibility, and the usability of the package. For small data sets the random effects variances are difficult to estimate. In the frequentist approaches the MLE of this variance was often estimated zero with a standard error that is either zero or could not be determined, while for Bayesian methods the estimates could depend on the chosen "non-informative" prior of the variance parameter. The starting value for the variance parameter may be also critical for the convergence of the Markov chain.
Efficient Bayesian inference for natural time series using ARFIMA processes
NASA Astrophysics Data System (ADS)
Graves, Timothy; Gramacy, Robert; Franzke, Christian; Watkins, Nicholas
2016-04-01
Many geophysical quantities, such as atmospheric temperature, water levels in rivers, and wind speeds, have shown evidence of long memory (LM). LM implies that these quantities experience non-trivial temporal memory, which potentially not only enhances their predictability, but also hampers the detection of externally forced trends. Thus, it is important to reliably identify whether or not a system exhibits LM. We present a modern and systematic approach to the inference of LM. We use the flexible autoregressive fractional integrated moving average (ARFIMA) model, which is widely used in time series analysis, and of increasing interest in climate science. Unlike most previous work on the inference of LM, which is frequentist in nature, we provide a systematic treatment of Bayesian inference. In particular, we provide a new approximate likelihood for efficient parameter inference, and show how nuisance parameters (e.g., short-memory effects) can be integrated over in order to focus on long-memory parameters and hypothesis testing more directly. We illustrate our new methodology on the Nile water level data and the central England temperature (CET) time series, with favorable comparison to the standard estimators [1]. In addition we show how the method can be used to perform joint inference of the stability exponent and the memory parameter when ARFIMA is extended to allow for alpha-stable innovations. Such models can be used to study systems where heavy tails and long range memory coexist. [1] Graves et al, Nonlin. Processes Geophys., 22, 679-700, 2015; doi:10.5194/npg-22-679-2015.
Planck intermediate results. XVI. Profile likelihoods for cosmological parameters
NASA Astrophysics Data System (ADS)
Planck Collaboration; Ade, P. A. R.; Aghanim, N.; Arnaud, M.; Ashdown, M.; Aumont, J.; Baccigalupi, C.; Banday, A. J.; Barreiro, R. B.; Bartlett, J. G.; Battaner, E.; Benabed, K.; Benoit-Lévy, A.; Bernard, J.-P.; Bersanelli, M.; Bielewicz, P.; Bobin, J.; Bonaldi, A.; Bond, J. R.; Bouchet, F. R.; Burigana, C.; Cardoso, J.-F.; Catalano, A.; Chamballu, A.; Chiang, H. C.; Christensen, P. R.; Clements, D. L.; Colombi, S.; Colombo, L. P. L.; Couchot, F.; Cuttaia, F.; Danese, L.; Davies, R. D.; Davis, R. J.; de Bernardis, P.; de Rosa, A.; de Zotti, G.; Delabrouille, J.; Dickinson, C.; Diego, J. M.; Dole, H.; Donzelli, S.; Doré, O.; Douspis, M.; Dupac, X.; Enßlin, T. A.; Eriksen, H. K.; Finelli, F.; Forni, O.; Frailis, M.; Franceschi, E.; Galeotta, S.; Galli, S.; Ganga, K.; Giard, M.; Giraud-Héraud, Y.; González-Nuevo, J.; Górski, K. M.; Gregorio, A.; Gruppuso, A.; Hansen, F. K.; Harrison, D. L.; Henrot-Versillé, S.; Hernández-Monteagudo, C.; Herranz, D.; Hildebrandt, S. R.; Hivon, E.; Hobson, M.; Holmes, W. A.; Hornstrup, A.; Hovest, W.; Huffenberger, K. M.; Jaffe, A. H.; Jaffe, T. R.; Jones, W. C.; Juvela, M.; Keihänen, E.; Keskitalo, R.; Kisner, T. S.; Kneissl, R.; Knoche, J.; Knox, L.; Kunz, M.; Kurki-Suonio, H.; Lagache, G.; Lähteenmäki, A.; Lamarre, J.-M.; Lasenby, A.; Lawrence, C. R.; Leonardi, R.; Liddle, A.; Liguori, M.; Lilje, P. B.; Linden-Vørnle, M.; López-Caniego, M.; Lubin, P. M.; Macías-Pérez, J. F.; Maffei, B.; Maino, D.; Mandolesi, N.; Maris, M.; Martin, P. G.; Martínez-González, E.; Masi, S.; Massardi, M.; Matarrese, S.; Mazzotta, P.; Melchiorri, A.; Mendes, L.; Mennella, A.; Migliaccio, M.; Mitra, S.; Miville-Deschênes, M.-A.; Moneti, A.; Montier, L.; Morgante, G.; Munshi, D.; Murphy, J. A.; Naselsky, P.; Nati, F.; Natoli, P.; Noviello, F.; Novikov, D.; Novikov, I.; Oxborrow, C. A.; Pagano, L.; Pajot, F.; Paoletti, D.; Pasian, F.; Perdereau, O.; Perotto, L.; Perrotta, F.; Pettorino, V.; Piacentini, F.; Piat, M.; Pierpaoli, E.; Pietrobon, D.; Plaszczynski∗, S.; Pointecouteau, E.; Polenta, G.; Popa, L.; Pratt, G. W.; Puget, J.-L.; Rachen, J. P.; Rebolo, R.; Reinecke, M.; Remazeilles, M.; Renault, C.; Ricciardi, S.; Riller, T.; Ristorcelli, I.; Rocha, G.; Rosset, C.; Roudier, G.; Rouillé d'Orfeuil, B.; Rubiño-Martín, J. A.; Rusholme, B.; Sandri, M.; Savelainen, M.; Savini, G.; Spencer, L. D.; Spinelli, M.; Starck, J.-L.; Sureau, F.; Sutton, D.; Suur-Uski, A.-S.; Sygnet, J.-F.; Tauber, J. A.; Terenzi, L.; Toffolatti, L.; Tomasi, M.; Tristram, M.; Tucci, M.; Umana, G.; Valenziano, L.; Valiviita, J.; Van Tent, B.; Vielva, P.; Villa, F.; Wade, L. A.; Wandelt, B. D.; White, M.; Yvon, D.; Zacchei, A.; Zonca, A.
2014-06-01
We explore the 2013 Planck likelihood function with a high-precision multi-dimensional minimizer (Minuit). This allows a refinement of the ΛCDM best-fit solution with respect to previously-released results, and the construction of frequentist confidence intervals using profile likelihoods. The agreement with the cosmological results from the Bayesian framework is excellent, demonstrating the robustness of the Planck results to the statistical methodology. We investigate the inclusion of neutrino masses, where more significant differences may appear due to the non-Gaussian nature of the posterior mass distribution. By applying the Feldman-Cousins prescription, we again obtain results very similar to those of the Bayesian methodology. However, the profile-likelihood analysis of the cosmic microwave background (CMB) combination (Planck+WP+highL) reveals a minimum well within the unphysical negative-mass region. We show that inclusion of the Planck CMB-lensing information regularizes this issue, and provide a robust frequentist upper limit ∑ mν ≤ 0.26 eV (95% confidence) from the CMB+lensing+BAO data combination.
Computer-Based Model Calibration and Uncertainty Analysis: Terms and Concepts
2015-07-01
uncertainty analyses throughout the lifecycle of planning, designing, and operating of Civil Works flood risk management projects as described in...value 95% of the time. In the frequentist approach to PE, model parameters area regarded as having true values, and their estimate is based on the...in catchment models. 1. Evaluating parameter uncertainty. Water Resources Research 19(5):1151–1172. Lee, P. M. 2012. Bayesian statistics: An
2011-01-01
Background Logistic random effects models are a popular tool to analyze multilevel also called hierarchical data with a binary or ordinal outcome. Here, we aim to compare different statistical software implementations of these models. Methods We used individual patient data from 8509 patients in 231 centers with moderate and severe Traumatic Brain Injury (TBI) enrolled in eight Randomized Controlled Trials (RCTs) and three observational studies. We fitted logistic random effects regression models with the 5-point Glasgow Outcome Scale (GOS) as outcome, both dichotomized as well as ordinal, with center and/or trial as random effects, and as covariates age, motor score, pupil reactivity or trial. We then compared the implementations of frequentist and Bayesian methods to estimate the fixed and random effects. Frequentist approaches included R (lme4), Stata (GLLAMM), SAS (GLIMMIX and NLMIXED), MLwiN ([R]IGLS) and MIXOR, Bayesian approaches included WinBUGS, MLwiN (MCMC), R package MCMCglmm and SAS experimental procedure MCMC. Three data sets (the full data set and two sub-datasets) were analysed using basically two logistic random effects models with either one random effect for the center or two random effects for center and trial. For the ordinal outcome in the full data set also a proportional odds model with a random center effect was fitted. Results The packages gave similar parameter estimates for both the fixed and random effects and for the binary (and ordinal) models for the main study and when based on a relatively large number of level-1 (patient level) data compared to the number of level-2 (hospital level) data. However, when based on relatively sparse data set, i.e. when the numbers of level-1 and level-2 data units were about the same, the frequentist and Bayesian approaches showed somewhat different results. The software implementations differ considerably in flexibility, computation time, and usability. There are also differences in the availability of additional tools for model evaluation, such as diagnostic plots. The experimental SAS (version 9.2) procedure MCMC appeared to be inefficient. Conclusions On relatively large data sets, the different software implementations of logistic random effects regression models produced similar results. Thus, for a large data set there seems to be no explicit preference (of course if there is no preference from a philosophical point of view) for either a frequentist or Bayesian approach (if based on vague priors). The choice for a particular implementation may largely depend on the desired flexibility, and the usability of the package. For small data sets the random effects variances are difficult to estimate. In the frequentist approaches the MLE of this variance was often estimated zero with a standard error that is either zero or could not be determined, while for Bayesian methods the estimates could depend on the chosen "non-informative" prior of the variance parameter. The starting value for the variance parameter may be also critical for the convergence of the Markov chain. PMID:21605357
Experimental Concepts for Testing Seismic Hazard Models
NASA Astrophysics Data System (ADS)
Marzocchi, W.; Jordan, T. H.
2015-12-01
Seismic hazard analysis is the primary interface through which useful information about earthquake rupture and wave propagation is delivered to society. To account for the randomness (aleatory variability) and limited knowledge (epistemic uncertainty) of these natural processes, seismologists must formulate and test hazard models using the concepts of probability. In this presentation, we will address the scientific objections that have been raised over the years against probabilistic seismic hazard analysis (PSHA). Owing to the paucity of observations, we must rely on expert opinion to quantify the epistemic uncertainties of PSHA models (e.g., in the weighting of individual models from logic-tree ensembles of plausible models). The main theoretical issue is a frequentist critique: subjectivity is immeasurable; ergo, PSHA models cannot be objectively tested against data; ergo, they are fundamentally unscientific. We have argued (PNAS, 111, 11973-11978) that the Bayesian subjectivity required for casting epistemic uncertainties can be bridged with the frequentist objectivity needed for pure significance testing through "experimental concepts." An experimental concept specifies collections of data, observed and not yet observed, that are judged to be exchangeable (i.e., with a joint distribution independent of the data ordering) when conditioned on a set of explanatory variables. We illustrate, through concrete examples, experimental concepts useful in the testing of PSHA models for ontological errors in the presence of aleatory variability and epistemic uncertainty. In particular, we describe experimental concepts that lead to exchangeable binary sequences that are statistically independent but not identically distributed, showing how the Bayesian concept of exchangeability generalizes the frequentist concept of experimental repeatability. We also address the issue of testing PSHA models using spatially correlated data.
Sequential Probability Ratio Test for Collision Avoidance Maneuver Decisions
NASA Technical Reports Server (NTRS)
Carpenter, J. Russell; Markley, F. Landis
2010-01-01
When facing a conjunction between space objects, decision makers must chose whether to maneuver for collision avoidance or not. We apply a well-known decision procedure, the sequential probability ratio test, to this problem. We propose two approaches to the problem solution, one based on a frequentist method, and the other on a Bayesian method. The frequentist method does not require any prior knowledge concerning the conjunction, while the Bayesian method assumes knowledge of prior probability densities. Our results show that both methods achieve desired missed detection rates, but the frequentist method's false alarm performance is inferior to the Bayesian method's
Asteroseismology: Data Analysis Methods and Interpretation for Space and Ground-based Facilities
NASA Astrophysics Data System (ADS)
Campante, T. L.
2012-06-01
This dissertation has been submitted to the Faculdade de Ciências da Universidade do Porto in partial fulfillment of the requirements for the PhD degree in Astronomy. The scientific results presented herein follow from the research activity performed under the supervision of Dr. Mário João Monteiro at the Centro de Astrofísica da Universidade do Porto and Dr. Hans Kjeldsen at the Institut for Fysik og Astronomi, Aarhus Universitet. The dissertation is composed of three chapters and a list of appendices. Chapter 1 serves as an unpretentious and rather general introduction to the field of asteroseismology of solar-like stars. It starts with a historical account of the field of asteroseismology followed by a general review of the basic physics and properties of stellar pulsations. Emphasis is then naturally placed on the stochastic excitation of stellar oscillations and on the potential of asteroseismic inference. The chapter closes with a discussion about observational techniques and the observational status of the field. Chapter 2 is devoted to the subject of data analysis in asteroseismology. This is an extensive subject, therefore, a compilation is presented of the relevant data analysis methods and techniques employed contemporarily in asteroseismology of solar-like stars. Special attention has been drawn to the subject of statistical inference both from the competing Bayesian and frequentist perspectives. The chapter ends with a description of the implementation of a pipeline for mode parameter analysis of Kepler data. In the course of these two first chapters, reference is made to a series of articles led by the author (or otherwise having greatly benefited from his contribution) that can be found in Appendices A to E. Chapter 3 then goes on to present a series of additional published results.
Wiggins, Paul A
2015-07-21
This article describes the application of a change-point algorithm to the analysis of stochastic signals in biological systems whose underlying state dynamics consist of transitions between discrete states. Applications of this analysis include molecular-motor stepping, fluorophore bleaching, electrophysiology, particle and cell tracking, detection of copy number variation by sequencing, tethered-particle motion, etc. We present a unified approach to the analysis of processes whose noise can be modeled by Gaussian, Wiener, or Ornstein-Uhlenbeck processes. To fit the model, we exploit explicit, closed-form algebraic expressions for maximum-likelihood estimators of model parameters and estimated information loss of the generalized noise model, which can be computed extremely efficiently. We implement change-point detection using the frequentist information criterion (which, to our knowledge, is a new information criterion). The frequentist information criterion specifies a single, information-based statistical test that is free from ad hoc parameters and requires no prior probability distribution. We demonstrate this information-based approach in the analysis of simulated and experimental tethered-particle-motion data. Copyright © 2015 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Covariate Balance in Bayesian Propensity Score Approaches for Observational Studies
ERIC Educational Resources Information Center
Chen, Jianshen; Kaplan, David
2015-01-01
Bayesian alternatives to frequentist propensity score approaches have recently been proposed. However, few studies have investigated their covariate balancing properties. This article compares a recently developed two-step Bayesian propensity score approach to the frequentist approach with respect to covariate balance. The effects of different…
In this paper, we present methods for estimating Freundlich isotherm fitting parameters (K and N) and their joint uncertainty, which have been implemented into the freeware software platforms R and WinBUGS. These estimates were determined by both Frequentist and Bayesian analyse...
ERIC Educational Resources Information Center
Prodromou, Theodosia
2012-01-01
This article seeks to address a pedagogical theory of introducing the classicist and the frequentist approach to probability, by investigating important elements in 9th grade students' learning process while working with a "TinkerPlots2" combinatorial problem. Results from this research study indicate that, after the students had seen…
Bayes-LQAS: classifying the prevalence of global acute malnutrition
2010-01-01
Lot Quality Assurance Sampling (LQAS) applications in health have generally relied on frequentist interpretations for statistical validity. Yet health professionals often seek statements about the probability distribution of unknown parameters to answer questions of interest. The frequentist paradigm does not pretend to yield such information, although a Bayesian formulation might. This is the source of an error made in a recent paper published in this journal. Many applications lend themselves to a Bayesian treatment, and would benefit from such considerations in their design. We discuss Bayes-LQAS (B-LQAS), which allows for incorporation of prior information into the LQAS classification procedure, and thus shows how to correct the aforementioned error. Further, we pay special attention to the formulation of Bayes Operating Characteristic Curves and the use of prior information to improve survey designs. As a motivating example, we discuss the classification of Global Acute Malnutrition prevalence and draw parallels between the Bayes and classical classifications schemes. We also illustrate the impact of informative and non-informative priors on the survey design. Results indicate that using a Bayesian approach allows the incorporation of expert information and/or historical data and is thus potentially a valuable tool for making accurate and precise classifications. PMID:20534159
Bayes-LQAS: classifying the prevalence of global acute malnutrition.
Olives, Casey; Pagano, Marcello
2010-06-09
Lot Quality Assurance Sampling (LQAS) applications in health have generally relied on frequentist interpretations for statistical validity. Yet health professionals often seek statements about the probability distribution of unknown parameters to answer questions of interest. The frequentist paradigm does not pretend to yield such information, although a Bayesian formulation might. This is the source of an error made in a recent paper published in this journal. Many applications lend themselves to a Bayesian treatment, and would benefit from such considerations in their design. We discuss Bayes-LQAS (B-LQAS), which allows for incorporation of prior information into the LQAS classification procedure, and thus shows how to correct the aforementioned error. Further, we pay special attention to the formulation of Bayes Operating Characteristic Curves and the use of prior information to improve survey designs. As a motivating example, we discuss the classification of Global Acute Malnutrition prevalence and draw parallels between the Bayes and classical classifications schemes. We also illustrate the impact of informative and non-informative priors on the survey design. Results indicate that using a Bayesian approach allows the incorporation of expert information and/or historical data and is thus potentially a valuable tool for making accurate and precise classifications.
Tressoldi, Patrizio E.
2011-01-01
Starting from the famous phrase “extraordinary claims require extraordinary evidence,” we will present the evidence supporting the concept that human visual perception may have non-local properties, in other words, that it may operate beyond the space and time constraints of sensory organs, in order to discuss which criteria can be used to define evidence as extraordinary. This evidence has been obtained from seven databases which are related to six different protocols used to test the reality and the functioning of non-local perception, analyzed using both a frequentist and a new Bayesian meta-analysis statistical procedure. According to a frequentist meta-analysis, the null hypothesis can be rejected for all six protocols even if the effect sizes range from 0.007 to 0.28. According to Bayesian meta-analysis, the Bayes factors provides strong evidence to support the alternative hypothesis (H1) over the null hypothesis (H0), but only for three out of the six protocols. We will discuss whether quantitative psychology can contribute to defining the criteria for the acceptance of new scientific ideas in order to avoid the inconclusive controversies between supporters and opponents. PMID:21713069
Uncertainty Estimates of Psychoacoustic Thresholds Obtained from Group Tests
NASA Technical Reports Server (NTRS)
Rathsam, Jonathan; Christian, Andrew
2016-01-01
Adaptive psychoacoustic test methods, in which the next signal level depends on the response to the previous signal, are the most efficient for determining psychoacoustic thresholds of individual subjects. In many tests conducted in the NASA psychoacoustic labs, the goal is to determine thresholds representative of the general population. To do this economically, non-adaptive testing methods are used in which three or four subjects are tested at the same time with predetermined signal levels. This approach requires us to identify techniques for assessing the uncertainty in resulting group-average psychoacoustic thresholds. In this presentation we examine the Delta Method of frequentist statistics, the Generalized Linear Model (GLM), the Nonparametric Bootstrap, a frequentist method, and Markov Chain Monte Carlo Posterior Estimation and a Bayesian approach. Each technique is exercised on a manufactured, theoretical dataset and then on datasets from two psychoacoustics facilities at NASA. The Delta Method is the simplest to implement and accurate for the cases studied. The GLM is found to be the least robust, and the Bootstrap takes the longest to calculate. The Bayesian Posterior Estimate is the most versatile technique examined because it allows the inclusion of prior information.
Bayesian demography 250 years after Bayes
Bijak, Jakub; Bryant, John
2016-01-01
Bayesian statistics offers an alternative to classical (frequentist) statistics. It is distinguished by its use of probability distributions to describe uncertain quantities, which leads to elegant solutions to many difficult statistical problems. Although Bayesian demography, like Bayesian statistics more generally, is around 250 years old, only recently has it begun to flourish. The aim of this paper is to review the achievements of Bayesian demography, address some misconceptions, and make the case for wider use of Bayesian methods in population studies. We focus on three applications: demographic forecasts, limited data, and highly structured or complex models. The key advantages of Bayesian methods are the ability to integrate information from multiple sources and to describe uncertainty coherently. Bayesian methods also allow for including additional (prior) information next to the data sample. As such, Bayesian approaches are complementary to many traditional methods, which can be productively re-expressed in Bayesian terms. PMID:26902889
OPTHYLIC: An Optimised Tool for Hybrid Limits Computation
NASA Astrophysics Data System (ADS)
Busato, Emmanuel; Calvet, David; Theveneaux-Pelzer, Timothée
2018-05-01
A software tool, computing observed and expected upper limits on Poissonian process rates using a hybrid frequentist-Bayesian CLs method, is presented. This tool can be used for simple counting experiments where only signal, background and observed yields are provided or for multi-bin experiments where binned distributions of discriminating variables are provided. It allows the combination of several channels and takes into account statistical and systematic uncertainties, as well as correlations of systematic uncertainties between channels. It has been validated against other software tools and analytical calculations, for several realistic cases.
Estimating the size of an open population using sparse capture-recapture data.
Huggins, Richard; Stoklosa, Jakub; Roach, Cameron; Yip, Paul
2018-03-01
Sparse capture-recapture data from open populations are difficult to analyze using currently available frequentist statistical methods. However, in closed capture-recapture experiments, the Chao sparse estimator (Chao, 1989, Biometrics 45, 427-438) may be used to estimate population sizes when there are few recaptures. Here, we extend the Chao (1989) closed population size estimator to the open population setting by using linear regression and extrapolation techniques. We conduct a small simulation study and apply the models to several sparse capture-recapture data sets. © 2017, The International Biometric Society.
The DNA database search controversy revisited: bridging the Bayesian-frequentist gap.
Storvik, Geir; Egeland, Thore
2007-09-01
Two different quantities have been suggested for quantification of evidence in cases where a suspect is found by a search through a database of DNA profiles. The likelihood ratio, typically motivated from a Bayesian setting, is preferred by most experts in the field. The so-called np rule has been suggested through frequentist arguments and has been suggested by the American National Research Council and Stockmarr (1999, Biometrics55, 671-677). The two quantities differ substantially and have given rise to the DNA database search controversy. Although several authors have criticized the different approaches, a full explanation of why these differences appear is still lacking. In this article we show that a P-value in a frequentist hypothesis setting is approximately equal to the result of the np rule. We argue, however, that a more reasonable procedure in this case is to use conditional testing, in which case a P-value directly related to posterior probabilities and the likelihood ratio is obtained. This way of viewing the problem bridges the gap between the Bayesian and frequentist approaches. At the same time it indicates that the np rule should not be used to quantify evidence.
Combining and comparing neutrinoless double beta decay experiments using different nuclei
NASA Astrophysics Data System (ADS)
Bergström, Johannes
2013-02-01
We perform a global fit of the most relevant neutrinoless double beta decay experiments within the standard model with massive Majorana neutrinos. Using Bayesian inference makes it possible to take into account the theoretical uncertainties on the nuclear matrix elements in a fully consistent way. First, we analyze the data used to claim the observation of neutrinoless double beta decay in 76Ge, and find strong evidence (according to Jeffrey's scale) for a peak in the spectrum and moderate evidence for that the peak is actually close to the energy expected for the neutrinoless decay. We also find a significantly larger statistical error than the original analysis, which we include in the comparison with other data. Then, we statistically test the consistency between this claim with that of recent measurements using 136Xe. We find that the two data sets are about 40 to 80 times more probable under the assumption that they are inconsistent, depending on the nuclear matrix element uncertainties and the prior on the smallest neutrino mass. Hence, there is moderate to strong evidence of incompatibility, and for equal prior probabilities the posterior probability of compatibility is between 1.3% and 2.5%. If one, despite such evidence for incompatibility, combines the two data sets, we find that the total evidence of neutrinoless double beta decay is negligible. If one ignores the claim, there is weak evidence against the existence of the decay. We also perform approximate frequentist tests of compatibility for fixed ratios of the nuclear matrix elements, as well as of the no signal hypothesis. Generalization to other sets of experiments as well as other mechanisms mediating the decay is possible.
Bayesian estimation of differential transcript usage from RNA-seq data.
Papastamoulis, Panagiotis; Rattray, Magnus
2017-11-27
Next generation sequencing allows the identification of genes consisting of differentially expressed transcripts, a term which usually refers to changes in the overall expression level. A specific type of differential expression is differential transcript usage (DTU) and targets changes in the relative within gene expression of a transcript. The contribution of this paper is to: (a) extend the use of cjBitSeq to the DTU context, a previously introduced Bayesian model which is originally designed for identifying changes in overall expression levels and (b) propose a Bayesian version of DRIMSeq, a frequentist model for inferring DTU. cjBitSeq is a read based model and performs fully Bayesian inference by MCMC sampling on the space of latent state of each transcript per gene. BayesDRIMSeq is a count based model and estimates the Bayes Factor of a DTU model against a null model using Laplace's approximation. The proposed models are benchmarked against the existing ones using a recent independent simulation study as well as a real RNA-seq dataset. Our results suggest that the Bayesian methods exhibit similar performance with DRIMSeq in terms of precision/recall but offer better calibration of False Discovery Rate.
Heudtlass, Peter; Guha-Sapir, Debarati; Speybroeck, Niko
2018-05-31
The crude death rate (CDR) is one of the defining indicators of humanitarian emergencies. When data from vital registration systems are not available, it is common practice to estimate the CDR from household surveys with cluster-sampling design. However, sample sizes are often too small to compare mortality estimates to emergency thresholds, at least in a frequentist framework. Several authors have proposed Bayesian methods for health surveys in humanitarian crises. Here, we develop an approach specifically for mortality data and cluster-sampling surveys. We describe a Bayesian hierarchical Poisson-Gamma mixture model with generic (weakly informative) priors that could be used as default in absence of any specific prior knowledge, and compare Bayesian and frequentist CDR estimates using five different mortality datasets. We provide an interpretation of the Bayesian estimates in the context of an emergency threshold and demonstrate how to interpret parameters at the cluster level and ways in which informative priors can be introduced. With the same set of weakly informative priors, Bayesian CDR estimates are equivalent to frequentist estimates, for all practical purposes. The probability that the CDR surpasses the emergency threshold can be derived directly from the posterior of the mean of the mixing distribution. All observation in the datasets contribute to the estimation of cluster-level estimates, through the hierarchical structure of the model. In a context of sparse data, Bayesian mortality assessments have advantages over frequentist ones already when using only weakly informative priors. More informative priors offer a formal and transparent way of combining new data with existing data and expert knowledge and can help to improve decision-making in humanitarian crises by complementing frequentist estimates.
The choice of sample size: a mixed Bayesian / frequentist approach.
Pezeshk, Hamid; Nematollahi, Nader; Maroufy, Vahed; Gittins, John
2009-04-01
Sample size computations are largely based on frequentist or classical methods. In the Bayesian approach the prior information on the unknown parameters is taken into account. In this work we consider a fully Bayesian approach to the sample size determination problem which was introduced by Grundy et al. and developed by Lindley. This approach treats the problem as a decision problem and employs a utility function to find the optimal sample size of a trial. Furthermore, we assume that a regulatory authority, which is deciding on whether or not to grant a licence to a new treatment, uses a frequentist approach. We then find the optimal sample size for the trial by maximising the expected net benefit, which is the expected benefit of subsequent use of the new treatment minus the cost of the trial.
Baker, Robert L; Leong, Wen Fung; An, Nan; Brock, Marcus T; Rubin, Matthew J; Welch, Stephen; Weinig, Cynthia
2018-02-01
We develop Bayesian function-valued trait models that mathematically isolate genetic mechanisms underlying leaf growth trajectories by factoring out genotype-specific differences in photosynthesis. Remote sensing data can be used instead of leaf-level physiological measurements. Characterizing the genetic basis of traits that vary during ontogeny and affect plant performance is a major goal in evolutionary biology and agronomy. Describing genetic programs that specifically regulate morphological traits can be complicated by genotypic differences in physiological traits. We describe the growth trajectories of leaves using novel Bayesian function-valued trait (FVT) modeling approaches in Brassica rapa recombinant inbred lines raised in heterogeneous field settings. While frequentist approaches estimate parameter values by treating each experimental replicate discretely, Bayesian models can utilize information in the global dataset, potentially leading to more robust trait estimation. We illustrate this principle by estimating growth asymptotes in the face of missing data and comparing heritabilities of growth trajectory parameters estimated by Bayesian and frequentist approaches. Using pseudo-Bayes factors, we compare the performance of an initial Bayesian logistic growth model and a model that incorporates carbon assimilation (A max ) as a cofactor, thus statistically accounting for genotypic differences in carbon resources. We further evaluate two remotely sensed spectroradiometric indices, photochemical reflectance (pri2) and MERIS Terrestrial Chlorophyll Index (mtci) as covariates in lieu of A max , because these two indices were genetically correlated with A max across years and treatments yet allow much higher throughput compared to direct leaf-level gas-exchange measurements. For leaf lengths in uncrowded settings, including A max improves model fit over the initial model. The mtci and pri2 indices also outperform direct A max measurements. Of particular importance for evolutionary biologists and plant breeders, hierarchical Bayesian models estimating FVT parameters improve heritabilities compared to frequentist approaches.
Statistical Inference at Work: Statistical Process Control as an Example
ERIC Educational Resources Information Center
Bakker, Arthur; Kent, Phillip; Derry, Jan; Noss, Richard; Hoyles, Celia
2008-01-01
To characterise statistical inference in the workplace this paper compares a prototypical type of statistical inference at work, statistical process control (SPC), with a type of statistical inference that is better known in educational settings, hypothesis testing. Although there are some similarities between the reasoning structure involved in…
Spatial distribution of psychotic disorders in an urban area of France: an ecological study.
Pignon, Baptiste; Schürhoff, Franck; Baudin, Grégoire; Ferchiou, Aziz; Richard, Jean-Romain; Saba, Ghassen; Leboyer, Marion; Kirkbride, James B; Szöke, Andrei
2016-05-18
Previous analyses of neighbourhood variations of non-affective psychotic disorders (NAPD) have focused mainly on incidence. However, prevalence studies provide important insights on factors associated with disease evolution as well as for healthcare resource allocation. This study aimed to investigate the distribution of prevalent NAPD cases in an urban area in France. The number of cases in each neighbourhood was modelled as a function of potential confounders and ecological variables, namely: migrant density, economic deprivation and social fragmentation. This was modelled using statistical models of increasing complexity: frequentist models (using Poisson and negative binomial regressions), and several Bayesian models. For each model, assumptions validity were checked and compared as to how this fitted to the data, in order to test for possible spatial variation in prevalence. Data showed significant overdispersion (invalidating the Poisson regression model) and residual autocorrelation (suggesting the need to use Bayesian models). The best Bayesian model was Leroux's model (i.e. a model with both strong correlation between neighbouring areas and weaker correlation between areas further apart), with economic deprivation as an explanatory variable (OR = 1.13, 95% CI [1.02-1.25]). In comparison with frequentist methods, the Bayesian model showed a better fit. The number of cases showed non-random spatial distribution and was linked to economic deprivation.
Bayesian analysis of multiple direct detection experiments
NASA Astrophysics Data System (ADS)
Arina, Chiara
2014-12-01
Bayesian methods offer a coherent and efficient framework for implementing uncertainties into induction problems. In this article, we review how this approach applies to the analysis of dark matter direct detection experiments. In particular we discuss the exclusion limit of XENON100 and the debated hints of detection under the hypothesis of a WIMP signal. Within parameter inference, marginalizing consistently over uncertainties to extract robust posterior probability distributions, we find that the claimed tension between XENON100 and the other experiments can be partially alleviated in isospin violating scenario, while elastic scattering model appears to be compatible with the frequentist statistical approach. We then move to model comparison, for which Bayesian methods are particularly well suited. Firstly, we investigate the annual modulation seen in CoGeNT data, finding that there is weak evidence for a modulation. Modulation models due to other physics compare unfavorably with the WIMP models, paying the price for their excessive complexity. Secondly, we confront several coherent scattering models to determine the current best physical scenario compatible with the experimental hints. We find that exothermic and inelastic dark matter are moderatly disfavored against the elastic scenario, while the isospin violating model has a similar evidence. Lastly the Bayes' factor gives inconclusive evidence for an incompatibility between the data sets of XENON100 and the hints of detection. The same question assessed with goodness of fit would indicate a 2 σ discrepancy. This suggests that more data are therefore needed to settle this question.
Cyber-T web server: differential analysis of high-throughput data.
Kayala, Matthew A; Baldi, Pierre
2012-07-01
The Bayesian regularization method for high-throughput differential analysis, described in Baldi and Long (A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics 2001: 17: 509-519) and implemented in the Cyber-T web server, is one of the most widely validated. Cyber-T implements a t-test using a Bayesian framework to compute a regularized variance of the measurements associated with each probe under each condition. This regularized estimate is derived by flexibly combining the empirical measurements with a prior, or background, derived from pooling measurements associated with probes in the same neighborhood. This approach flexibly addresses problems associated with low replication levels and technology biases, not only for DNA microarrays, but also for other technologies, such as protein arrays, quantitative mass spectrometry and next-generation sequencing (RNA-seq). Here we present an update to the Cyber-T web server, incorporating several useful new additions and improvements. Several preprocessing data normalization options including logarithmic and (Variance Stabilizing Normalization) VSN transforms are included. To augment two-sample t-tests, a one-way analysis of variance is implemented. Several methods for multiple tests correction, including standard frequentist methods and a probabilistic mixture model treatment, are available. Diagnostic plots allow visual assessment of the results. The web server provides comprehensive documentation and example data sets. The Cyber-T web server, with R source code and data sets, is publicly available at http://cybert.ics.uci.edu/.
Antal, Péter; Kiszel, Petra Sz.; Gézsi, András; Hadadi, Éva; Virág, Viktor; Hajós, Gergely; Millinghoffer, András; Nagy, Adrienne; Kiss, András; Semsei, Ágnes F.; Temesi, Gergely; Melegh, Béla; Kisfali, Péter; Széll, Márta; Bikov, András; Gálffy, Gabriella; Tamási, Lilla; Falus, András; Szalai, Csaba
2012-01-01
Genetic studies indicate high number of potential factors related to asthma. Based on earlier linkage analyses we selected the 11q13 and 14q22 asthma susceptibility regions, for which we designed a partial genome screening study using 145 SNPs in 1201 individuals (436 asthmatic children and 765 controls). The results were evaluated with traditional frequentist methods and we applied a new statistical method, called Bayesian network based Bayesian multilevel analysis of relevance (BN-BMLA). This method uses Bayesian network representation to provide detailed characterization of the relevance of factors, such as joint significance, the type of dependency, and multi-target aspects. We estimated posteriors for these relations within the Bayesian statistical framework, in order to estimate the posteriors whether a variable is directly relevant or its association is only mediated. With frequentist methods one SNP (rs3751464 in the FRMD6 gene) provided evidence for an association with asthma (OR = 1.43(1.2–1.8); p = 3×10−4). The possible role of the FRMD6 gene in asthma was also confirmed in an animal model and human asthmatics. In the BN-BMLA analysis altogether 5 SNPs in 4 genes were found relevant in connection with asthma phenotype: PRPF19 on chromosome 11, and FRMD6, PTGER2 and PTGDR on chromosome 14. In a subsequent step a partial dataset containing rhinitis and further clinical parameters was used, which allowed the analysis of relevance of SNPs for asthma and multiple targets. These analyses suggested that SNPs in the AHNAK and MS4A2 genes were indirectly associated with asthma. This paper indicates that BN-BMLA explores the relevant factors more comprehensively than traditional statistical methods and extends the scope of strong relevance based methods to include partial relevance, global characterization of relevance and multi-target relevance. PMID:22432035
Investigation of Statistical Inference Methodologies Through Scale Model Propagation Experiments
2015-09-30
statistical inference methodologies for ocean- acoustic problems by investigating and applying statistical methods to data collected from scale-model...to begin planning experiments for statistical inference applications. APPROACH In the ocean acoustics community over the past two decades...solutions for waveguide parameters. With the introduction of statistical inference to the field of ocean acoustics came the desire to interpret marginal
Bayesian randomized clinical trials: From fixed to adaptive design.
Yin, Guosheng; Lam, Chi Kin; Shi, Haolun
2017-08-01
Randomized controlled studies are the gold standard for phase III clinical trials. Using α-spending functions to control the overall type I error rate, group sequential methods are well established and have been dominating phase III studies. Bayesian randomized design, on the other hand, can be viewed as a complement instead of competitive approach to the frequentist methods. For the fixed Bayesian design, the hypothesis testing can be cast in the posterior probability or Bayes factor framework, which has a direct link to the frequentist type I error rate. Bayesian group sequential design relies upon Bayesian decision-theoretic approaches based on backward induction, which is often computationally intensive. Compared with the frequentist approaches, Bayesian methods have several advantages. The posterior predictive probability serves as a useful and convenient tool for trial monitoring, and can be updated at any time as the data accrue during the trial. The Bayesian decision-theoretic framework possesses a direct link to the decision making in the practical setting, and can be modeled more realistically to reflect the actual cost-benefit analysis during the drug development process. Other merits include the possibility of hierarchical modeling and the use of informative priors, which would lead to a more comprehensive utilization of information from both historical and longitudinal data. From fixed to adaptive design, we focus on Bayesian randomized controlled clinical trials and make extensive comparisons with frequentist counterparts through numerical studies. Copyright © 2017 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Stan Development Team
2018-01-01
Stan facilitates statistical inference at the frontiers of applied statistics and provides both a modeling language for specifying complex statistical models and a library of statistical algorithms for computing inferences with those models. These components are exposed through interfaces in environments such as R, Python, and the command line.
ERIC Educational Resources Information Center
Jacob, Bridgette L.
2013-01-01
The difficulties introductory statistics students have with formal statistical inference are well known in the field of statistics education. "Informal" statistical inference has been studied as a means to introduce inferential reasoning well before and without the formalities of formal statistical inference. This mixed methods study…
ERIC Educational Resources Information Center
Braham, Hana Manor; Ben-Zvi, Dani
2017-01-01
A fundamental aspect of statistical inference is representation of real-world data using statistical models. This article analyzes students' articulations of statistical models and modeling during their first steps in making informal statistical inferences. An integrated modeling approach (IMA) was designed and implemented to help students…
Taroni, F; Biedermann, A; Bozza, S
2016-02-01
Many people regard the concept of hypothesis testing as fundamental to inferential statistics. Various schools of thought, in particular frequentist and Bayesian, have promoted radically different solutions for taking a decision about the plausibility of competing hypotheses. Comprehensive philosophical comparisons about their advantages and drawbacks are widely available and continue to span over large debates in the literature. More recently, controversial discussion was initiated by an editorial decision of a scientific journal [1] to refuse any paper submitted for publication containing null hypothesis testing procedures. Since the large majority of papers published in forensic journals propose the evaluation of statistical evidence based on the so called p-values, it is of interest to expose the discussion of this journal's decision within the forensic science community. This paper aims to provide forensic science researchers with a primer on the main concepts and their implications for making informed methodological choices. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Applications of Bayesian Statistics to Problems in Gamma-Ray Bursts
NASA Technical Reports Server (NTRS)
Meegan, Charles A.
1997-01-01
This presentation will describe two applications of Bayesian statistics to Gamma Ray Bursts (GRBS). The first attempts to quantify the evidence for a cosmological versus galactic origin of GRBs using only the observations of the dipole and quadrupole moments of the angular distribution of bursts. The cosmological hypothesis predicts isotropy, while the galactic hypothesis is assumed to produce a uniform probability distribution over positive values for these moments. The observed isotropic distribution indicates that the Bayes factor for the cosmological hypothesis over the galactic hypothesis is about 300. Another application of Bayesian statistics is in the estimation of chance associations of optical counterparts with galaxies. The Bayesian approach is preferred to frequentist techniques here because the Bayesian approach easily accounts for galaxy mass distributions and because one can incorporate three disjoint hypotheses: (1) bursts come from galactic centers, (2) bursts come from galaxies in proportion to luminosity, and (3) bursts do not come from external galaxies. This technique was used in the analysis of the optical counterpart to GRB970228.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Amo Sanchez, P. del; Lees, J. P.; Poireau, V.
Using the entire sample of 467x10{sup 6} {Upsilon}(4S){yields}BB decays collected with the BABAR detector at the PEP-II asymmetric-energy B factory at the SLAC National Accelerator Laboratory, we perform an analysis of B{sup {+-}}{yields}DK{sup {+-}}decays, using decay modes in which the neutral D meson decays to either CP-eigenstates or non-CP-eigenstates. We measure the partial decay rate charge asymmetries for CP-even and CP-odd D final states to be A{sub CP+}=0.25{+-}0.06{+-}0.02 and A{sub CP-}=-0.09{+-}0.07{+-}0.02, respectively, where the first error is the statistical and the second is the systematic uncertainty. The parameter A{sub CP+} is different from zero with a significance of 3.6 standardmore » deviations, constituting evidence for direct CP violation. We also measure the ratios of the charged-averaged B partial decay rates in CP and non-CP decays, R{sub CP+}=1.18{+-}0.09{+-}0.05 and R{sub CP-}=1.07{+-}0.08{+-}0.04. We infer frequentist confidence intervals for the angle {gamma} of the unitarity triangle, for the strong phase difference {delta}{sub B}, and for the amplitude ratio r{sub B}, which are related to the B{sup -}{yields}DK{sup -} decay amplitude by r{sub B}e{sup i({delta}{sub B}-{gamma})}=A(B{sup -}{yields}D{sup 0}K{sup -})/A(B{sup -}{yields}D{sup 0}K{sup -}). Including statistical and systematic uncertainties, we obtain 0.24
Statistical analyses of Higgs- and Z -portal dark matter models
NASA Astrophysics Data System (ADS)
Ellis, John; Fowlie, Andrew; Marzola, Luca; Raidal, Martti
2018-06-01
We perform frequentist and Bayesian statistical analyses of Higgs- and Z -portal models of dark matter particles with spin 0, 1 /2 , and 1. Our analyses incorporate data from direct detection and indirect detection experiments, as well as LHC searches for monojet and monophoton events, and we also analyze the potential impacts of future direct detection experiments. We find acceptable regions of the parameter spaces for Higgs-portal models with real scalar, neutral vector, Majorana, or Dirac fermion dark matter particles, and Z -portal models with Majorana or Dirac fermion dark matter particles. In many of these cases, there are interesting prospects for discovering dark matter particles in Higgs or Z decays, as well as dark matter particles weighing ≳100 GeV . Negative results from planned direct detection experiments would still allow acceptable regions for Higgs- and Z -portal models with Majorana or Dirac fermion dark matter particles.
An objective Bayesian analysis of a crossover design via model selection and model averaging.
Li, Dandan; Sivaganesan, Siva
2016-11-10
Inference about the treatment effect in a crossover design has received much attention over time owing to the uncertainty in the existence of the carryover effect and its impact on the estimation of the treatment effect. Adding to this uncertainty is that the existence of the carryover effect and its size may depend on the presence of the treatment effect and its size. We consider estimation and testing hypothesis about the treatment effect in a two-period crossover design, assuming normally distributed response variable, and use an objective Bayesian approach to test the hypothesis about the treatment effect and to estimate its size when it exists while accounting for the uncertainty about the presence of the carryover effect as well as the treatment and period effects. We evaluate and compare the performance of the proposed approach with a standard frequentist approach using simulated data, and real data. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Bayesian Revision of Residual Detection Power
NASA Technical Reports Server (NTRS)
DeLoach, Richard
2013-01-01
This paper addresses some issues with quality assessment and quality assurance in response surface modeling experiments executed in wind tunnels. The role of data volume on quality assurance for response surface models is reviewed. Specific wind tunnel response surface modeling experiments are considered for which apparent discrepancies exist between fit quality expectations based on implemented quality assurance tactics, and the actual fit quality achieved in those experiments. These discrepancies are resolved by using Bayesian inference to account for certain imperfections in the assessment methodology. Estimates of the fraction of out-of-tolerance model predictions based on traditional frequentist methods are revised to account for uncertainty in the residual assessment process. The number of sites in the design space for which residuals are out of tolerance is seen to exceed the number of sites where the model actually fails to fit the data. A method is presented to estimate how much of the design space in inadequately modeled by low-order polynomial approximations to the true but unknown underlying response function.
Multivariate meta-analysis using individual participant data
Riley, R. D.; Price, M. J.; Jackson, D.; Wardle, M.; Gueyffier, F.; Wang, J.; Staessen, J. A.; White, I. R.
2016-01-01
When combining results across related studies, a multivariate meta-analysis allows the joint synthesis of correlated effect estimates from multiple outcomes. Joint synthesis can improve efficiency over separate univariate syntheses, may reduce selective outcome reporting biases, and enables joint inferences across the outcomes. A common issue is that within-study correlations needed to fit the multivariate model are unknown from published reports. However, provision of individual participant data (IPD) allows them to be calculated directly. Here, we illustrate how to use IPD to estimate within-study correlations, using a joint linear regression for multiple continuous outcomes and bootstrapping methods for binary, survival and mixed outcomes. In a meta-analysis of 10 hypertension trials, we then show how these methods enable multivariate meta-analysis to address novel clinical questions about continuous, survival and binary outcomes; treatment–covariate interactions; adjusted risk/prognostic factor effects; longitudinal data; prognostic and multiparameter models; and multiple treatment comparisons. Both frequentist and Bayesian approaches are applied, with example software code provided to derive within-study correlations and to fit the models. PMID:26099484
The Reasoning behind Informal Statistical Inference
ERIC Educational Resources Information Center
Makar, Katie; Bakker, Arthur; Ben-Zvi, Dani
2011-01-01
Informal statistical inference (ISI) has been a frequent focus of recent research in statistics education. Considering the role that context plays in developing ISI calls into question the need to be more explicit about the reasoning that underpins ISI. This paper uses educational literature on informal statistical inference and philosophical…
Quantifying Safety Margin Using the Risk-Informed Safety Margin Characterization (RISMC)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grabaskas, David; Bucknor, Matthew; Brunett, Acacia
2015-04-26
The Risk-Informed Safety Margin Characterization (RISMC), developed by Idaho National Laboratory as part of the Light-Water Reactor Sustainability Project, utilizes a probabilistic safety margin comparison between a load and capacity distribution, rather than a deterministic comparison between two values, as is usually done in best-estimate plus uncertainty analyses. The goal is to determine the failure probability, or in other words, the probability of the system load equaling or exceeding the system capacity. While this method has been used in pilot studies, there has been little work conducted investigating the statistical significance of the resulting failure probability. In particular, it ismore » difficult to determine how many simulations are necessary to properly characterize the failure probability. This work uses classical (frequentist) statistics and confidence intervals to examine the impact in statistical accuracy when the number of simulations is varied. Two methods are proposed to establish confidence intervals related to the failure probability established using a RISMC analysis. The confidence interval provides information about the statistical accuracy of the method utilized to explore the uncertainty space, and offers a quantitative method to gauge the increase in statistical accuracy due to performing additional simulations.« less
NASA Astrophysics Data System (ADS)
Lundberg, J.; Conrad, J.; Rolke, W.; Lopez, A.
2010-03-01
A C++ class was written for the calculation of frequentist confidence intervals using the profile likelihood method. Seven combinations of Binomial, Gaussian, Poissonian and Binomial uncertainties are implemented. The package provides routines for the calculation of upper and lower limits, sensitivity and related properties. It also supports hypothesis tests which take uncertainties into account. It can be used in compiled C++ code, in Python or interactively via the ROOT analysis framework. Program summaryProgram title: TRolke version 2.0 Catalogue identifier: AEFT_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEFT_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: MIT license No. of lines in distributed program, including test data, etc.: 3431 No. of bytes in distributed program, including test data, etc.: 21 789 Distribution format: tar.gz Programming language: ISO C++. Computer: Unix, GNU/Linux, Mac. Operating system: Linux 2.6 (Scientific Linux 4 and 5, Ubuntu 8.10), Darwin 9.0 (Mac-OS X 10.5.8). RAM:˜20 MB Classification: 14.13. External routines: ROOT ( http://root.cern.ch/drupal/) Nature of problem: The problem is to calculate a frequentist confidence interval on the parameter of a Poisson process with statistical or systematic uncertainties in signal efficiency or background. Solution method: Profile likelihood method, Analytical Running time:<10 seconds per extracted limit.
Hoaglin, David C; Hawkins, Neil; Jansen, Jeroen P; Scott, David A; Itzler, Robbin; Cappelleri, Joseph C; Boersma, Cornelis; Thompson, David; Larholt, Kay M; Diaz, Mireya; Barrett, Annabel
2011-06-01
Evidence-based health care decision making requires comparison of all relevant competing interventions. In the absence of randomized controlled trials involving a direct comparison of all treatments of interest, indirect treatment comparisons and network meta-analysis provide useful evidence for judiciously selecting the best treatment(s). Mixed treatment comparisons, a special case of network meta-analysis, combine direct evidence and indirect evidence for particular pairwise comparisons, thereby synthesizing a greater share of the available evidence than traditional meta-analysis. This report from the International Society for Pharmacoeconomics and Outcomes Research Indirect Treatment Comparisons Good Research Practices Task Force provides guidance on technical aspects of conducting network meta-analyses (our use of this term includes most methods that involve meta-analysis in the context of a network of evidence). We start with a discussion of strategies for developing networks of evidence. Next we briefly review assumptions of network meta-analysis. Then we focus on the statistical analysis of the data: objectives, models (fixed-effects and random-effects), frequentist versus Bayesian approaches, and model validation. A checklist highlights key components of network meta-analysis, and substantial examples illustrate indirect treatment comparisons (both frequentist and Bayesian approaches) and network meta-analysis. A further section discusses eight key areas for future research. Copyright © 2011 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
Ahn, Jaeil; Mukherjee, Bhramar; Banerjee, Mousumi; Cooney, Kathleen A.
2011-01-01
Summary The stereotype regression model for categorical outcomes, proposed by Anderson (1984) is nested between the baseline category logits and adjacent category logits model with proportional odds structure. The stereotype model is more parsimonious than the ordinary baseline-category (or multinomial logistic) model due to a product representation of the log odds-ratios in terms of a common parameter corresponding to each predictor and category specific scores. The model could be used for both ordered and unordered outcomes. For ordered outcomes, the stereotype model allows more flexibility than the popular proportional odds model in capturing highly subjective ordinal scaling which does not result from categorization of a single latent variable, but are inherently multidimensional in nature. As pointed out by Greenland (1994), an additional advantage of the stereotype model is that it provides unbiased and valid inference under outcome-stratified sampling as in case-control studies. In addition, for matched case-control studies, the stereotype model is amenable to classical conditional likelihood principle, whereas there is no reduction due to sufficiency under the proportional odds model. In spite of these attractive features, the model has been applied less, as there are issues with maximum likelihood estimation and likelihood based testing approaches due to non-linearity and lack of identifiability of the parameters. We present comprehensive Bayesian inference and model comparison procedure for this class of models as an alternative to the classical frequentist approach. We illustrate our methodology by analyzing data from The Flint Men’s Health Study, a case-control study of prostate cancer in African-American men aged 40 to 79 years. We use clinical staging of prostate cancer in terms of Tumors, Nodes and Metastatsis (TNM) as the categorical response of interest. PMID:19731262
A Two-Step Bayesian Approach for Propensity Score Analysis: Simulations and Case Study.
Kaplan, David; Chen, Jianshen
2012-07-01
A two-step Bayesian propensity score approach is introduced that incorporates prior information in the propensity score equation and outcome equation without the problems associated with simultaneous Bayesian propensity score approaches. The corresponding variance estimators are also provided. The two-step Bayesian propensity score is provided for three methods of implementation: propensity score stratification, weighting, and optimal full matching. Three simulation studies and one case study are presented to elaborate the proposed two-step Bayesian propensity score approach. Results of the simulation studies reveal that greater precision in the propensity score equation yields better recovery of the frequentist-based treatment effect. A slight advantage is shown for the Bayesian approach in small samples. Results also reveal that greater precision around the wrong treatment effect can lead to seriously distorted results. However, greater precision around the correct treatment effect parameter yields quite good results, with slight improvement seen with greater precision in the propensity score equation. A comparison of coverage rates for the conventional frequentist approach and proposed Bayesian approach is also provided. The case study reveals that credible intervals are wider than frequentist confidence intervals when priors are non-informative.
Design-based Sample and Probability Law-Assumed Sample: Their Role in Scientific Investigation.
ERIC Educational Resources Information Center
Ojeda, Mario Miguel; Sahai, Hardeo
2002-01-01
Discusses some key statistical concepts in probabilistic and non-probabilistic sampling to provide an overview for understanding the inference process. Suggests a statistical model constituting the basis of statistical inference and provides a brief review of the finite population descriptive inference and a quota sampling inferential theory.…
The Importance of Statistical Modeling in Data Analysis and Inference
ERIC Educational Resources Information Center
Rollins, Derrick, Sr.
2017-01-01
Statistical inference simply means to draw a conclusion based on information that comes from data. Error bars are the most commonly used tool for data analysis and inference in chemical engineering data studies. This work demonstrates, using common types of data collection studies, the importance of specifying the statistical model for sound…
Enhancing pediatric clinical trial feasibility through the use of Bayesian statistics.
Huff, Robin A; Maca, Jeff D; Puri, Mala; Seltzer, Earl W
2017-11-01
BackgroundPediatric clinical trials commonly experience recruitment challenges including limited number of patients and investigators, inclusion/exclusion criteria that further reduce the patient pool, and a competitive research landscape created by pediatric regulatory commitments. To overcome these challenges, innovative approaches are needed.MethodsThis article explores the use of Bayesian statistics to improve pediatric trial feasibility, using pediatric Type-2 diabetes as an example. Data for six therapies approved for adults were used to perform simulations to determine the impact on pediatric trial size.ResultsWhen the number of adult patients contributing to the simulation was assumed to be the same as the number of patients to be enrolled in the pediatric trial, the pediatric trial size was reduced by 75-78% when compared with a frequentist statistical approach, but was associated with a 34-45% false-positive rate. In subsequent simulations, greater control was exerted over the false-positive rate by decreasing the contribution of the adult data. A 30-33% reduction in trial size was achieved when false-positives were held to less than 10%.ConclusionReducing the trial size through the use of Bayesian statistics would facilitate completion of pediatric trials, enabling drugs to be labeled appropriately for children.
2012-01-01
Background An important question in the analysis of biochemical data is that of identifying subsets of molecular variables that may jointly influence a biological response. Statistical variable selection methods have been widely used for this purpose. In many settings, it may be important to incorporate ancillary biological information concerning the variables of interest. Pathway and network maps are one example of a source of such information. However, although ancillary information is increasingly available, it is not always clear how it should be used nor how it should be weighted in relation to primary data. Results We put forward an approach in which biological knowledge is incorporated using informative prior distributions over variable subsets, with prior information selected and weighted in an automated, objective manner using an empirical Bayes formulation. We employ continuous, linear models with interaction terms and exploit biochemically-motivated sparsity constraints to permit exact inference. We show an example of priors for pathway- and network-based information and illustrate our proposed method on both synthetic response data and by an application to cancer drug response data. Comparisons are also made to alternative Bayesian and frequentist penalised-likelihood methods for incorporating network-based information. Conclusions The empirical Bayes method proposed here can aid prior elicitation for Bayesian variable selection studies and help to guard against mis-specification of priors. Empirical Bayes, together with the proposed pathway-based priors, results in an approach with a competitive variable selection performance. In addition, the overall procedure is fast, deterministic, and has very few user-set parameters, yet is capable of capturing interplay between molecular players. The approach presented is general and readily applicable in any setting with multiple sources of biological prior knowledge. PMID:22578440
Identification and Forecasting in Mortality Models
Nielsen, Jens P.
2014-01-01
Mortality models often have inbuilt identification issues challenging the statistician. The statistician can choose to work with well-defined freely varying parameters, derived as maximal invariants in this paper, or with ad hoc identified parameters which at first glance seem more intuitive, but which can introduce a number of unnecessary challenges. In this paper we describe the methodological advantages from using the maximal invariant parameterisation and we go through the extra methodological challenges a statistician has to deal with when insisting on working with ad hoc identifications. These challenges are broadly similar in frequentist and in Bayesian setups. We also go through a number of examples from the literature where ad hoc identifications have been preferred in the statistical analyses. PMID:24987729
THESEUS: maximum likelihood superpositioning and analysis of macromolecular structures
Theobald, Douglas L.; Wuttke, Deborah S.
2008-01-01
Summary THESEUS is a command line program for performing maximum likelihood (ML) superpositions and analysis of macromolecular structures. While conventional superpositioning methods use ordinary least-squares (LS) as the optimization criterion, ML superpositions provide substantially improved accuracy by down-weighting variable structural regions and by correcting for correlations among atoms. ML superpositioning is robust and insensitive to the specific atoms included in the analysis, and thus it does not require subjective pruning of selected variable atomic coordinates. Output includes both likelihood-based and frequentist statistics for accurate evaluation of the adequacy of a superposition and for reliable analysis of structural similarities and differences. THESEUS performs principal components analysis for analyzing the complex correlations found among atoms within a structural ensemble. PMID:16777907
How alive is constrained SUSY really?
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bechtle, Philip; Desch, Klaus; Dreiner, Herbert K.
2016-05-31
Constrained supersymmetric models like the CMSSM might look less attractive nowadays because of fine tuning arguments. They also might look less probable in terms of Bayesian statistics. The question how well the model under study describes the data, however, is answered by frequentist p-values. Thus, for the first time, we calculate a p-value for a supersymmetric model by performing dedicated global toy fits. We combine constraints from low-energy and astrophysical observables, Higgs boson mass and rate measurements as well as the non-observation of new physics in searches for supersymmetry at the LHC. Furthermore, using the framework Fittino, we perform globalmore » fits of the CMSSM to the toy data and find that this model is excluded at the 90% confidence level.« less
Inferring Demographic History Using Two-Locus Statistics.
Ragsdale, Aaron P; Gutenkunst, Ryan N
2017-06-01
Population demographic history may be learned from contemporary genetic variation data. Methods based on aggregating the statistics of many single loci into an allele frequency spectrum (AFS) have proven powerful, but such methods ignore potentially informative patterns of linkage disequilibrium (LD) between neighboring loci. To leverage such patterns, we developed a composite-likelihood framework for inferring demographic history from aggregated statistics of pairs of loci. Using this framework, we show that two-locus statistics are more sensitive to demographic history than single-locus statistics such as the AFS. In particular, two-locus statistics escape the notorious confounding of depth and duration of a bottleneck, and they provide a means to estimate effective population size based on the recombination rather than mutation rate. We applied our approach to a Zambian population of Drosophila melanogaster Notably, using both single- and two-locus statistics, we inferred a substantially lower ancestral effective population size than previous works and did not infer a bottleneck history. Together, our results demonstrate the broad potential for two-locus statistics to enable powerful population genetic inference. Copyright © 2017 by the Genetics Society of America.
A new prior for bayesian anomaly detection: application to biosurveillance.
Shen, Y; Cooper, G F
2010-01-01
Bayesian anomaly detection computes posterior probabilities of anomalous events by combining prior beliefs and evidence from data. However, the specification of prior probabilities can be challenging. This paper describes a Bayesian prior in the context of disease outbreak detection. The goal is to provide a meaningful, easy-to-use prior that yields a posterior probability of an outbreak that performs at least as well as a standard frequentist approach. If this goal is achieved, the resulting posterior could be usefully incorporated into a decision analysis about how to act in light of a possible disease outbreak. This paper describes a Bayesian method for anomaly detection that combines learning from data with a semi-informative prior probability over patterns of anomalous events. A univariate version of the algorithm is presented here for ease of illustration of the essential ideas. The paper describes the algorithm in the context of disease-outbreak detection, but it is general and can be used in other anomaly detection applications. For this application, the semi-informative prior specifies that an increased count over baseline is expected for the variable being monitored, such as the number of respiratory chief complaints per day at a given emergency department. The semi-informative prior is derived based on the baseline prior, which is estimated from using historical data. The evaluation reported here used semi-synthetic data to evaluate the detection performance of the proposed Bayesian method and a control chart method, which is a standard frequentist algorithm that is closest to the Bayesian method in terms of the type of data it uses. The disease-outbreak detection performance of the Bayesian method was statistically significantly better than that of the control chart method when proper baseline periods were used to estimate the baseline behavior to avoid seasonal effects. When using longer baseline periods, the Bayesian method performed as well as the control chart method. The time complexity of the Bayesian algorithm is linear in the number of the observed events being monitored, due to a novel, closed-form derivation that is introduced in the paper. This paper introduces a novel prior probability for Bayesian outbreak detection that is expressive, easy-to-apply, computationally efficient, and performs as well or better than a standard frequentist method.
Teaching Statistical Inference for Causal Effects in Experiments and Observational Studies
ERIC Educational Resources Information Center
Rubin, Donald B.
2004-01-01
Inference for causal effects is a critical activity in many branches of science and public policy. The field of statistics is the one field most suited to address such problems, whether from designed experiments or observational studies. Consequently, it is arguably essential that departments of statistics teach courses in causal inference to both…
Reasoning about Informal Statistical Inference: One Statistician's View
ERIC Educational Resources Information Center
Rossman, Allan J.
2008-01-01
This paper identifies key concepts and issues associated with the reasoning of informal statistical inference. I focus on key ideas of inference that I think all students should learn, including at secondary level as well as tertiary. I argue that a fundamental component of inference is to go beyond the data at hand, and I propose that statistical…
Data-driven inference for the spatial scan statistic.
Almeida, Alexandre C L; Duarte, Anderson R; Duczmal, Luiz H; Oliveira, Fernando L P; Takahashi, Ricardo H C
2011-08-02
Kulldorff's spatial scan statistic for aggregated area maps searches for clusters of cases without specifying their size (number of areas) or geographic location in advance. Their statistical significance is tested while adjusting for the multiple testing inherent in such a procedure. However, as is shown in this work, this adjustment is not done in an even manner for all possible cluster sizes. A modification is proposed to the usual inference test of the spatial scan statistic, incorporating additional information about the size of the most likely cluster found. A new interpretation of the results of the spatial scan statistic is done, posing a modified inference question: what is the probability that the null hypothesis is rejected for the original observed cases map with a most likely cluster of size k, taking into account only those most likely clusters of size k found under null hypothesis for comparison? This question is especially important when the p-value computed by the usual inference process is near the alpha significance level, regarding the correctness of the decision based in this inference. A practical procedure is provided to make more accurate inferences about the most likely cluster found by the spatial scan statistic.
NASA Astrophysics Data System (ADS)
Saputra, K. V. I.; Cahyadi, L.; Sembiring, U. A.
2018-01-01
Start in this paper, we assess our traditional elementary statistics education and also we introduce elementary statistics with simulation-based inference. To assess our statistical class, we adapt the well-known CAOS (Comprehensive Assessment of Outcomes in Statistics) test that serves as an external measure to assess the student’s basic statistical literacy. This test generally represents as an accepted measure of statistical literacy. We also introduce a new teaching method on elementary statistics class. Different from the traditional elementary statistics course, we will introduce a simulation-based inference method to conduct hypothesis testing. From the literature, it has shown that this new teaching method works very well in increasing student’s understanding of statistics.
Apes are intuitive statisticians.
Rakoczy, Hannes; Clüver, Annette; Saucke, Liane; Stoffregen, Nicole; Gräbener, Alice; Migura, Judith; Call, Josep
2014-04-01
Inductive learning and reasoning, as we use it both in everyday life and in science, is characterized by flexible inferences based on statistical information: inferences from populations to samples and vice versa. Many forms of such statistical reasoning have been found to develop late in human ontogeny, depending on formal education and language, and to be fragile even in adults. New revolutionary research, however, suggests that even preverbal human infants make use of intuitive statistics. Here, we conducted the first investigation of such intuitive statistical reasoning with non-human primates. In a series of 7 experiments, Bonobos, Chimpanzees, Gorillas and Orangutans drew flexible statistical inferences from populations to samples. These inferences, furthermore, were truly based on statistical information regarding the relative frequency distributions in a population, and not on absolute frequencies. Intuitive statistics in its most basic form is thus an evolutionarily more ancient rather than a uniquely human capacity. Copyright © 2014 Elsevier B.V. All rights reserved.
Uncertainty estimation of Intensity-Duration-Frequency relationships: A regional analysis
NASA Astrophysics Data System (ADS)
Mélèse, Victor; Blanchet, Juliette; Molinié, Gilles
2018-03-01
We propose in this article a regional study of uncertainties in IDF curves derived from point-rainfall maxima. We develop two generalized extreme value models based on the simple scaling assumption, first in the frequentist framework and second in the Bayesian framework. Within the frequentist framework, uncertainties are obtained i) from the Gaussian density stemming from the asymptotic normality theorem of the maximum likelihood and ii) with a bootstrap procedure. Within the Bayesian framework, uncertainties are obtained from the posterior densities. We confront these two frameworks on the same database covering a large region of 100, 000 km2 in southern France with contrasted rainfall regime, in order to be able to draw conclusion that are not specific to the data. The two frameworks are applied to 405 hourly stations with data back to the 1980's, accumulated in the range 3 h-120 h. We show that i) the Bayesian framework is more robust than the frequentist one to the starting point of the estimation procedure, ii) the posterior and the bootstrap densities are able to better adjust uncertainty estimation to the data than the Gaussian density, and iii) the bootstrap density give unreasonable confidence intervals, in particular for return levels associated to large return period. Therefore our recommendation goes towards the use of the Bayesian framework to compute uncertainty.
Cluster mass inference via random field theory.
Zhang, Hui; Nichols, Thomas E; Johnson, Timothy D
2009-01-01
Cluster extent and voxel intensity are two widely used statistics in neuroimaging inference. Cluster extent is sensitive to spatially extended signals while voxel intensity is better for intense but focal signals. In order to leverage strength from both statistics, several nonparametric permutation methods have been proposed to combine the two methods. Simulation studies have shown that of the different cluster permutation methods, the cluster mass statistic is generally the best. However, to date, there is no parametric cluster mass inference available. In this paper, we propose a cluster mass inference method based on random field theory (RFT). We develop this method for Gaussian images, evaluate it on Gaussian and Gaussianized t-statistic images and investigate its statistical properties via simulation studies and real data. Simulation results show that the method is valid under the null hypothesis and demonstrate that it can be more powerful than the cluster extent inference method. Further, analyses with a single subject and a group fMRI dataset demonstrate better power than traditional cluster size inference, and good accuracy relative to a gold-standard permutation test.
Beating the Spin-down Limit on Gravitational Wave Emission from the Vela Pulsar
NASA Astrophysics Data System (ADS)
Abadie, J.; Abbott, B. P.; Abbott, R.; Abernathy, M.; Accadia, T.; Acernese, F.; Adams, C.; Adhikari, R.; Affeldt, C.; Allen, B.; Allen, G. S.; Amador Ceron, E.; Amariutei, D.; Amin, R. S.; Anderson, S. B.; Anderson, W. G.; Antonucci, F.; Arai, K.; Arain, M. A.; Araya, M. C.; Aston, S. M.; Astone, P.; Atkinson, D.; Aufmuth, P.; Aulbert, C.; Aylott, B. E.; Babak, S.; Baker, P.; Ballardin, G.; Ballmer, S.; Barker, D.; Barnum, S.; Barone, F.; Barr, B.; Barriga, P.; Barsotti, L.; Barsuglia, M.; Barton, M. A.; Bartos, I.; Bassiri, R.; Bastarrika, M.; Basti, A.; Bauchrowitz, J.; Bauer, Th. S.; Behnke, B.; Bejger, M.; Beker, M. G.; Bell, A. S.; Belletoile, A.; Belopolski, I.; Benacquista, M.; Bertolini, A.; Betzwieser, J.; Beveridge, N.; Beyersdorf, P. T.; Bilenko, I. A.; Billingsley, G.; Birch, J.; Birindelli, S.; Biswas, R.; Bitossi, M.; Bizouard, M. A.; Black, E.; Blackburn, J. K.; Blackburn, L.; Blair, D.; Bland, B.; Blom, M.; Bock, O.; Bodiya, T. P.; Bogan, C.; Bondarescu, R.; Bondu, F.; Bonelli, L.; Bonnand, R.; Bork, R.; Born, M.; Boschi, V.; Bose, S.; Bosi, L.; Bouhou, B.; Boyle, M.; Braccini, S.; Bradaschia, C.; Brady, P. R.; Braginsky, V. B.; Brau, J. E.; Breyer, J.; Bridges, D. O.; Brillet, A.; Brinkmann, M.; Brisson, V.; Britzger, M.; Brooks, A. F.; Brown, D. A.; Brummit, A.; Budzyński, R.; Bulik, T.; Bulten, H. J.; Buonanno, A.; Burguet-Castell, J.; Burmeister, O.; Buskulic, D.; Buy, C.; Byer, R. L.; Cadonati, L.; Cagnoli, G.; Cain, J.; Calloni, E.; Camp, J. B.; Campagna, E.; Campsie, P.; Cannizzo, J.; Cannon, K.; Canuel, B.; Cao, J.; Capano, C.; Carbognani, F.; Caride, S.; Caudill, S.; Cavaglià, M.; Cavalier, F.; Cavalieri, R.; Cella, G.; Cepeda, C.; Cesarini, E.; Chaibi, O.; Chalermsongsak, T.; Chalkley, E.; Charlton, P.; Chassande-Mottin, E.; Chelkowski, S.; Chen, Y.; Chincarini, A.; Christensen, N.; Chua, S. S. Y.; Chung, C. T. Y.; Chung, S.; Clara, F.; Clark, D.; Clark, J.; Clayton, J. H.; Cleva, F.; Coccia, E.; Colacino, C. N.; Colas, J.; Colla, A.; Colombini, M.; Conte, R.; Cook, D.; Corbitt, T. R.; Cornish, N.; Corsi, A.; Costa, C. A.; Coughlin, M.; Coulon, J.-P.; Coward, D. M.; Coyne, D. C.; Creighton, J. D. E.; Creighton, T. D.; Cruise, A. M.; Culter, R. M.; Cumming, A.; Cunningham, L.; Cuoco, E.; Dahl, K.; Danilishin, S. L.; Dannenberg, R.; D'Antonio, S.; Danzmann, K.; Das, K.; Dattilo, V.; Daudert, B.; Daveloza, H.; Davier, M.; Davies, G.; Daw, E. J.; Day, R.; Dayanga, T.; De Rosa, R.; DeBra, D.; Debreczeni, G.; Degallaix, J.; del Prete, M.; Dent, T.; Dergachev, V.; DeRosa, R.; DeSalvo, R.; Dhurandhar, S.; Di Fiore, L.; Di Lieto, A.; Di Palma, I.; Emilio, M. Di Paolo; Di Virgilio, A.; Díaz, M.; Dietz, A.; Donovan, F.; Dooley, K. L.; Dorsher, S.; Douglas, E. S. D.; Drago, M.; Drever, R. W. P.; Driggers, J. C.; Dumas, J.-C.; Dwyer, S.; Eberle, T.; Edgar, M.; Edwards, M.; Effler, A.; Ehrens, P.; Engel, R.; Etzel, T.; Evans, M.; Evans, T.; Factourovich, M.; Fafone, V.; Fairhurst, S.; Fan, Y.; Farr, B. F.; Fazi, D.; Fehrmann, H.; Feldbaum, D.; Ferrante, I.; Fidecaro, F.; Finn, L. S.; Fiori, I.; Flaminio, R.; Flanigan, M.; Foley, S.; Forsi, E.; Forte, L. A.; Fotopoulos, N.; Fournier, J.-D.; Franc, J.; Frasca, S.; Frasconi, F.; Frede, M.; Frei, M.; Frei, Z.; Freise, A.; Frey, R.; Fricke, T. T.; Friedrich, D.; Fritschel, P.; Frolov, V. V.; Fulda, P.; Fyffe, M.; Galimberti, M.; Gammaitoni, L.; Garcia, J.; Garofoli, J. A.; Garufi, F.; Gáspár, M. E.; Gemme, G.; Genin, E.; Gennai, A.; Ghosh, S.; Giaime, J. A.; Giampanis, S.; Giardina, K. D.; Giazotto, A.; Gill, C.; Goetz, E.; Goggin, L. M.; González, G.; Gorodetsky, M. L.; Goßler, S.; Gouaty, R.; Graef, C.; Granata, M.; Grant, A.; Gras, S.; Gray, C.; Greenhalgh, R. J. S.; Gretarsson, A. M.; Greverie, C.; Grosso, R.; Grote, H.; Grunewald, S.; Guidi, G. M.; Guido, C.; Gupta, R.; Gustafson, E. K.; Gustafson, R.; Hage, B.; Hallam, J. M.; Hammer, D.; Hammond, G.; Hanks, J.; Hanna, C.; Hanson, J.; Harms, J.; Harry, G. M.; Harry, I. W.; Harstad, E. D.; Hartman, M. T.; Haughian, K.; Hayama, K.; Hayau, J.-F.; Hayler, T.; Heefner, J.; Heitmann, H.; Hello, P.; Hendry, M. A.; Heng, I. S.; Heptonstall, A. W.; Herrera, V.; Hewitson, M.; Hild, S.; Hoak, D.; Hodge, K. A.; Holt, K.; Hong, T.; Hooper, S.; Hosken, D. J.; Hough, J.; Howell, E. J.; Huet, D.; Hughey, B.; Husa, S.; Huttner, S. H.; Ingram, D. R.; Inta, R.; Isogai, T.; Ivanov, A.; Jaranowski, P.; Johnson, W. W.; Jones, D. I.; Jones, G.; Jones, R.; Ju, L.; Kalmus, P.; Kalogera, V.; Kandhasamy, S.; Kanner, J. B.; Katsavounidis, E.; Katzman, W.; Kawabe, K.; Kawamura, S.; Kawazoe, F.; Kells, W.; Kelner, M.; Keppel, D. G.; Khalaidovski, A.; Khalili, F. Y.; Khazanov, E. A.; Kim, H.; Kim, N.; King, P. J.; Kinzel, D. L.; Kissel, J. S.; Klimenko, S.; Kondrashov, V.; Kopparapu, R.; Koranda, S.; Korth, W. Z.; Kowalska, I.; Kozak, D.; Kringel, V.; Krishnamurthy, S.; Krishnan, B.; Królak, A.; Kuehn, G.; Kumar, R.; Kwee, P.; Landry, M.; Lantz, B.; Lastzka, N.; Lazzarini, A.; Leaci, P.; Leong, J.; Leonor, I.; Leroy, N.; Letendre, N.; Li, J.; Li, T. G. F.; Liguori, N.; Lindquist, P. E.; Lockerbie, N. A.; Lodhia, D.; Lorenzini, M.; Loriette, V.; Lormand, M.; Losurdo, G.; Lu, P.; Luan, J.; Lubinski, M.; Lück, H.; Lundgren, A. P.; Macdonald, E.; Machenschalk, B.; MacInnis, M.; Mageswaran, M.; Mailand, K.; Majorana, E.; Maksimovic, I.; Man, N.; Mandel, I.; Mandic, V.; Mantovani, M.; Marandi, A.; Marchesoni, F.; Marion, F.; Márka, S.; Márka, Z.; Maros, E.; Marque, J.; Martelli, F.; Martin, I. W.; Martin, R. M.; Marx, J. N.; Mason, K.; Masserot, A.; Matichard, F.; Matone, L.; Matzner, R. A.; Mavalvala, N.; McCarthy, R.; McClelland, D. E.; McGuire, S. C.; McIntyre, G.; McKechan, D. J. A.; Meadors, G.; Mehmet, M.; Meier, T.; Melatos, A.; Melissinos, A. C.; Mendell, G.; Mercer, R. A.; Merill, L.; Meshkov, S.; Messenger, C.; Meyer, M. S.; Miao, H.; Michel, C.; Milano, L.; Miller, J.; Minenkov, Y.; Mino, Y.; Mitrofanov, V. P.; Mitselmakher, G.; Mittleman, R.; Miyakawa, O.; Moe, B.; Moesta, P.; Mohan, M.; Mohanty, S. D.; Mohapatra, S. R. P.; Moraru, D.; Moreno, G.; Morgado, N.; Morgia, A.; Mosca, S.; Moscatelli, V.; Mossavi, K.; Mours, B.; Mow-Lowry, C. M.; Mueller, G.; Mukherjee, S.; Mullavey, A.; Müller-Ebhardt, H.; Munch, J.; Murray, P. G.; Nash, T.; Nawrodt, R.; Nelson, J.; Neri, I.; Newton, G.; Nishida, E.; Nishizawa, A.; Nocera, F.; Nolting, D.; Ochsner, E.; O'Dell, J.; Ogin, G. H.; Oldenburg, R. G.; O'Reilly, B.; O'Shaughnessy, R.; Osthelder, C.; Ott, C. D.; Ottaway, D. J.; Ottens, R. S.; Overmier, H.; Owen, B. J.; Page, A.; Pagliaroli, G.; Palladino, L.; Palomba, C.; Pan, Y.; Pankow, C.; Paoletti, F.; Papa, M. A.; Parameswaran, A.; Pardi, S.; Parisi, M.; Pasqualetti, A.; Passaquieti, R.; Passuello, D.; Patel, P.; Pathak, D.; Pedraza, M.; Pekowsky, L.; Penn, S.; Peralta, C.; Perreca, A.; Persichetti, G.; Phelps, M.; Pichot, M.; Pickenpack, M.; Piergiovanni, F.; Pietka, M.; Pinard, L.; Pinto, I. M.; Pitkin, M.; Pletsch, H. J.; Plissi, M. V.; Podkaminer, J.; Poggiani, R.; Pöld, J.; Postiglione, F.; Prato, M.; Predoi, V.; Price, L. R.; Prijatelj, M.; Principe, M.; Privitera, S.; Prix, R.; Prodi, G. A.; Prokhorov, L.; Puncken, O.; Punturo, M.; Puppo, P.; Quetschke, V.; Raab, F. J.; Rabeling, D. S.; Rácz, I.; Radkins, H.; Raffai, P.; Rakhmanov, M.; Ramet, C. R.; Rankins, B.; Rapagnani, P.; Raymond, V.; Re, V.; Redwine, K.; Reed, C. M.; Reed, T.; Regimbau, T.; Reid, S.; Reitze, D. H.; Ricci, F.; Riesen, R.; Riles, K.; Roberts, P.; Robertson, N. A.; Robinet, F.; Robinson, C.; Robinson, E. L.; Rocchi, A.; Roddy, S.; Rolland, L.; Rollins, J.; Romano, J. D.; Romano, R.; Romie, J. H.; Rosińska, D.; Röver, C.; Rowan, S.; Rüdiger, A.; Ruggi, P.; Ryan, K.; Sakata, S.; Sakosky, M.; Salemi, F.; Salit, M.; Sammut, L.; Sancho de la Jordana, L.; Sandberg, V.; Sannibale, V.; Santamaría, L.; Santiago-Prieto, I.; Santostasi, G.; Saraf, S.; Sassolas, B.; Sathyaprakash, B. S.; Sato, S.; Satterthwaite, M.; Saulson, P. R.; Savage, R.; Schilling, R.; Schlamminger, S.; Schnabel, R.; Schofield, R. M. S.; Schulz, B.; Schutz, B. F.; Schwinberg, P.; Scott, J.; Scott, S. M.; Searle, A. C.; Seifert, F.; Sellers, D.; Sengupta, A. S.; Sentenac, D.; Sergeev, A.; Shaddock, D. A.; Shaltev, M.; Shapiro, B.; Shawhan, P.; Shihan Weerathunga, T.; Shoemaker, D. H.; Sibley, A.; Siemens, X.; Sigg, D.; Singer, A.; Singer, L.; Sintes, A. M.; Skelton, G.; Slagmolen, B. J. J.; Slutsky, J.; Smith, J. R.; Smith, M. R.; Smith, N. D.; Smith, R.; Somiya, K.; Sorazu, B.; Soto, J.; Speirits, F. C.; Sperandio, L.; Stefszky, M.; Stein, A. J.; Steinlechner, J.; Steinlechner, S.; Steplewski, S.; Stochino, A.; Stone, R.; Strain, K. A.; Strigin, S.; Stroeer, A. S.; Sturani, R.; Stuver, A. L.; Summerscales, T. Z.; Sung, M.; Susmithan, S.; Sutton, P. J.; Swinkels, B.; Szokoly, G. P.; Tacca, M.; Talukder, D.; Tanner, D. B.; Tarabrin, S. P.; Taylor, J. R.; Taylor, R.; Thomas, P.; Thorne, K. A.; Thorne, K. S.; Thrane, E.; Thüring, A.; Titsler, C.; Tokmakov, K. V.; Toncelli, A.; Tonelli, M.; Torre, O.; Torres, C.; Torrie, C. I.; Tournefier, E.; Travasso, F.; Traylor, G.; Trias, M.; Tseng, K.; Turner, L.; Ugolini, D.; Urbanek, K.; Vahlbruch, H.; Vaishnav, B.; Vajente, G.; Vallisneri, M.; van den Brand, J. F. J.; Van Den Broeck, C.; van der Putten, S.; van der Sluys, M. V.; van Veggel, A. A.; Vass, S.; Vasuth, M.; Vaulin, R.; Vavoulidis, M.; Vecchio, A.; Vedovato, G.; Veitch, J.; Veitch, P. J.; Veltkamp, C.; Verkindt, D.; Vetrano, F.; Viceré, A.; Villar, A. E.; Vinet, J.-Y.; Vocca, H.; Vorvick, C.; Vyachanin, S. P.; Waldman, S. J.; Wallace, L.; Wanner, A.; Ward, R. L.; Was, M.; Wei, P.; Weinert, M.; Weinstein, A. J.; Weiss, R.; Wen, L.; Wen, S.; Wessels, P.; West, M.; Westphal, T.; Wette, K.; Whelan, J. T.; Whitcomb, S. E.; White, D.; Whiting, B. F.; Wilkinson, C.; Willems, P. A.; Williams, H. R.; Williams, L.; Willke, B.; Winkelmann, L.; Winkler, W.; Wipf, C. C.; Wiseman, A. G.; Woan, G.; Wooley, R.; Worden, J.; Yablon, J.; Yakushin, I.; Yamamoto, H.; Yamamoto, K.; Yang, H.; Yeaton-Massey, D.; Yoshida, S.; Yu, P.; Yvert, M.; Zanolin, M.; Zhang, L.; Zhang, Z.; Zhao, C.; Zotov, N.; Zucker, M. E.; Zweizig, J.; LIGO Scientific Collaboration; Virgo Collaboration; Buchner, S.; Hotan, A.; Palfreyman, J.
2011-08-01
We present direct upper limits on continuous gravitational wave emission from the Vela pulsar using data from the Virgo detector's second science run. These upper limits have been obtained using three independent methods that assume the gravitational wave emission follows the radio timing. Two of the methods produce frequentist upper limits for an assumed known orientation of the star's spin axis and value of the wave polarization angle of, respectively, 1.9 × 10-24 and 2.2 × 10-24, with 95% confidence. The third method, under the same hypothesis, produces a Bayesian upper limit of 2.1 × 10-24, with 95% degree of belief. These limits are below the indirect spin-down limit of 3.3 × 10-24 for the Vela pulsar, defined by the energy loss rate inferred from observed decrease in Vela's spin frequency, and correspond to a limit on the star ellipticity of ~10-3. Slightly less stringent results, but still well below the spin-down limit, are obtained assuming the star's spin axis inclination and the wave polarization angles are unknown.
Waldmann, P; García-Gil, M R; Sillanpää, M J
2005-06-01
Comparison of the level of differentiation at neutral molecular markers (estimated as F(ST) or G(ST)) with the level of differentiation at quantitative traits (estimated as Q(ST)) has become a standard tool for inferring that there is differential selection between populations. We estimated Q(ST) of timing of bud set from a latitudinal cline of Pinus sylvestris with a Bayesian hierarchical variance component method utilizing the information on the pre-estimated population structure from neutral molecular markers. Unfortunately, the between-family variances differed substantially between populations that resulted in a bimodal posterior of Q(ST) that could not be compared in any sensible way with the unimodal posterior of the microsatellite F(ST). In order to avoid publishing studies with flawed Q(ST) estimates, we recommend that future studies should present heritability estimates for each trait and population. Moreover, to detect variance heterogeneity in frequentist methods (ANOVA and REML), it is of essential importance to check also that the residuals are normally distributed and do not follow any systematically deviating trends.
Multivariate meta-analysis using individual participant data.
Riley, R D; Price, M J; Jackson, D; Wardle, M; Gueyffier, F; Wang, J; Staessen, J A; White, I R
2015-06-01
When combining results across related studies, a multivariate meta-analysis allows the joint synthesis of correlated effect estimates from multiple outcomes. Joint synthesis can improve efficiency over separate univariate syntheses, may reduce selective outcome reporting biases, and enables joint inferences across the outcomes. A common issue is that within-study correlations needed to fit the multivariate model are unknown from published reports. However, provision of individual participant data (IPD) allows them to be calculated directly. Here, we illustrate how to use IPD to estimate within-study correlations, using a joint linear regression for multiple continuous outcomes and bootstrapping methods for binary, survival and mixed outcomes. In a meta-analysis of 10 hypertension trials, we then show how these methods enable multivariate meta-analysis to address novel clinical questions about continuous, survival and binary outcomes; treatment-covariate interactions; adjusted risk/prognostic factor effects; longitudinal data; prognostic and multiparameter models; and multiple treatment comparisons. Both frequentist and Bayesian approaches are applied, with example software code provided to derive within-study correlations and to fit the models. © 2014 The Authors. Research Synthesis Methods published by John Wiley & Sons, Ltd.
A Test by Any Other Name: P Values, Bayes Factors, and Statistical Inference.
Stern, Hal S
2016-01-01
Procedures used for statistical inference are receiving increased scrutiny as the scientific community studies the factors associated with insuring reproducible research. This note addresses recent negative attention directed at p values, the relationship of confidence intervals and tests, and the role of Bayesian inference and Bayes factors, with an eye toward better understanding these different strategies for statistical inference. We argue that researchers and data analysts too often resort to binary decisions (e.g., whether to reject or accept the null hypothesis) in settings where this may not be required.
Using of bayesian networks to estimate the probability of "NATECH" scenario occurrence
NASA Astrophysics Data System (ADS)
Dobes, Pavel; Dlabka, Jakub; Jelšovská, Katarína; Polorecká, Mária; Baudišová, Barbora; Danihelka, Pavel
2015-04-01
In the twentieth century, implementation of Bayesian statistics and probability was not much used (may be it wasn't a preferred approach) in the area of natural and industrial risk analysis and management. Neither it was used within analysis of so called NATECH accidents (chemical accidents triggered by natural events, such as e.g. earthquakes, floods, lightning etc.; ref. E. Krausmann, 2011, doi:10.5194/nhess-11-921-2011). Main role, from the beginning, played here so called "classical" frequentist probability (ref. Neyman, 1937), which rely up to now especially on the right/false results of experiments and monitoring and didn't enable to count on expert's beliefs, expectations and judgements (which is, on the other hand, one of the once again well known pillars of Bayessian approach to probability). In the last 20 or 30 years, there is possible to observe, through publications and conferences, the Renaissance of Baysssian statistics into many scientific disciplines (also into various branches of geosciences). The necessity of a certain level of trust in expert judgment within risk analysis is back? After several decades of development on this field, it could be proposed following hypothesis (to be checked): "We couldn't estimate probabilities of complex crisis situations and their TOP events (many NATECH events could be classified as crisis situations or emergencies), only by classical frequentist approach, but also by using of Bayessian approach (i.e. with help of prestaged Bayessian Network including expert belief and expectation as well as classical frequentist inputs). Because - there is not always enough quantitative information from monitoring of historical emergencies, there could be several dependant or independant variables necessary to consider and in generally - every emergency situation always have a little different run." In this topic, team of authors presents its proposal of prestaged typized Bayessian network model for specified NATECH scenario (heavy rainfalls AND/OR melting snow OR earthquake -> landslides AND/OR floods -> major chemical accident), comparing it with "Black Box approach" and with so called "Bow-tie approach" (ref. C. A. Brebbia, Risk Analysis VIII, p.103-111 , WIT Press, 2012) - visualisation of development of the scenario with possibility to calculate frequencies (TOP event of the scenario, developed both ways down to initation events and upwards to end accidental events, using Fault Tree Analysis and Event Tree Analysis methods). This model can include also possible terrorist attack on the chemical facility with potential of major release of chemical into the environmental compartments (water, soil, air), with the goal to threaten environmental safety in the specific area. The study was supported by the project no. VG20132015128 "Increasing of the Environmental Safety & Security by the Prevention of Industrial Chemicals Misuse to the Terrorism", supported by the Ministry of the Interior of the Czech Republic through Security Research Programme, 2013-2015.
A systematic review of Bayesian articles in psychology: The last 25 years.
van de Schoot, Rens; Winter, Sonja D; Ryan, Oisín; Zondervan-Zwijnenburg, Mariëlle; Depaoli, Sarah
2017-06-01
Although the statistical tools most often used by researchers in the field of psychology over the last 25 years are based on frequentist statistics, it is often claimed that the alternative Bayesian approach to statistics is gaining in popularity. In the current article, we investigated this claim by performing the very first systematic review of Bayesian psychological articles published between 1990 and 2015 (n = 1,579). We aim to provide a thorough presentation of the role Bayesian statistics plays in psychology. This historical assessment allows us to identify trends and see how Bayesian methods have been integrated into psychological research in the context of different statistical frameworks (e.g., hypothesis testing, cognitive models, IRT, SEM, etc.). We also describe take-home messages and provide "big-picture" recommendations to the field as Bayesian statistics becomes more popular. Our review indicated that Bayesian statistics is used in a variety of contexts across subfields of psychology and related disciplines. There are many different reasons why one might choose to use Bayes (e.g., the use of priors, estimating otherwise intractable models, modeling uncertainty, etc.). We found in this review that the use of Bayes has increased and broadened in the sense that this methodology can be used in a flexible manner to tackle many different forms of questions. We hope this presentation opens the door for a larger discussion regarding the current state of Bayesian statistics, as well as future trends. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Using Guided Reinvention to Develop Teachers' Understanding of Hypothesis Testing Concepts
ERIC Educational Resources Information Center
Dolor, Jason; Noll, Jennifer
2015-01-01
Statistics education reform efforts emphasize the importance of informal inference in the learning of statistics. Research suggests statistics teachers experience similar difficulties understanding statistical inference concepts as students and how teacher knowledge can impact student learning. This study investigates how teachers reinvented an…
Defining Probability in Sex Offender Risk Assessment.
Elwood, Richard W
2016-12-01
There is ongoing debate and confusion over using actuarial scales to predict individuals' risk of sexual recidivism. Much of the debate comes from not distinguishing Frequentist from Bayesian definitions of probability. Much of the confusion comes from applying Frequentist probability to individuals' risk. By definition, only Bayesian probability can be applied to the single case. The Bayesian concept of probability resolves most of the confusion and much of the debate in sex offender risk assessment. Although Bayesian probability is well accepted in risk assessment generally, it has not been widely used to assess the risk of sex offenders. I review the two concepts of probability and show how the Bayesian view alone provides a coherent scheme to conceptualize individuals' risk of sexual recidivism.
Predicting a future lifetime through Box-Cox transformation.
Yang, Z
1999-09-01
In predicting a future lifetime based on a sample of past lifetimes, the Box-Cox transformation method provides a simple and unified procedure that is shown in this article to meet or often outperform the corresponding frequentist solution in terms of coverage probability and average length of prediction intervals. Kullback-Leibler information and second-order asymptotic expansion are used to justify the Box-Cox procedure. Extensive Monte Carlo simulations are also performed to evaluate the small sample behavior of the procedure. Certain popular lifetime distributions, such as Weibull, inverse Gaussian and Birnbaum-Saunders are served as illustrative examples. One important advantage of the Box-Cox procedure lies in its easy extension to linear model predictions where the exact frequentist solutions are often not available.
ERIC Educational Resources Information Center
Larwin, Karen H.; Larwin, David A.
2011-01-01
Bootstrapping methods and random distribution methods are increasingly recommended as better approaches for teaching students about statistical inference in introductory-level statistics courses. The authors examined the effect of teaching undergraduate business statistics students using random distribution and bootstrapping simulations. It is the…
Application of Transformations in Parametric Inference
ERIC Educational Resources Information Center
Brownstein, Naomi; Pensky, Marianna
2008-01-01
The objective of the present paper is to provide a simple approach to statistical inference using the method of transformations of variables. We demonstrate performance of this powerful tool on examples of constructions of various estimation procedures, hypothesis testing, Bayes analysis and statistical inference for the stress-strength systems.…
Statistical inference for tumor growth inhibition T/C ratio.
Wu, Jianrong
2010-09-01
The tumor growth inhibition T/C ratio is commonly used to quantify treatment effects in drug screening tumor xenograft experiments. The T/C ratio is converted to an antitumor activity rating using an arbitrary cutoff point and often without any formal statistical inference. Here, we applied a nonparametric bootstrap method and a small sample likelihood ratio statistic to make a statistical inference of the T/C ratio, including both hypothesis testing and a confidence interval estimate. Furthermore, sample size and power are also discussed for statistical design of tumor xenograft experiments. Tumor xenograft data from an actual experiment were analyzed to illustrate the application.
Investigating Mathematics Teachers' Thoughts of Statistical Inference
ERIC Educational Resources Information Center
Yang, Kai-Lin
2012-01-01
Research on statistical cognition and application suggests that statistical inference concepts are commonly misunderstood by students and even misinterpreted by researchers. Although some research has been done on students' misunderstanding or misconceptions of confidence intervals (CIs), few studies explore either students' or mathematics…
Lessons from Inferentialism for Statistics Education
ERIC Educational Resources Information Center
Bakker, Arthur; Derry, Jan
2011-01-01
This theoretical paper relates recent interest in informal statistical inference (ISI) to the semantic theory termed inferentialism, a significant development in contemporary philosophy, which places inference at the heart of human knowing. This theory assists epistemological reflection on challenges in statistics education encountered when…
Zero-inflated spatio-temporal models for disease mapping.
Torabi, Mahmoud
2017-05-01
In this paper, our aim is to analyze geographical and temporal variability of disease incidence when spatio-temporal count data have excess zeros. To that end, we consider random effects in zero-inflated Poisson models to investigate geographical and temporal patterns of disease incidence. Spatio-temporal models that employ conditionally autoregressive smoothing across the spatial dimension and B-spline smoothing over the temporal dimension are proposed. The analysis of these complex models is computationally difficult from the frequentist perspective. On the other hand, the advent of the Markov chain Monte Carlo algorithm has made the Bayesian analysis of complex models computationally convenient. Recently developed data cloning method provides a frequentist approach to mixed models that is also computationally convenient. We propose to use data cloning, which yields to maximum likelihood estimation, to conduct frequentist analysis of zero-inflated spatio-temporal modeling of disease incidence. One of the advantages of the data cloning approach is that the prediction and corresponding standard errors (or prediction intervals) of smoothing disease incidence over space and time is easily obtained. We illustrate our approach using a real dataset of monthly children asthma visits to hospital in the province of Manitoba, Canada, during the period April 2006 to March 2010. Performance of our approach is also evaluated through a simulation study. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Statistical Inference and Patterns of Inequality in the Global North
ERIC Educational Resources Information Center
Moran, Timothy Patrick
2006-01-01
Cross-national inequality trends have historically been a crucial field of inquiry across the social sciences, and new methodological techniques of statistical inference have recently improved the ability to analyze these trends over time. This paper applies Monte Carlo, bootstrap inference methods to the income surveys of the Luxembourg Income…
Statistical inference and Aristotle's Rhetoric.
Macdonald, Ranald R
2004-11-01
Formal logic operates in a closed system where all the information relevant to any conclusion is present, whereas this is not the case when one reasons about events and states of the world. Pollard and Richardson drew attention to the fact that the reasoning behind statistical tests does not lead to logically justifiable conclusions. In this paper statistical inferences are defended not by logic but by the standards of everyday reasoning. Aristotle invented formal logic, but argued that people mostly get at the truth with the aid of enthymemes--incomplete syllogisms which include arguing from examples, analogies and signs. It is proposed that statistical tests work in the same way--in that they are based on examples, invoke the analogy of a model and use the size of the effect under test as a sign that the chance hypothesis is unlikely. Of existing theories of statistical inference only a weak version of Fisher's takes this into account. Aristotle anticipated Fisher by producing an argument of the form that there were too many cases in which an outcome went in a particular direction for that direction to be plausibly attributed to chance. We can therefore conclude that Aristotle would have approved of statistical inference and there is a good reason for calling this form of statistical inference classical.
Barreto, Rafael E; Narváez, Javier; Sepúlveda, Natalia A; Velásquez, Fabián C; Díaz, Sandra C; López, Myriam Consuelo; Reyes, Patricia; Moncada, Ligia I
2017-09-01
Public health programs for the control of soil-transmitted helminthiases require valid diagnostic tests for surveillance and parasitic control evaluation. However, there is currently no agreement about what test should be used as a gold standard for the diagnosis of hookworm infection. Still, in presence of concurrent data for multiple tests it is possible to use statistical models to estimate measures of test performance and prevalence. The aim of this study was to estimate the diagnostic accuracy of five parallel tests (direct microscopic examination, Kato-Katz, Harada-Mori, modified Ritchie-Frick, and culture in agar plate) to detect hookworm infections in a sample of school-aged children from a rural area in Colombia. We used both, a frequentist approach, and Bayesian latent class models to estimate the sensitivity and specificity of five tests for hookworm detection, and to estimate the prevalence of hookworm infection in absence of a Gold Standard. The Kato-Katz and agar plate methods had an overall agreement of 95% and kappa coefficient of 0.76. Different models estimated a sensitivity between 76% and 92% for the agar plate technique, and 52% to 87% for the Kato-Katz technique. The other tests had lower sensitivity. All tests had specificity between 95% and 98%. The prevalence estimated by the Kato-Katz and Agar plate methods for different subpopulations varied between 10% and 14%, and was consistent with the prevalence estimated from the combination of all tests. The Harada-Mori, Ritchie-Frick and direct examination techniques resulted in lower and disparate prevalence estimates. Bayesian approaches assuming imperfect specificity resulted in lower prevalence estimates than the frequentist approach. Copyright © 2017 Elsevier B.V. All rights reserved.
CADDIS Volume 4. Data Analysis: Biological and Environmental Data Requirements
Overview of PECBO Module, using scripts to infer environmental conditions from biological observations, statistically estimating species-environment relationships, methods for inferring environmental conditions, statistical scripts in module.
Statistical methods for the beta-binomial model in teratology.
Yamamoto, E; Yanagimoto, T
1994-01-01
The beta-binomial model is widely used for analyzing teratological data involving littermates. Recent developments in statistical analyses of teratological data are briefly reviewed with emphasis on the model. For statistical inference of the parameters in the beta-binomial distribution, separation of the likelihood introduces an likelihood inference. This leads to reducing biases of estimators and also to improving accuracy of empirical significance levels of tests. Separate inference of the parameters can be conducted in a unified way. PMID:8187716
On Some Assumptions of the Null Hypothesis Statistical Testing
ERIC Educational Resources Information Center
Patriota, Alexandre Galvão
2017-01-01
Bayesian and classical statistical approaches are based on different types of logical principles. In order to avoid mistaken inferences and misguided interpretations, the practitioner must respect the inference rules embedded into each statistical method. Ignoring these principles leads to the paradoxical conclusions that the hypothesis…
Direct evidence for a dual process model of deductive inference.
Markovits, Henry; Brunet, Marie-Laurence; Thompson, Valerie; Brisson, Janie
2013-07-01
In 2 experiments, we tested a strong version of a dual process theory of conditional inference (cf. Verschueren et al., 2005a, 2005b) that assumes that most reasoners have 2 strategies available, the choice of which is determined by situational variables, cognitive capacity, and metacognitive control. The statistical strategy evaluates inferences probabilistically, accepting those with high conditional probability. The counterexample strategy rejects inferences when a counterexample shows the inference to be invalid. To discriminate strategy use, we presented reasoners with conditional statements (if p, then q) and explicit statistical information about the relative frequency of the probability of p/q (50% vs. 90%). A statistical strategy would accept the more probable inferences more frequently, whereas the counterexample one would reject both. In Experiment 1, reasoners under time pressure used the statistical strategy more, but switched to the counterexample strategy when time constraints were removed; the former took less time than the latter. These data are consistent with the hypothesis that the statistical strategy is the default heuristic. Under a free-time condition, reasoners preferred the counterexample strategy and kept it when put under time pressure. Thus, it is not simply a lack of capacity that produces a statistical strategy; instead, it seems that time pressure disrupts the ability to make good metacognitive choices. In line with this conclusion, in a 2nd experiment, we measured reasoners' confidence in their performance; those under time pressure were less confident in the statistical than the counterexample strategy and more likely to switch strategies under free-time conditions. PsycINFO Database Record (c) 2013 APA, all rights reserved.
Onisko, Agnieszka; Druzdzel, Marek J; Austin, R Marshall
2016-01-01
Classical statistics is a well-established approach in the analysis of medical data. While the medical community seems to be familiar with the concept of a statistical analysis and its interpretation, the Bayesian approach, argued by many of its proponents to be superior to the classical frequentist approach, is still not well-recognized in the analysis of medical data. The goal of this study is to encourage data analysts to use the Bayesian approach, such as modeling with graphical probabilistic networks, as an insightful alternative to classical statistical analysis of medical data. This paper offers a comparison of two approaches to analysis of medical time series data: (1) classical statistical approach, such as the Kaplan-Meier estimator and the Cox proportional hazards regression model, and (2) dynamic Bayesian network modeling. Our comparison is based on time series cervical cancer screening data collected at Magee-Womens Hospital, University of Pittsburgh Medical Center over 10 years. The main outcomes of our comparison are cervical cancer risk assessments produced by the three approaches. However, our analysis discusses also several aspects of the comparison, such as modeling assumptions, model building, dealing with incomplete data, individualized risk assessment, results interpretation, and model validation. Our study shows that the Bayesian approach is (1) much more flexible in terms of modeling effort, and (2) it offers an individualized risk assessment, which is more cumbersome for classical statistical approaches.
DAMPE squib? Significance of the 1.4 TeV DAMPE excess
NASA Astrophysics Data System (ADS)
Fowlie, Andrew
2018-05-01
We present a Bayesian and frequentist analysis of the DAMPE charged cosmic ray spectrum. The spectrum, by eye, contained a spectral break at about 1TeV and a monochromatic excess at about 1.4TeV. The break was supported by a Bayes factor of about 1010 and we argue that the statistical significance was resounding. We investigated whether we should attribute the excess to dark matter annihilation into electrons in a nearby subhalo. We found a local significance of about 3.6σ and a global significance of about 2.3σ, including a two-dimensional look-elsewhere effect by simulating 1000 pseudo-experiments. The Bayes factor was sensitive to our choices of priors, but favoured the excess by about 2 for our choices. Thus, whilst intriguing, the evidence for a signal is not currently compelling.
NASA Astrophysics Data System (ADS)
Abbott, B. P.; Abbott, R.; Abbott, T. D.; Acernese, F.; Ackley, K.; Adams, C.; Adams, T.; Addesso, P.; Adhikari, R. X.; Adya, V. B.; Affeldt, C.; Afrough, M.; Agarwal, B.; Agatsuma, K.; Aggarwal, N.; Aguiar, O. D.; Aiello, L.; Ain, A.; Ajith, P.; Allen, B.; Allen, G.; Allocca, A.; Almoubayyed, H.; Altin, P. A.; Amato, A.; Ananyeva, A.; Anderson, S. B.; Anderson, W. G.; Antier, S.; Appert, S.; Arai, K.; Araya, M. C.; Areeda, J. S.; Arnaud, N.; Arun, K. G.; Ascenzi, S.; Ashton, G.; Ast, M.; Aston, S. M.; Astone, P.; Aufmuth, P.; Aulbert, C.; AultONeal, K.; Avila-Alvarez, A.; Babak, S.; Bacon, P.; Bader, M. K. M.; Bae, S.; Baker, P. T.; Baldaccini, F.; Ballardin, G.; Ballmer, S. W.; Banagiri, S.; Barayoga, J. C.; Barclay, S. E.; Barish, B. C.; Barker, D.; Barone, F.; Barr, B.; Barsotti, L.; Barsuglia, M.; Barta, D.; Bartlett, J.; Bartos, I.; Bassiri, R.; Basti, A.; Batch, J. C.; Baune, C.; Bawaj, M.; Bazzan, M.; Bécsy, B.; Beer, C.; Bejger, M.; Belahcene, I.; Bell, A. S.; Berger, B. K.; Bergmann, G.; Berry, C. P. L.; Bersanetti, D.; Bertolini, A.; Etienne, Z. B.; Betzwieser, J.; Bhagwat, S.; Bhandare, R.; Bilenko, I. A.; Billingsley, G.; Billman, C. R.; Birch, J.; Birney, R.; Birnholtz, O.; Biscans, S.; Bisht, A.; Bitossi, M.; Biwer, C.; Bizouard, M. A.; Blackburn, J. K.; Blackman, J.; Blair, C. D.; Blair, D. G.; Blair, R. M.; Bloemen, S.; Bock, O.; Bode, N.; Boer, M.; Bogaert, G.; Bohe, A.; Bondu, F.; Bonnand, R.; Boom, B. A.; Bork, R.; Boschi, V.; Bose, S.; Bouffanais, Y.; Bozzi, A.; Bradaschia, C.; Brady, P. R.; Braginsky, V. B.; Branchesi, M.; Brau, J. E.; Briant, T.; Brillet, A.; Brinkmann, M.; Brisson, V.; Brockill, P.; Broida, J. E.; Brooks, A. F.; Brown, D. A.; Brown, D. D.; Brown, N. M.; Brunett, S.; Buchanan, C. C.; Buikema, A.; Bulik, T.; Bulten, H. J.; Buonanno, A.; Buskulic, D.; Buy, C.; Byer, R. L.; Cabero, M.; Cadonati, L.; Cagnoli, G.; Cahillane, C.; Calderón Bustillo, J.; Callister, T. A.; Calloni, E.; Camp, J. B.; Canepa, M.; Canizares, P.; Cannon, K. C.; Cao, H.; Cao, J.; Capano, C. D.; Capocasa, E.; Carbognani, F.; Caride, S.; Carney, M. F.; Casanueva Diaz, J.; Casentini, C.; Caudill, S.; Cavaglià, M.; Cavalier, F.; Cavalieri, R.; Cella, G.; Cepeda, C. B.; Cerboni Baiardi, L.; Cerretani, G.; Cesarini, E.; Chamberlin, S. J.; Chan, M.; Chao, S.; Charlton, P.; Chassande-Mottin, E.; Chatterjee, D.; Cheeseboro, B. D.; Chen, H. Y.; Chen, Y.; Cheng, H.-P.; Chincarini, A.; Chiummo, A.; Chmiel, T.; Cho, H. S.; Cho, M.; Chow, J. H.; Christensen, N.; Chu, Q.; Chua, A. J. K.; Chua, S.; Chung, A. K. W.; Chung, S.; Ciani, G.; Ciolfi, R.; Cirelli, C. E.; Cirone, A.; Clara, F.; Clark, J. A.; Cleva, F.; Cocchieri, C.; Coccia, E.; Cohadon, P.-F.; Colla, A.; Collette, C. G.; Cominsky, L. R.; Constancio, M.; Conti, L.; Cooper, S. J.; Corban, P.; Corbitt, T. R.; Corley, K. R.; Cornish, N.; Corsi, A.; Cortese, S.; Costa, C. A.; Coughlin, M. W.; Coughlin, S. B.; Coulon, J.-P.; Countryman, S. T.; Couvares, P.; Covas, P. B.; Cowan, E. E.; Coward, D. M.; Cowart, M. J.; Coyne, D. C.; Coyne, R.; Creighton, J. D. E.; Creighton, T. D.; Cripe, J.; Crowder, S. G.; Cullen, T. J.; Cumming, A.; Cunningham, L.; Cuoco, E.; Dal Canton, T.; Danilishin, S. L.; D'Antonio, S.; Danzmann, K.; Dasgupta, A.; Da Silva Costa, C. F.; Dattilo, V.; Dave, I.; Davier, M.; Davies, G. S.; Davis, D.; Daw, E. J.; Day, B.; De, S.; DeBra, D.; Deelman, E.; Degallaix, J.; De Laurentis, M.; Deléglise, S.; Del Pozzo, W.; Denker, T.; Dent, T.; Dergachev, V.; De Rosa, R.; DeRosa, R. T.; DeSalvo, R.; Devenson, J.; Devine, R. C.; Dhurandhar, S.; Díaz, M. C.; Di Fiore, L.; Di Giovanni, M.; Di Girolamo, T.; Di Lieto, A.; Di Pace, S.; Di Palma, I.; Di Renzo, F.; Doctor, Z.; Dolique, V.; Donovan, F.; Dooley, K. L.; Doravari, S.; Dorrington, I.; Douglas, R.; Dovale Álvarez, M.; Downes, T. P.; Drago, M.; Drever, R. W. P.; Driggers, J. C.; Du, Z.; Ducrot, M.; Duncan, J.; Dwyer, S. E.; Edo, T. B.; Edwards, M. C.; Effler, A.; Eggenstein, H.-B.; Ehrens, P.; Eichholz, J.; Eikenberry, S. S.; Essick, R. C.; Etzel, T.; Evans, M.; Evans, T. M.; Factourovich, M.; Fafone, V.; Fair, H.; Fairhurst, S.; Fan, X.; Farinon, S.; Farr, B.; Farr, W. M.; Fauchon-Jones, E. J.; Favata, M.; Fays, M.; Fehrmann, H.; Feicht, J.; Fejer, M. M.; Fernandez-Galiana, A.; Ferrante, I.; Ferreira, E. C.; Ferrini, F.; Fidecaro, F.; Fiori, I.; Fiorucci, D.; Fisher, R. P.; Flaminio, R.; Fletcher, M.; Fong, H.; Forsyth, P. W. F.; Forsyth, S. S.; Fournier, J.-D.; Frasca, S.; Frasconi, F.; Frei, Z.; Freise, A.; Frey, R.; Frey, V.; Fries, E. M.; Fritschel, P.; Frolov, V. V.; Fulda, P.; Fyffe, M.; Gabbard, H.; Gabel, M.; Gadre, B. U.; Gaebel, S. M.; Gair, J. R.; Gammaitoni, L.; Ganija, M. R.; Gaonkar, S. G.; Garufi, F.; Gaudio, S.; Gaur, G.; Gayathri, V.; Gehrels, N.; Gemme, G.; Genin, E.; Gennai, A.; George, D.; George, J.; Gergely, L.; Germain, V.; Ghonge, S.; Ghosh, Abhirup; Ghosh, Archisman; Ghosh, S.; Giaime, J. A.; Giardina, K. D.; Giazotto, A.; Gill, K.; Glover, L.; Goetz, E.; Goetz, R.; Gomes, S.; González, G.; Gonzalez Castro, J. M.; Gopakumar, A.; Gorodetsky, M. L.; Gossan, S. E.; Gosselin, M.; Gouaty, R.; Grado, A.; Graef, C.; Granata, M.; Grant, A.; Gras, S.; Gray, C.; Greco, G.; Green, A. C.; Groot, P.; Grote, H.; Grunewald, S.; Gruning, P.; Guidi, G. M.; Guo, X.; Gupta, A.; Gupta, M. K.; Gushwa, K. E.; Gustafson, E. K.; Gustafson, R.; Hall, B. R.; Hall, E. D.; Hammond, G.; Haney, M.; Hanke, M. M.; Hanks, J.; Hanna, C.; Hannuksela, O. A.; Hanson, J.; Hardwick, T.; Harms, J.; Harry, G. M.; Harry, I. W.; Hart, M. J.; Haster, C.-J.; Haughian, K.; Healy, J.; Heidmann, A.; Heintze, M. C.; Heitmann, H.; Hello, P.; Hemming, G.; Hendry, M.; Heng, I. S.; Hennig, J.; Henry, J.; Heptonstall, A. W.; Heurs, M.; Hild, S.; Hoak, D.; Hofman, D.; Holt, K.; Holz, D. E.; Hopkins, P.; Horst, C.; Hough, J.; Houston, E. A.; Howell, E. J.; Hu, Y. M.; Huerta, E. A.; Huet, D.; Hughey, B.; Husa, S.; Huttner, S. H.; Huynh-Dinh, T.; Indik, N.; Ingram, D. R.; Inta, R.; Intini, G.; Isa, H. N.; Isac, J.-M.; Isi, M.; Iyer, B. R.; Izumi, K.; Jacqmin, T.; Jani, K.; Jaranowski, P.; Jawahar, S.; Jiménez-Forteza, F.; Johnson, W. W.; Jones, D. I.; Jones, R.; Jonker, R. J. G.; Ju, L.; Junker, J.; Kalaghatgi, C. V.; Kalogera, V.; Kandhasamy, S.; Kang, G.; Kanner, J. B.; Karki, S.; Karvinen, K. S.; Kasprzack, M.; Katolik, M.; Katsavounidis, E.; Katzman, W.; Kaufer, S.; Kawabe, K.; Kéfélian, F.; Keitel, D.; Kemball, A. J.; Kennedy, R.; Kent, C.; Key, J. S.; Khalili, F. Y.; Khan, I.; Khan, S.; Khan, Z.; Khazanov, E. A.; Kijbunchoo, N.; Kim, Chunglee; Kim, J. C.; Kim, W.; Kim, W. S.; Kim, Y.-M.; Kimbrell, S. J.; King, E. J.; King, P. J.; Kirchhoff, R.; Kissel, J. S.; Kleybolte, L.; Klimenko, S.; Koch, P.; Koehlenbeck, S. M.; Koley, S.; Kondrashov, V.; Kontos, A.; Korobko, M.; Korth, W. Z.; Kowalska, I.; Kozak, D. B.; Krämer, C.; Kringel, V.; Krishnan, B.; Królak, A.; Kuehn, G.; Kumar, P.; Kumar, R.; Kumar, S.; Kuo, L.; Kutynia, A.; Kwang, S.; Lackey, B. D.; Lai, K. H.; Landry, M.; Lang, R. N.; Lange, J.; Lantz, B.; Lanza, R. K.; Lartaux-Vollard, A.; Lasky, P. D.; Laxen, M.; Lazzarini, A.; Lazzaro, C.; Leaci, P.; Leavey, S.; Lee, C. H.; Lee, H. K.; Lee, H. M.; Lee, H. W.; Lee, K.; Lehmann, J.; Lenon, A.; Leonardi, M.; Leroy, N.; Letendre, N.; Levin, Y.; Li, T. G. F.; Libson, A.; Littenberg, T. B.; Liu, J.; Lockerbie, N. A.; London, L. T.; Lord, J. E.; Lorenzini, M.; Loriette, V.; Lormand, M.; Losurdo, G.; Lough, J. D.; Lovelace, G.; Lück, H.; Lumaca, D.; Lundgren, A. P.; Lynch, R.; Ma, Y.; Macfoy, S.; Machenschalk, B.; MacInnis, M.; Macleod, D. M.; Magaña Hernandez, I.; Magaña-Sandoval, F.; Magaña Zertuche, L.; Magee, R. M.; Majorana, E.; Maksimovic, I.; Man, N.; Mandic, V.; Mangano, V.; Mansell, G. L.; Manske, M.; Mantovani, M.; Marchesoni, F.; Marion, F.; Márka, S.; Márka, Z.; Markakis, C.; Markosyan, A. S.; Maros, E.; Martelli, F.; Martellini, L.; Martin, I. W.; Martynov, D. V.; Marx, J. N.; Mason, K.; Masserot, A.; Massinger, T. J.; Masso-Reid, M.; Mastrogiovanni, S.; Matas, A.; Matichard, F.; Matone, L.; Mavalvala, N.; Mayani, R.; Mazumder, N.; McCarthy, R.; McClelland, D. E.; McCormick, S.; McCuller, L.; McGuire, S. C.; McIntyre, G.; McIver, J.; McManus, D. J.; McRae, T.; McWilliams, S. T.; Meacher, D.; Meadors, G. D.; Meidam, J.; Mejuto-Villa, E.; Melatos, A.; Mendell, G.; Mercer, R. A.; Merilh, E. L.; Merzougui, M.; Meshkov, S.; Messenger, C.; Messick, C.; Metzdorff, R.; Meyers, P. M.; Mezzani, F.; Miao, H.; Michel, C.; Middleton, H.; Mikhailov, E. E.; Milano, L.; Miller, A. L.; Miller, A.; Miller, B. B.; Miller, J.; Millhouse, M.; Minazzoli, O.; Minenkov, Y.; Ming, J.; Mishra, C.; Mitra, S.; Mitrofanov, V. P.; Mitselmakher, G.; Mittleman, R.; Moggi, A.; Mohan, M.; Mohapatra, S. R. P.; Montani, M.; Moore, B. C.; Moore, C. J.; Moraru, D.; Moreno, G.; Morriss, S. R.; Mours, B.; Mow-Lowry, C. M.; Mueller, G.; Muir, A. W.; Mukherjee, Arunava; Mukherjee, D.; Mukherjee, S.; Mukund, N.; Mullavey, A.; Munch, J.; Muniz, E. A. M.; Murray, P. G.; Napier, K.; Nardecchia, I.; Naticchioni, L.; Nayak, R. K.; Nelemans, G.; Nelson, T. J. N.; Neri, M.; Nery, M.; Neunzert, A.; Newport, J. M.; Newton, G.; Ng, K. K. Y.; Nguyen, T. T.; Nichols, D.; Nielsen, A. B.; Nissanke, S.; Nitz, A.; Noack, A.; Nocera, F.; Nolting, D.; Normandin, M. E. N.; Nuttall, L. K.; Oberling, J.; Ochsner, E.; Oelker, E.; Ogin, G. H.; Oh, J. J.; Oh, S. H.; Ohme, F.; Oliver, M.; Oppermann, P.; Oram, Richard J.; O'Reilly, B.; Ormiston, R.; Ortega, L. F.; O'Shaughnessy, R.; Ottaway, D. J.; Overmier, H.; Owen, B. J.; Pace, A. E.; Page, J.; Page, M. A.; Pai, A.; Pai, S. A.; Palamos, J. R.; Palashov, O.; Palomba, C.; Pal-Singh, A.; Pan, H.; Pang, B.; Pang, P. T. H.; Pankow, C.; Pannarale, F.; Pant, B. C.; Paoletti, F.; Paoli, A.; Papa, M. A.; Paris, H. R.; Parker, W.; Pascucci, D.; Pasqualetti, A.; Passaquieti, R.; Passuello, D.; Patricelli, B.; Pearlstone, B. L.; Pedraza, M.; Pedurand, R.; Pekowsky, L.; Pele, A.; Penn, S.; Perez, C. J.; Perreca, A.; Perri, L. M.; Pfeiffer, H. P.; Phelps, M.; Piccinni, O. J.; Pichot, M.; Piergiovanni, F.; Pierro, V.; Pillant, G.; Pinard, L.; Pinto, I. M.; Pitkin, M.; Poggiani, R.; Popolizio, P.; Porter, E. K.; Post, A.; Powell, J.; Prasad, J.; Pratt, J. W. W.; Predoi, V.; Prestegard, T.; Prijatelj, M.; Principe, M.; Privitera, S.; Prix, R.; Prodi, G. A.; Prokhorov, L. G.; Puncken, O.; Punturo, M.; Puppo, P.; Pürrer, M.; Qi, H.; Qin, J.; Qiu, S.; Quetschke, V.; Quintero, E. A.; Quitzow-James, R.; Raab, F. J.; Rabeling, D. S.; Radkins, H.; Raffai, P.; Raja, S.; Rajan, C.; Rakhmanov, M.; Ramirez, K. E.; Rapagnani, P.; Raymond, V.; Razzano, M.; Read, J.; Regimbau, T.; Rei, L.; Reid, S.; Reitze, D. H.; Rew, H.; Reyes, S. D.; Ricci, F.; Ricker, P. M.; Rieger, S.; Riles, K.; Rizzo, M.; Robertson, N. A.; Robie, R.; Robinet, F.; Rocchi, A.; Rolland, L.; Rollins, J. G.; Roma, V. J.; Romano, R.; Romel, C. L.; Romie, J. H.; Rosińska, D.; Ross, M. P.; Rowan, S.; Rüdiger, A.; Ruggi, P.; Ryan, K.; Rynge, M.; Sachdev, S.; Sadecki, T.; Sadeghian, L.; Sakellariadou, M.; Salconi, L.; Saleem, M.; Salemi, F.; Samajdar, A.; Sammut, L.; Sampson, L. M.; Sanchez, E. J.; Sandberg, V.; Sandeen, B.; Sanders, J. R.; Sassolas, B.; Sathyaprakash, B. S.; Saulson, P. R.; Sauter, O.; Savage, R. L.; Sawadsky, A.; Schale, P.; Scheuer, J.; Schmidt, E.; Schmidt, J.; Schmidt, P.; Schnabel, R.; Schofield, R. M. S.; Schönbeck, A.; Schreiber, E.; Schuette, D.; Schulte, B. W.; Schutz, B. F.; Schwalbe, S. G.; Scott, J.; Scott, S. M.; Seidel, E.; Sellers, D.; Sengupta, A. S.; Sentenac, D.; Sequino, V.; Sergeev, A.; Shaddock, D. A.; Shaffer, T. J.; Shah, A. A.; Shahriar, M. S.; Shao, L.; Shapiro, B.; Shawhan, P.; Sheperd, A.; Shoemaker, D. H.; Shoemaker, D. M.; Siellez, K.; Siemens, X.; Sieniawska, M.; Sigg, D.; Silva, A. D.; Singer, A.; Singer, L. P.; Singh, A.; Singh, R.; Singhal, A.; Sintes, A. M.; Slagmolen, B. J. J.; Smith, B.; Smith, J. R.; Smith, R. J. E.; Son, E. J.; Sonnenberg, J. A.; Sorazu, B.; Sorrentino, F.; Souradeep, T.; Spencer, A. P.; Srivastava, A. K.; Staley, A.; Steinke, M.; Steinlechner, J.; Steinlechner, S.; Steinmeyer, D.; Stephens, B. C.; Stone, R.; Strain, K. A.; Stratta, G.; Strigin, S. E.; Sturani, R.; Stuver, A. L.; Summerscales, T. Z.; Sun, L.; Sunil, S.; Sutton, P. J.; Swinkels, B. L.; Szczepańczyk, M. J.; Tacca, M.; Talukder, D.; Tanner, D. B.; Tápai, M.; Taracchini, A.; Taylor, J. A.; Taylor, R.; Theeg, T.; Thomas, E. G.; Thomas, M.; Thomas, P.; Thorne, K. A.; Thorne, K. S.; Thrane, E.; Tiwari, S.; Tiwari, V.; Tokmakov, K. V.; Toland, K.; Tonelli, M.; Tornasi, Z.; Torrie, C. I.; Töyrä, D.; Travasso, F.; Traylor, G.; Trifirò, D.; Trinastic, J.; Tringali, M. C.; Trozzo, L.; Tsang, K. W.; Tse, M.; Tso, R.; Tuyenbayev, D.; Ueno, K.; Ugolini, D.; Unnikrishnan, C. S.; Urban, A. L.; Usman, S. A.; Vahi, K.; Vahlbruch, H.; Vajente, G.; Valdes, G.; van Bakel, N.; van Beuzekom, M.; van den Brand, J. F. J.; Van Den Broeck, C.; Vander-Hyde, D. C.; van der Schaaf, L.; van Heijningen, J. V.; van Veggel, A. A.; Vardaro, M.; Varma, V.; Vass, S.; Vasúth, M.; Vecchio, A.; Vedovato, G.; Veitch, J.; Veitch, P. J.; Venkateswara, K.; Venugopalan, G.; Verkindt, D.; Vetrano, F.; Viceré, A.; Viets, A. D.; Vinciguerra, S.; Vine, D. J.; Vinet, J.-Y.; Vitale, S.; Vo, T.; Vocca, H.; Vorvick, C.; Voss, D. V.; Vousden, W. D.; Vyatchanin, S. P.; Wade, A. R.; Wade, L. E.; Wade, M.; Walet, R.; Walker, M.; Wallace, L.; Walsh, S.; Wang, G.; Wang, H.; Wang, J. Z.; Wang, M.; Wang, Y.-F.; Wang, Y.; Ward, R. L.; Warner, J.; Was, M.; Watchi, J.; Weaver, B.; Wei, L.-W.; Weinert, M.; Weinstein, A. J.; Weiss, R.; Wen, L.; Wessel, E. K.; Weßels, P.; Westphal, T.; Wette, K.; Whelan, J. T.; Whiting, B. F.; Whittle, C.; Williams, D.; Williams, R. D.; Williamson, A. R.; Willis, J. L.; Willke, B.; Wimmer, M. H.; Winkler, W.; Wipf, C. C.; Wittel, H.; Woan, G.; Woehler, J.; Wofford, J.; Wong, K. W. K.; Worden, J.; Wright, J. L.; Wu, D. S.; Wu, G.; Yam, W.; Yamamoto, H.; Yancey, C. C.; Yap, M. J.; Yu, Hang; Yu, Haocun; Yvert, M.; ZadroŻny, A.; Zanolin, M.; Zelenova, T.; Zendri, J.-P.; Zevin, M.; Zhang, L.; Zhang, M.; Zhang, T.; Zhang, Y.-H.; Zhao, C.; Zhou, M.; Zhou, Z.; Zhu, X. J.; Zucker, M. E.; Zweizig, J.; Suvorova, S.; Moran, W.; Evans, R. J.; LIGO Scientific Collaboration; Virgo Collaboration
2017-06-01
Results are presented from a semicoherent search for continuous gravitational waves from the brightest low-mass X-ray binary, Scorpius X-1, using data collected during the first Advanced LIGO observing run. The search combines a frequency domain matched filter (Bessel-weighted F -statistic) with a hidden Markov model to track wandering of the neutron star spin frequency. No evidence of gravitational waves is found in the frequency range 60-650 Hz. Frequentist 95% confidence strain upper limits, h095 %=4.0 ×1 0-25, 8.3 ×1 0-25, and 3.0 ×1 0-25 for electromagnetically restricted source orientation, unknown polarization, and circular polarization, respectively, are reported at 106 Hz. They are ≤10 times higher than the theoretical torque-balance limit at 106 Hz.
Harrison, Jay M; Howard, Delia; Malven, Marianne; Halls, Steven C; Culler, Angela H; Harrigan, George G; Wolfinger, Russell D
2013-07-03
Compositional studies on genetically modified (GM) and non-GM crops have consistently demonstrated that their respective levels of key nutrients and antinutrients are remarkably similar and that other factors such as germplasm and environment contribute more to compositional variability than transgenic breeding. We propose that graphical and statistical approaches that can provide meaningful evaluations of the relative impact of different factors to compositional variability may offer advantages over traditional frequentist testing. A case study on the novel application of principal variance component analysis (PVCA) in a compositional assessment of herbicide-tolerant GM cotton is presented. Results of the traditional analysis of variance approach confirmed the compositional equivalence of the GM and non-GM cotton. The multivariate approach of PVCA provided further information on the impact of location and germplasm on compositional variability relative to GM.
ERIC Educational Resources Information Center
Thompson, Bruce
Web-based statistical instruction, like all statistical instruction, ought to focus on teaching the essence of the research endeavor: the exercise of reflective judgment. Using the framework of the recent report of the American Psychological Association (APA) Task Force on Statistical Inference (Wilkinson and the APA Task Force on Statistical…
Data Analysis Techniques for Physical Scientists
NASA Astrophysics Data System (ADS)
Pruneau, Claude A.
2017-10-01
Preface; How to read this book; 1. The scientific method; Part I. Foundation in Probability and Statistics: 2. Probability; 3. Probability models; 4. Classical inference I: estimators; 5. Classical inference II: optimization; 6. Classical inference III: confidence intervals and statistical tests; 7. Bayesian inference; Part II. Measurement Techniques: 8. Basic measurements; 9. Event reconstruction; 10. Correlation functions; 11. The multiple facets of correlation functions; 12. Data correction methods; Part III. Simulation Techniques: 13. Monte Carlo methods; 14. Collision and detector modeling; List of references; Index.
Overview of PECBO Module, using scripts to infer environmental conditions from biological observations, statistically estimating species-environment relationships, methods for inferring environmental conditions, statistical scripts in module.
NASA Astrophysics Data System (ADS)
Suess, Daniel; Rudnicki, Łukasz; maciel, Thiago O.; Gross, David
2017-09-01
The outcomes of quantum mechanical measurements are inherently random. It is therefore necessary to develop stringent methods for quantifying the degree of statistical uncertainty about the results of quantum experiments. For the particularly relevant task of quantum state tomography, it has been shown that a significant reduction in uncertainty can be achieved by taking the positivity of quantum states into account. However—the large number of partial results and heuristics notwithstanding—no efficient general algorithm is known that produces an optimal uncertainty region from experimental data, while making use of the prior constraint of positivity. Here, we provide a precise formulation of this problem and show that the general case is NP-hard. Our result leaves room for the existence of efficient approximate solutions, and therefore does not in itself imply that the practical task of quantum uncertainty quantification is intractable. However, it does show that there exists a non-trivial trade-off between optimality and computational efficiency for error regions. We prove two versions of the result: one for frequentist and one for Bayesian statistics.
Statistical Inferences from Formaldehyde Dna-Protein Cross-Link Data
Physiologically-based pharmacokinetic (PBPK) modeling has reached considerable sophistication in its application in the pharmacological and environmental health areas. Yet, mature methodologies for making statistical inferences have not been routinely incorporated in these applic...
Statistics for nuclear engineers and scientists. Part 1. Basic statistical inference
DOE Office of Scientific and Technical Information (OSTI.GOV)
Beggs, W.J.
1981-02-01
This report is intended for the use of engineers and scientists working in the nuclear industry, especially at the Bettis Atomic Power Laboratory. It serves as the basis for several Bettis in-house statistics courses. The objectives of the report are to introduce the reader to the language and concepts of statistics and to provide a basic set of techniques to apply to problems of the collection and analysis of data. Part 1 covers subjects of basic inference. The subjects include: descriptive statistics; probability; simple inference for normally distributed populations, and for non-normal populations as well; comparison of two populations; themore » analysis of variance; quality control procedures; and linear regression analysis.« less
Introducing Statistical Inference to Biology Students through Bootstrapping and Randomization
ERIC Educational Resources Information Center
Lock, Robin H.; Lock, Patti Frazer
2008-01-01
Bootstrap methods and randomization tests are increasingly being used as alternatives to standard statistical procedures in biology. They also serve as an effective introduction to the key ideas of statistical inference in introductory courses for biology students. We discuss the use of such simulation based procedures in an integrated curriculum…
Developing Young Children's Emergent Inferential Practices in Statistics
ERIC Educational Resources Information Center
Makar, Katie
2016-01-01
Informal statistical inference has now been researched at all levels of schooling and initial tertiary study. Work in informal statistical inference is least understood in the early years, where children have had little if any exposure to data handling. A qualitative study in Australia was carried out through a series of teaching experiments with…
Quantifying the uncertainty in heritability.
Furlotte, Nicholas A; Heckerman, David; Lippert, Christoph
2014-05-01
The use of mixed models to determine narrow-sense heritability and related quantities such as SNP heritability has received much recent attention. Less attention has been paid to the inherent variability in these estimates. One approach for quantifying variability in estimates of heritability is a frequentist approach, in which heritability is estimated using maximum likelihood and its variance is quantified through an asymptotic normal approximation. An alternative approach is to quantify the uncertainty in heritability through its Bayesian posterior distribution. In this paper, we develop the latter approach, make it computationally efficient and compare it to the frequentist approach. We show theoretically that, for a sufficiently large sample size and intermediate values of heritability, the two approaches provide similar results. Using the Atherosclerosis Risk in Communities cohort, we show empirically that the two approaches can give different results and that the variance/uncertainty can remain large.
Storm, Lance; Tressoldi, Patrizio E; Utts, Jessica
2013-01-01
Rouder, Morey, and Province (2013) stated that (a) the evidence-based case for psi in Storm, Tressoldi, and Di Risio's (2010) meta-analysis is supported only by a number of studies that used manual randomization, and (b) when these studies are excluded so that only investigations using automatic randomization are evaluated (and some additional studies previously omitted by Storm et al., 2010, are included), the evidence for psi is "unpersuasive." Rouder et al. used a Bayesian approach, and we adopted the same methodology, finding that our case is upheld. Because of recent updates and corrections, we reassessed the free-response databases of Storm et al. using a frequentist approach. We discuss and critique the assumptions and findings of Rouder et al. (PsycINFO Database Record (c) 2013 APA, all rights reserved).
Comparison of two drug safety signals in a pharmacovigilance data mining framework.
Tubert-Bitter, Pascale; Bégaud, Bernard; Ahmed, Ismaïl
2016-04-01
Since adverse drug reactions are a major public health concern, early detection of drug safety signals has become a top priority for regulatory agencies and the pharmaceutical industry. Quantitative methods for analyzing spontaneous reporting material recorded in pharmacovigilance databases through data mining have been proposed in the last decades and are increasingly used to flag potential safety problems. While automated data mining is motivated by the usually huge size of pharmacovigilance databases, it does not systematically produce relevant alerts. Moreover, each detected signal requires appropriate assessment that may involve investigation of the whole therapeutic class. The goal of this article is to provide a methodology for comparing two detected signals. It is nested within the automated surveillance framework as (1) no extra information is required and (2) no simple inference on the actual risks can be extrapolated from spontaneous reporting data. We designed our methodology on the basis of two classical methods used for automated signal detection: the Bayesian Gamma Poisson Shrinker and the frequentist Proportional Reporting Ratio. A simulation study was conducted to assess the performances of both proposed methods. The latter were used to compare cardiovascular signals for two HIV treatments from the French pharmacovigilance database. © The Author(s) 2012.
Accounting for correlation in network meta-analysis with multi-arm trials.
Franchini, A J; Dias, S; Ades, A E; Jansen, J P; Welton, N J
2012-06-01
Multi-arm trials (trials with more than two arms) are particularly valuable forms of evidence for network meta-analysis (NMA). Trial results are available either as arm-level summaries, where effect measures are reported for each arm, or as contrast-level summaries, where the differences in effect between arms compare with the control arm chosen for the trial. We show that likelihood-based inference in both contrast-level and arm-level formats is identical if there are only two-arm trials, but that if there are multi-arm trials, results from the contrast-level format will be incorrect unless correlations are accounted for in the likelihood. We review Bayesian and frequentist software for NMA with multi-arm trials that can account for this correlation and give an illustrative example of the difference in estimates that can be introduced if the correlations are not incorporated. We discuss methods of imputing correlations when they cannot be derived from the reported results and urge trialists to report the standard error for the control arm even if only contrast-level summaries are reported. Copyright © 2012 John Wiley & Sons, Ltd. Copyright © 2012 John Wiley & Sons, Ltd.
Toward improved analysis of concentration data: Embracing nondetects.
Shoari, Niloofar; Dubé, Jean-Sébastien
2018-03-01
Various statistical tests on concentration data serve to support decision-making regarding characterization and monitoring of contaminated media, assessing exposure to a chemical, and quantifying the associated risks. However, the routine statistical protocols cannot be directly applied because of challenges arising from nondetects or left-censored observations, which are concentration measurements below the detection limit of measuring instruments. Despite the existence of techniques based on survival analysis that can adjust for nondetects, these are seldom taken into account properly. A comprehensive review of the literature showed that managing policies regarding analysis of censored data do not always agree and that guidance from regulatory agencies may be outdated. Therefore, researchers and practitioners commonly resort to the most convenient way of tackling the censored data problem by substituting nondetects with arbitrary constants prior to data analysis, although this is generally regarded as a bias-prone approach. Hoping to improve the interpretation of concentration data, the present article aims to familiarize researchers in different disciplines with the significance of left-censored observations and provides theoretical and computational recommendations (under both frequentist and Bayesian frameworks) for adequate analysis of censored data. In particular, the present article synthesizes key findings from previous research with respect to 3 noteworthy aspects of inferential statistics: estimation of descriptive statistics, hypothesis testing, and regression analysis. Environ Toxicol Chem 2018;37:643-656. © 2017 SETAC. © 2017 SETAC.
Using Alien Coins to Test Whether Simple Inference Is Bayesian
ERIC Educational Resources Information Center
Cassey, Peter; Hawkins, Guy E.; Donkin, Chris; Brown, Scott D.
2016-01-01
Reasoning and inference are well-studied aspects of basic cognition that have been explained as statistically optimal Bayesian inference. Using a simplified experimental design, we conducted quantitative comparisons between Bayesian inference and human inference at the level of individuals. In 3 experiments, with more than 13,000 participants, we…
ERIC Educational Resources Information Center
Henriques, Ana; Oliveira, Hélia
2016-01-01
This paper reports on the results of a study investigating the potential to embed Informal Statistical Inference in statistical investigations, using TinkerPlots, for assisting 8th grade students' informal inferential reasoning to emerge, particularly their articulations of uncertainty. Data collection included students' written work on a…
MAFsnp: A Multi-Sample Accurate and Flexible SNP Caller Using Next-Generation Sequencing Data
Hu, Jiyuan; Li, Tengfei; Xiu, Zidi; Zhang, Hong
2015-01-01
Most existing statistical methods developed for calling single nucleotide polymorphisms (SNPs) using next-generation sequencing (NGS) data are based on Bayesian frameworks, and there does not exist any SNP caller that produces p-values for calling SNPs in a frequentist framework. To fill in this gap, we develop a new method MAFsnp, a Multiple-sample based Accurate and Flexible algorithm for calling SNPs with NGS data. MAFsnp is based on an estimated likelihood ratio test (eLRT) statistic. In practical situation, the involved parameter is very close to the boundary of the parametric space, so the standard large sample property is not suitable to evaluate the finite-sample distribution of the eLRT statistic. Observing that the distribution of the test statistic is a mixture of zero and a continuous part, we propose to model the test statistic with a novel two-parameter mixture distribution. Once the parameters in the mixture distribution are estimated, p-values can be easily calculated for detecting SNPs, and the multiple-testing corrected p-values can be used to control false discovery rate (FDR) at any pre-specified level. With simulated data, MAFsnp is shown to have much better control of FDR than the existing SNP callers. Through the application to two real datasets, MAFsnp is also shown to outperform the existing SNP callers in terms of calling accuracy. An R package “MAFsnp” implementing the new SNP caller is freely available at http://homepage.fudan.edu.cn/zhangh/softwares/. PMID:26309201
Using Bayes' theorem for free energy calculations
NASA Astrophysics Data System (ADS)
Rogers, David M.
Statistical mechanics is fundamentally based on calculating the probabilities of molecular-scale events. Although Bayes' theorem has generally been recognized as providing key guiding principals for setup and analysis of statistical experiments [83], classical frequentist models still predominate in the world of computational experimentation. As a starting point for widespread application of Bayesian methods in statistical mechanics, we investigate the central quantity of free energies from this perspective. This dissertation thus reviews the basics of Bayes' view of probability theory, and the maximum entropy formulation of statistical mechanics before providing examples of its application to several advanced research areas. We first apply Bayes' theorem to a multinomial counting problem in order to determine inner shell and hard sphere solvation free energy components of Quasi-Chemical Theory [140]. We proceed to consider the general problem of free energy calculations from samples of interaction energy distributions. From there, we turn to spline-based estimation of the potential of mean force [142], and empirical modeling of observed dynamics using integrator matching. The results of this research are expected to advance the state of the art in coarse-graining methods, as they allow a systematic connection from high-resolution (atomic) to low-resolution (coarse) structure and dynamics. In total, our work on these problems constitutes a critical starting point for further application of Bayes' theorem in all areas of statistical mechanics. It is hoped that the understanding so gained will allow for improvements in comparisons between theory and experiment.
Serang, Oliver; Noble, William Stafford
2012-01-01
The problem of identifying the proteins in a complex mixture using tandem mass spectrometry can be framed as an inference problem on a graph that connects peptides to proteins. Several existing protein identification methods make use of statistical inference methods for graphical models, including expectation maximization, Markov chain Monte Carlo, and full marginalization coupled with approximation heuristics. We show that, for this problem, the majority of the cost of inference usually comes from a few highly connected subgraphs. Furthermore, we evaluate three different statistical inference methods using a common graphical model, and we demonstrate that junction tree inference substantially improves rates of convergence compared to existing methods. The python code used for this paper is available at http://noble.gs.washington.edu/proj/fido. PMID:22331862
Using genetic data to strengthen causal inference in observational research.
Pingault, Jean-Baptiste; O'Reilly, Paul F; Schoeler, Tabea; Ploubidis, George B; Rijsdijk, Frühling; Dudbridge, Frank
2018-06-05
Causal inference is essential across the biomedical, behavioural and social sciences.By progressing from confounded statistical associations to evidence of causal relationships, causal inference can reveal complex pathways underlying traits and diseases and help to prioritize targets for intervention. Recent progress in genetic epidemiology - including statistical innovation, massive genotyped data sets and novel computational tools for deep data mining - has fostered the intense development of methods exploiting genetic data and relatedness to strengthen causal inference in observational research. In this Review, we describe how such genetically informed methods differ in their rationale, applicability and inherent limitations and outline how they should be integrated in the future to offer a rich causal inference toolbox.
Brannigan, V M; Bier, V M; Berg, C
1992-09-01
Toxic torts are product liability cases dealing with alleged injuries due to chemical or biological hazards such as radiation, thalidomide, or Agent Orange. Toxic tort cases typically rely more heavily than other product liability cases on indirect or statistical proof of injury. There have been numerous theoretical analyses of statistical proof of injury in toxic tort cases. However, there have been only a handful of actual legal decisions regarding the use of such statistical evidence, and most of those decisions have been inconclusive. Recently, a major case from the Fifth Circuit, involving allegations that Benedectin (a morning sickness drug) caused birth defects, was decided entirely on the basis of statistical inference. This paper examines both the conceptual basis of that decision, and also the relationships among statistical inference, scientific evidence, and the rules of product liability in general.
THESEUS: maximum likelihood superpositioning and analysis of macromolecular structures.
Theobald, Douglas L; Wuttke, Deborah S
2006-09-01
THESEUS is a command line program for performing maximum likelihood (ML) superpositions and analysis of macromolecular structures. While conventional superpositioning methods use ordinary least-squares (LS) as the optimization criterion, ML superpositions provide substantially improved accuracy by down-weighting variable structural regions and by correcting for correlations among atoms. ML superpositioning is robust and insensitive to the specific atoms included in the analysis, and thus it does not require subjective pruning of selected variable atomic coordinates. Output includes both likelihood-based and frequentist statistics for accurate evaluation of the adequacy of a superposition and for reliable analysis of structural similarities and differences. THESEUS performs principal components analysis for analyzing the complex correlations found among atoms within a structural ensemble. ANSI C source code and selected binaries for various computing platforms are available under the GNU open source license from http://monkshood.colorado.edu/theseus/ or http://www.theseus3d.org.
Making statistical inferences about software reliability
NASA Technical Reports Server (NTRS)
Miller, Douglas R.
1988-01-01
Failure times of software undergoing random debugging can be modelled as order statistics of independent but nonidentically distributed exponential random variables. Using this model inferences can be made about current reliability and, if debugging continues, future reliability. This model also shows the difficulty inherent in statistical verification of very highly reliable software such as that used by digital avionics in commercial aircraft.
ERIC Educational Resources Information Center
Denbleyker, John Nickolas
2012-01-01
The shortcomings of the proportion above cut (PAC) statistic used so prominently in the educational landscape renders it a very problematic measure for making correct inferences with student test data. The limitations of PAC-based statistics are more pronounced with cross-test comparisons due to their dependency on cut-score locations. A better…
ERIC Educational Resources Information Center
Trumpower, David L.
2015-01-01
Making inferences about population differences based on samples of data, that is, performing intuitive analysis of variance (IANOVA), is common in everyday life. However, the intuitive reasoning of individuals when making such inferences (even following statistics instruction), often differs from the normative logic of formal statistics. The…
A frequentist approach to computer model calibration
Wong, Raymond K. W.; Storlie, Curtis Byron; Lee, Thomas C. M.
2016-05-05
The paper considers the computer model calibration problem and provides a general frequentist solution. Under the framework proposed, the data model is semiparametric with a non-parametric discrepancy function which accounts for any discrepancy between physical reality and the computer model. In an attempt to solve a fundamentally important (but often ignored) identifiability issue between the computer model parameters and the discrepancy function, the paper proposes a new and identifiable parameterization of the calibration problem. It also develops a two-step procedure for estimating all the relevant quantities under the new parameterization. This estimation procedure is shown to enjoy excellent rates ofmore » convergence and can be straightforwardly implemented with existing software. For uncertainty quantification, bootstrapping is adopted to construct confidence regions for the quantities of interest. As a result, the practical performance of the methodology is illustrated through simulation examples and an application to a computational fluid dynamics model.« less
A default Bayesian hypothesis test for mediation.
Nuijten, Michèle B; Wetzels, Ruud; Matzke, Dora; Dolan, Conor V; Wagenmakers, Eric-Jan
2015-03-01
In order to quantify the relationship between multiple variables, researchers often carry out a mediation analysis. In such an analysis, a mediator (e.g., knowledge of a healthy diet) transmits the effect from an independent variable (e.g., classroom instruction on a healthy diet) to a dependent variable (e.g., consumption of fruits and vegetables). Almost all mediation analyses in psychology use frequentist estimation and hypothesis-testing techniques. A recent exception is Yuan and MacKinnon (Psychological Methods, 14, 301-322, 2009), who outlined a Bayesian parameter estimation procedure for mediation analysis. Here we complete the Bayesian alternative to frequentist mediation analysis by specifying a default Bayesian hypothesis test based on the Jeffreys-Zellner-Siow approach. We further extend this default Bayesian test by allowing a comparison to directional or one-sided alternatives, using Markov chain Monte Carlo techniques implemented in JAGS. All Bayesian tests are implemented in the R package BayesMed (Nuijten, Wetzels, Matzke, Dolan, & Wagenmakers, 2014).
Quantifying the uncertainty in heritability
Furlotte, Nicholas A; Heckerman, David; Lippert, Christoph
2014-01-01
The use of mixed models to determine narrow-sense heritability and related quantities such as SNP heritability has received much recent attention. Less attention has been paid to the inherent variability in these estimates. One approach for quantifying variability in estimates of heritability is a frequentist approach, in which heritability is estimated using maximum likelihood and its variance is quantified through an asymptotic normal approximation. An alternative approach is to quantify the uncertainty in heritability through its Bayesian posterior distribution. In this paper, we develop the latter approach, make it computationally efficient and compare it to the frequentist approach. We show theoretically that, for a sufficiently large sample size and intermediate values of heritability, the two approaches provide similar results. Using the Atherosclerosis Risk in Communities cohort, we show empirically that the two approaches can give different results and that the variance/uncertainty can remain large. PMID:24670270
Difference to Inference: teaching logical and statistical reasoning through on-line interactivity.
Malloy, T E
2001-05-01
Difference to Inference is an on-line JAVA program that simulates theory testing and falsification through research design and data collection in a game format. The program, based on cognitive and epistemological principles, is designed to support learning of the thinking skills underlying deductive and inductive logic and statistical reasoning. Difference to Inference has database connectivity so that game scores can be counted as part of course grades.
ERIC Educational Resources Information Center
Watson, Jane
2007-01-01
Inference, or decision making, is seen in curriculum documents as the final step in a statistical investigation. For a formal statistical enquiry this may be associated with sophisticated tests involving probability distributions. For young students without the mathematical background to perform such tests, it is still possible to draw informal…
A Framework for Thinking about Informal Statistical Inference
ERIC Educational Resources Information Center
Makar, Katie; Rubin, Andee
2009-01-01
Informal inferential reasoning has shown some promise in developing students' deeper understanding of statistical processes. This paper presents a framework to think about three key principles of informal inference--generalizations "beyond the data," probabilistic language, and data as evidence. The authors use primary school classroom…
Sensitivity to the Sampling Process Emerges From the Principle of Efficiency.
Jara-Ettinger, Julian; Sun, Felix; Schulz, Laura; Tenenbaum, Joshua B
2018-05-01
Humans can seamlessly infer other people's preferences, based on what they do. Broadly, two types of accounts have been proposed to explain different aspects of this ability. The first account focuses on spatial information: Agents' efficient navigation in space reveals what they like. The second account focuses on statistical information: Uncommon choices reveal stronger preferences. Together, these two lines of research suggest that we have two distinct capacities for inferring preferences. Here we propose that this is not the case, and that spatial-based and statistical-based preference inferences can be explained by the assumption that agents are efficient alone. We show that people's sensitivity to spatial and statistical information when they infer preferences is best predicted by a computational model of the principle of efficiency, and that this model outperforms dual-system models, even when the latter are fit to participant judgments. Our results suggest that, as adults, a unified understanding of agency under the principle of efficiency underlies our ability to infer preferences. Copyright © 2018 Cognitive Science Society, Inc.
Pressman, Alice R; Avins, Andrew L; Hubbard, Alan; Satariano, William A
2011-07-01
There is a paucity of literature comparing Bayesian analytic techniques with traditional approaches for analyzing clinical trials using real trial data. We compared Bayesian and frequentist group sequential methods using data from two published clinical trials. We chose two widely accepted frequentist rules, O'Brien-Fleming and Lan-DeMets, and conjugate Bayesian priors. Using the nonparametric bootstrap, we estimated a sampling distribution of stopping times for each method. Because current practice dictates the preservation of an experiment-wise false positive rate (Type I error), we approximated these error rates for our Bayesian and frequentist analyses with the posterior probability of detecting an effect in a simulated null sample. Thus for the data-generated distribution represented by these trials, we were able to compare the relative performance of these techniques. No final outcomes differed from those of the original trials. However, the timing of trial termination differed substantially by method and varied by trial. For one trial, group sequential designs of either type dictated early stopping of the study. In the other, stopping times were dependent upon the choice of spending function and prior distribution. Results indicate that trialists ought to consider Bayesian methods in addition to traditional approaches for analysis of clinical trials. Though findings from this small sample did not demonstrate either method to consistently outperform the other, they did suggest the need to replicate these comparisons using data from varied clinical trials in order to determine the conditions under which the different methods would be most efficient. Copyright © 2011 Elsevier Inc. All rights reserved.
Pressman, Alice R.; Avins, Andrew L.; Hubbard, Alan; Satariano, William A.
2014-01-01
Background There is a paucity of literature comparing Bayesian analytic techniques with traditional approaches for analyzing clinical trials using real trial data. Methods We compared Bayesian and frequentist group sequential methods using data from two published clinical trials. We chose two widely accepted frequentist rules, O'Brien–Fleming and Lan–DeMets, and conjugate Bayesian priors. Using the nonparametric bootstrap, we estimated a sampling distribution of stopping times for each method. Because current practice dictates the preservation of an experiment-wise false positive rate (Type I error), we approximated these error rates for our Bayesian and frequentist analyses with the posterior probability of detecting an effect in a simulated null sample. Thus for the data-generated distribution represented by these trials, we were able to compare the relative performance of these techniques. Results No final outcomes differed from those of the original trials. However, the timing of trial termination differed substantially by method and varied by trial. For one trial, group sequential designs of either type dictated early stopping of the study. In the other, stopping times were dependent upon the choice of spending function and prior distribution. Conclusions Results indicate that trialists ought to consider Bayesian methods in addition to traditional approaches for analysis of clinical trials. Though findings from this small sample did not demonstrate either method to consistently outperform the other, they did suggest the need to replicate these comparisons using data from varied clinical trials in order to determine the conditions under which the different methods would be most efficient. PMID:21453792
Statistical inference for extended or shortened phase II studies based on Simon's two-stage designs.
Zhao, Junjun; Yu, Menggang; Feng, Xi-Ping
2015-06-07
Simon's two-stage designs are popular choices for conducting phase II clinical trials, especially in the oncology trials to reduce the number of patients placed on ineffective experimental therapies. Recently Koyama and Chen (2008) discussed how to conduct proper inference for such studies because they found that inference procedures used with Simon's designs almost always ignore the actual sampling plan used. In particular, they proposed an inference method for studies when the actual second stage sample sizes differ from planned ones. We consider an alternative inference method based on likelihood ratio. In particular, we order permissible sample paths under Simon's two-stage designs using their corresponding conditional likelihood. In this way, we can calculate p-values using the common definition: the probability of obtaining a test statistic value at least as extreme as that observed under the null hypothesis. In addition to providing inference for a couple of scenarios where Koyama and Chen's method can be difficult to apply, the resulting estimate based on our method appears to have certain advantage in terms of inference properties in many numerical simulations. It generally led to smaller biases and narrower confidence intervals while maintaining similar coverages. We also illustrated the two methods in a real data setting. Inference procedures used with Simon's designs almost always ignore the actual sampling plan. Reported P-values, point estimates and confidence intervals for the response rate are not usually adjusted for the design's adaptiveness. Proper statistical inference procedures should be used.
NASA Astrophysics Data System (ADS)
Alsing, Justin; Wandelt, Benjamin; Feeney, Stephen
2018-07-01
Many statistical models in cosmology can be simulated forwards but have intractable likelihood functions. Likelihood-free inference methods allow us to perform Bayesian inference from these models using only forward simulations, free from any likelihood assumptions or approximations. Likelihood-free inference generically involves simulating mock data and comparing to the observed data; this comparison in data space suffers from the curse of dimensionality and requires compression of the data to a small number of summary statistics to be tractable. In this paper, we use massive asymptotically optimal data compression to reduce the dimensionality of the data space to just one number per parameter, providing a natural and optimal framework for summary statistic choice for likelihood-free inference. Secondly, we present the first cosmological application of Density Estimation Likelihood-Free Inference (DELFI), which learns a parametrized model for joint distribution of data and parameters, yielding both the parameter posterior and the model evidence. This approach is conceptually simple, requires less tuning than traditional Approximate Bayesian Computation approaches to likelihood-free inference and can give high-fidelity posteriors from orders of magnitude fewer forward simulations. As an additional bonus, it enables parameter inference and Bayesian model comparison simultaneously. We demonstrate DELFI with massive data compression on an analysis of the joint light-curve analysis supernova data, as a simple validation case study. We show that high-fidelity posterior inference is possible for full-scale cosmological data analyses with as few as ˜104 simulations, with substantial scope for further improvement, demonstrating the scalability of likelihood-free inference to large and complex cosmological data sets.
A Coalitional Game for Distributed Inference in Sensor Networks With Dependent Observations
NASA Astrophysics Data System (ADS)
He, Hao; Varshney, Pramod K.
2016-04-01
We consider the problem of collaborative inference in a sensor network with heterogeneous and statistically dependent sensor observations. Each sensor aims to maximize its inference performance by forming a coalition with other sensors and sharing information within the coalition. It is proved that the inference performance is a nondecreasing function of the coalition size. However, in an energy constrained network, the energy consumption of inter-sensor communication also increases with increasing coalition size, which discourages the formation of the grand coalition (the set of all sensors). In this paper, the formation of non-overlapping coalitions with statistically dependent sensors is investigated under a specific communication constraint. We apply a game theoretical approach to fully explore and utilize the information contained in the spatial dependence among sensors to maximize individual sensor performance. Before formulating the distributed inference problem as a coalition formation game, we first quantify the gain and loss in forming a coalition by introducing the concepts of diversity gain and redundancy loss for both estimation and detection problems. These definitions, enabled by the statistical theory of copulas, allow us to characterize the influence of statistical dependence among sensor observations on inference performance. An iterative algorithm based on merge-and-split operations is proposed for the solution and the stability of the proposed algorithm is analyzed. Numerical results are provided to demonstrate the superiority of our proposed game theoretical approach.
Spurious correlations and inference in landscape genetics
Samuel A. Cushman; Erin L. Landguth
2010-01-01
Reliable interpretation of landscape genetic analyses depends on statistical methods that have high power to identify the correct process driving gene flow while rejecting incorrect alternative hypotheses. Little is known about statistical power and inference in individual-based landscape genetics. Our objective was to evaluate the power of causalmodelling with partial...
The Philosophical Foundations of Prescriptive Statements and Statistical Inference
ERIC Educational Resources Information Center
Sun, Shuyan; Pan, Wei
2011-01-01
From the perspectives of the philosophy of science and statistical inference, we discuss the challenges of making prescriptive statements in quantitative research articles. We first consider the prescriptive nature of educational research and argue that prescriptive statements are a necessity in educational research. The logic of deduction,…
Inference and the Introductory Statistics Course
ERIC Educational Resources Information Center
Pfannkuch, Maxine; Regan, Matt; Wild, Chris; Budgett, Stephanie; Forbes, Sharleen; Harraway, John; Parsonage, Ross
2011-01-01
This article sets out some of the rationale and arguments for making major changes to the teaching and learning of statistical inference in introductory courses at our universities by changing from a norm-based, mathematical approach to more conceptually accessible computer-based approaches. The core problem of the inferential argument with its…
ERIC Educational Resources Information Center
Yuan, Ying; MacKinnon, David P.
2009-01-01
In this article, we propose Bayesian analysis of mediation effects. Compared with conventional frequentist mediation analysis, the Bayesian approach has several advantages. First, it allows researchers to incorporate prior information into the mediation analysis, thus potentially improving the efficiency of estimates. Second, under the Bayesian…
Using Playing Cards to Differentiate Probability Interpretations
ERIC Educational Resources Information Center
López Puga, Jorge
2014-01-01
The aprioristic (classical, naïve and symmetric) and frequentist interpretations of probability are commonly known. Bayesian or subjective interpretation of probability is receiving increasing attention. This paper describes an activity to help students differentiate between the three types of probability interpretations.
Hobbs, Brian P.; Carlin, Bradley P.; Mandrekar, Sumithra J.; Sargent, Daniel J.
2011-01-01
Summary Bayesian clinical trial designs offer the possibility of a substantially reduced sample size, increased statistical power, and reductions in cost and ethical hazard. However when prior and current information conflict, Bayesian methods can lead to higher than expected Type I error, as well as the possibility of a costlier and lengthier trial. This motivates an investigation of the feasibility of hierarchical Bayesian methods for incorporating historical data that are adaptively robust to prior information that reveals itself to be inconsistent with the accumulating experimental data. In this paper, we present several models that allow for the commensurability of the information in the historical and current data to determine how much historical information is used. A primary tool is elaborating the traditional power prior approach based upon a measure of commensurability for Gaussian data. We compare the frequentist performance of several methods using simulations, and close with an example of a colon cancer trial that illustrates a linear models extension of our adaptive borrowing approach. Our proposed methods produce more precise estimates of the model parameters, in particular conferring statistical significance to the observed reduction in tumor size for the experimental regimen as compared to the control regimen. PMID:21361892
Multi-Agent Inference in Social Networks: A Finite Population Learning Approach.
Fan, Jianqing; Tong, Xin; Zeng, Yao
When people in a society want to make inference about some parameter, each person may want to use data collected by other people. Information (data) exchange in social networks is usually costly, so to make reliable statistical decisions, people need to trade off the benefits and costs of information acquisition. Conflicts of interests and coordination problems will arise in the process. Classical statistics does not consider people's incentives and interactions in the data collection process. To address this imperfection, this work explores multi-agent Bayesian inference problems with a game theoretic social network model. Motivated by our interest in aggregate inference at the societal level, we propose a new concept, finite population learning , to address whether with high probability, a large fraction of people in a given finite population network can make "good" inference. Serving as a foundation, this concept enables us to study the long run trend of aggregate inference quality as population grows.
Intuitive statistics by 8-month-old infants
Xu, Fei; Garcia, Vashti
2008-01-01
Human learners make inductive inferences based on small amounts of data: we generalize from samples to populations and vice versa. The academic discipline of statistics formalizes these intuitive statistical inferences. What is the origin of this ability? We report six experiments investigating whether 8-month-old infants are “intuitive statisticians.” Our results showed that, given a sample, the infants were able to make inferences about the population from which the sample had been drawn. Conversely, given information about the entire population of relatively small size, the infants were able to make predictions about the sample. Our findings provide evidence that infants possess a powerful mechanism for inductive learning, either using heuristics or basic principles of probability. This ability to make inferences based on samples or information about the population develops early and in the absence of schooling or explicit teaching. Human infants may be rational learners from very early in development. PMID:18378901
Assessing colour-dependent occupation statistics inferred from galaxy group catalogues
NASA Astrophysics Data System (ADS)
Campbell, Duncan; van den Bosch, Frank C.; Hearin, Andrew; Padmanabhan, Nikhil; Berlind, Andreas; Mo, H. J.; Tinker, Jeremy; Yang, Xiaohu
2015-09-01
We investigate the ability of current implementations of galaxy group finders to recover colour-dependent halo occupation statistics. To test the fidelity of group catalogue inferred statistics, we run three different group finders used in the literature over a mock that includes galaxy colours in a realistic manner. Overall, the resulting mock group catalogues are remarkably similar, and most colour-dependent statistics are recovered with reasonable accuracy. However, it is also clear that certain systematic errors arise as a consequence of correlated errors in group membership determination, central/satellite designation, and halo mass assignment. We introduce a new statistic, the halo transition probability (HTP), which captures the combined impact of all these errors. As a rule of thumb, errors tend to equalize the properties of distinct galaxy populations (i.e. red versus blue galaxies or centrals versus satellites), and to result in inferred occupation statistics that are more accurate for red galaxies than for blue galaxies. A statistic that is particularly poorly recovered from the group catalogues is the red fraction of central galaxies as a function of halo mass. Group finders do a good job in recovering galactic conformity, but also have a tendency to introduce weak conformity when none is present. We conclude that proper inference of colour-dependent statistics from group catalogues is best achieved using forward modelling (i.e. running group finders over mock data) or by implementing a correction scheme based on the HTP, as long as the latter is not too strongly model dependent.
Kottyan, Leah C; Zoller, Erin E; Bene, Jessica; Lu, Xiaoming; Kelly, Jennifer A; Rupert, Andrew M; Lessard, Christopher J; Vaughn, Samuel E; Marion, Miranda; Weirauch, Matthew T; Namjou, Bahram; Adler, Adam; Rasmussen, Astrid; Glenn, Stuart; Montgomery, Courtney G; Hirschfield, Gideon M; Xie, Gang; Coltescu, Catalina; Amos, Chris; Li, He; Ice, John A; Nath, Swapan K; Mariette, Xavier; Bowman, Simon; Rischmueller, Maureen; Lester, Sue; Brun, Johan G; Gøransson, Lasse G; Harboe, Erna; Omdal, Roald; Cunninghame-Graham, Deborah S; Vyse, Tim; Miceli-Richard, Corinne; Brennan, Michael T; Lessard, James A; Wahren-Herlenius, Marie; Kvarnström, Marika; Illei, Gabor G; Witte, Torsten; Jonsson, Roland; Eriksson, Per; Nordmark, Gunnel; Ng, Wan-Fai; Anaya, Juan-Manuel; Rhodus, Nelson L; Segal, Barbara M; Merrill, Joan T; James, Judith A; Guthridge, Joel M; Scofield, R Hal; Alarcon-Riquelme, Marta; Bae, Sang-Cheol; Boackle, Susan A; Criswell, Lindsey A; Gilkeson, Gary; Kamen, Diane L; Jacob, Chaim O; Kimberly, Robert; Brown, Elizabeth; Edberg, Jeffrey; Alarcón, Graciela S; Reveille, John D; Vilá, Luis M; Petri, Michelle; Ramsey-Goldman, Rosalind; Freedman, Barry I; Niewold, Timothy; Stevens, Anne M; Tsao, Betty P; Ying, Jun; Mayes, Maureen D; Gorlova, Olga Y; Wakeland, Ward; Radstake, Timothy; Martin, Ezequiel; Martin, Javier; Siminovitch, Katherine; Moser Sivils, Kathy L; Gaffney, Patrick M; Langefeld, Carl D; Harley, John B; Kaufman, Kenneth M
2015-01-15
Exploiting genotyping, DNA sequencing, imputation and trans-ancestral mapping, we used Bayesian and frequentist approaches to model the IRF5-TNPO3 locus association, now implicated in two immunotherapies and seven autoimmune diseases. Specifically, in systemic lupus erythematosus (SLE), we resolved separate associations in the IRF5 promoter (all ancestries) and with an extended European haplotype. We captured 3230 IRF5-TNPO3 high-quality, common variants across 5 ethnicities in 8395 SLE cases and 7367 controls. The genetic effect from the IRF5 promoter can be explained by any one of four variants in 5.7 kb (P-valuemeta = 6 × 10(-49); OR = 1.38-1.97). The second genetic effect spanned an 85.5-kb, 24-variant haplotype that included the genes IRF5 and TNPO3 (P-valuesEU = 10(-27)-10(-32), OR = 1.7-1.81). Many variants at the IRF5 locus with previously assigned biological function are not members of either final credible set of potential causal variants identified herein. In addition to the known biologically functional variants, we demonstrated that the risk allele of rs4728142, a variant in the promoter among the lowest frequentist probability and highest Bayesian posterior probability, was correlated with IRF5 expression and differentially binds the transcription factor ZBTB3. Our analytical strategy provides a novel framework for future studies aimed at dissecting etiological genetic effects. Finally, both SLE elements of the statistical model appear to operate in Sjögren's syndrome and systemic sclerosis whereas only the IRF5-TNPO3 gene-spanning haplotype is associated with primary biliary cirrhosis, demonstrating the nuance of similarity and difference in autoimmune disease risk mechanisms at IRF5-TNPO3. Published by Oxford University Press 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.
Kottyan, Leah C.; Zoller, Erin E.; Bene, Jessica; Lu, Xiaoming; Kelly, Jennifer A.; Rupert, Andrew M.; Lessard, Christopher J.; Vaughn, Samuel E.; Marion, Miranda; Weirauch, Matthew T.; Namjou, Bahram; Adler, Adam; Rasmussen, Astrid; Glenn, Stuart; Montgomery, Courtney G.; Hirschfield, Gideon M.; Xie, Gang; Coltescu, Catalina; Amos, Chris; Li, He; Ice, John A.; Nath, Swapan K.; Mariette, Xavier; Bowman, Simon; Rischmueller, Maureen; Lester, Sue; Brun, Johan G.; Gøransson, Lasse G.; Harboe, Erna; Omdal, Roald; Cunninghame-Graham, Deborah S.; Vyse, Tim; Miceli-Richard, Corinne; Brennan, Michael T.; Lessard, James A.; Wahren-Herlenius, Marie; Kvarnström, Marika; Illei, Gabor G.; Witte, Torsten; Jonsson, Roland; Eriksson, Per; Nordmark, Gunnel; Ng, Wan-Fai; Anaya, Juan-Manuel; Rhodus, Nelson L.; Segal, Barbara M.; Merrill, Joan T.; James, Judith A.; Guthridge, Joel M.; Hal Scofield, R.; Alarcon-Riquelme, Marta; Bae, Sang-Cheol; Boackle, Susan A.; Criswell, Lindsey A.; Gilkeson, Gary; Kamen, Diane L.; Jacob, Chaim O.; Kimberly, Robert; Brown, Elizabeth; Edberg, Jeffrey; Alarcón, Graciela S.; Reveille, John D.; Vilá, Luis M.; Petri, Michelle; Ramsey-Goldman, Rosalind; Freedman, Barry I.; Niewold, Timothy; Stevens, Anne M.; Tsao, Betty P.; Ying, Jun; Mayes, Maureen D.; Gorlova, Olga Y.; Wakeland, Ward; Radstake, Timothy; Martin, Ezequiel; Martin, Javier; Siminovitch, Katherine; Moser Sivils, Kathy L.; Gaffney, Patrick M.; Langefeld, Carl D.; Harley, John B.; Kaufman, Kenneth M.
2015-01-01
Exploiting genotyping, DNA sequencing, imputation and trans-ancestral mapping, we used Bayesian and frequentist approaches to model the IRF5–TNPO3 locus association, now implicated in two immunotherapies and seven autoimmune diseases. Specifically, in systemic lupus erythematosus (SLE), we resolved separate associations in the IRF5 promoter (all ancestries) and with an extended European haplotype. We captured 3230 IRF5–TNPO3 high-quality, common variants across 5 ethnicities in 8395 SLE cases and 7367 controls. The genetic effect from the IRF5 promoter can be explained by any one of four variants in 5.7 kb (P-valuemeta = 6 × 10−49; OR = 1.38–1.97). The second genetic effect spanned an 85.5-kb, 24-variant haplotype that included the genes IRF5 and TNPO3 (P-valuesEU = 10−27–10−32, OR = 1.7–1.81). Many variants at the IRF5 locus with previously assigned biological function are not members of either final credible set of potential causal variants identified herein. In addition to the known biologically functional variants, we demonstrated that the risk allele of rs4728142, a variant in the promoter among the lowest frequentist probability and highest Bayesian posterior probability, was correlated with IRF5 expression and differentially binds the transcription factor ZBTB3. Our analytical strategy provides a novel framework for future studies aimed at dissecting etiological genetic effects. Finally, both SLE elements of the statistical model appear to operate in Sjögren's syndrome and systemic sclerosis whereas only the IRF5–TNPO3 gene-spanning haplotype is associated with primary biliary cirrhosis, demonstrating the nuance of similarity and difference in autoimmune disease risk mechanisms at IRF5–TNPO3. PMID:25205108
Measurement of the CP-violating phase β s J/ψΦ in B s 0→J/ψΦ decays with the CDF II detector
Aaltonen, T.; Álvarez González, B.; Amerio, S.; ...
2012-04-23
We present a measurement of the CP-violating parameter β s J/ψΦ using approximately 6500 B 0 s→J/ψΦ decays reconstructed with the CDF II detector in a sample of pp̄ collisions at √s=1.96 TeV corresponding to 5.2 fb⁻¹ integrated luminosity produced by the Tevatron collider at Fermilab. We find the CP-violating phase to be within the range β s J/ψΦϵ [0.02,0.52]∪[1.08,1.55] at 68% confidence level where the coverage property of the quoted interval is guaranteed using a frequentist statistical analysis. This result is in agreement with the standard model expectation at the level of about one Gaussian standard deviation. We considermore » the inclusion of a potential S-wave contribution to the B 0 s→J/ψK⁺K⁻ final state which is found to be negligible over the mass interval 1.009s J/ψΦ, we find the B 0 s decay width difference to be ΔΓ s=0.075±0.035(stat)±0.006(syst) ps⁻¹. We also present the most precise measurements of the B 0 s mean lifetime τ(B 0 s)=1.529±0.025(stat)±0.012(syst) ps, the polarization fractions |A0(0)|²=0.524±0.013(stat)±0.015(syst) and |A II (0)|²=0.231±0.014(stat)±0.015(syst), as well as the strong phase δ ⊥=2.95±0.64(stat)±0.07(syst) rad. In addition, we report an alternative Bayesian analysis that gives results consistent with the frequentist approach.« less
Statistical learning and selective inference.
Taylor, Jonathan; Tibshirani, Robert J
2015-06-23
We describe the problem of "selective inference." This addresses the following challenge: Having mined a set of data to find potential associations, how do we properly assess the strength of these associations? The fact that we have "cherry-picked"--searched for the strongest associations--means that we must set a higher bar for declaring significant the associations that we see. This challenge becomes more important in the era of big data and complex statistical modeling. The cherry tree (dataset) can be very large and the tools for cherry picking (statistical learning methods) are now very sophisticated. We describe some recent new developments in selective inference and illustrate their use in forward stepwise regression, the lasso, and principal components analysis.
Anatomy of the Higgs fits: A first guide to statistical treatments of the theoretical uncertainties
NASA Astrophysics Data System (ADS)
Fichet, Sylvain; Moreau, Grégory
2016-04-01
The studies of the Higgs boson couplings based on the recent and upcoming LHC data open up a new window on physics beyond the Standard Model. In this paper, we propose a statistical guide to the consistent treatment of the theoretical uncertainties entering the Higgs rate fits. Both the Bayesian and frequentist approaches are systematically analysed in a unified formalism. We present analytical expressions for the marginal likelihoods, useful to implement simultaneously the experimental and theoretical uncertainties. We review the various origins of the theoretical errors (QCD, EFT, PDF, production mode contamination…). All these individual uncertainties are thoroughly combined with the help of moment-based considerations. The theoretical correlations among Higgs detection channels appear to affect the location and size of the best-fit regions in the space of Higgs couplings. We discuss the recurrent question of the shape of the prior distributions for the individual theoretical errors and find that a nearly Gaussian prior arises from the error combinations. We also develop the bias approach, which is an alternative to marginalisation providing more conservative results. The statistical framework to apply the bias principle is introduced and two realisations of the bias are proposed. Finally, depending on the statistical treatment, the Standard Model prediction for the Higgs signal strengths is found to lie within either the 68% or 95% confidence level region obtained from the latest analyses of the 7 and 8 TeV LHC datasets.
Variations on Bayesian Prediction and Inference
2016-05-09
inference 2.2.1 Background There are a number of statistical inference problems that are not generally formulated via a full probability model...problem of inference about an unknown parameter, the Bayesian approach requires a full probability 1. REPORT DATE (DD-MM-YYYY) 4. TITLE AND...the problem of inference about an unknown parameter, the Bayesian approach requires a full probability model/likelihood which can be an obstacle
Meng, Xiang-He; Shen, Hui; Chen, Xiang-Ding; Xiao, Hong-Mei; Deng, Hong-Wen
2018-03-01
Genome-wide association studies (GWAS) have successfully identified numerous genetic variants associated with diverse complex phenotypes and diseases, and provided tremendous opportunities for further analyses using summary association statistics. Recently, Pickrell et al. developed a robust method for causal inference using independent putative causal SNPs. However, this method may fail to infer the causal relationship between two phenotypes when only a limited number of independent putative causal SNPs identified. Here, we extended Pickrell's method to make it more applicable for the general situations. We extended the causal inference method by replacing the putative causal SNPs with the lead SNPs (the set of the most significant SNPs in each independent locus) and tested the performance of our extended method using both simulation and empirical data. Simulations suggested that when the same number of genetic variants is used, our extended method had similar distribution of test statistic under the null model as well as comparable power under the causal model compared with the original method by Pickrell et al. But in practice, our extended method would generally be more powerful because the number of independent lead SNPs was often larger than the number of independent putative causal SNPs. And including more SNPs, on the other hand, would not cause more false positives. By applying our extended method to summary statistics from GWAS for blood metabolites and femoral neck bone mineral density (FN-BMD), we successfully identified ten blood metabolites that may causally influence FN-BMD. We extended a causal inference method for inferring putative causal relationship between two phenotypes using summary statistics from GWAS, and identified a number of potential causal metabolites for FN-BMD, which may provide novel insights into the pathophysiological mechanisms underlying osteoporosis.
In defence of model-based inference in phylogeography
Beaumont, Mark A.; Nielsen, Rasmus; Robert, Christian; Hey, Jody; Gaggiotti, Oscar; Knowles, Lacey; Estoup, Arnaud; Panchal, Mahesh; Corander, Jukka; Hickerson, Mike; Sisson, Scott A.; Fagundes, Nelson; Chikhi, Lounès; Beerli, Peter; Vitalis, Renaud; Cornuet, Jean-Marie; Huelsenbeck, John; Foll, Matthieu; Yang, Ziheng; Rousset, Francois; Balding, David; Excoffier, Laurent
2017-01-01
Recent papers have promoted the view that model-based methods in general, and those based on Approximate Bayesian Computation (ABC) in particular, are flawed in a number of ways, and are therefore inappropriate for the analysis of phylogeographic data. These papers further argue that Nested Clade Phylogeographic Analysis (NCPA) offers the best approach in statistical phylogeography. In order to remove the confusion and misconceptions introduced by these papers, we justify and explain the reasoning behind model-based inference. We argue that ABC is a statistically valid approach, alongside other computational statistical techniques that have been successfully used to infer parameters and compare models in population genetics. We also examine the NCPA method and highlight numerous deficiencies, either when used with single or multiple loci. We further show that the ages of clades are carelessly used to infer ages of demographic events, that these ages are estimated under a simple model of panmixia and population stationarity but are then used under different and unspecified models to test hypotheses, a usage the invalidates these testing procedures. We conclude by encouraging researchers to study and use model-based inference in population genetics. PMID:29284924
Statistical inference for remote sensing-based estimates of net deforestation
Ronald E. McRoberts; Brian F. Walters
2012-01-01
Statistical inference requires expression of an estimate in probabilistic terms, usually in the form of a confidence interval. An approach to constructing confidence intervals for remote sensing-based estimates of net deforestation is illustrated. The approach is based on post-classification methods using two independent forest/non-forest classifications because...
Steingroever, Helen; Pachur, Thorsten; Šmíra, Martin; Lee, Michael D
2018-06-01
The Iowa Gambling Task (IGT) is one of the most popular experimental paradigms for comparing complex decision-making across groups. Most commonly, IGT behavior is analyzed using frequentist tests to compare performance across groups, and to compare inferred parameters of cognitive models developed for the IGT. Here, we present a Bayesian alternative based on Bayesian repeated-measures ANOVA for comparing performance, and a suite of three complementary model-based methods for assessing the cognitive processes underlying IGT performance. The three model-based methods involve Bayesian hierarchical parameter estimation, Bayes factor model comparison, and Bayesian latent-mixture modeling. We illustrate these Bayesian methods by applying them to test the extent to which differences in intuitive versus deliberate decision style are associated with differences in IGT performance. The results show that intuitive and deliberate decision-makers behave similarly on the IGT, and the modeling analyses consistently suggest that both groups of decision-makers rely on similar cognitive processes. Our results challenge the notion that individual differences in intuitive and deliberate decision styles have a broad impact on decision-making. They also highlight the advantages of Bayesian methods, especially their ability to quantify evidence in favor of the null hypothesis, and that they allow model-based analyses to incorporate hierarchical and latent-mixture structures.
Software and package applicating for network meta-analysis: A usage-based comparative study.
Xu, Chang; Niu, Yuming; Wu, Junyi; Gu, Huiyun; Zhang, Chao
2017-12-21
To compare and analyze the characteristics and functions of software applications for network meta-analysis (NMA). PubMed, EMbase, The Cochrane Library, the official websites of Bayesian inference Using Gibbs Sampling (BUGS), Stata and R, and Google were searched to collect the software and packages for performing NMA; software and packages published up to March 2016 were included. After collecting the software, packages, and their user guides, we used the software and packages to calculate a typical example. All characteristics, functions, and computed results were compared and analyzed. Ten types of software were included, including programming and non-programming software. They were developed mainly based on Bayesian or frequentist theory. Most types of software have the characteristics of easy operation, easy mastery, exact calculation, or excellent graphing. However, there was no single software that performed accurate calculations with superior graphing; this could only be achieved through the combination of two or more types of software. This study suggests that the user should choose the appropriate software according to personal programming basis, operational habits, and financial ability. Then, the choice of the combination of BUGS and R (or Stata) software to perform the NMA is considered. © 2017 Chinese Cochrane Center, West China Hospital of Sichuan University and John Wiley & Sons Australia, Ltd.
Harrigan, George G; Harrison, Jay M
2012-01-01
New transgenic (GM) crops are subjected to extensive safety assessments that include compositional comparisons with conventional counterparts as a cornerstone of the process. The influence of germplasm, location, environment, and agronomic treatments on compositional variability is, however, often obscured in these pair-wise comparisons. Furthermore, classical statistical significance testing can often provide an incomplete and over-simplified summary of highly responsive variables such as crop composition. In order to more clearly describe the influence of the numerous sources of compositional variation we present an introduction to two alternative but complementary approaches to data analysis and interpretation. These include i) exploratory data analysis (EDA) with its emphasis on visualization and graphics-based approaches and ii) Bayesian statistical methodology that provides easily interpretable and meaningful evaluations of data in terms of probability distributions. The EDA case-studies include analyses of herbicide-tolerant GM soybean and insect-protected GM maize and soybean. Bayesian approaches are presented in an analysis of herbicide-tolerant GM soybean. Advantages of these approaches over classical frequentist significance testing include the more direct interpretation of results in terms of probabilities pertaining to quantities of interest and no confusion over the application of corrections for multiple comparisons. It is concluded that a standardized framework for these methodologies could provide specific advantages through enhanced clarity of presentation and interpretation in comparative assessments of crop composition.
Thermodynamics of statistical inference by cells.
Lang, Alex H; Fisher, Charles K; Mora, Thierry; Mehta, Pankaj
2014-10-03
The deep connection between thermodynamics, computation, and information is now well established both theoretically and experimentally. Here, we extend these ideas to show that thermodynamics also places fundamental constraints on statistical estimation and learning. To do so, we investigate the constraints placed by (nonequilibrium) thermodynamics on the ability of biochemical signaling networks to estimate the concentration of an external signal. We show that accuracy is limited by energy consumption, suggesting that there are fundamental thermodynamic constraints on statistical inference.
Jiang, Zhehan; Skorupski, William
2017-12-12
In many behavioral research areas, multivariate generalizability theory (mG theory) has been typically used to investigate the reliability of certain multidimensional assessments. However, traditional mG-theory estimation-namely, using frequentist approaches-has limits, leading researchers to fail to take full advantage of the information that mG theory can offer regarding the reliability of measurements. Alternatively, Bayesian methods provide more information than frequentist approaches can offer. This article presents instructional guidelines on how to implement mG-theory analyses in a Bayesian framework; in particular, BUGS code is presented to fit commonly seen designs from mG theory, including single-facet designs, two-facet crossed designs, and two-facet nested designs. In addition to concrete examples that are closely related to the selected designs and the corresponding BUGS code, a simulated dataset is provided to demonstrate the utility and advantages of the Bayesian approach. This article is intended to serve as a tutorial reference for applied researchers and methodologists conducting mG-theory studies.
ERIC Educational Resources Information Center
Noll, Jennifer; Hancock, Stacey
2015-01-01
This research investigates what students' use of statistical language can tell us about their conceptions of distribution and sampling in relation to informal inference. Prior research documents students' challenges in understanding ideas of distribution and sampling as tools for making informal statistical inferences. We know that these…
Multi-Agent Inference in Social Networks: A Finite Population Learning Approach
Tong, Xin; Zeng, Yao
2016-01-01
When people in a society want to make inference about some parameter, each person may want to use data collected by other people. Information (data) exchange in social networks is usually costly, so to make reliable statistical decisions, people need to trade off the benefits and costs of information acquisition. Conflicts of interests and coordination problems will arise in the process. Classical statistics does not consider people’s incentives and interactions in the data collection process. To address this imperfection, this work explores multi-agent Bayesian inference problems with a game theoretic social network model. Motivated by our interest in aggregate inference at the societal level, we propose a new concept, finite population learning, to address whether with high probability, a large fraction of people in a given finite population network can make “good” inference. Serving as a foundation, this concept enables us to study the long run trend of aggregate inference quality as population grows. PMID:27076691
Ganju, Jitendra; Yu, Xinxin; Ma, Guoguang Julie
2013-01-01
Formal inference in randomized clinical trials is based on controlling the type I error rate associated with a single pre-specified statistic. The deficiency of using just one method of analysis is that it depends on assumptions that may not be met. For robust inference, we propose pre-specifying multiple test statistics and relying on the minimum p-value for testing the null hypothesis of no treatment effect. The null hypothesis associated with the various test statistics is that the treatment groups are indistinguishable. The critical value for hypothesis testing comes from permutation distributions. Rejection of the null hypothesis when the smallest p-value is less than the critical value controls the type I error rate at its designated value. Even if one of the candidate test statistics has low power, the adverse effect on the power of the minimum p-value statistic is not much. Its use is illustrated with examples. We conclude that it is better to rely on the minimum p-value rather than a single statistic particularly when that single statistic is the logrank test, because of the cost and complexity of many survival trials. Copyright © 2013 John Wiley & Sons, Ltd.
Ensemble stacking mitigates biases in inference of synaptic connectivity.
Chambers, Brendan; Levy, Maayan; Dechery, Joseph B; MacLean, Jason N
2018-01-01
A promising alternative to directly measuring the anatomical connections in a neuronal population is inferring the connections from the activity. We employ simulated spiking neuronal networks to compare and contrast commonly used inference methods that identify likely excitatory synaptic connections using statistical regularities in spike timing. We find that simple adjustments to standard algorithms improve inference accuracy: A signing procedure improves the power of unsigned mutual-information-based approaches and a correction that accounts for differences in mean and variance of background timing relationships, such as those expected to be induced by heterogeneous firing rates, increases the sensitivity of frequency-based methods. We also find that different inference methods reveal distinct subsets of the synaptic network and each method exhibits different biases in the accurate detection of reciprocity and local clustering. To correct for errors and biases specific to single inference algorithms, we combine methods into an ensemble. Ensemble predictions, generated as a linear combination of multiple inference algorithms, are more sensitive than the best individual measures alone, and are more faithful to ground-truth statistics of connectivity, mitigating biases specific to single inference methods. These weightings generalize across simulated datasets, emphasizing the potential for the broad utility of ensemble-based approaches.
Bayesian inference of physiologically meaningful parameters from body sway measurements.
Tietäväinen, A; Gutmann, M U; Keski-Vakkuri, E; Corander, J; Hæggström, E
2017-06-19
The control of the human body sway by the central nervous system, muscles, and conscious brain is of interest since body sway carries information about the physiological status of a person. Several models have been proposed to describe body sway in an upright standing position, however, due to the statistical intractability of the more realistic models, no formal parameter inference has previously been conducted and the expressive power of such models for real human subjects remains unknown. Using the latest advances in Bayesian statistical inference for intractable models, we fitted a nonlinear control model to posturographic measurements, and we showed that it can accurately predict the sway characteristics of both simulated and real subjects. Our method provides a full statistical characterization of the uncertainty related to all model parameters as quantified by posterior probability density functions, which is useful for comparisons across subjects and test settings. The ability to infer intractable control models from sensor data opens new possibilities for monitoring and predicting body status in health applications.
Research participant compensation: A matter of statistical inference as well as ethics.
Swanson, David M; Betensky, Rebecca A
2015-11-01
The ethics of compensation of research subjects for participation in clinical trials has been debated for years. One ethical issue of concern is variation among subjects in the level of compensation for identical treatments. Surprisingly, the impact of variation on the statistical inferences made from trial results has not been examined. We seek to identify how variation in compensation may influence any existing dependent censoring in clinical trials, thereby also influencing inference about the survival curve, hazard ratio, or other measures of treatment efficacy. In simulation studies, we consider a model for how compensation structure may influence the censoring model. Under existing dependent censoring, we estimate survival curves under different compensation structures and observe how these structures induce variability in the estimates. We show through this model that if the compensation structure affects the censoring model and dependent censoring is present, then variation in that structure induces variation in the estimates and affects the accuracy of estimation and inference on treatment efficacy. From the perspectives of both ethics and statistical inference, standardization and transparency in the compensation of participants in clinical trials are warranted. Copyright © 2015 Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Kazak, Sibel; Pratt, Dave
2017-01-01
This study considers probability models as tools for both making informal statistical inferences and building stronger conceptual connections between data and chance topics in teaching statistics. In this paper, we aim to explore pre-service mathematics teachers' use of probability models for a chance game, where the sum of two dice matters in…
Phylogeography Takes a Relaxed Random Walk in Continuous Space and Time
Lemey, Philippe; Rambaut, Andrew; Welch, John J.; Suchard, Marc A.
2010-01-01
Research aimed at understanding the geographic context of evolutionary histories is burgeoning across biological disciplines. Recent endeavors attempt to interpret contemporaneous genetic variation in the light of increasingly detailed geographical and environmental observations. Such interest has promoted the development of phylogeographic inference techniques that explicitly aim to integrate such heterogeneous data. One promising development involves reconstructing phylogeographic history on a continuous landscape. Here, we present a Bayesian statistical approach to infer continuous phylogeographic diffusion using random walk models while simultaneously reconstructing the evolutionary history in time from molecular sequence data. Moreover, by accommodating branch-specific variation in dispersal rates, we relax the most restrictive assumption of the standard Brownian diffusion process and demonstrate increased statistical efficiency in spatial reconstructions of overdispersed random walks by analyzing both simulated and real viral genetic data. We further illustrate how drawing inference about summary statistics from a fully specified stochastic process over both sequence evolution and spatial movement reveals important characteristics of a rabies epidemic. Together with recent advances in discrete phylogeographic inference, the continuous model developments furnish a flexible statistical framework for biogeographical reconstructions that is easily expanded upon to accommodate various landscape genetic features. PMID:20203288
Variation in reaction norms: Statistical considerations and biological interpretation.
Morrissey, Michael B; Liefting, Maartje
2016-09-01
Analysis of reaction norms, the functions by which the phenotype produced by a given genotype depends on the environment, is critical to studying many aspects of phenotypic evolution. Different techniques are available for quantifying different aspects of reaction norm variation. We examine what biological inferences can be drawn from some of the more readily applicable analyses for studying reaction norms. We adopt a strongly biologically motivated view, but draw on statistical theory to highlight strengths and drawbacks of different techniques. In particular, consideration of some formal statistical theory leads to revision of some recently, and forcefully, advocated opinions on reaction norm analysis. We clarify what simple analysis of the slope between mean phenotype in two environments can tell us about reaction norms, explore the conditions under which polynomial regression can provide robust inferences about reaction norm shape, and explore how different existing approaches may be used to draw inferences about variation in reaction norm shape. We show how mixed model-based approaches can provide more robust inferences than more commonly used multistep statistical approaches, and derive new metrics of the relative importance of variation in reaction norm intercepts, slopes, and curvatures. © 2016 The Author(s). Evolution © 2016 The Society for the Study of Evolution.
On the importance of avoiding shortcuts in applying cognitive models to hierarchical data.
Boehm, Udo; Marsman, Maarten; Matzke, Dora; Wagenmakers, Eric-Jan
2018-06-12
Psychological experiments often yield data that are hierarchically structured. A number of popular shortcut strategies in cognitive modeling do not properly accommodate this structure and can result in biased conclusions. To gauge the severity of these biases, we conducted a simulation study for a two-group experiment. We first considered a modeling strategy that ignores the hierarchical data structure. In line with theoretical results, our simulations showed that Bayesian and frequentist methods that rely on this strategy are biased towards the null hypothesis. Secondly, we considered a modeling strategy that takes a two-step approach by first obtaining participant-level estimates from a hierarchical cognitive model and subsequently using these estimates in a follow-up statistical test. Methods that rely on this strategy are biased towards the alternative hypothesis. Only hierarchical models of the multilevel data lead to correct conclusions. Our results are particularly relevant for the use of hierarchical Bayesian parameter estimates in cognitive modeling.
Norris, David C; Wilson, Andrew
2016-01-01
In a 2014 report on adolescent mental health outcomes in the Moving to Opportunity for Fair Housing Demonstration (MTO), Kessler et al. reported that, at 10- to 15-year follow-up, boys from households randomized to an experimental housing voucher intervention experienced 12-month prevalence of post-traumatic stress disorder (PTSD) at several times the rate of boys from control households. We reanalyze this finding here, bringing to light a PTSD outcome imputation procedure used in the original analysis, but not described in the study report. By bootstrapping with repeated draws from the frequentist sampling distribution of the imputation model used by Kessler et al., and by varying two pseudorandom number generator seeds that fed their analysis, we account for several purely statistical components of the uncertainty inherent in their imputation procedure. We also discuss other sources of uncertainty in this procedure that were not accessible to a formal reanalysis.
Frequentist Analysis of SLAC Rosenbluth Data
NASA Astrophysics Data System (ADS)
Higinbotham, Douglas; McClellan, Evan; Shamaiengar, Stephen
2017-01-01
Analysis of the SLAC NE-11 elastic electron-proton scattering data typically assumes that the 1.6 GeV spectrometer has a systematic normalization offset as compared to the well-known 8 GeV spectrometer, yet such an offset should have been observed globally. A review of doctoral theses from the period finds that analysis with high statistics, inelastic data saw no significant normalization difference. Moreover, the unique kinematics utilized to match the two spectrometers for normalization required the 8 GeV to be rotated beyond it's well-understood angular range. We try to quantify the confidence level of rejecting the null hypothesis, i.e. that the 1.6 GeV spectrometer normalization is correct, and will show the result of simply analyzing the cross section data as obtained. This is a critical study, as the 1.6 GeV spectrometer data drives the epsilon lever arm in Rosenbluth extractions, and therefore can have a significant impact on form factor extractions at high momentum transfer.
The Role of Probability-Based Inference in an Intelligent Tutoring System.
ERIC Educational Resources Information Center
Mislevy, Robert J.; Gitomer, Drew H.
Probability-based inference in complex networks of interdependent variables is an active topic in statistical research, spurred by such diverse applications as forecasting, pedigree analysis, troubleshooting, and medical diagnosis. This paper concerns the role of Bayesian inference networks for updating student models in intelligent tutoring…
NASA Astrophysics Data System (ADS)
Albert, Carlo; Ulzega, Simone; Stoop, Ruedi
2016-04-01
Measured time-series of both precipitation and runoff are known to exhibit highly non-trivial statistical properties. For making reliable probabilistic predictions in hydrology, it is therefore desirable to have stochastic models with output distributions that share these properties. When parameters of such models have to be inferred from data, we also need to quantify the associated parametric uncertainty. For non-trivial stochastic models, however, this latter step is typically very demanding, both conceptually and numerically, and always never done in hydrology. Here, we demonstrate that methods developed in statistical physics make a large class of stochastic differential equation (SDE) models amenable to a full-fledged Bayesian parameter inference. For concreteness we demonstrate these methods by means of a simple yet non-trivial toy SDE model. We consider a natural catchment that can be described by a linear reservoir, at the scale of observation. All the neglected processes are assumed to happen at much shorter time-scales and are therefore modeled with a Gaussian white noise term, the standard deviation of which is assumed to scale linearly with the system state (water volume in the catchment). Even for constant input, the outputs of this simple non-linear SDE model show a wealth of desirable statistical properties, such as fat-tailed distributions and long-range correlations. Standard algorithms for Bayesian inference fail, for models of this kind, because their likelihood functions are extremely high-dimensional intractable integrals over all possible model realizations. The use of Kalman filters is illegitimate due to the non-linearity of the model. Particle filters could be used but become increasingly inefficient with growing number of data points. Hamiltonian Monte Carlo algorithms allow us to translate this inference problem to the problem of simulating the dynamics of a statistical mechanics system and give us access to most sophisticated methods that have been developed in the statistical physics community over the last few decades. We demonstrate that such methods, along with automated differentiation algorithms, allow us to perform a full-fledged Bayesian inference, for a large class of SDE models, in a highly efficient and largely automatized manner. Furthermore, our algorithm is highly parallelizable. For our toy model, discretized with a few hundred points, a full Bayesian inference can be performed in a matter of seconds on a standard PC.
NASA Astrophysics Data System (ADS)
Bakker, Arthur; Ben-Zvi, Dani; Makar, Katie
2017-12-01
To understand how statistical and other types of reasoning are coordinated with actions to reduce uncertainty, we conducted a case study in vocational education that involved statistical hypothesis testing. We analyzed an intern's research project in a hospital laboratory in which reducing uncertainties was crucial to make a valid statistical inference. In his project, the intern, Sam, investigated whether patients' blood could be sent through pneumatic post without influencing the measurement of particular blood components. We asked, in the process of making a statistical inference, how are reasons and actions coordinated to reduce uncertainty? For the analysis, we used the semantic theory of inferentialism, specifically, the concept of webs of reasons and actions—complexes of interconnected reasons for facts and actions; these reasons include premises and conclusions, inferential relations, implications, motives for action, and utility of tools for specific purposes in a particular context. Analysis of interviews with Sam, his supervisor and teacher as well as video data of Sam in the classroom showed that many of Sam's actions aimed to reduce variability, rule out errors, and thus reduce uncertainties so as to arrive at a valid inference. Interestingly, the decisive factor was not the outcome of a t test but of the reference change value, a clinical chemical measure of analytic and biological variability. With insights from this case study, we expect that students can be better supported in connecting statistics with context and in dealing with uncertainty.
PyClone: statistical inference of clonal population structure in cancer.
Roth, Andrew; Khattra, Jaswinder; Yap, Damian; Wan, Adrian; Laks, Emma; Biele, Justina; Ha, Gavin; Aparicio, Samuel; Bouchard-Côté, Alexandre; Shah, Sohrab P
2014-04-01
We introduce PyClone, a statistical model for inference of clonal population structures in cancers. PyClone is a Bayesian clustering method for grouping sets of deeply sequenced somatic mutations into putative clonal clusters while estimating their cellular prevalences and accounting for allelic imbalances introduced by segmental copy-number changes and normal-cell contamination. Single-cell sequencing validation demonstrates PyClone's accuracy.
Statistical Signal Models and Algorithms for Image Analysis
1984-10-25
In this report, two-dimensional stochastic linear models are used in developing algorithms for image analysis such as classification, segmentation, and object detection in images characterized by textured backgrounds. These models generate two-dimensional random processes as outputs to which statistical inference procedures can naturally be applied. A common thread throughout our algorithms is the interpretation of the inference procedures in terms of linear prediction
Nabi, Razieh; Shpitser, Ilya
2017-01-01
In this paper, we consider the problem of fair statistical inference involving outcome variables. Examples include classification and regression problems, and estimating treatment effects in randomized trials or observational data. The issue of fairness arises in such problems where some covariates or treatments are “sensitive,” in the sense of having potential of creating discrimination. In this paper, we argue that the presence of discrimination can be formalized in a sensible way as the presence of an effect of a sensitive covariate on the outcome along certain causal pathways, a view which generalizes (Pearl 2009). A fair outcome model can then be learned by solving a constrained optimization problem. We discuss a number of complications that arise in classical statistical inference due to this view and provide workarounds based on recent work in causal and semi-parametric inference.
P values are only an index to evidence: 20th- vs. 21st-century statistical science.
Burnham, K P; Anderson, D R
2014-03-01
Early statistical methods focused on pre-data probability statements (i.e., data as random variables) such as P values; these are not really inferences nor are P values evidential. Statistical science clung to these principles throughout much of the 20th century as a wide variety of methods were developed for special cases. Looking back, it is clear that the underlying paradigm (i.e., testing and P values) was weak. As Kuhn (1970) suggests, new paradigms have taken the place of earlier ones: this is a goal of good science. New methods have been developed and older methods extended and these allow proper measures of strength of evidence and multimodel inference. It is time to move forward with sound theory and practice for the difficult practical problems that lie ahead. Given data the useful foundation shifts to post-data probability statements such as model probabilities (Akaike weights) or related quantities such as odds ratios and likelihood intervals. These new methods allow formal inference from multiple models in the a prior set. These quantities are properly evidential. The past century was aimed at finding the "best" model and making inferences from it. The goal in the 21st century is to base inference on all the models weighted by their model probabilities (model averaging). Estimates of precision can include model selection uncertainty leading to variances conditional on the model set. The 21st century will be about the quantification of information, proper measures of evidence, and multi-model inference. Nelder (1999:261) concludes, "The most important task before us in developing statistical science is to demolish the P-value culture, which has taken root to a frightening extent in many areas of both pure and applied science and technology".
Statistical inference of the generation probability of T-cell receptors from sequence repertoires.
Murugan, Anand; Mora, Thierry; Walczak, Aleksandra M; Callan, Curtis G
2012-10-02
Stochastic rearrangement of germline V-, D-, and J-genes to create variable coding sequence for certain cell surface receptors is at the origin of immune system diversity. This process, known as "VDJ recombination", is implemented via a series of stochastic molecular events involving gene choices and random nucleotide insertions between, and deletions from, genes. We use large sequence repertoires of the variable CDR3 region of human CD4+ T-cell receptor beta chains to infer the statistical properties of these basic biochemical events. Because any given CDR3 sequence can be produced in multiple ways, the probability distribution of hidden recombination events cannot be inferred directly from the observed sequences; we therefore develop a maximum likelihood inference method to achieve this end. To separate the properties of the molecular rearrangement mechanism from the effects of selection, we focus on nonproductive CDR3 sequences in T-cell DNA. We infer the joint distribution of the various generative events that occur when a new T-cell receptor gene is created. We find a rich picture of correlation (and absence thereof), providing insight into the molecular mechanisms involved. The generative event statistics are consistent between individuals, suggesting a universal biochemical process. Our probabilistic model predicts the generation probability of any specific CDR3 sequence by the primitive recombination process, allowing us to quantify the potential diversity of the T-cell repertoire and to understand why some sequences are shared between individuals. We argue that the use of formal statistical inference methods, of the kind presented in this paper, will be essential for quantitative understanding of the generation and evolution of diversity in the adaptive immune system.
Emmert-Streib, Frank; Glazko, Galina V.; Altay, Gökmen; de Matos Simoes, Ricardo
2012-01-01
In this paper, we present a systematic and conceptual overview of methods for inferring gene regulatory networks from observational gene expression data. Further, we discuss two classic approaches to infer causal structures and compare them with contemporary methods by providing a conceptual categorization thereof. We complement the above by surveying global and local evaluation measures for assessing the performance of inference algorithms. PMID:22408642
Information Entropy Production of Maximum Entropy Markov Chains from Spike Trains
NASA Astrophysics Data System (ADS)
Cofré, Rodrigo; Maldonado, Cesar
2018-01-01
We consider the maximum entropy Markov chain inference approach to characterize the collective statistics of neuronal spike trains, focusing on the statistical properties of the inferred model. We review large deviations techniques useful in this context to describe properties of accuracy and convergence in terms of sampling size. We use these results to study the statistical fluctuation of correlations, distinguishability and irreversibility of maximum entropy Markov chains. We illustrate these applications using simple examples where the large deviation rate function is explicitly obtained for maximum entropy models of relevance in this field.
Linde, Klaus; Rücker, Gerta; Schneider, Antonius; Kriston, Levente
2016-03-01
We aimed to evaluate the underlying assumptions of a network meta-analysis investigating which depression treatment works best in primary care and to highlight challenges and pitfalls of interpretation under consideration of these assumptions. We reviewed 100 randomized trials investigating pharmacologic and psychological treatments for primary care patients with depression. Network meta-analysis was carried out within a frequentist framework using response to treatment as outcome measure. Transitivity was assessed by epidemiologic judgment based on theoretical and empirical investigation of the distribution of trial characteristics across comparisons. Homogeneity and consistency were investigated by decomposing the Q statistic. There were important clinical and statistically significant differences between "pure" drug trials comparing pharmacologic substances with each other or placebo (63 trials) and trials including a psychological treatment arm (37 trials). Overall network meta-analysis produced results well comparable with separate meta-analyses of drug trials and psychological trials. Although the homogeneity and consistency assumptions were mostly met, we considered the transitivity assumption unjustifiable. An exchange of experience between reviewers and, if possible, some guidance on how reviewers addressing important clinical questions can proceed in situations where important assumptions for valid network meta-analysis are not met would be desirable. Copyright © 2016 Elsevier Inc. All rights reserved.
2012-01-01
Background A statistical analysis plan (SAP) is a critical link between how a clinical trial is conducted and the clinical study report. To secure objective study results, regulatory bodies expect that the SAP will meet requirements in pre-specifying inferential analyses and other important statistical techniques. To write a good SAP for model-based sensitivity and ancillary analyses involves non-trivial decisions on and justification of many aspects of the chosen setting. In particular, trials with longitudinal count data as primary endpoints pose challenges for model choice and model validation. In the random effects setting, frequentist strategies for model assessment and model diagnosis are complex and not easily implemented and have several limitations. Therefore, it is of interest to explore Bayesian alternatives which provide the needed decision support to finalize a SAP. Methods We focus on generalized linear mixed models (GLMMs) for the analysis of longitudinal count data. A series of distributions with over- and under-dispersion is considered. Additionally, the structure of the variance components is modified. We perform a simulation study to investigate the discriminatory power of Bayesian tools for model criticism in different scenarios derived from the model setting. We apply the findings to the data from an open clinical trial on vertigo attacks. These data are seen as pilot data for an ongoing phase III trial. To fit GLMMs we use a novel Bayesian computational approach based on integrated nested Laplace approximations (INLAs). The INLA methodology enables the direct computation of leave-one-out predictive distributions. These distributions are crucial for Bayesian model assessment. We evaluate competing GLMMs for longitudinal count data according to the deviance information criterion (DIC) or probability integral transform (PIT), and by using proper scoring rules (e.g. the logarithmic score). Results The instruments under study provide excellent tools for preparing decisions within the SAP in a transparent way when structuring the primary analysis, sensitivity or ancillary analyses, and specific analyses for secondary endpoints. The mean logarithmic score and DIC discriminate well between different model scenarios. It becomes obvious that the naive choice of a conventional random effects Poisson model is often inappropriate for real-life count data. The findings are used to specify an appropriate mixed model employed in the sensitivity analyses of an ongoing phase III trial. Conclusions The proposed Bayesian methods are not only appealing for inference but notably provide a sophisticated insight into different aspects of model performance, such as forecast verification or calibration checks, and can be applied within the model selection process. The mean of the logarithmic score is a robust tool for model ranking and is not sensitive to sample size. Therefore, these Bayesian model selection techniques offer helpful decision support for shaping sensitivity and ancillary analyses in a statistical analysis plan of a clinical trial with longitudinal count data as the primary endpoint. PMID:22962944
Adrion, Christine; Mansmann, Ulrich
2012-09-10
A statistical analysis plan (SAP) is a critical link between how a clinical trial is conducted and the clinical study report. To secure objective study results, regulatory bodies expect that the SAP will meet requirements in pre-specifying inferential analyses and other important statistical techniques. To write a good SAP for model-based sensitivity and ancillary analyses involves non-trivial decisions on and justification of many aspects of the chosen setting. In particular, trials with longitudinal count data as primary endpoints pose challenges for model choice and model validation. In the random effects setting, frequentist strategies for model assessment and model diagnosis are complex and not easily implemented and have several limitations. Therefore, it is of interest to explore Bayesian alternatives which provide the needed decision support to finalize a SAP. We focus on generalized linear mixed models (GLMMs) for the analysis of longitudinal count data. A series of distributions with over- and under-dispersion is considered. Additionally, the structure of the variance components is modified. We perform a simulation study to investigate the discriminatory power of Bayesian tools for model criticism in different scenarios derived from the model setting. We apply the findings to the data from an open clinical trial on vertigo attacks. These data are seen as pilot data for an ongoing phase III trial. To fit GLMMs we use a novel Bayesian computational approach based on integrated nested Laplace approximations (INLAs). The INLA methodology enables the direct computation of leave-one-out predictive distributions. These distributions are crucial for Bayesian model assessment. We evaluate competing GLMMs for longitudinal count data according to the deviance information criterion (DIC) or probability integral transform (PIT), and by using proper scoring rules (e.g. the logarithmic score). The instruments under study provide excellent tools for preparing decisions within the SAP in a transparent way when structuring the primary analysis, sensitivity or ancillary analyses, and specific analyses for secondary endpoints. The mean logarithmic score and DIC discriminate well between different model scenarios. It becomes obvious that the naive choice of a conventional random effects Poisson model is often inappropriate for real-life count data. The findings are used to specify an appropriate mixed model employed in the sensitivity analyses of an ongoing phase III trial. The proposed Bayesian methods are not only appealing for inference but notably provide a sophisticated insight into different aspects of model performance, such as forecast verification or calibration checks, and can be applied within the model selection process. The mean of the logarithmic score is a robust tool for model ranking and is not sensitive to sample size. Therefore, these Bayesian model selection techniques offer helpful decision support for shaping sensitivity and ancillary analyses in a statistical analysis plan of a clinical trial with longitudinal count data as the primary endpoint.
Emura, Takeshi; Konno, Yoshihiko; Michimae, Hirofumi
2015-07-01
Doubly truncated data consist of samples whose observed values fall between the right- and left- truncation limits. With such samples, the distribution function of interest is estimated using the nonparametric maximum likelihood estimator (NPMLE) that is obtained through a self-consistency algorithm. Owing to the complicated asymptotic distribution of the NPMLE, the bootstrap method has been suggested for statistical inference. This paper proposes a closed-form estimator for the asymptotic covariance function of the NPMLE, which is computationally attractive alternative to bootstrapping. Furthermore, we develop various statistical inference procedures, such as confidence interval, goodness-of-fit tests, and confidence bands to demonstrate the usefulness of the proposed covariance estimator. Simulations are performed to compare the proposed method with both the bootstrap and jackknife methods. The methods are illustrated using the childhood cancer dataset.
Structured statistical models of inductive reasoning.
Kemp, Charles; Tenenbaum, Joshua B
2009-01-01
Everyday inductive inferences are often guided by rich background knowledge. Formal models of induction should aim to incorporate this knowledge and should explain how different kinds of knowledge lead to the distinctive patterns of reasoning found in different inductive contexts. This article presents a Bayesian framework that attempts to meet both goals and describes [corrected] 4 applications of the framework: a taxonomic model, a spatial model, a threshold model, and a causal model. Each model makes probabilistic inferences about the extensions of novel properties, but the priors for the 4 models are defined over different kinds of structures that capture different relationships between the categories in a domain. The framework therefore shows how statistical inference can operate over structured background knowledge, and the authors argue that this interaction between structure and statistics is critical for explaining the power and flexibility of human reasoning.
Goyal, Ravi; De Gruttola, Victor
2018-01-30
Analysis of sexual history data intended to describe sexual networks presents many challenges arising from the fact that most surveys collect information on only a very small fraction of the population of interest. In addition, partners are rarely identified and responses are subject to reporting biases. Typically, each network statistic of interest, such as mean number of sexual partners for men or women, is estimated independently of other network statistics. There is, however, a complex relationship among networks statistics; and knowledge of these relationships can aid in addressing concerns mentioned earlier. We develop a novel method that constrains a posterior predictive distribution of a collection of network statistics in order to leverage the relationships among network statistics in making inference about network properties of interest. The method ensures that inference on network properties is compatible with an actual network. Through extensive simulation studies, we also demonstrate that use of this method can improve estimates in settings where there is uncertainty that arises both from sampling and from systematic reporting bias compared with currently available approaches to estimation. To illustrate the method, we apply it to estimate network statistics using data from the Chicago Health and Social Life Survey. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Lee, K. David; Wiesenfeld, Eric; Gelfand, Andrew
2007-04-01
One of the greatest challenges in modern combat is maintaining a high level of timely Situational Awareness (SA). In many situations, computational complexity and accuracy considerations make the development and deployment of real-time, high-level inference tools very difficult. An innovative hybrid framework that combines Bayesian inference, in the form of Bayesian Networks, and Possibility Theory, in the form of Fuzzy Logic systems, has recently been introduced to provide a rigorous framework for high-level inference. In previous research, the theoretical basis and benefits of the hybrid approach have been developed. However, lacking is a concrete experimental comparison of the hybrid framework with traditional fusion methods, to demonstrate and quantify this benefit. The goal of this research, therefore, is to provide a statistical analysis on the comparison of the accuracy and performance of hybrid network theory, with pure Bayesian and Fuzzy systems and an inexact Bayesian system approximated using Particle Filtering. To accomplish this task, domain specific models will be developed under these different theoretical approaches and then evaluated, via Monte Carlo Simulation, in comparison to situational ground truth to measure accuracy and fidelity. Following this, a rigorous statistical analysis of the performance results will be performed, to quantify the benefit of hybrid inference to other fusion tools.
Statistically optimal perception and learning: from behavior to neural representations
Fiser, József; Berkes, Pietro; Orbán, Gergő; Lengyel, Máté
2010-01-01
Human perception has recently been characterized as statistical inference based on noisy and ambiguous sensory inputs. Moreover, suitable neural representations of uncertainty have been identified that could underlie such probabilistic computations. In this review, we argue that learning an internal model of the sensory environment is another key aspect of the same statistical inference procedure and thus perception and learning need to be treated jointly. We review evidence for statistically optimal learning in humans and animals, and reevaluate possible neural representations of uncertainty based on their potential to support statistically optimal learning. We propose that spontaneous activity can have a functional role in such representations leading to a new, sampling-based, framework of how the cortex represents information and uncertainty. PMID:20153683
On the analysis of very small samples of Gaussian repeated measurements: an alternative approach.
Westgate, Philip M; Burchett, Woodrow W
2017-03-15
The analysis of very small samples of Gaussian repeated measurements can be challenging. First, due to a very small number of independent subjects contributing outcomes over time, statistical power can be quite small. Second, nuisance covariance parameters must be appropriately accounted for in the analysis in order to maintain the nominal test size. However, available statistical strategies that ensure valid statistical inference may lack power, whereas more powerful methods may have the potential for inflated test sizes. Therefore, we explore an alternative approach to the analysis of very small samples of Gaussian repeated measurements, with the goal of maintaining valid inference while also improving statistical power relative to other valid methods. This approach uses generalized estimating equations with a bias-corrected empirical covariance matrix that accounts for all small-sample aspects of nuisance correlation parameter estimation in order to maintain valid inference. Furthermore, the approach utilizes correlation selection strategies with the goal of choosing the working structure that will result in the greatest power. In our study, we show that when accurate modeling of the nuisance correlation structure impacts the efficiency of regression parameter estimation, this method can improve power relative to existing methods that yield valid inference. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Statistical inference for noisy nonlinear ecological dynamic systems.
Wood, Simon N
2010-08-26
Chaotic ecological dynamic systems defy conventional statistical analysis. Systems with near-chaotic dynamics are little better. Such systems are almost invariably driven by endogenous dynamic processes plus demographic and environmental process noise, and are only observable with error. Their sensitivity to history means that minute changes in the driving noise realization, or the system parameters, will cause drastic changes in the system trajectory. This sensitivity is inherited and amplified by the joint probability density of the observable data and the process noise, rendering it useless as the basis for obtaining measures of statistical fit. Because the joint density is the basis for the fit measures used by all conventional statistical methods, this is a major theoretical shortcoming. The inability to make well-founded statistical inferences about biological dynamic models in the chaotic and near-chaotic regimes, other than on an ad hoc basis, leaves dynamic theory without the methods of quantitative validation that are essential tools in the rest of biological science. Here I show that this impasse can be resolved in a simple and general manner, using a method that requires only the ability to simulate the observed data on a system from the dynamic model about which inferences are required. The raw data series are reduced to phase-insensitive summary statistics, quantifying local dynamic structure and the distribution of observations. Simulation is used to obtain the mean and the covariance matrix of the statistics, given model parameters, allowing the construction of a 'synthetic likelihood' that assesses model fit. This likelihood can be explored using a straightforward Markov chain Monte Carlo sampler, but one further post-processing step returns pure likelihood-based inference. I apply the method to establish the dynamic nature of the fluctuations in Nicholson's classic blowfly experiments.
A multilevel model of the impact of farm-level best management practices on phosphorus runoff
USDA-ARS?s Scientific Manuscript database
Multilevel or hierarchical models have been applied for a number of years in the social sciences but only relatively recently in the environmental sciences. These models can be developed in either a frequentist or Bayesian context and have similarities to other methods such as empirical Bayes analys...
ERIC Educational Resources Information Center
Gómez-Torres, Emilse; Batanero, Carmen; Díaz, Carmen; Contreras, José Miguel
2016-01-01
In this paper we describe the development of a questionnaire designed to assess the probability content knowledge of prospective primary school teachers. Three components of mathematical knowledge for teaching and three different meanings of probability (classical, frequentist and subjective) are considered. The questionnaire content is based on…
Small Sample Properties of Bayesian Multivariate Autoregressive Time Series Models
ERIC Educational Resources Information Center
Price, Larry R.
2012-01-01
The aim of this study was to compare the small sample (N = 1, 3, 5, 10, 15) performance of a Bayesian multivariate vector autoregressive (BVAR-SEM) time series model relative to frequentist power and parameter estimation bias. A multivariate autoregressive model was developed based on correlated autoregressive time series vectors of varying…
Probabilistic Graphical Model Representation in Phylogenetics
Höhna, Sebastian; Heath, Tracy A.; Boussau, Bastien; Landis, Michael J.; Ronquist, Fredrik; Huelsenbeck, John P.
2014-01-01
Recent years have seen a rapid expansion of the model space explored in statistical phylogenetics, emphasizing the need for new approaches to statistical model representation and software development. Clear communication and representation of the chosen model is crucial for: (i) reproducibility of an analysis, (ii) model development, and (iii) software design. Moreover, a unified, clear and understandable framework for model representation lowers the barrier for beginners and nonspecialists to grasp complex phylogenetic models, including their assumptions and parameter/variable dependencies. Graphical modeling is a unifying framework that has gained in popularity in the statistical literature in recent years. The core idea is to break complex models into conditionally independent distributions. The strength lies in the comprehensibility, flexibility, and adaptability of this formalism, and the large body of computational work based on it. Graphical models are well-suited to teach statistical models, to facilitate communication among phylogeneticists and in the development of generic software for simulation and statistical inference. Here, we provide an introduction to graphical models for phylogeneticists and extend the standard graphical model representation to the realm of phylogenetics. We introduce a new graphical model component, tree plates, to capture the changing structure of the subgraph corresponding to a phylogenetic tree. We describe a range of phylogenetic models using the graphical model framework and introduce modules to simplify the representation of standard components in large and complex models. Phylogenetic model graphs can be readily used in simulation, maximum likelihood inference, and Bayesian inference using, for example, Metropolis–Hastings or Gibbs sampling of the posterior distribution. [Computation; graphical models; inference; modularization; statistical phylogenetics; tree plate.] PMID:24951559
Protein and gene model inference based on statistical modeling in k-partite graphs.
Gerster, Sarah; Qeli, Ermir; Ahrens, Christian H; Bühlmann, Peter
2010-07-06
One of the major goals of proteomics is the comprehensive and accurate description of a proteome. Shotgun proteomics, the method of choice for the analysis of complex protein mixtures, requires that experimentally observed peptides are mapped back to the proteins they were derived from. This process is also known as protein inference. We present Markovian Inference of Proteins and Gene Models (MIPGEM), a statistical model based on clearly stated assumptions to address the problem of protein and gene model inference for shotgun proteomics data. In particular, we are dealing with dependencies among peptides and proteins using a Markovian assumption on k-partite graphs. We are also addressing the problems of shared peptides and ambiguous proteins by scoring the encoding gene models. Empirical results on two control datasets with synthetic mixtures of proteins and on complex protein samples of Saccharomyces cerevisiae, Drosophila melanogaster, and Arabidopsis thaliana suggest that the results with MIPGEM are competitive with existing tools for protein inference.
Meyer, Patrick E; Lafitte, Frédéric; Bontempi, Gianluca
2008-10-29
This paper presents the R/Bioconductor package minet (version 1.1.6) which provides a set of functions to infer mutual information networks from a dataset. Once fed with a microarray dataset, the package returns a network where nodes denote genes, edges model statistical dependencies between genes and the weight of an edge quantifies the statistical evidence of a specific (e.g transcriptional) gene-to-gene interaction. Four different entropy estimators are made available in the package minet (empirical, Miller-Madow, Schurmann-Grassberger and shrink) as well as four different inference methods, namely relevance networks, ARACNE, CLR and MRNET. Also, the package integrates accuracy assessment tools, like F-scores, PR-curves and ROC-curves in order to compare the inferred network with a reference one. The package minet provides a series of tools for inferring transcriptional networks from microarray data. It is freely available from the Comprehensive R Archive Network (CRAN) as well as from the Bioconductor website.
Computational statistics using the Bayesian Inference Engine
NASA Astrophysics Data System (ADS)
Weinberg, Martin D.
2013-09-01
This paper introduces the Bayesian Inference Engine (BIE), a general parallel, optimized software package for parameter inference and model selection. This package is motivated by the analysis needs of modern astronomical surveys and the need to organize and reuse expensive derived data. The BIE is the first platform for computational statistics designed explicitly to enable Bayesian update and model comparison for astronomical problems. Bayesian update is based on the representation of high-dimensional posterior distributions using metric-ball-tree based kernel density estimation. Among its algorithmic offerings, the BIE emphasizes hybrid tempered Markov chain Monte Carlo schemes that robustly sample multimodal posterior distributions in high-dimensional parameter spaces. Moreover, the BIE implements a full persistence or serialization system that stores the full byte-level image of the running inference and previously characterized posterior distributions for later use. Two new algorithms to compute the marginal likelihood from the posterior distribution, developed for and implemented in the BIE, enable model comparison for complex models and data sets. Finally, the BIE was designed to be a collaborative platform for applying Bayesian methodology to astronomy. It includes an extensible object-oriented and easily extended framework that implements every aspect of the Bayesian inference. By providing a variety of statistical algorithms for all phases of the inference problem, a scientist may explore a variety of approaches with a single model and data implementation. Additional technical details and download details are available from http://www.astro.umass.edu/bie. The BIE is distributed under the GNU General Public License.
Teach a Confidence Interval for the Median in the First Statistics Course
ERIC Educational Resources Information Center
Howington, Eric B.
2017-01-01
Few introductory statistics courses consider statistical inference for the median. This article argues in favour of adding a confidence interval for the median to the first statistics course. Several methods suitable for introductory statistics students are identified and briefly reviewed.
Doss, Hani; Tan, Aixin
2017-01-01
In the classical biased sampling problem, we have k densities π1(·), …, πk(·), each known up to a normalizing constant, i.e. for l = 1, …, k, πl(·) = νl(·)/ml, where νl(·) is a known function and ml is an unknown constant. For each l, we have an iid sample from πl,·and the problem is to estimate the ratios ml/ms for all l and all s. This problem arises frequently in several situations in both frequentist and Bayesian inference. An estimate of the ratios was developed and studied by Vardi and his co-workers over two decades ago, and there has been much subsequent work on this problem from many different perspectives. In spite of this, there are no rigorous results in the literature on how to estimate the standard error of the estimate. We present a class of estimates of the ratios of normalizing constants that are appropriate for the case where the samples from the πl’s are not necessarily iid sequences, but are Markov chains. We also develop an approach based on regenerative simulation for obtaining standard errors for the estimates of ratios of normalizing constants. These standard error estimates are valid for both the iid case and the Markov chain case. PMID:28706463
Doss, Hani; Tan, Aixin
2014-09-01
In the classical biased sampling problem, we have k densities π 1 (·), …, π k (·), each known up to a normalizing constant, i.e. for l = 1, …, k , π l (·) = ν l (·)/ m l , where ν l (·) is a known function and m l is an unknown constant. For each l , we have an iid sample from π l , · and the problem is to estimate the ratios m l /m s for all l and all s . This problem arises frequently in several situations in both frequentist and Bayesian inference. An estimate of the ratios was developed and studied by Vardi and his co-workers over two decades ago, and there has been much subsequent work on this problem from many different perspectives. In spite of this, there are no rigorous results in the literature on how to estimate the standard error of the estimate. We present a class of estimates of the ratios of normalizing constants that are appropriate for the case where the samples from the π l 's are not necessarily iid sequences, but are Markov chains. We also develop an approach based on regenerative simulation for obtaining standard errors for the estimates of ratios of normalizing constants. These standard error estimates are valid for both the iid case and the Markov chain case.
Pointwise probability reinforcements for robust statistical inference.
Frénay, Benoît; Verleysen, Michel
2014-02-01
Statistical inference using machine learning techniques may be difficult with small datasets because of abnormally frequent data (AFDs). AFDs are observations that are much more frequent in the training sample that they should be, with respect to their theoretical probability, and include e.g. outliers. Estimates of parameters tend to be biased towards models which support such data. This paper proposes to introduce pointwise probability reinforcements (PPRs): the probability of each observation is reinforced by a PPR and a regularisation allows controlling the amount of reinforcement which compensates for AFDs. The proposed solution is very generic, since it can be used to robustify any statistical inference method which can be formulated as a likelihood maximisation. Experiments show that PPRs can be easily used to tackle regression, classification and projection: models are freed from the influence of outliers. Moreover, outliers can be filtered manually since an abnormality degree is obtained for each observation. Copyright © 2013 Elsevier Ltd. All rights reserved.
Probability, statistics, and computational science.
Beerenwinkel, Niko; Siebourg, Juliane
2012-01-01
In this chapter, we review basic concepts from probability theory and computational statistics that are fundamental to evolutionary genomics. We provide a very basic introduction to statistical modeling and discuss general principles, including maximum likelihood and Bayesian inference. Markov chains, hidden Markov models, and Bayesian network models are introduced in more detail as they occur frequently and in many variations in genomics applications. In particular, we discuss efficient inference algorithms and methods for learning these models from partially observed data. Several simple examples are given throughout the text, some of which point to models that are discussed in more detail in subsequent chapters.
A Not-So-Fundamental Limitation on Studying Complex Systems with Statistics: Comment on Rabin (2011)
NASA Astrophysics Data System (ADS)
Thomas, Drew M.
2012-12-01
Although living organisms are affected by many interrelated and unidentified variables, this complexity does not automatically impose a fundamental limitation on statistical inference. Nor need one invoke such complexity as an explanation of the "Truth Wears Off" or "decline" effect; similar "decline" effects occur with far simpler systems studied in physics. Selective reporting and publication bias, and scientists' biases in favor of reporting eye-catching results (in general) or conforming to others' results (in physics) better explain this feature of the "Truth Wears Off" effect than Rabin's suggested limitation on statistical inference.
Bayesian Inference: with ecological applications
Link, William A.; Barker, Richard J.
2010-01-01
This text provides a mathematically rigorous yet accessible and engaging introduction to Bayesian inference with relevant examples that will be of interest to biologists working in the fields of ecology, wildlife management and environmental studies as well as students in advanced undergraduate statistics.. This text opens the door to Bayesian inference, taking advantage of modern computational efficiencies and easily accessible software to evaluate complex hierarchical models.
Theory-based Bayesian Models of Inductive Inference
2010-07-19
Subjective randomness and natural scene statistics. Psychonomic Bulletin & Review . http://cocosci.berkeley.edu/tom/papers/randscenes.pdf Page 1...in press). Exemplar models as a mechanism for performing Bayesian inference. Psychonomic Bulletin & Review . http://cocosci.berkeley.edu/tom
Differences in Performance Among Test Statistics for Assessing Phylogenomic Model Adequacy.
Duchêne, David A; Duchêne, Sebastian; Ho, Simon Y W
2018-05-18
Statistical phylogenetic analyses of genomic data depend on models of nucleotide or amino acid substitution. The adequacy of these substitution models can be assessed using a number of test statistics, allowing the model to be rejected when it is found to provide a poor description of the evolutionary process. A potentially valuable use of model-adequacy test statistics is to identify when data sets are likely to produce unreliable phylogenetic estimates, but their differences in performance are rarely explored. We performed a comprehensive simulation study to identify test statistics that are sensitive to some of the most commonly cited sources of phylogenetic estimation error. Our results show that, for many test statistics, traditional thresholds for assessing model adequacy can fail to reject the model when the phylogenetic inferences are inaccurate and imprecise. This is particularly problematic when analysing loci that have few variable informative sites. We propose new thresholds for assessing substitution model adequacy and demonstrate their effectiveness in analyses of three phylogenomic data sets. These thresholds lead to frequent rejection of the model for loci that yield topological inferences that are imprecise and are likely to be inaccurate. We also propose the use of a summary statistic that provides a practical assessment of overall model adequacy. Our approach offers a promising means of enhancing model choice in genome-scale data sets, potentially leading to improvements in the reliability of phylogenomic inference.
Inferring action structure and causal relationships in continuous sequences of human action.
Buchsbaum, Daphna; Griffiths, Thomas L; Plunkett, Dillon; Gopnik, Alison; Baldwin, Dare
2015-02-01
In the real world, causal variables do not come pre-identified or occur in isolation, but instead are embedded within a continuous temporal stream of events. A challenge faced by both human learners and machine learning algorithms is identifying subsequences that correspond to the appropriate variables for causal inference. A specific instance of this problem is action segmentation: dividing a sequence of observed behavior into meaningful actions, and determining which of those actions lead to effects in the world. Here we present a Bayesian analysis of how statistical and causal cues to segmentation should optimally be combined, as well as four experiments investigating human action segmentation and causal inference. We find that both people and our model are sensitive to statistical regularities and causal structure in continuous action, and are able to combine these sources of information in order to correctly infer both causal relationships and segmentation boundaries. Copyright © 2014. Published by Elsevier Inc.
Applications of statistics to medical science (1) Fundamental concepts.
Watanabe, Hiroshi
2011-01-01
The conceptual framework of statistical tests and statistical inferences are discussed, and the epidemiological background of statistics is briefly reviewed. This study is one of a series in which we survey the basics of statistics and practical methods used in medical statistics. Arguments related to actual statistical analysis procedures will be made in subsequent papers.
The role of causal criteria in causal inferences: Bradford Hill's "aspects of association".
Ward, Andrew C
2009-06-17
As noted by Wesley Salmon and many others, causal concepts are ubiquitous in every branch of theoretical science, in the practical disciplines and in everyday life. In the theoretical and practical sciences especially, people often base claims about causal relations on applications of statistical methods to data. However, the source and type of data place important constraints on the choice of statistical methods as well as on the warrant attributed to the causal claims based on the use of such methods. For example, much of the data used by people interested in making causal claims come from non-experimental, observational studies in which random allocations to treatment and control groups are not present. Thus, one of the most important problems in the social and health sciences concerns making justified causal inferences using non-experimental, observational data. In this paper, I examine one method of justifying such inferences that is especially widespread in epidemiology and the health sciences generally - the use of causal criteria. I argue that while the use of causal criteria is not appropriate for either deductive or inductive inferences, they do have an important role to play in inferences to the best explanation. As such, causal criteria, exemplified by what Bradford Hill referred to as "aspects of [statistical] associations", have an indispensible part to play in the goal of making justified causal claims.
The role of causal criteria in causal inferences: Bradford Hill's "aspects of association"
Ward, Andrew C
2009-01-01
As noted by Wesley Salmon and many others, causal concepts are ubiquitous in every branch of theoretical science, in the practical disciplines and in everyday life. In the theoretical and practical sciences especially, people often base claims about causal relations on applications of statistical methods to data. However, the source and type of data place important constraints on the choice of statistical methods as well as on the warrant attributed to the causal claims based on the use of such methods. For example, much of the data used by people interested in making causal claims come from non-experimental, observational studies in which random allocations to treatment and control groups are not present. Thus, one of the most important problems in the social and health sciences concerns making justified causal inferences using non-experimental, observational data. In this paper, I examine one method of justifying such inferences that is especially widespread in epidemiology and the health sciences generally – the use of causal criteria. I argue that while the use of causal criteria is not appropriate for either deductive or inductive inferences, they do have an important role to play in inferences to the best explanation. As such, causal criteria, exemplified by what Bradford Hill referred to as "aspects of [statistical] associations", have an indispensible part to play in the goal of making justified causal claims. PMID:19534788
Jacquin, Hugo; Gilson, Amy; Shakhnovich, Eugene; Cocco, Simona; Monasson, Rémi
2016-05-01
Inverse statistical approaches to determine protein structure and function from Multiple Sequence Alignments (MSA) are emerging as powerful tools in computational biology. However the underlying assumptions of the relationship between the inferred effective Potts Hamiltonian and real protein structure and energetics remain untested so far. Here we use lattice protein model (LP) to benchmark those inverse statistical approaches. We build MSA of highly stable sequences in target LP structures, and infer the effective pairwise Potts Hamiltonians from those MSA. We find that inferred Potts Hamiltonians reproduce many important aspects of 'true' LP structures and energetics. Careful analysis reveals that effective pairwise couplings in inferred Potts Hamiltonians depend not only on the energetics of the native structure but also on competing folds; in particular, the coupling values reflect both positive design (stabilization of native conformation) and negative design (destabilization of competing folds). In addition to providing detailed structural information, the inferred Potts models used as protein Hamiltonian for design of new sequences are able to generate with high probability completely new sequences with the desired folds, which is not possible using independent-site models. Those are remarkable results as the effective LP Hamiltonians used to generate MSA are not simple pairwise models due to the competition between the folds. Our findings elucidate the reasons for the success of inverse approaches to the modelling of proteins from sequence data, and their limitations.
Logical reasoning versus information processing in the dual-strategy model of reasoning.
Markovits, Henry; Brisson, Janie; de Chantal, Pier-Luc
2017-01-01
One of the major debates concerning the nature of inferential reasoning is between counterexample-based strategies such as mental model theory and statistical strategies underlying probabilistic models. The dual-strategy model, proposed by Verschueren, Schaeken, & d'Ydewalle (2005a, 2005b), which suggests that people might have access to both kinds of strategy has been supported by several recent studies. These have shown that statistical reasoners make inferences based on using information about premises in order to generate a likelihood estimate of conclusion probability. However, while results concerning counterexample reasoners are consistent with a counterexample detection model, these results could equally be interpreted as indicating a greater sensitivity to logical form. In order to distinguish these 2 interpretations, in Studies 1 and 2, we presented reasoners with Modus ponens (MP) inferences with statistical information about premise strength and in Studies 3 and 4, naturalistic MP inferences with premises having many disabling conditions. Statistical reasoners accepted the MP inference more often than counterexample reasoners in Studies 1 and 2, while the opposite pattern was observed in Studies 3 and 4. Results show that these strategies must be defined in terms of information processing, with no clear relations to "logical" reasoning. These results have additional implications for the underlying debate about the nature of human reasoning. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Graffelman, Jan; Sánchez, Milagros; Cook, Samantha; Moreno, Victor
2013-01-01
In genetic association studies, tests for Hardy-Weinberg proportions are often employed as a quality control checking procedure. Missing genotypes are typically discarded prior to testing. In this paper we show that inference for Hardy-Weinberg proportions can be biased when missing values are discarded. We propose to use multiple imputation of missing values in order to improve inference for Hardy-Weinberg proportions. For imputation we employ a multinomial logit model that uses information from allele intensities and/or neighbouring markers. Analysis of an empirical data set of single nucleotide polymorphisms possibly related to colon cancer reveals that missing genotypes are not missing completely at random. Deviation from Hardy-Weinberg proportions is mostly due to a lack of heterozygotes. Inbreeding coefficients estimated by multiple imputation of the missings are typically lowered with respect to inbreeding coefficients estimated by discarding the missings. Accounting for missings by multiple imputation qualitatively changed the results of 10 to 17% of the statistical tests performed. Estimates of inbreeding coefficients obtained by multiple imputation showed high correlation with estimates obtained by single imputation using an external reference panel. Our conclusion is that imputation of missing data leads to improved statistical inference for Hardy-Weinberg proportions.
[Application of statistics on chronic-diseases-relating observational research papers].
Hong, Zhi-heng; Wang, Ping; Cao, Wei-hua
2012-09-01
To study the application of statistics on Chronic-diseases-relating observational research papers which were recently published in the Chinese Medical Association Magazines, with influential index above 0.5. Using a self-developed criterion, two investigators individually participated in assessing the application of statistics on Chinese Medical Association Magazines, with influential index above 0.5. Different opinions reached an agreement through discussion. A total number of 352 papers from 6 magazines, including the Chinese Journal of Epidemiology, Chinese Journal of Oncology, Chinese Journal of Preventive Medicine, Chinese Journal of Cardiology, Chinese Journal of Internal Medicine and Chinese Journal of Endocrinology and Metabolism, were reviewed. The rate of clear statement on the following contents as: research objectives, t target audience, sample issues, objective inclusion criteria and variable definitions were 99.43%, 98.57%, 95.43%, 92.86% and 96.87%. The correct rates of description on quantitative and qualitative data were 90.94% and 91.46%, respectively. The rates on correctly expressing the results, on statistical inference methods related to quantitative, qualitative data and modeling were 100%, 95.32% and 87.19%, respectively. 89.49% of the conclusions could directly response to the research objectives. However, 69.60% of the papers did not mention the exact names of the study design, statistically, that the papers were using. 11.14% of the papers were in lack of further statement on the exclusion criteria. Percentage of the papers that could clearly explain the sample size estimation only taking up as 5.16%. Only 24.21% of the papers clearly described the variable value assignment. Regarding the introduction on statistical conduction and on database methods, the rate was only 24.15%. 18.75% of the papers did not express the statistical inference methods sufficiently. A quarter of the papers did not use 'standardization' appropriately. As for the aspect of statistical inference, the rate of description on statistical testing prerequisite was only 24.12% while 9.94% papers did not even employ the statistical inferential method that should be used. The main deficiencies on the application of Statistics used in papers related to Chronic-diseases-related observational research were as follows: lack of sample-size determination, variable value assignment description not sufficient, methods on statistics were not introduced clearly or properly, lack of consideration for pre-requisition regarding the use of statistical inferences.
Royle, J. Andrew; Dorazio, Robert M.
2008-01-01
A guide to data collection, modeling and inference strategies for biological survey data using Bayesian and classical statistical methods. This book describes a general and flexible framework for modeling and inference in ecological systems based on hierarchical models, with a strict focus on the use of probability models and parametric inference. Hierarchical models represent a paradigm shift in the application of statistics to ecological inference problems because they combine explicit models of ecological system structure or dynamics with models of how ecological systems are observed. The principles of hierarchical modeling are developed and applied to problems in population, metapopulation, community, and metacommunity systems. The book provides the first synthetic treatment of many recent methodological advances in ecological modeling and unifies disparate methods and procedures. The authors apply principles of hierarchical modeling to ecological problems, including * occurrence or occupancy models for estimating species distribution * abundance models based on many sampling protocols, including distance sampling * capture-recapture models with individual effects * spatial capture-recapture models based on camera trapping and related methods * population and metapopulation dynamic models * models of biodiversity, community structure and dynamics.
Truth, models, model sets, AIC, and multimodel inference: a Bayesian perspective
Barker, Richard J.; Link, William A.
2015-01-01
Statistical inference begins with viewing data as realizations of stochastic processes. Mathematical models provide partial descriptions of these processes; inference is the process of using the data to obtain a more complete description of the stochastic processes. Wildlife and ecological scientists have become increasingly concerned with the conditional nature of model-based inference: what if the model is wrong? Over the last 2 decades, Akaike's Information Criterion (AIC) has been widely and increasingly used in wildlife statistics for 2 related purposes, first for model choice and second to quantify model uncertainty. We argue that for the second of these purposes, the Bayesian paradigm provides the natural framework for describing uncertainty associated with model choice and provides the most easily communicated basis for model weighting. Moreover, Bayesian arguments provide the sole justification for interpreting model weights (including AIC weights) as coherent (mathematically self consistent) model probabilities. This interpretation requires treating the model as an exact description of the data-generating mechanism. We discuss the implications of this assumption, and conclude that more emphasis is needed on model checking to provide confidence in the quality of inference.
Sileshi, G
2006-10-01
Researchers and regulatory agencies often make statistical inferences from insect count data using modelling approaches that assume homogeneous variance. Such models do not allow for formal appraisal of variability which in its different forms is the subject of interest in ecology. Therefore, the objectives of this paper were to (i) compare models suitable for handling variance heterogeneity and (ii) select optimal models to ensure valid statistical inferences from insect count data. The log-normal, standard Poisson, Poisson corrected for overdispersion, zero-inflated Poisson, the negative binomial distribution and zero-inflated negative binomial models were compared using six count datasets on foliage-dwelling insects and five families of soil-dwelling insects. Akaike's and Schwarz Bayesian information criteria were used for comparing the various models. Over 50% of the counts were zeros even in locally abundant species such as Ootheca bennigseni Weise, Mesoplatys ochroptera Stål and Diaecoderus spp. The Poisson model after correction for overdispersion and the standard negative binomial distribution model provided better description of the probability distribution of seven out of the 11 insects than the log-normal, standard Poisson, zero-inflated Poisson or zero-inflated negative binomial models. It is concluded that excess zeros and variance heterogeneity are common data phenomena in insect counts. If not properly modelled, these properties can invalidate the normal distribution assumptions resulting in biased estimation of ecological effects and jeopardizing the integrity of the scientific inferences. Therefore, it is recommended that statistical models appropriate for handling these data properties be selected using objective criteria to ensure efficient statistical inference.
Li, Huanjie; Nickerson, Lisa D; Nichols, Thomas E; Gao, Jia-Hong
2017-03-01
Two powerful methods for statistical inference on MRI brain images have been proposed recently, a non-stationary voxelation-corrected cluster-size test (CST) based on random field theory and threshold-free cluster enhancement (TFCE) based on calculating the level of local support for a cluster, then using permutation testing for inference. Unlike other statistical approaches, these two methods do not rest on the assumptions of a uniform and high degree of spatial smoothness of the statistic image. Thus, they are strongly recommended for group-level fMRI analysis compared to other statistical methods. In this work, the non-stationary voxelation-corrected CST and TFCE methods for group-level analysis were evaluated for both stationary and non-stationary images under varying smoothness levels, degrees of freedom and signal to noise ratios. Our results suggest that, both methods provide adequate control for the number of voxel-wise statistical tests being performed during inference on fMRI data and they are both superior to current CSTs implemented in popular MRI data analysis software packages. However, TFCE is more sensitive and stable for group-level analysis of VBM data. Thus, the voxelation-corrected CST approach may confer some advantages by being computationally less demanding for fMRI data analysis than TFCE with permutation testing and by also being applicable for single-subject fMRI analyses, while the TFCE approach is advantageous for VBM data. Hum Brain Mapp 38:1269-1280, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Wright, Marvin N; Dankowski, Theresa; Ziegler, Andreas
2017-04-15
The most popular approach for analyzing survival data is the Cox regression model. The Cox model may, however, be misspecified, and its proportionality assumption may not always be fulfilled. An alternative approach for survival prediction is random forests for survival outcomes. The standard split criterion for random survival forests is the log-rank test statistic, which favors splitting variables with many possible split points. Conditional inference forests avoid this split variable selection bias. However, linear rank statistics are utilized by default in conditional inference forests to select the optimal splitting variable, which cannot detect non-linear effects in the independent variables. An alternative is to use maximally selected rank statistics for the split point selection. As in conditional inference forests, splitting variables are compared on the p-value scale. However, instead of the conditional Monte-Carlo approach used in conditional inference forests, p-value approximations are employed. We describe several p-value approximations and the implementation of the proposed random forest approach. A simulation study demonstrates that unbiased split variable selection is possible. However, there is a trade-off between unbiased split variable selection and runtime. In benchmark studies of prediction performance on simulated and real datasets, the new method performs better than random survival forests if informative dichotomous variables are combined with uninformative variables with more categories and better than conditional inference forests if non-linear covariate effects are included. In a runtime comparison, the method proves to be computationally faster than both alternatives, if a simple p-value approximation is used. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Incorporating Biological Knowledge into Evaluation of Casual Regulatory Hypothesis
NASA Technical Reports Server (NTRS)
Chrisman, Lonnie; Langley, Pat; Bay, Stephen; Pohorille, Andrew; DeVincenzi, D. (Technical Monitor)
2002-01-01
Biological data can be scarce and costly to obtain. The small number of samples available typically limits statistical power and makes reliable inference of causal relations extremely difficult. However, we argue that statistical power can be increased substantially by incorporating prior knowledge and data from diverse sources. We present a Bayesian framework that combines information from different sources and we show empirically that this lets one make correct causal inferences with small sample sizes that otherwise would be impossible.
A Review of Some Aspects of Robust Inference for Time Series.
1984-09-01
REVIEW OF SOME ASPECTSOF ROBUST INFERNCE FOR TIME SERIES by Ad . Dougla Main TE "iAL REPOW No. 63 Septermber 1984 Department of Statistics University of ...clear. One cannot hope to have a good method for dealing with outliers in time series by using only an instantaneous nonlinear transformation of the data...AI.49 716 A REVIEWd OF SOME ASPECTS OF ROBUST INFERENCE FOR TIME 1/1 SERIES(U) WASHINGTON UNIV SEATTLE DEPT OF STATISTICS R D MARTIN SEP 84 TR-53
The researcher and the consultant: from testing to probability statements.
Hamra, Ghassan B; Stang, Andreas; Poole, Charles
2015-09-01
In the first instalment of this series, Stang and Poole provided an overview of Fisher significance testing (ST), Neyman-Pearson null hypothesis testing (NHT), and their unfortunate and unintended offspring, null hypothesis significance testing. In addition to elucidating the distinction between the first two and the evolution of the third, the authors alluded to alternative models of statistical inference; namely, Bayesian statistics. Bayesian inference has experienced a revival in recent decades, with many researchers advocating for its use as both a complement and an alternative to NHT and ST. This article will continue in the direction of the first instalment, providing practicing researchers with an introduction to Bayesian inference. Our work will draw on the examples and discussion of the previous dialogue.
Spectral likelihood expansions for Bayesian inference
NASA Astrophysics Data System (ADS)
Nagel, Joseph B.; Sudret, Bruno
2016-03-01
A spectral approach to Bayesian inference is presented. It pursues the emulation of the posterior probability density. The starting point is a series expansion of the likelihood function in terms of orthogonal polynomials. From this spectral likelihood expansion all statistical quantities of interest can be calculated semi-analytically. The posterior is formally represented as the product of a reference density and a linear combination of polynomial basis functions. Both the model evidence and the posterior moments are related to the expansion coefficients. This formulation avoids Markov chain Monte Carlo simulation and allows one to make use of linear least squares instead. The pros and cons of spectral Bayesian inference are discussed and demonstrated on the basis of simple applications from classical statistics and inverse modeling.
Statistics, Computation, and Modeling in Cosmology
NASA Astrophysics Data System (ADS)
Jewell, Jeff; Guiness, Joe; SAMSI 2016 Working Group in Cosmology
2017-01-01
Current and future ground and space based missions are designed to not only detect, but map out with increasing precision, details of the universe in its infancy to the present-day. As a result we are faced with the challenge of analyzing and interpreting observations from a wide variety of instruments to form a coherent view of the universe. Finding solutions to a broad range of challenging inference problems in cosmology is one of the goals of the “Statistics, Computation, and Modeling in Cosmology” workings groups, formed as part of the year long program on ‘Statistical, Mathematical, and Computational Methods for Astronomy’, hosted by the Statistical and Applied Mathematical Sciences Institute (SAMSI), a National Science Foundation funded institute. Two application areas have emerged for focused development in the cosmology working group involving advanced algorithmic implementations of exact Bayesian inference for the Cosmic Microwave Background, and statistical modeling of galaxy formation. The former includes study and development of advanced Markov Chain Monte Carlo algorithms designed to confront challenging inference problems including inference for spatial Gaussian random fields in the presence of sources of galactic emission (an example of a source separation problem). Extending these methods to future redshift survey data probing the nonlinear regime of large scale structure formation is also included in the working group activities. In addition, the working group is also focused on the study of ‘Galacticus’, a galaxy formation model applied to dark matter-only cosmological N-body simulations operating on time-dependent halo merger trees. The working group is interested in calibrating the Galacticus model to match statistics of galaxy survey observations; specifically stellar mass functions, luminosity functions, and color-color diagrams. The group will use subsampling approaches and fractional factorial designs to statistically and computationally efficiently explore the Galacticus parameter space. The group will also use the Galacticus simulations to study the relationship between the topological and physical structure of the halo merger trees and the properties of the resulting galaxies.
ERIC Educational Resources Information Center
Christ, Theodore J.; Desjardins, Christopher David
2018-01-01
Curriculum-Based Measurement of Oral Reading (CBM-R) is often used to monitor student progress and guide educational decisions. Ordinary least squares regression (OLSR) is the most widely used method to estimate the slope, or rate of improvement (ROI), even though published research demonstrates OLSR's lack of validity and reliability, and…
Bayesian methods including nonrandomized study data increased the efficiency of postlaunch RCTs.
Schmidt, Amand F; Klugkist, Irene; Klungel, Olaf H; Nielen, Mirjam; de Boer, Anthonius; Hoes, Arno W; Groenwold, Rolf H H
2015-04-01
Findings from nonrandomized studies on safety or efficacy of treatment in patient subgroups may trigger postlaunch randomized clinical trials (RCTs). In the analysis of such RCTs, results from nonrandomized studies are typically ignored. This study explores the trade-off between bias and power of Bayesian RCT analysis incorporating information from nonrandomized studies. A simulation study was conducted to compare frequentist with Bayesian analyses using noninformative and informative priors in their ability to detect interaction effects. In simulated subgroups, the effect of a hypothetical treatment differed between subgroups (odds ratio 1.00 vs. 2.33). Simulations varied in sample size, proportions of the subgroups, and specification of the priors. As expected, the results for the informative Bayesian analyses were more biased than those from the noninformative Bayesian analysis or frequentist analysis. However, because of a reduction in posterior variance, informative Bayesian analyses were generally more powerful to detect an effect. In scenarios where the informative priors were in the opposite direction of the RCT data, type 1 error rates could be 100% and power 0%. Bayesian methods incorporating data from nonrandomized studies can meaningfully increase power of interaction tests in postlaunch RCTs. Copyright © 2015 Elsevier Inc. All rights reserved.
Hybrid regulatory models: a statistically tractable approach to model regulatory network dynamics.
Ocone, Andrea; Millar, Andrew J; Sanguinetti, Guido
2013-04-01
Computational modelling of the dynamics of gene regulatory networks is a central task of systems biology. For networks of small/medium scale, the dominant paradigm is represented by systems of coupled non-linear ordinary differential equations (ODEs). ODEs afford great mechanistic detail and flexibility, but calibrating these models to data is often an extremely difficult statistical problem. Here, we develop a general statistical inference framework for stochastic transcription-translation networks. We use a coarse-grained approach, which represents the system as a network of stochastic (binary) promoter and (continuous) protein variables. We derive an exact inference algorithm and an efficient variational approximation that allows scalable inference and learning of the model parameters. We demonstrate the power of the approach on two biological case studies, showing that the method allows a high degree of flexibility and is capable of testable novel biological predictions. http://homepages.inf.ed.ac.uk/gsanguin/software.html. Supplementary data are available at Bioinformatics online.
Reading biological processes from nucleotide sequences
NASA Astrophysics Data System (ADS)
Murugan, Anand
Cellular processes have traditionally been investigated by techniques of imaging and biochemical analysis of the molecules involved. The recent rapid progress in our ability to manipulate and read nucleic acid sequences gives us direct access to the genetic information that directs and constrains biological processes. While sequence data is being used widely to investigate genotype-phenotype relationships and population structure, here we use sequencing to understand biophysical mechanisms. We present work on two different systems. First, in chapter 2, we characterize the stochastic genetic editing mechanism that produces diverse T-cell receptors in the human immune system. We do this by inferring statistical distributions of the underlying biochemical events that generate T-cell receptor coding sequences from the statistics of the observed sequences. This inferred model quantitatively describes the potential repertoire of T-cell receptors that can be produced by an individual, providing insight into its potential diversity and the probability of generation of any specific T-cell receptor. Then in chapter 3, we present work on understanding the functioning of regulatory DNA sequences in both prokaryotes and eukaryotes. Here we use experiments that measure the transcriptional activity of large libraries of mutagenized promoters and enhancers and infer models of the sequence-function relationship from this data. For the bacterial promoter, we infer a physically motivated 'thermodynamic' model of the interaction of DNA-binding proteins and RNA polymerase determining the transcription rate of the downstream gene. For the eukaryotic enhancers, we infer heuristic models of the sequence-function relationship and use these models to find synthetic enhancer sequences that optimize inducibility of expression. Both projects demonstrate the utility of sequence information in conjunction with sophisticated statistical inference techniques for dissecting underlying biophysical mechanisms.
CADDIS Volume 4. Data Analysis: PECBO Appendix - R Scripts for Non-Parametric Regressions
Script for computing nonparametric regression analysis. Overview of using scripts to infer environmental conditions from biological observations, statistically estimating species-environment relationships, statistical scripts.
ERIC Educational Resources Information Center
Mislevy, Robert J.
Educational test theory consists of statistical and methodological tools to support inferences about examinees' knowledge, skills, and accomplishments. The evolution of test theory has been shaped by the nature of users' inferences which, until recently, have been framed almost exclusively in terms of trait and behavioral psychology. Progress in…
Data-driven sensitivity inference for Thomson scattering electron density measurement systems.
Fujii, Keisuke; Yamada, Ichihiro; Hasuo, Masahiro
2017-01-01
We developed a method to infer the calibration parameters of multichannel measurement systems, such as channel variations of sensitivity and noise amplitude, from experimental data. We regard such uncertainties of the calibration parameters as dependent noise. The statistical properties of the dependent noise and that of the latent functions were modeled and implemented in the Gaussian process kernel. Based on their statistical difference, both parameters were inferred from the data. We applied this method to the electron density measurement system by Thomson scattering for the Large Helical Device plasma, which is equipped with 141 spatial channels. Based on the 210 sets of experimental data, we evaluated the correction factor of the sensitivity and noise amplitude for each channel. The correction factor varies by ≈10%, and the random noise amplitude is ≈2%, i.e., the measurement accuracy increases by a factor of 5 after this sensitivity correction. The certainty improvement in the spatial derivative inference was demonstrated.
Feder, Paul I; Ma, Zhenxu J; Bull, Richard J; Teuschler, Linda K; Rice, Glenn
2009-01-01
In chemical mixtures risk assessment, the use of dose-response data developed for one mixture to estimate risk posed by a second mixture depends on whether the two mixtures are sufficiently similar. While evaluations of similarity may be made using qualitative judgments, this article uses nonparametric statistical methods based on the "bootstrap" resampling technique to address the question of similarity among mixtures of chemical disinfectant by-products (DBP) in drinking water. The bootstrap resampling technique is a general-purpose, computer-intensive approach to statistical inference that substitutes empirical sampling for theoretically based parametric mathematical modeling. Nonparametric, bootstrap-based inference involves fewer assumptions than parametric normal theory based inference. The bootstrap procedure is appropriate, at least in an asymptotic sense, whether or not the parametric, distributional assumptions hold, even approximately. The statistical analysis procedures in this article are initially illustrated with data from 5 water treatment plants (Schenck et al., 2009), and then extended using data developed from a study of 35 drinking-water utilities (U.S. EPA/AMWA, 1989), which permits inclusion of a greater number of water constituents and increased structure in the statistical models.
Local dependence in random graph models: characterization, properties and statistical inference
Schweinberger, Michael; Handcock, Mark S.
2015-01-01
Summary Dependent phenomena, such as relational, spatial and temporal phenomena, tend to be characterized by local dependence in the sense that units which are close in a well-defined sense are dependent. In contrast with spatial and temporal phenomena, though, relational phenomena tend to lack a natural neighbourhood structure in the sense that it is unknown which units are close and thus dependent. Owing to the challenge of characterizing local dependence and constructing random graph models with local dependence, many conventional exponential family random graph models induce strong dependence and are not amenable to statistical inference. We take first steps to characterize local dependence in random graph models, inspired by the notion of finite neighbourhoods in spatial statistics and M-dependence in time series, and we show that local dependence endows random graph models with desirable properties which make them amenable to statistical inference. We show that random graph models with local dependence satisfy a natural domain consistency condition which every model should satisfy, but conventional exponential family random graph models do not satisfy. In addition, we establish a central limit theorem for random graph models with local dependence, which suggests that random graph models with local dependence are amenable to statistical inference. We discuss how random graph models with local dependence can be constructed by exploiting either observed or unobserved neighbourhood structure. In the absence of observed neighbourhood structure, we take a Bayesian view and express the uncertainty about the neighbourhood structure by specifying a prior on a set of suitable neighbourhood structures. We present simulation results and applications to two real world networks with ‘ground truth’. PMID:26560142
White, H; Racine, J
2001-01-01
We propose tests for individual and joint irrelevance of network inputs. Such tests can be used to determine whether an input or group of inputs "belong" in a particular model, thus permitting valid statistical inference based on estimated feedforward neural-network models. The approaches employ well-known statistical resampling techniques. We conduct a small Monte Carlo experiment showing that our tests have reasonable level and power behavior, and we apply our methods to examine whether there are predictable regularities in foreign exchange rates. We find that exchange rates do appear to contain information that is exploitable for enhanced point prediction, but the nature of the predictive relations evolves through time.
FUNSTAT and statistical image representations
NASA Technical Reports Server (NTRS)
Parzen, E.
1983-01-01
General ideas of functional statistical inference analysis of one sample and two samples, univariate and bivariate are outlined. ONESAM program is applied to analyze the univariate probability distributions of multi-spectral image data.
Building Intuitions about Statistical Inference Based on Resampling
ERIC Educational Resources Information Center
Watson, Jane; Chance, Beth
2012-01-01
Formal inference, which makes theoretical assumptions about distributions and applies hypothesis testing procedures with null and alternative hypotheses, is notoriously difficult for tertiary students to master. The debate about whether this content should appear in Years 11 and 12 of the "Australian Curriculum: Mathematics" has gone on…
Theory-based Bayesian models of inductive learning and reasoning.
Tenenbaum, Joshua B; Griffiths, Thomas L; Kemp, Charles
2006-07-01
Inductive inference allows humans to make powerful generalizations from sparse data when learning about word meanings, unobserved properties, causal relationships, and many other aspects of the world. Traditional accounts of induction emphasize either the power of statistical learning, or the importance of strong constraints from structured domain knowledge, intuitive theories or schemas. We argue that both components are necessary to explain the nature, use and acquisition of human knowledge, and we introduce a theory-based Bayesian framework for modeling inductive learning and reasoning as statistical inferences over structured knowledge representations.
Data free inference with processed data products
Chowdhary, K.; Najm, H. N.
2014-07-12
Here, we consider the context of probabilistic inference of model parameters given error bars or confidence intervals on model output values, when the data is unavailable. We introduce a class of algorithms in a Bayesian framework, relying on maximum entropy arguments and approximate Bayesian computation methods, to generate consistent data with the given summary statistics. Once we obtain consistent data sets, we pool the respective posteriors, to arrive at a single, averaged density on the parameters. This approach allows us to perform accurate forward uncertainty propagation consistent with the reported statistics.
Statistics at the Chinese Universities.
1981-09-01
education in China in the postwar years is pro- vided to give some perspective. My observa- tions on statistics at the Chinese universities are necessarily...has been accepted as a member society of ISI. 3. Education in China Understanding of statistics in universities in China will be enhanced through some...programaming), Statistical Mathematics (infer- ence, data analysis, industrial statistics , information theory), tiathematical Physics (dif- ferential
ERIC Educational Resources Information Center
Storm, Lance; Tressoldi, Patrizio E.; Utts, Jessica
2013-01-01
Rouder, Morey, and Province (2013) stated that (a) the evidence-based case for psi in Storm, Tressoldi, and Di Risio's (2010) meta-analysis is supported only by a number of studies that used manual randomization, and (b) when these studies are excluded so that only investigations using automatic randomization are evaluated (and some additional…
The role of familiarity in binary choice inferences.
Honda, Hidehito; Abe, Keiga; Matsuka, Toshihiko; Yamagishi, Kimihiko
2011-07-01
In research on the recognition heuristic (Goldstein & Gigerenzer, Psychological Review, 109, 75-90, 2002), knowledge of recognized objects has been categorized as "recognized" or "unrecognized" without regard to the degree of familiarity of the recognized object. In the present article, we propose a new inference model--familiarity-based inference. We hypothesize that when subjective knowledge levels (familiarity) of recognized objects differ, the degree of familiarity of recognized objects will influence inferences. Specifically, people are predicted to infer that the more familiar object in a pair of two objects has a higher criterion value on the to-be-judged dimension. In two experiments, using a binary choice task, we examined inferences about populations in a pair of two cities. Results support predictions of familiarity-based inference. Participants inferred that the more familiar city in a pair was more populous. Statistical modeling showed that individual differences in familiarity-based inference lie in the sensitivity to differences in familiarity. In addition, we found that familiarity-based inference can be generally regarded as an ecologically rational inference. Furthermore, when cue knowledge about the inference criterion was available, participants made inferences based on the cue knowledge about population instead of familiarity. Implications of the role of familiarity in psychological processes are discussed.
Bayesian bivariate meta-analysis of diagnostic test studies with interpretable priors.
Guo, Jingyi; Riebler, Andrea; Rue, Håvard
2017-08-30
In a bivariate meta-analysis, the number of diagnostic studies involved is often very low so that frequentist methods may result in problems. Using Bayesian inference is particularly attractive as informative priors that add a small amount of information can stabilise the analysis without overwhelming the data. However, Bayesian analysis is often computationally demanding and the selection of the prior for the covariance matrix of the bivariate structure is crucial with little data. The integrated nested Laplace approximations method provides an efficient solution to the computational issues by avoiding any sampling, but the important question of priors remain. We explore the penalised complexity (PC) prior framework for specifying informative priors for the variance parameters and the correlation parameter. PC priors facilitate model interpretation and hyperparameter specification as expert knowledge can be incorporated intuitively. We conduct a simulation study to compare the properties and behaviour of differently defined PC priors to currently used priors in the field. The simulation study shows that the PC prior seems beneficial for the variance parameters. The use of PC priors for the correlation parameter results in more precise estimates when specified in a sensible neighbourhood around the truth. To investigate the usage of PC priors in practice, we reanalyse a meta-analysis using the telomerase marker for the diagnosis of bladder cancer and compare the results with those obtained by other commonly used modelling approaches. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Button, Patrick James Harold
This dissertation contains three independent papers (chapters). Chapter 1 argues that researchers that use a regression discontinuity design with parametric controls for the assignment variable face significant uncertainty as to the appropriate model. I argue that researchers should incorporate this model uncertainty into their inference through model averaging. I show in a Monte Carlo experiment that Frequentist model averaging, compared to pretesting, leads to more precise estimates and less Type I error. Chapter 2 studies the impact of state-level tax incentives for the film industry, which have exploded since the early 2000s. I study how these incentives affect filming location and the number of jobs and businesses in the film industry. I seek to answer the question: can these incentives create a local film industry? I find that while these incentives have large effects on filming location, they almost never lead to increases in jobs and employment in the film industry. Chapter 3 studies the Prudence Kay Poppink Act, effective 2001, which broadened California's disability discrimination in employment law. This expanded coverage to those with less severe impairments. I study how this expansion affected labor market outcomes for the disabled by comparing the disabled in California before and after the Act to the disabled in other states during the same time period. I augment this difference-in-difference methodology by adding a comparison to the non-disabled as well. I find that this expansion of legal protections increased employment and possibly increased wage and salary rates.
Statistical modeling of software reliability
NASA Technical Reports Server (NTRS)
Miller, Douglas R.
1992-01-01
This working paper discusses the statistical simulation part of a controlled software development experiment being conducted under the direction of the System Validation Methods Branch, Information Systems Division, NASA Langley Research Center. The experiment uses guidance and control software (GCS) aboard a fictitious planetary landing spacecraft: real-time control software operating on a transient mission. Software execution is simulated to study the statistical aspects of reliability and other failure characteristics of the software during development, testing, and random usage. Quantification of software reliability is a major goal. Various reliability concepts are discussed. Experiments are described for performing simulations and collecting appropriate simulated software performance and failure data. This data is then used to make statistical inferences about the quality of the software development and verification processes as well as inferences about the reliability of software versions and reliability growth under random testing and debugging.
Quantum-Like Representation of Non-Bayesian Inference
NASA Astrophysics Data System (ADS)
Asano, M.; Basieva, I.; Khrennikov, A.; Ohya, M.; Tanaka, Y.
2013-01-01
This research is related to the problem of "irrational decision making or inference" that have been discussed in cognitive psychology. There are some experimental studies, and these statistical data cannot be described by classical probability theory. The process of decision making generating these data cannot be reduced to the classical Bayesian inference. For this problem, a number of quantum-like coginitive models of decision making was proposed. Our previous work represented in a natural way the classical Bayesian inference in the frame work of quantum mechanics. By using this representation, in this paper, we try to discuss the non-Bayesian (irrational) inference that is biased by effects like the quantum interference. Further, we describe "psychological factor" disturbing "rationality" as an "environment" correlating with the "main system" of usual Bayesian inference.
Why environmental scientists are becoming Bayesians
James S. Clark
2005-01-01
Advances in computational statistics provide a general framework for the high dimensional models typically needed for ecological inference and prediction. Hierarchical Bayes (HB) represents a modelling structure with capacity to exploit diverse sources of information, to accommodate influences that are unknown (or unknowable), and to draw inference on large numbers of...
ERIC Educational Resources Information Center
Meiser, Thorsten; Rummel, Jan; Fleig, Hanna
2018-01-01
Pseudocontingencies are inferences about correlations in the environment that are formed on the basis of statistical regularities like skewed base rates or varying base rates across environmental contexts. Previous research has demonstrated that pseudocontingencies provide a pervasive mechanism of inductive inference in numerous social judgment…
Cross-Situational Learning of Minimal Word Pairs
ERIC Educational Resources Information Center
Escudero, Paola; Mulak, Karen E.; Vlach, Haley A.
2016-01-01
"Cross-situational statistical learning" of words involves tracking co-occurrences of auditory words and objects across time to infer word-referent mappings. Previous research has demonstrated that learners can infer referents across sets of very phonologically distinct words (e.g., WUG, DAX), but it remains unknown whether learners can…
Statistical analysis of fNIRS data: a comprehensive review.
Tak, Sungho; Ye, Jong Chul
2014-01-15
Functional near-infrared spectroscopy (fNIRS) is a non-invasive method to measure brain activities using the changes of optical absorption in the brain through the intact skull. fNIRS has many advantages over other neuroimaging modalities such as positron emission tomography (PET), functional magnetic resonance imaging (fMRI), or magnetoencephalography (MEG), since it can directly measure blood oxygenation level changes related to neural activation with high temporal resolution. However, fNIRS signals are highly corrupted by measurement noises and physiology-based systemic interference. Careful statistical analyses are therefore required to extract neuronal activity-related signals from fNIRS data. In this paper, we provide an extensive review of historical developments of statistical analyses of fNIRS signal, which include motion artifact correction, short source-detector separation correction, principal component analysis (PCA)/independent component analysis (ICA), false discovery rate (FDR), serially-correlated errors, as well as inference techniques such as the standard t-test, F-test, analysis of variance (ANOVA), and statistical parameter mapping (SPM) framework. In addition, to provide a unified view of various existing inference techniques, we explain a linear mixed effect model with restricted maximum likelihood (ReML) variance estimation, and show that most of the existing inference methods for fNIRS analysis can be derived as special cases. Some of the open issues in statistical analysis are also described. Copyright © 2013 Elsevier Inc. All rights reserved.
Accounting for measurement error: a critical but often overlooked process.
Harris, Edward F; Smith, Richard N
2009-12-01
Due to instrument imprecision and human inconsistencies, measurements are not free of error. Technical error of measurement (TEM) is the variability encountered between dimensions when the same specimens are measured at multiple sessions. A goal of a data collection regimen is to minimise TEM. The few studies that actually quantify TEM, regardless of discipline, report that it is substantial and can affect results and inferences. This paper reviews some statistical approaches for identifying and controlling TEM. Statistically, TEM is part of the residual ('unexplained') variance in a statistical test, so accounting for TEM, which requires repeated measurements, enhances the chances of finding a statistically significant difference if one exists. The aim of this paper was to review and discuss common statistical designs relating to types of error and statistical approaches to error accountability. This paper addresses issues of landmark location, validity, technical and systematic error, analysis of variance, scaled measures and correlation coefficients in order to guide the reader towards correct identification of true experimental differences. Researchers commonly infer characteristics about populations from comparatively restricted study samples. Most inferences are statistical and, aside from concerns about adequate accounting for known sources of variation with the research design, an important source of variability is measurement error. Variability in locating landmarks that define variables is obvious in odontometrics, cephalometrics and anthropometry, but the same concerns about measurement accuracy and precision extend to all disciplines. With increasing accessibility to computer-assisted methods of data collection, the ease of incorporating repeated measures into statistical designs has improved. Accounting for this technical source of variation increases the chance of finding biologically true differences when they exist.
Sumner, Jeremy G; Taylor, Amelia; Holland, Barbara R; Jarvis, Peter D
2017-12-01
Recently there has been renewed interest in phylogenetic inference methods based on phylogenetic invariants, alongside the related Markov invariants. Broadly speaking, both these approaches give rise to polynomial functions of sequence site patterns that, in expectation value, either vanish for particular evolutionary trees (in the case of phylogenetic invariants) or have well understood transformation properties (in the case of Markov invariants). While both approaches have been valued for their intrinsic mathematical interest, it is not clear how they relate to each other, and to what extent they can be used as practical tools for inference of phylogenetic trees. In this paper, by focusing on the special case of binary sequence data and quartets of taxa, we are able to view these two different polynomial-based approaches within a common framework. To motivate the discussion, we present three desirable statistical properties that we argue any invariant-based phylogenetic method should satisfy: (1) sensible behaviour under reordering of input sequences; (2) stability as the taxa evolve independently according to a Markov process; and (3) explicit dependence on the assumption of a continuous-time process. Motivated by these statistical properties, we develop and explore several new phylogenetic inference methods. In particular, we develop a statistically bias-corrected version of the Markov invariants approach which satisfies all three properties. We also extend previous work by showing that the phylogenetic invariants can be implemented in such a way as to satisfy property (3). A simulation study shows that, in comparison to other methods, our new proposed approach based on bias-corrected Markov invariants is extremely powerful for phylogenetic inference. The binary case is of particular theoretical interest as-in this case only-the Markov invariants can be expressed as linear combinations of the phylogenetic invariants. A wider implication of this is that, for models with more than two states-for example DNA sequence alignments with four-state models-we find that methods which rely on phylogenetic invariants are incapable of satisfying all three of the stated statistical properties. This is because in these cases the relevant Markov invariants belong to a class of polynomials independent from the phylogenetic invariants.
Imputation approaches for animal movement modeling
Scharf, Henry; Hooten, Mevin B.; Johnson, Devin S.
2017-01-01
The analysis of telemetry data is common in animal ecological studies. While the collection of telemetry data for individual animals has improved dramatically, the methods to properly account for inherent uncertainties (e.g., measurement error, dependence, barriers to movement) have lagged behind. Still, many new statistical approaches have been developed to infer unknown quantities affecting animal movement or predict movement based on telemetry data. Hierarchical statistical models are useful to account for some of the aforementioned uncertainties, as well as provide population-level inference, but they often come with an increased computational burden. For certain types of statistical models, it is straightforward to provide inference if the latent true animal trajectory is known, but challenging otherwise. In these cases, approaches related to multiple imputation have been employed to account for the uncertainty associated with our knowledge of the latent trajectory. Despite the increasing use of imputation approaches for modeling animal movement, the general sensitivity and accuracy of these methods have not been explored in detail. We provide an introduction to animal movement modeling and describe how imputation approaches may be helpful for certain types of models. We also assess the performance of imputation approaches in two simulation studies. Our simulation studies suggests that inference for model parameters directly related to the location of an individual may be more accurate than inference for parameters associated with higher-order processes such as velocity or acceleration. Finally, we apply these methods to analyze a telemetry data set involving northern fur seals (Callorhinus ursinus) in the Bering Sea. Supplementary materials accompanying this paper appear online.
Using a Five-Step Procedure for Inferential Statistical Analyses
ERIC Educational Resources Information Center
Kamin, Lawrence F.
2010-01-01
Many statistics texts pose inferential statistical problems in a disjointed way. By using a simple five-step procedure as a template for statistical inference problems, the student can solve problems in an organized fashion. The problem and its solution will thus be a stand-by-itself organic whole and a single unit of thought and effort. The…
Design-based and model-based inference in surveys of freshwater mollusks
Dorazio, R.M.
1999-01-01
Well-known concepts in statistical inference and sampling theory are used to develop recommendations for planning and analyzing the results of quantitative surveys of freshwater mollusks. Two methods of inference commonly used in survey sampling (design-based and model-based) are described and illustrated using examples relevant in surveys of freshwater mollusks. The particular objectives of a survey and the type of information observed in each unit of sampling can be used to help select the sampling design and the method of inference. For example, the mean density of a sparsely distributed population of mollusks can be estimated with higher precision by using model-based inference or by using design-based inference with adaptive cluster sampling than by using design-based inference with conventional sampling. More experience with quantitative surveys of natural assemblages of freshwater mollusks is needed to determine the actual benefits of different sampling designs and inferential procedures.
Bayesian inference for joint modelling of longitudinal continuous, binary and ordinal events.
Li, Qiuju; Pan, Jianxin; Belcher, John
2016-12-01
In medical studies, repeated measurements of continuous, binary and ordinal outcomes are routinely collected from the same patient. Instead of modelling each outcome separately, in this study we propose to jointly model the trivariate longitudinal responses, so as to take account of the inherent association between the different outcomes and thus improve statistical inferences. This work is motivated by a large cohort study in the North West of England, involving trivariate responses from each patient: Body Mass Index, Depression (Yes/No) ascertained with cut-off score not less than 8 at the Hospital Anxiety and Depression Scale, and Pain Interference generated from the Medical Outcomes Study 36-item short-form health survey with values returned on an ordinal scale 1-5. There are some well-established methods for combined continuous and binary, or even continuous and ordinal responses, but little work was done on the joint analysis of continuous, binary and ordinal responses. We propose conditional joint random-effects models, which take into account the inherent association between the continuous, binary and ordinal outcomes. Bayesian analysis methods are used to make statistical inferences. Simulation studies show that, by jointly modelling the trivariate outcomes, standard deviations of the estimates of parameters in the models are smaller and much more stable, leading to more efficient parameter estimates and reliable statistical inferences. In the real data analysis, the proposed joint analysis yields a much smaller deviance information criterion value than the separate analysis, and shows other good statistical properties too. © The Author(s) 2014.
Cocco, Simona; Leibler, Stanislas; Monasson, Rémi
2009-01-01
Complexity of neural systems often makes impracticable explicit measurements of all interactions between their constituents. Inverse statistical physics approaches, which infer effective couplings between neurons from their spiking activity, have been so far hindered by their computational complexity. Here, we present 2 complementary, computationally efficient inverse algorithms based on the Ising and “leaky integrate-and-fire” models. We apply those algorithms to reanalyze multielectrode recordings in the salamander retina in darkness and under random visual stimulus. We find strong positive couplings between nearby ganglion cells common to both stimuli, whereas long-range couplings appear under random stimulus only. The uncertainty on the inferred couplings due to limitations in the recordings (duration, small area covered on the retina) is discussed. Our methods will allow real-time evaluation of couplings for large assemblies of neurons. PMID:19666487
Confidence crisis of results in biomechanics research.
Knudson, Duane
2017-11-01
Many biomechanics studies have small sample sizes and incorrect statistical analyses, so reporting of inaccurate inferences and inflated magnitude of effects are common in the field. This review examines these issues in biomechanics research and summarises potential solutions from research in other fields to increase the confidence in the experimental effects reported in biomechanics. Authors, reviewers and editors of biomechanics research reports are encouraged to improve sample sizes and the resulting statistical power, improve reporting transparency, improve the rigour of statistical analyses used, and increase the acceptance of replication studies to improve the validity of inferences from data in biomechanics research. The application of sports biomechanics research results would also improve if a larger percentage of unbiased effects and their uncertainty were reported in the literature.
The Empirical Nature and Statistical Treatment of Missing Data
ERIC Educational Resources Information Center
Tannenbaum, Christyn E.
2009-01-01
Introduction. Missing data is a common problem in research and can produce severely misleading analyses, including biased estimates of statistical parameters, and erroneous conclusions. In its 1999 report, the APA Task Force on Statistical Inference encouraged authors to report complications such as missing data and discouraged the use of…
Cognitive Transfer Outcomes for a Simulation-Based Introductory Statistics Curriculum
ERIC Educational Resources Information Center
Backman, Matthew D.; Delmas, Robert C.; Garfield, Joan
2017-01-01
Cognitive transfer is the ability to apply learned skills and knowledge to new applications and contexts. This investigation evaluates cognitive transfer outcomes for a tertiary-level introductory statistics course using the CATALST curriculum, which exclusively used simulation-based methods to develop foundations of statistical inference. A…
The Role of the Sampling Distribution in Understanding Statistical Inference
ERIC Educational Resources Information Center
Lipson, Kay
2003-01-01
Many statistics educators believe that few students develop the level of conceptual understanding essential for them to apply correctly the statistical techniques at their disposal and to interpret their outcomes appropriately. It is also commonly believed that the sampling distribution plays an important role in developing this understanding.…
A statistical method for lung tumor segmentation uncertainty in PET images based on user inference.
Zheng, Chaojie; Wang, Xiuying; Feng, Dagan
2015-01-01
PET has been widely accepted as an effective imaging modality for lung tumor diagnosis and treatment. However, standard criteria for delineating tumor boundary from PET are yet to develop largely due to relatively low quality of PET images, uncertain tumor boundary definition, and variety of tumor characteristics. In this paper, we propose a statistical solution to segmentation uncertainty on the basis of user inference. We firstly define the uncertainty segmentation band on the basis of segmentation probability map constructed from Random Walks (RW) algorithm; and then based on the extracted features of the user inference, we use Principle Component Analysis (PCA) to formulate the statistical model for labeling the uncertainty band. We validated our method on 10 lung PET-CT phantom studies from the public RIDER collections [1] and 16 clinical PET studies where tumors were manually delineated by two experienced radiologists. The methods were validated using Dice similarity coefficient (DSC) to measure the spatial volume overlap. Our method achieved an average DSC of 0.878 ± 0.078 on phantom studies and 0.835 ± 0.039 on clinical studies.
Empirical evidence for acceleration-dependent amplification factors
Borcherdt, R.D.
2002-01-01
Site-specific amplification factors, Fa and Fv, used in current U.S. building codes decrease with increasing base acceleration level as implied by the Loma Prieta earthquake at 0.1g and extrapolated using numerical models and laboratory results. The Northridge earthquake recordings of 17 January 1994 and subsequent geotechnical data permit empirical estimates of amplification at base acceleration levels up to 0.5g. Distance measures and normalization procedures used to infer amplification ratios from soil-rock pairs in predetermined azimuth-distance bins significantly influence the dependence of amplification estimates on base acceleration. Factors inferred using a hypocentral distance norm do not show a statistically significant dependence on base acceleration. Factors inferred using norms implied by the attenuation functions of Abrahamson and Silva show a statistically significant decrease with increasing base acceleration. The decrease is statistically more significant for stiff clay and sandy soil (site class D) sites than for stiffer sites underlain by gravely soils and soft rock (site class C). The decrease in amplification with increasing base acceleration is more pronounced for the short-period amplification factor, Fa, than for the midperiod factor, Fv.
Karl Pearson and eugenics: personal opinions and scientific rigor.
Delzell, Darcie A P; Poliak, Cathy D
2013-09-01
The influence of personal opinions and biases on scientific conclusions is a threat to the advancement of knowledge. Expertise and experience does not render one immune to this temptation. In this work, one of the founding fathers of statistics, Karl Pearson, is used as an illustration of how even the most talented among us can produce misleading results when inferences are made without caution or reference to potential bias and other analysis limitations. A study performed by Pearson on British Jewish schoolchildren is examined in light of ethical and professional statistical practice. The methodology used and inferences made by Pearson and his coauthor are sometimes questionable and offer insight into how Pearson's support of eugenics and his own British nationalism could have potentially influenced his often careless and far-fetched inferences. A short background into Pearson's work and beliefs is provided, along with an in-depth examination of the authors' overall experimental design and statistical practices. In addition, portions of the study regarding intelligence and tuberculosis are discussed in more detail, along with historical reactions to their work.
In silico model-based inference: a contemporary approach for hypothesis testing in network biology
Klinke, David J.
2014-01-01
Inductive inference plays a central role in the study of biological systems where one aims to increase their understanding of the system by reasoning backwards from uncertain observations to identify causal relationships among components of the system. These causal relationships are postulated from prior knowledge as a hypothesis or simply a model. Experiments are designed to test the model. Inferential statistics are used to establish a level of confidence in how well our postulated model explains the acquired data. This iterative process, commonly referred to as the scientific method, either improves our confidence in a model or suggests that we revisit our prior knowledge to develop a new model. Advances in technology impact how we use prior knowledge and data to formulate models of biological networks and how we observe cellular behavior. However, the approach for model-based inference has remained largely unchanged since Fisher, Neyman and Pearson developed the ideas in the early 1900’s that gave rise to what is now known as classical statistical hypothesis (model) testing. Here, I will summarize conventional methods for model-based inference and suggest a contemporary approach to aid in our quest to discover how cells dynamically interpret and transmit information for therapeutic aims that integrates ideas drawn from high performance computing, Bayesian statistics, and chemical kinetics. PMID:25139179
In silico model-based inference: a contemporary approach for hypothesis testing in network biology.
Klinke, David J
2014-01-01
Inductive inference plays a central role in the study of biological systems where one aims to increase their understanding of the system by reasoning backwards from uncertain observations to identify causal relationships among components of the system. These causal relationships are postulated from prior knowledge as a hypothesis or simply a model. Experiments are designed to test the model. Inferential statistics are used to establish a level of confidence in how well our postulated model explains the acquired data. This iterative process, commonly referred to as the scientific method, either improves our confidence in a model or suggests that we revisit our prior knowledge to develop a new model. Advances in technology impact how we use prior knowledge and data to formulate models of biological networks and how we observe cellular behavior. However, the approach for model-based inference has remained largely unchanged since Fisher, Neyman and Pearson developed the ideas in the early 1900s that gave rise to what is now known as classical statistical hypothesis (model) testing. Here, I will summarize conventional methods for model-based inference and suggest a contemporary approach to aid in our quest to discover how cells dynamically interpret and transmit information for therapeutic aims that integrates ideas drawn from high performance computing, Bayesian statistics, and chemical kinetics. © 2014 American Institute of Chemical Engineers.
Spatio-temporal conditional inference and hypothesis tests for neural ensemble spiking precision
Harrison, Matthew T.; Amarasingham, Asohan; Truccolo, Wilson
2014-01-01
The collective dynamics of neural ensembles create complex spike patterns with many spatial and temporal scales. Understanding the statistical structure of these patterns can help resolve fundamental questions about neural computation and neural dynamics. Spatio-temporal conditional inference (STCI) is introduced here as a semiparametric statistical framework for investigating the nature of precise spiking patterns from collections of neurons that is robust to arbitrarily complex and nonstationary coarse spiking dynamics. The main idea is to focus statistical modeling and inference, not on the full distribution of the data, but rather on families of conditional distributions of precise spiking given different types of coarse spiking. The framework is then used to develop families of hypothesis tests for probing the spatio-temporal precision of spiking patterns. Relationships among different conditional distributions are used to improve multiple hypothesis testing adjustments and to design novel Monte Carlo spike resampling algorithms. Of special note are algorithms that can locally jitter spike times while still preserving the instantaneous peri-stimulus time histogram (PSTH) or the instantaneous total spike count from a group of recorded neurons. The framework can also be used to test whether first-order maximum entropy models with possibly random and time-varying parameters can account for observed patterns of spiking. STCI provides a detailed example of the generic principle of conditional inference, which may be applicable in other areas of neurostatistical analysis. PMID:25380339
Data mining and statistical inference in selective laser melting
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kamath, Chandrika
Selective laser melting (SLM) is an additive manufacturing process that builds a complex three-dimensional part, layer-by-layer, using a laser beam to fuse fine metal powder together. The design freedom afforded by SLM comes associated with complexity. As the physical phenomena occur over a broad range of length and time scales, the computational cost of modeling the process is high. At the same time, the large number of parameters that control the quality of a part make experiments expensive. In this paper, we describe ways in which we can use data mining and statistical inference techniques to intelligently combine simulations andmore » experiments to build parts with desired properties. We start with a brief summary of prior work in finding process parameters for high-density parts. We then expand on this work to show how we can improve the approach by using feature selection techniques to identify important variables, data-driven surrogate models to reduce computational costs, improved sampling techniques to cover the design space adequately, and uncertainty analysis for statistical inference. Here, our results indicate that techniques from data mining and statistics can complement those from physical modeling to provide greater insight into complex processes such as selective laser melting.« less
Data mining and statistical inference in selective laser melting
Kamath, Chandrika
2016-01-11
Selective laser melting (SLM) is an additive manufacturing process that builds a complex three-dimensional part, layer-by-layer, using a laser beam to fuse fine metal powder together. The design freedom afforded by SLM comes associated with complexity. As the physical phenomena occur over a broad range of length and time scales, the computational cost of modeling the process is high. At the same time, the large number of parameters that control the quality of a part make experiments expensive. In this paper, we describe ways in which we can use data mining and statistical inference techniques to intelligently combine simulations andmore » experiments to build parts with desired properties. We start with a brief summary of prior work in finding process parameters for high-density parts. We then expand on this work to show how we can improve the approach by using feature selection techniques to identify important variables, data-driven surrogate models to reduce computational costs, improved sampling techniques to cover the design space adequately, and uncertainty analysis for statistical inference. Here, our results indicate that techniques from data mining and statistics can complement those from physical modeling to provide greater insight into complex processes such as selective laser melting.« less
Schmidt, Paul; Schmid, Volker J; Gaser, Christian; Buck, Dorothea; Bührlen, Susanne; Förschler, Annette; Mühlau, Mark
2013-01-01
Aiming at iron-related T2-hypointensity, which is related to normal aging and neurodegenerative processes, we here present two practicable approaches, based on Bayesian inference, for preprocessing and statistical analysis of a complex set of structural MRI data. In particular, Markov Chain Monte Carlo methods were used to simulate posterior distributions. First, we rendered a segmentation algorithm that uses outlier detection based on model checking techniques within a Bayesian mixture model. Second, we rendered an analytical tool comprising a Bayesian regression model with smoothness priors (in the form of Gaussian Markov random fields) mitigating the necessity to smooth data prior to statistical analysis. For validation, we used simulated data and MRI data of 27 healthy controls (age: [Formula: see text]; range, [Formula: see text]). We first observed robust segmentation of both simulated T2-hypointensities and gray-matter regions known to be T2-hypointense. Second, simulated data and images of segmented T2-hypointensity were analyzed. We found not only robust identification of simulated effects but also a biologically plausible age-related increase of T2-hypointensity primarily within the dentate nucleus but also within the globus pallidus, substantia nigra, and red nucleus. Our results indicate that fully Bayesian inference can successfully be applied for preprocessing and statistical analysis of structural MRI data.
Drug target inference through pathway analysis of genomics data
Ma, Haisu; Zhao, Hongyu
2013-01-01
Statistical modeling coupled with bioinformatics is commonly used for drug discovery. Although there exist many approaches for single target based drug design and target inference, recent years have seen a paradigm shift to system-level pharmacological research. Pathway analysis of genomics data represents one promising direction for computational inference of drug targets. This article aims at providing a comprehensive review on the evolving issues is this field, covering methodological developments, their pros and cons, as well as future research directions. PMID:23369829
Watanabe, Hiroshi
2012-01-01
Procedures of statistical analysis are reviewed to provide an overview of applications of statistics for general use. Topics that are dealt with are inference on a population, comparison of two populations with respect to means and probabilities, and multiple comparisons. This study is the second part of series in which we survey medical statistics. Arguments related to statistical associations and regressions will be made in subsequent papers.
Dutton, P; Love, S B; Billingham, L; Hassan, A B
2018-05-01
Trials run in either rare diseases, such as rare cancers, or rare sub-populations of common diseases are challenging in terms of identifying, recruiting and treating sufficient patients in a sensible period. Treatments for rare diseases are often designed for other disease areas and then later proposed as possible treatments for the rare disease after initial phase I testing is complete. To ensure the trial is in the best interests of the patient participants, frequent interim analyses are needed to force the trial to stop promptly if the treatment is futile or toxic. These non-definitive phase II trials should also be stopped for efficacy to accelerate research progress if the treatment proves to be particularly promising. In this paper, we review frequentist and Bayesian methods that have been adapted to incorporate two binary endpoints and frequent interim analyses. The Eurosarc Trial of Linsitinib in advanced Ewing Sarcoma (LINES) is used as a motivating example and provides a suitable platform to compare these approaches. The Bayesian approach provides greater design flexibility, but does not provide additional value over the frequentist approaches in a single trial setting when the prior is non-informative. However, Bayesian designs are able to borrow from any previous experience, using prior information to improve efficiency.
A Bayesian sequential design using alpha spending function to control type I error.
Zhu, Han; Yu, Qingzhao
2017-10-01
We propose in this article a Bayesian sequential design using alpha spending functions to control the overall type I error in phase III clinical trials. We provide algorithms to calculate critical values, power, and sample sizes for the proposed design. Sensitivity analysis is implemented to check the effects from different prior distributions, and conservative priors are recommended. We compare the power and actual sample sizes of the proposed Bayesian sequential design with different alpha spending functions through simulations. We also compare the power of the proposed method with frequentist sequential design using the same alpha spending function. Simulations show that, at the same sample size, the proposed method provides larger power than the corresponding frequentist sequential design. It also has larger power than traditional Bayesian sequential design which sets equal critical values for all interim analyses. When compared with other alpha spending functions, O'Brien-Fleming alpha spending function has the largest power and is the most conservative in terms that at the same sample size, the null hypothesis is the least likely to be rejected at early stage of clinical trials. And finally, we show that adding a step of stop for futility in the Bayesian sequential design can reduce the overall type I error and reduce the actual sample sizes.
Status of the scalar singlet dark matter model
NASA Astrophysics Data System (ADS)
Athron, Peter; Balázs, Csaba; Bringmann, Torsten; Buckley, Andy; Chrząszcz, Marcin; Conrad, Jan; Cornell, Jonathan M.; Dal, Lars A.; Edsjö, Joakim; Farmer, Ben; Jackson, Paul; Kahlhoefer, Felix; Krislock, Abram; Kvellestad, Anders; McKay, James; Mahmoudi, Farvah; Martinez, Gregory D.; Putze, Antje; Raklev, Are; Rogan, Christopher; Saavedra, Aldo; Savage, Christopher; Scott, Pat; Serra, Nicola; Weniger, Christoph; White, Martin
2017-08-01
One of the simplest viable models for dark matter is an additional neutral scalar, stabilised by a Z_2 symmetry. Using the GAMBIT package and combining results from four independent samplers, we present Bayesian and frequentist global fits of this model. We vary the singlet mass and coupling along with 13 nuisance parameters, including nuclear uncertainties relevant for direct detection, the local dark matter density, and selected quark masses and couplings. We include the dark matter relic density measured by Planck, direct searches with LUX, PandaX, SuperCDMS and XENON100, limits on invisible Higgs decays from the Large Hadron Collider, searches for high-energy neutrinos from dark matter annihilation in the Sun with IceCube, and searches for gamma rays from annihilation in dwarf galaxies with the Fermi-LAT. Viable solutions remain at couplings of order unity, for singlet masses between the Higgs mass and about 300 GeV, and at masses above ˜ 1 TeV. Only in the latter case can the scalar singlet constitute all of dark matter. Frequentist analysis shows that the low-mass resonance region, where the singlet is about half the mass of the Higgs, can also account for all of dark matter, and remains viable. However, Bayesian considerations show this region to be rather fine-tuned.
Bayesian approach for assessing non-inferiority in a three-arm trial with pre-specified margin.
Ghosh, Samiran; Ghosh, Santu; Tiwari, Ram C
2016-02-28
Non-inferiority trials are becoming increasingly popular for comparative effectiveness research. However, inclusion of the placebo arm, whenever possible, gives rise to a three-arm trial which has lesser burdensome assumptions than a standard two-arm non-inferiority trial. Most of the past developments in a three-arm trial consider defining a pre-specified fraction of unknown effect size of reference drug, that is, without directly specifying a fixed non-inferiority margin. However, in some recent developments, a more direct approach is being considered with pre-specified fixed margin albeit in the frequentist setup. Bayesian paradigm provides a natural path to integrate historical and current trials' information via sequential learning. In this paper, we propose a Bayesian approach for simultaneous testing of non-inferiority and assay sensitivity in a three-arm trial with normal responses. For the experimental arm, in absence of historical information, non-informative priors are assumed under two situations, namely when (i) variance is known and (ii) variance is unknown. A Bayesian decision criteria is derived and compared with the frequentist method using simulation studies. Finally, several published clinical trial examples are reanalyzed to demonstrate the benefit of the proposed procedure. Copyright © 2015 John Wiley & Sons, Ltd.
PROBABILITY SAMPLING AND POPULATION INFERENCE IN MONITORING PROGRAMS
A fundamental difference between probability sampling and conventional statistics is that "sampling" deals with real, tangible populations, whereas "conventional statistics" usually deals with hypothetical populations that have no real-world realization. he focus here is on real ...
Statistical Inference in the Learning of Novel Phonetic Categories
ERIC Educational Resources Information Center
Zhao, Yuan
2010-01-01
Learning a phonetic category (or any linguistic category) requires integrating different sources of information. A crucial unsolved problem for phonetic learning is how this integration occurs: how can we update our previous knowledge about a phonetic category as we hear new exemplars of the category? One model of learning is Bayesian Inference,…
Conceptual Challenges in Coordinating Theoretical and Data-Centered Estimates of Probability
ERIC Educational Resources Information Center
Konold, Cliff; Madden, Sandra; Pollatsek, Alexander; Pfannkuch, Maxine; Wild, Chris; Ziedins, Ilze; Finzer, William; Horton, Nicholas J.; Kazak, Sibel
2011-01-01
A core component of informal statistical inference is the recognition that judgments based on sample data are inherently uncertain. This implies that instruction aimed at developing informal inference needs to foster basic probabilistic reasoning. In this article, we analyze and critique the now-common practice of introducing students to both…
Campbell's and Rubin's Perspectives on Causal Inference
ERIC Educational Resources Information Center
West, Stephen G.; Thoemmes, Felix
2010-01-01
Donald Campbell's approach to causal inference (D. T. Campbell, 1957; W. R. Shadish, T. D. Cook, & D. T. Campbell, 2002) is widely used in psychology and education, whereas Donald Rubin's causal model (P. W. Holland, 1986; D. B. Rubin, 1974, 2005) is widely used in economics, statistics, medicine, and public health. Campbell's approach focuses on…
Direct Evidence for a Dual Process Model of Deductive Inference
ERIC Educational Resources Information Center
Markovits, Henry; Brunet, Marie-Laurence; Thompson, Valerie; Brisson, Janie
2013-01-01
In 2 experiments, we tested a strong version of a dual process theory of conditional inference (cf. Verschueren et al., 2005a, 2005b) that assumes that most reasoners have 2 strategies available, the choice of which is determined by situational variables, cognitive capacity, and metacognitive control. The statistical strategy evaluates inferences…
The Role of Probability in Developing Learners' Models of Simulation Approaches to Inference
ERIC Educational Resources Information Center
Lee, Hollylynne S.; Doerr, Helen M.; Tran, Dung; Lovett, Jennifer N.
2016-01-01
Repeated sampling approaches to inference that rely on simulations have recently gained prominence in statistics education, and probabilistic concepts are at the core of this approach. In this approach, learners need to develop a mapping among the problem situation, a physical enactment, computer representations, and the underlying randomization…
From Blickets to Synapses: Inferring Temporal Causal Networks by Observation
ERIC Educational Resources Information Center
Fernando, Chrisantha
2013-01-01
How do human infants learn the causal dependencies between events? Evidence suggests that this remarkable feat can be achieved by observation of only a handful of examples. Many computational models have been produced to explain how infants perform causal inference without explicit teaching about statistics or the scientific method. Here, we…
It's a Girl! Random Numbers, Simulations, and the Law of Large Numbers
ERIC Educational Resources Information Center
Goodwin, Chris; Ortiz, Enrique
2015-01-01
Modeling using mathematics and making inferences about mathematical situations are becoming more prevalent in most fields of study. Descriptive statistics cannot be used to generalize about a population or make predictions of what can occur. Instead, inference must be used. Simulation and sampling are essential in building a foundation for…
Thou Shalt Not Bear False Witness against Null Hypothesis Significance Testing
ERIC Educational Resources Information Center
García-Pérez, Miguel A.
2017-01-01
Null hypothesis significance testing (NHST) has been the subject of debate for decades and alternative approaches to data analysis have been proposed. This article addresses this debate from the perspective of scientific inquiry and inference. Inference is an inverse problem and application of statistical methods cannot reveal whether effects…
NIRS-SPM: statistical parametric mapping for near infrared spectroscopy
NASA Astrophysics Data System (ADS)
Tak, Sungho; Jang, Kwang Eun; Jung, Jinwook; Jang, Jaeduck; Jeong, Yong; Ye, Jong Chul
2008-02-01
Even though there exists a powerful statistical parametric mapping (SPM) tool for fMRI, similar public domain tools are not available for near infrared spectroscopy (NIRS). In this paper, we describe a new public domain statistical toolbox called NIRS-SPM for quantitative analysis of NIRS signals. Specifically, NIRS-SPM statistically analyzes the NIRS data using GLM and makes inference as the excursion probability which comes from the random field that are interpolated from the sparse measurement. In order to obtain correct inference, NIRS-SPM offers the pre-coloring and pre-whitening method for temporal correlation estimation. For simultaneous recording NIRS signal with fMRI, the spatial mapping between fMRI image and real coordinate in 3-D digitizer is estimated using Horn's algorithm. These powerful tools allows us the super-resolution localization of the brain activation which is not possible using the conventional NIRS analysis tools.
NASA Astrophysics Data System (ADS)
Knuth, K. H.
2001-05-01
We consider the application of Bayesian inference to the study of self-organized structures in complex adaptive systems. In particular, we examine the distribution of elements, agents, or processes in systems dominated by hierarchical structure. We demonstrate that results obtained by Caianiello [1] on Hierarchical Modular Systems (HMS) can be found by applying Jaynes' Principle of Group Invariance [2] to a few key assumptions about our knowledge of hierarchical organization. Subsequent application of the Principle of Maximum Entropy allows inferences to be made about specific systems. The utility of the Bayesian method is considered by examining both successes and failures of the hierarchical model. We discuss how Caianiello's original statements suffer from the Mind Projection Fallacy [3] and we restate his assumptions thus widening the applicability of the HMS model. The relationship between inference and statistical physics, described by Jaynes [4], is reiterated with the expectation that this realization will aid the field of complex systems research by moving away from often inappropriate direct application of statistical mechanics to a more encompassing inferential methodology.
Conditional statistical inference with multistage testing designs.
Zwitser, Robert J; Maris, Gunter
2015-03-01
In this paper it is demonstrated how statistical inference from multistage test designs can be made based on the conditional likelihood. Special attention is given to parameter estimation, as well as the evaluation of model fit. Two reasons are provided why the fit of simple measurement models is expected to be better in adaptive designs, compared to linear designs: more parameters are available for the same number of observations; and undesirable response behavior, like slipping and guessing, might be avoided owing to a better match between item difficulty and examinee proficiency. The results are illustrated with simulated data, as well as with real data.
ERIC Educational Resources Information Center
Snyder, Patricia A.; Thompson, Bruce
The use of tests of statistical significance was explored, first by reviewing some criticisms of contemporary practice in the use of statistical tests as reflected in a series of articles in the "American Psychologist" and in the appointment of a "Task Force on Statistical Inference" by the American Psychological Association…
Standard deviation and standard error of the mean.
Lee, Dong Kyu; In, Junyong; Lee, Sangseok
2015-06-01
In most clinical and experimental studies, the standard deviation (SD) and the estimated standard error of the mean (SEM) are used to present the characteristics of sample data and to explain statistical analysis results. However, some authors occasionally muddle the distinctive usage between the SD and SEM in medical literature. Because the process of calculating the SD and SEM includes different statistical inferences, each of them has its own meaning. SD is the dispersion of data in a normal distribution. In other words, SD indicates how accurately the mean represents sample data. However the meaning of SEM includes statistical inference based on the sampling distribution. SEM is the SD of the theoretical distribution of the sample means (the sampling distribution). While either SD or SEM can be applied to describe data and statistical results, one should be aware of reasonable methods with which to use SD and SEM. We aim to elucidate the distinctions between SD and SEM and to provide proper usage guidelines for both, which summarize data and describe statistical results.
Standard deviation and standard error of the mean
In, Junyong; Lee, Sangseok
2015-01-01
In most clinical and experimental studies, the standard deviation (SD) and the estimated standard error of the mean (SEM) are used to present the characteristics of sample data and to explain statistical analysis results. However, some authors occasionally muddle the distinctive usage between the SD and SEM in medical literature. Because the process of calculating the SD and SEM includes different statistical inferences, each of them has its own meaning. SD is the dispersion of data in a normal distribution. In other words, SD indicates how accurately the mean represents sample data. However the meaning of SEM includes statistical inference based on the sampling distribution. SEM is the SD of the theoretical distribution of the sample means (the sampling distribution). While either SD or SEM can be applied to describe data and statistical results, one should be aware of reasonable methods with which to use SD and SEM. We aim to elucidate the distinctions between SD and SEM and to provide proper usage guidelines for both, which summarize data and describe statistical results. PMID:26045923
Visual shape perception as Bayesian inference of 3D object-centered shape representations.
Erdogan, Goker; Jacobs, Robert A
2017-11-01
Despite decades of research, little is known about how people visually perceive object shape. We hypothesize that a promising approach to shape perception is provided by a "visual perception as Bayesian inference" framework which augments an emphasis on visual representation with an emphasis on the idea that shape perception is a form of statistical inference. Our hypothesis claims that shape perception of unfamiliar objects can be characterized as statistical inference of 3D shape in an object-centered coordinate system. We describe a computational model based on our theoretical framework, and provide evidence for the model along two lines. First, we show that, counterintuitively, the model accounts for viewpoint-dependency of object recognition, traditionally regarded as evidence against people's use of 3D object-centered shape representations. Second, we report the results of an experiment using a shape similarity task, and present an extensive evaluation of existing models' abilities to account for the experimental data. We find that our shape inference model captures subjects' behaviors better than competing models. Taken as a whole, our experimental and computational results illustrate the promise of our approach and suggest that people's shape representations of unfamiliar objects are probabilistic, 3D, and object-centered. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Primativo, Silvia; Clark, Camilla; Yong, Keir X X; Firth, Nicholas C; Nicholas, Jennifer; Alexander, Daniel; Warren, Jason D; Rohrer, Jonathan D; Crutch, Sebastian J
2017-11-01
Eyetracking technology has had limited application in the dementia field to date, with most studies attempting to discriminate syndrome subgroups on the basis of basic oculomotor functions rather than higher-order cognitive abilities. Eyetracking-based tasks may also offer opportunities to reduce or ameliorate problems associated with standard paper-and-pencil cognitive tests such as the complexity and linguistic demands of verbal test instructions, and the problems of tiredness and attention associated with lengthy tasks that generate few data points at a slow rate. In the present paper we adapted the Brixton spatial anticipation test to a computerized instruction-less version where oculomotor metrics, rather than overt verbal responses, were taken into account as indicators of high level cognitive functions. Twelve bvFTD (in whom spatial anticipation deficits were expected), six SD patients (in whom deficits were predicted to be less frequent) and 38 healthy controls were presented with a 10 × 7 matrix of white circles. During each trial (N = 24) a black dot moved across seven positions on the screen, following 12 different patterns. Participants' eye movements were recorded. Frequentist statistical analysis of standard eye movement metrics were complemented by a Bayesian machine learning (ML) approach in which raw eyetracking time series datasets were examined to explore the ability to discriminate diagnostic group performance not only on the overall performance but also on individual trials. The original pen and paper Brixton test identified a spatial anticipation deficit in 7/12 (58%) of bvFTD and in 2/6 (33%) of SD patients. The eyetracking frequentist approach reported the deficit in 11/12 (92%) of bvFTD and in none (0%) of the SD patients. The machine learning approach had the main advantage of identifying significant differences from controls in 24/24 individual trials for bvFTD patients and in only 12/24 for SD patients. Results indicate that the fine grained rich datasets obtained from eyetracking metrics can inform us about high level cognitive functions in dementia, such as spatial anticipation. The ML approach can help identify conditions where subtle deficits are present and, potentially, contribute to test optimisation and the reduction of testing times. The absence of instructions also favoured a better distinction between different clinical groups of patients and can help provide valuable disease-specific markers. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
2014-01-01
Background Meta-regression is becoming increasingly used to model study level covariate effects. However this type of statistical analysis presents many difficulties and challenges. Here two methods for calculating confidence intervals for the magnitude of the residual between-study variance in random effects meta-regression models are developed. A further suggestion for calculating credible intervals using informative prior distributions for the residual between-study variance is presented. Methods Two recently proposed and, under the assumptions of the random effects model, exact methods for constructing confidence intervals for the between-study variance in random effects meta-analyses are extended to the meta-regression setting. The use of Generalised Cochran heterogeneity statistics is extended to the meta-regression setting and a Newton-Raphson procedure is developed to implement the Q profile method for meta-analysis and meta-regression. WinBUGS is used to implement informative priors for the residual between-study variance in the context of Bayesian meta-regressions. Results Results are obtained for two contrasting examples, where the first example involves a binary covariate and the second involves a continuous covariate. Intervals for the residual between-study variance are wide for both examples. Conclusions Statistical methods, and R computer software, are available to compute exact confidence intervals for the residual between-study variance under the random effects model for meta-regression. These frequentist methods are almost as easily implemented as their established counterparts for meta-analysis. Bayesian meta-regressions are also easily performed by analysts who are comfortable using WinBUGS. Estimates of the residual between-study variance in random effects meta-regressions should be routinely reported and accompanied by some measure of their uncertainty. Confidence and/or credible intervals are well-suited to this purpose. PMID:25196829
Network meta-analysis: a technique to gather evidence from direct and indirect comparisons
2017-01-01
Systematic reviews and pairwise meta-analyses of randomized controlled trials, at the intersection of clinical medicine, epidemiology and statistics, are positioned at the top of evidence-based practice hierarchy. These are important tools to base drugs approval, clinical protocols and guidelines formulation and for decision-making. However, this traditional technique only partially yield information that clinicians, patients and policy-makers need to make informed decisions, since it usually compares only two interventions at the time. In the market, regardless the clinical condition under evaluation, usually many interventions are available and few of them have been studied in head-to-head studies. This scenario precludes conclusions to be drawn from comparisons of all interventions profile (e.g. efficacy and safety). The recent development and introduction of a new technique – usually referred as network meta-analysis, indirect meta-analysis, multiple or mixed treatment comparisons – has allowed the estimation of metrics for all possible comparisons in the same model, simultaneously gathering direct and indirect evidence. Over the last years this statistical tool has matured as technique with models available for all types of raw data, producing different pooled effect measures, using both Frequentist and Bayesian frameworks, with different software packages. However, the conduction, report and interpretation of network meta-analysis still poses multiple challenges that should be carefully considered, especially because this technique inherits all assumptions from pairwise meta-analysis but with increased complexity. Thus, we aim to provide a basic explanation of network meta-analysis conduction, highlighting its risks and benefits for evidence-based practice, including information on statistical methods evolution, assumptions and steps for performing the analysis. PMID:28503228
Robust inference for group sequential trials.
Ganju, Jitendra; Lin, Yunzhi; Zhou, Kefei
2017-03-01
For ethical reasons, group sequential trials were introduced to allow trials to stop early in the event of extreme results. Endpoints in such trials are usually mortality or irreversible morbidity. For a given endpoint, the norm is to use a single test statistic and to use that same statistic for each analysis. This approach is risky because the test statistic has to be specified before the study is unblinded, and there is loss in power if the assumptions that ensure optimality for each analysis are not met. To minimize the risk of moderate to substantial loss in power due to a suboptimal choice of a statistic, a robust method was developed for nonsequential trials. The concept is analogous to diversification of financial investments to minimize risk. The method is based on combining P values from multiple test statistics for formal inference while controlling the type I error rate at its designated value.This article evaluates the performance of 2 P value combining methods for group sequential trials. The emphasis is on time to event trials although results from less complex trials are also included. The gain or loss in power with the combination method relative to a single statistic is asymmetric in its favor. Depending on the power of each individual test, the combination method can give more power than any single test or give power that is closer to the test with the most power. The versatility of the method is that it can combine P values from different test statistics for analysis at different times. The robustness of results suggests that inference from group sequential trials can be strengthened with the use of combined tests. Copyright © 2017 John Wiley & Sons, Ltd.
Long-term strategy for the statistical design of a forest health monitoring system
Hans T. Schreuder; Raymond L. Czaplewski
1993-01-01
A conceptual framework is given for a broad-scale survey of forest health that accomplishes three objectives: generate descriptive statistics; detect changes in such statistics; and simplify analytical inferences that identify, and possibly establish cause-effect relationships. Our paper discusses the development of sampling schemes to satisfy these three objectives,...
ERIC Educational Resources Information Center
Beeman, Jennifer Leigh Sloan
2013-01-01
Research has found that students successfully complete an introductory course in statistics without fully comprehending the underlying theory or being able to exhibit statistical reasoning. This is particularly true for the understanding about the sampling distribution of the mean, a crucial concept for statistical inference. This study…
Using Action Research to Develop a Course in Statistical Inference for Workplace-Based Adults
ERIC Educational Resources Information Center
Forbes, Sharleen
2014-01-01
Many adults who need an understanding of statistical concepts have limited mathematical skills. They need a teaching approach that includes as little mathematical context as possible. Iterative participatory qualitative research (action research) was used to develop a statistical literacy course for adult learners informed by teaching in…
Applying Statistical Process Control to Clinical Data: An Illustration.
ERIC Educational Resources Information Center
Pfadt, Al; And Others
1992-01-01
Principles of statistical process control are applied to a clinical setting through the use of control charts to detect changes, as part of treatment planning and clinical decision-making processes. The logic of control chart analysis is derived from principles of statistical inference. Sample charts offer examples of evaluating baselines and…
ERIC Educational Resources Information Center
Madhere, Serge
An analytic procedure, efficiency analysis, is proposed for improving the utility of quantitative program evaluation for decision making. The three features of the procedure are explained: (1) for statistical control, it adopts and extends the regression-discontinuity design; (2) for statistical inferences, it de-emphasizes hypothesis testing in…
Bayesian characterization of uncertainty in species interaction strengths.
Wolf, Christopher; Novak, Mark; Gitelman, Alix I
2017-06-01
Considerable effort has been devoted to the estimation of species interaction strengths. This effort has focused primarily on statistical significance testing and obtaining point estimates of parameters that contribute to interaction strength magnitudes, leaving the characterization of uncertainty associated with those estimates unconsidered. We consider a means of characterizing the uncertainty of a generalist predator's interaction strengths by formulating an observational method for estimating a predator's prey-specific per capita attack rates as a Bayesian statistical model. This formulation permits the explicit incorporation of multiple sources of uncertainty. A key insight is the informative nature of several so-called non-informative priors that have been used in modeling the sparse data typical of predator feeding surveys. We introduce to ecology a new neutral prior and provide evidence for its superior performance. We use a case study to consider the attack rates in a New Zealand intertidal whelk predator, and we illustrate not only that Bayesian point estimates can be made to correspond with those obtained by frequentist approaches, but also that estimation uncertainty as described by 95% intervals is more useful and biologically realistic using the Bayesian method. In particular, unlike in bootstrap confidence intervals, the lower bounds of the Bayesian posterior intervals for attack rates do not include zero when a predator-prey interaction is in fact observed. We conclude that the Bayesian framework provides a straightforward, probabilistic characterization of interaction strength uncertainty, enabling future considerations of both the deterministic and stochastic drivers of interaction strength and their impact on food webs.
Clinical trial designs for testing biomarker-based personalized therapies
Lai, Tze Leung; Lavori, Philip W; Shih, Mei-Chiung I; Sikic, Branimir I
2014-01-01
Background Advances in molecular therapeutics in the past decade have opened up new possibilities for treating cancer patients with personalized therapies, using biomarkers to determine which treatments are most likely to benefit them, but there are difficulties and unresolved issues in the development and validation of biomarker-based personalized therapies. We develop a new clinical trial design to address some of these issues. The goal is to capture the strengths of the frequentist and Bayesian approaches to address this problem in the recent literature and to circumvent their limitations. Methods We use generalized likelihood ratio tests of the intersection null and enriched strategy null hypotheses to derive a novel clinical trial design for the problem of advancing promising biomarker-guided strategies toward eventual validation. We also investigate the usefulness of adaptive randomization (AR) and futility stopping proposed in the recent literature. Results Simulation studies demonstrate the advantages of testing both the narrowly focused enriched strategy null hypothesis related to validating a proposed strategy and the intersection null hypothesis that can accommodate to a potentially successful strategy. AR and early termination of ineffective treatments offer increased probability of receiving the preferred treatment and better response rates for patients in the trial, at the expense of more complicated inference under small-to-moderate total sample sizes and some reduction in power. Limitations The binary response used in the development phase may not be a reliable indicator of treatment benefit on long-term clinical outcomes. In the proposed design, the biomarker-guided strategy (BGS) is not compared to ‘standard of care’, such as physician’s choice that may be informed by patient characteristics. Therefore, a positive result does not imply superiority of the BGS to ‘standard of care’. The proposed design and tests are valid asymptotically. Simulations are used to examine small-to-moderate sample properties. Conclusion Innovative clinical trial designs are needed to address the difficulties and issues in the development and validation of biomarker-based personalized therapies. The article shows the advantages of using likelihood inference and interim analysis to meet the challenges in the sample size needed and in the constantly evolving biomarker landscape and genomic and proteomic technologies. PMID:22397801
The penumbra of learning: a statistical theory of synaptic tagging and capture.
Gershman, Samuel J
2014-01-01
Learning in humans and animals is accompanied by a penumbra: Learning one task benefits from learning an unrelated task shortly before or after. At the cellular level, the penumbra of learning appears when weak potentiation of one synapse is amplified by strong potentiation of another synapse on the same neuron during a critical time window. Weak potentiation sets a molecular tag that enables the synapse to capture plasticity-related proteins synthesized in response to strong potentiation at another synapse. This paper describes a computational model which formalizes synaptic tagging and capture in terms of statistical learning mechanisms. According to this model, synaptic strength encodes a probabilistic inference about the dynamically changing association between pre- and post-synaptic firing rates. The rate of change is itself inferred, coupling together different synapses on the same neuron. When the inputs to one synapse change rapidly, the inferred rate of change increases, amplifying learning at other synapses.
Space-Time Data fusion for Remote Sensing Applications
NASA Technical Reports Server (NTRS)
Braverman, Amy; Nguyen, H.; Cressie, N.
2011-01-01
NASA has been collecting massive amounts of remote sensing data about Earth's systems for more than a decade. Missions are selected to be complementary in quantities measured, retrieval techniques, and sampling characteristics, so these datasets are highly synergistic. To fully exploit this, a rigorous methodology for combining data with heterogeneous sampling characteristics is required. For scientific purposes, the methodology must also provide quantitative measures of uncertainty that propagate input-data uncertainty appropriately. We view this as a statistical inference problem. The true but notdirectly- observed quantities form a vector-valued field continuous in space and time. Our goal is to infer those true values or some function of them, and provide to uncertainty quantification for those inferences. We use a spatiotemporal statistical model that relates the unobserved quantities of interest at point-level to the spatially aggregated, observed data. We describe and illustrate our method using CO2 data from two NASA data sets.
Inference of missing data and chemical model parameters using experimental statistics
NASA Astrophysics Data System (ADS)
Casey, Tiernan; Najm, Habib
2017-11-01
A method for determining the joint parameter density of Arrhenius rate expressions through the inference of missing experimental data is presented. This approach proposes noisy hypothetical data sets from target experiments and accepts those which agree with the reported statistics, in the form of nominal parameter values and their associated uncertainties. The data exploration procedure is formalized using Bayesian inference, employing maximum entropy and approximate Bayesian computation methods to arrive at a joint density on data and parameters. The method is demonstrated in the context of reactions in the H2-O2 system for predictive modeling of combustion systems of interest. Work supported by the US DOE BES CSGB. Sandia National Labs is a multimission lab managed and operated by Nat. Technology and Eng'g Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell Intl, for the US DOE NCSA under contract DE-NA-0003525.
Statistical numeracy as a moderator of (pseudo)contingency effects on decision behavior.
Fleig, Hanna; Meiser, Thorsten; Ettlin, Florence; Rummel, Jan
2017-03-01
Pseudocontingencies denote contingency estimates inferred from base rates rather than from cell frequencies. We examined the role of statistical numeracy for effects of such fallible but adaptive inferences on choice behavior. In Experiment 1, we provided information on single observations as well as on base rates and tracked participants' eye movements. In Experiment 2, we manipulated the availability of information on cell frequencies and base rates between conditions. Our results demonstrate that a focus on base rates rather than cell frequencies benefits pseudocontingency effects. Learners who are more proficient in (conditional) probability calculation prefer to rely on cell frequencies in order to judge contingencies, though, as was evident from their gaze behavior. If cell frequencies are available in summarized format, they may infer the true contingency between options and outcomes. Otherwise, however, even highly numerate learners are susceptible to pseudocontingency effects. Copyright © 2017 Elsevier B.V. All rights reserved.
Feinauer, Christoph; Procaccini, Andrea; Zecchina, Riccardo; Weigt, Martin; Pagnani, Andrea
2014-01-01
In the course of evolution, proteins show a remarkable conservation of their three-dimensional structure and their biological function, leading to strong evolutionary constraints on the sequence variability between homologous proteins. Our method aims at extracting such constraints from rapidly accumulating sequence data, and thereby at inferring protein structure and function from sequence information alone. Recently, global statistical inference methods (e.g. direct-coupling analysis, sparse inverse covariance estimation) have achieved a breakthrough towards this aim, and their predictions have been successfully implemented into tertiary and quaternary protein structure prediction methods. However, due to the discrete nature of the underlying variable (amino-acids), exact inference requires exponential time in the protein length, and efficient approximations are needed for practical applicability. Here we propose a very efficient multivariate Gaussian modeling approach as a variant of direct-coupling analysis: the discrete amino-acid variables are replaced by continuous Gaussian random variables. The resulting statistical inference problem is efficiently and exactly solvable. We show that the quality of inference is comparable or superior to the one achieved by mean-field approximations to inference with discrete variables, as done by direct-coupling analysis. This is true for (i) the prediction of residue-residue contacts in proteins, and (ii) the identification of protein-protein interaction partner in bacterial signal transduction. An implementation of our multivariate Gaussian approach is available at the website http://areeweb.polito.it/ricerca/cmp/code. PMID:24663061
The space of ultrametric phylogenetic trees.
Gavryushkin, Alex; Drummond, Alexei J
2016-08-21
The reliability of a phylogenetic inference method from genomic sequence data is ensured by its statistical consistency. Bayesian inference methods produce a sample of phylogenetic trees from the posterior distribution given sequence data. Hence the question of statistical consistency of such methods is equivalent to the consistency of the summary of the sample. More generally, statistical consistency is ensured by the tree space used to analyse the sample. In this paper, we consider two standard parameterisations of phylogenetic time-trees used in evolutionary models: inter-coalescent interval lengths and absolute times of divergence events. For each of these parameterisations we introduce a natural metric space on ultrametric phylogenetic trees. We compare the introduced spaces with existing models of tree space and formulate several formal requirements that a metric space on phylogenetic trees must possess in order to be a satisfactory space for statistical analysis, and justify them. We show that only a few known constructions of the space of phylogenetic trees satisfy these requirements. However, our results suggest that these basic requirements are not enough to distinguish between the two metric spaces we introduce and that the choice between metric spaces requires additional properties to be considered. Particularly, that the summary tree minimising the square distance to the trees from the sample might be different for different parameterisations. This suggests that further fundamental insight is needed into the problem of statistical consistency of phylogenetic inference methods. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
Assessing risk factors for dental caries: a statistical modeling approach.
Trottini, Mario; Bossù, Maurizio; Corridore, Denise; Ierardo, Gaetano; Luzzi, Valeria; Saccucci, Matteo; Polimeni, Antonella
2015-01-01
The problem of identifying potential determinants and predictors of dental caries is of key importance in caries research and it has received considerable attention in the scientific literature. From the methodological side, a broad range of statistical models is currently available to analyze dental caries indices (DMFT, dmfs, etc.). These models have been applied in several studies to investigate the impact of different risk factors on the cumulative severity of dental caries experience. However, in most of the cases (i) these studies focus on a very specific subset of risk factors; and (ii) in the statistical modeling only few candidate models are considered and model selection is at best only marginally addressed. As a result, our understanding of the robustness of the statistical inferences with respect to the choice of the model is very limited; the richness of the set of statistical models available for analysis in only marginally exploited; and inferences could be biased due the omission of potentially important confounding variables in the model's specification. In this paper we argue that these limitations can be overcome considering a general class of candidate models and carefully exploring the model space using standard model selection criteria and measures of global fit and predictive performance of the candidate models. Strengths and limitations of the proposed approach are illustrated with a real data set. In our illustration the model space contains more than 2.6 million models, which require inferences to be adjusted for 'optimism'.
Inferring Characteristics of Sensorimotor Behavior by Quantifying Dynamics of Animal Locomotion
NASA Astrophysics Data System (ADS)
Leung, KaWai
Locomotion is one of the most well-studied topics in animal behavioral studies. Many fundamental and clinical research make use of the locomotion of an animal model to explore various aspects in sensorimotor behavior. In the past, most of these studies focused on population average of a specific trait due to limitation of data collection and processing power. With recent advance in computer vision and statistical modeling techniques, it is now possible to track and analyze large amounts of behavioral data. In this thesis, I present two projects that aim to infer the characteristics of sensorimotor behavior by quantifying the dynamics of locomotion of nematode Caenorhabditis elegans and fruit fly Drosophila melanogaster, shedding light on statistical dependence between sensing and behavior. In the first project, I investigate the possibility of inferring noxious sensory information from the behavior of Caenorhabditis elegans. I develop a statistical model to infer the heat stimulus level perceived by individual animals from their stereotyped escape responses after stimulation by an IR laser. The model allows quantification of analgesic-like effects of chemical agents or genetic mutations in the worm. At the same time, the method is able to differentiate perturbations of locomotion behavior that are beyond affecting the sensory system. With this model I propose experimental designs that allows statistically significant identification of analgesic-like effects. In the second project, I investigate the relationship of energy budget and stability of locomotion in determining the walking speed distribution of Drosophila melanogaster during aging. The locomotion stability at different age groups is estimated from video recordings using Floquet theory. I calculate the power consumption of different locomotion speed using a biomechanics model. In conclusion, the power consumption, not stability, predicts the locomotion speed distribution at different ages.
de la Rúa, Nicholas M.; Bustamante, Dulce M.; Menes, Marianela; Stevens, Lori; Monroy, Carlota; Kilpatrick, William; Rizzo, Donna; Klotz, Stephen A.; Schmidt, Justin; Axen, Heather J.; Dorn, Patricia L.
2014-01-01
Phylogenetic relationships of insect vectors of parasitic diseases are important for understanding the evolution of epidemiologically relevant traits, and may be useful in vector control. The subfamily Triatominae (Hemiptera:Reduviidae) includes ~140 extant species arranged in five tribes comprised of 15 genera. The genus Triatoma is the most species-rich and contains important vectors of Trypanosoma cruzi, the causative agent of Chagas disease. Triatoma species were grouped into complexes originally by morphology and more recently with the addition of information from molecular phylogenetics (the four-complex hypothesis); however, without a strict adherence to monophyly. To date, the validity of proposed species complexes has not been tested by statistical tests of topology. The goal of this study was to clarify the systematics of 19 Triatoma species from North and Central America. We inferred their evolutionary relatedness using two independent data sets: the complete nuclear Internal Transcribed Spacer-2 ribosomal DNA (ITS-2 rDNA) and head morphometrics. In addition, we used the Shimodaira-Hasegawa statistical test of topology to assess the fit of the data to a set of competing systematic hypotheses (topologies). An unconstrained topology inferred from the ITS-2 data was compared to topologies constrained based on the four-complex hypothesis or one inferred from our morphometry results. The unconstrained topology represents a statistically significant better fit of the molecular data than either the four-complex or the morphometric topology. We propose an update to the composition of species complexes in the North and Central American Triatoma, based on a phylogeny inferred from ITS-2 as a first step towards updating the phylogeny of the complexes based on monophyly and statistical tests of topologies. PMID:24681261
Emerging Concepts of Data Integration in Pathogen Phylodynamics.
Baele, Guy; Suchard, Marc A; Rambaut, Andrew; Lemey, Philippe
2017-01-01
Phylodynamics has become an increasingly popular statistical framework to extract evolutionary and epidemiological information from pathogen genomes. By harnessing such information, epidemiologists aim to shed light on the spatio-temporal patterns of spread and to test hypotheses about the underlying interaction of evolutionary and ecological dynamics in pathogen populations. Although the field has witnessed a rich development of statistical inference tools with increasing levels of sophistication, these tools initially focused on sequences as their sole primary data source. Integrating various sources of information, however, promises to deliver more precise insights in infectious diseases and to increase opportunities for statistical hypothesis testing. Here, we review how the emerging concept of data integration is stimulating new advances in Bayesian evolutionary inference methodology which formalize a marriage of statistical thinking and evolutionary biology. These approaches include connecting sequence to trait evolution, such as for host, phenotypic and geographic sampling information, but also the incorporation of covariates of evolutionary and epidemic processes in the reconstruction procedures. We highlight how a full Bayesian approach to covariate modeling and testing can generate further insights into sequence evolution, trait evolution, and population dynamics in pathogen populations. Specific examples demonstrate how such approaches can be used to test the impact of host on rabies and HIV evolutionary rates, to identify the drivers of influenza dispersal as well as the determinants of rabies cross-species transmissions, and to quantify the evolutionary dynamics of influenza antigenicity. Finally, we briefly discuss how data integration is now also permeating through the inference of transmission dynamics, leading to novel insights into tree-generative processes and detailed reconstructions of transmission trees. [Bayesian inference; birth–death models; coalescent models; continuous trait evolution; covariates; data integration; discrete trait evolution; pathogen phylodynamics.
Emerging Concepts of Data Integration in Pathogen Phylodynamics
Baele, Guy; Suchard, Marc A.; Rambaut, Andrew; Lemey, Philippe
2017-01-01
Phylodynamics has become an increasingly popular statistical framework to extract evolutionary and epidemiological information from pathogen genomes. By harnessing such information, epidemiologists aim to shed light on the spatio-temporal patterns of spread and to test hypotheses about the underlying interaction of evolutionary and ecological dynamics in pathogen populations. Although the field has witnessed a rich development of statistical inference tools with increasing levels of sophistication, these tools initially focused on sequences as their sole primary data source. Integrating various sources of information, however, promises to deliver more precise insights in infectious diseases and to increase opportunities for statistical hypothesis testing. Here, we review how the emerging concept of data integration is stimulating new advances in Bayesian evolutionary inference methodology which formalize a marriage of statistical thinking and evolutionary biology. These approaches include connecting sequence to trait evolution, such as for host, phenotypic and geographic sampling information, but also the incorporation of covariates of evolutionary and epidemic processes in the reconstruction procedures. We highlight how a full Bayesian approach to covariate modeling and testing can generate further insights into sequence evolution, trait evolution, and population dynamics in pathogen populations. Specific examples demonstrate how such approaches can be used to test the impact of host on rabies and HIV evolutionary rates, to identify the drivers of influenza dispersal as well as the determinants of rabies cross-species transmissions, and to quantify the evolutionary dynamics of influenza antigenicity. Finally, we briefly discuss how data integration is now also permeating through the inference of transmission dynamics, leading to novel insights into tree-generative processes and detailed reconstructions of transmission trees. [Bayesian inference; birth–death models; coalescent models; continuous trait evolution; covariates; data integration; discrete trait evolution; pathogen phylodynamics. PMID:28173504
Ayres, Daniel L; Darling, Aaron; Zwickl, Derrick J; Beerli, Peter; Holder, Mark T; Lewis, Paul O; Huelsenbeck, John P; Ronquist, Fredrik; Swofford, David L; Cummings, Michael P; Rambaut, Andrew; Suchard, Marc A
2012-01-01
Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years, there has been a concentration of interest in statistical approaches such as Bayesian inference and maximum likelihood estimation. Yet, for large data sets and realistic or interesting models of evolution, these approaches remain computationally demanding. High-throughput sequencing can yield data for thousands of taxa, but scaling to such problems using serial computing often necessitates the use of nonstatistical or approximate approaches. The recent emergence of graphics processing units (GPUs) provides an opportunity to leverage their excellent floating-point computational performance to accelerate statistical phylogenetic inference. A specialized library for phylogenetic calculation would allow existing software packages to make more effective use of available computer hardware, including GPUs. Adoption of a common library would also make it easier for other emerging computing architectures, such as field programmable gate arrays, to be used in the future. We present BEAGLE, an application programming interface (API) and library for high-performance statistical phylogenetic inference. The API provides a uniform interface for performing phylogenetic likelihood calculations on a variety of compute hardware platforms. The library includes a set of efficient implementations and can currently exploit hardware including GPUs using NVIDIA CUDA, central processing units (CPUs) with Streaming SIMD Extensions and related processor supplementary instruction sets, and multicore CPUs via OpenMP. To demonstrate the advantages of a common API, we have incorporated the library into several popular phylogenetic software packages. The BEAGLE library is free open source software licensed under the Lesser GPL and available from http://beagle-lib.googlecode.com. An example client program is available as public domain software.
Ayres, Daniel L.; Darling, Aaron; Zwickl, Derrick J.; Beerli, Peter; Holder, Mark T.; Lewis, Paul O.; Huelsenbeck, John P.; Ronquist, Fredrik; Swofford, David L.; Cummings, Michael P.; Rambaut, Andrew; Suchard, Marc A.
2012-01-01
Abstract Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years, there has been a concentration of interest in statistical approaches such as Bayesian inference and maximum likelihood estimation. Yet, for large data sets and realistic or interesting models of evolution, these approaches remain computationally demanding. High-throughput sequencing can yield data for thousands of taxa, but scaling to such problems using serial computing often necessitates the use of nonstatistical or approximate approaches. The recent emergence of graphics processing units (GPUs) provides an opportunity to leverage their excellent floating-point computational performance to accelerate statistical phylogenetic inference. A specialized library for phylogenetic calculation would allow existing software packages to make more effective use of available computer hardware, including GPUs. Adoption of a common library would also make it easier for other emerging computing architectures, such as field programmable gate arrays, to be used in the future. We present BEAGLE, an application programming interface (API) and library for high-performance statistical phylogenetic inference. The API provides a uniform interface for performing phylogenetic likelihood calculations on a variety of compute hardware platforms. The library includes a set of efficient implementations and can currently exploit hardware including GPUs using NVIDIA CUDA, central processing units (CPUs) with Streaming SIMD Extensions and related processor supplementary instruction sets, and multicore CPUs via OpenMP. To demonstrate the advantages of a common API, we have incorporated the library into several popular phylogenetic software packages. The BEAGLE library is free open source software licensed under the Lesser GPL and available from http://beagle-lib.googlecode.com. An example client program is available as public domain software. PMID:21963610
ERIC Educational Resources Information Center
Page, Robert; Satake, Eiki
2017-01-01
While interest in Bayesian statistics has been growing in statistics education, the treatment of the topic is still inadequate in both textbooks and the classroom. Because so many fields of study lead to careers that involve a decision-making process requiring an understanding of Bayesian methods, it is becoming increasingly clear that Bayesian…
Human Inferences about Sequences: A Minimal Transition Probability Model
2016-01-01
The brain constantly infers the causes of the inputs it receives and uses these inferences to generate statistical expectations about future observations. Experimental evidence for these expectations and their violations include explicit reports, sequential effects on reaction times, and mismatch or surprise signals recorded in electrophysiology and functional MRI. Here, we explore the hypothesis that the brain acts as a near-optimal inference device that constantly attempts to infer the time-varying matrix of transition probabilities between the stimuli it receives, even when those stimuli are in fact fully unpredictable. This parsimonious Bayesian model, with a single free parameter, accounts for a broad range of findings on surprise signals, sequential effects and the perception of randomness. Notably, it explains the pervasive asymmetry between repetitions and alternations encountered in those studies. Our analysis suggests that a neural machinery for inferring transition probabilities lies at the core of human sequence knowledge. PMID:28030543
Exploring High School Students Beginning Reasoning about Significance Tests with Technology
ERIC Educational Resources Information Center
García, Víctor N.; Sánchez, Ernesto
2017-01-01
In the present study we analyze how students reason about or make inferences given a particular hypothesis testing problem (without having studied formal methods of statistical inference) when using Fathom. They use Fathom to create an empirical sampling distribution through computer simulation. It is found that most student´s reasoning rely on…
Stan: A Probabilistic Programming Language for Bayesian Inference and Optimization
ERIC Educational Resources Information Center
Gelman, Andrew; Lee, Daniel; Guo, Jiqiang
2015-01-01
Stan is a free and open-source C++ program that performs Bayesian inference or optimization for arbitrary user-specified models and can be called from the command line, R, Python, Matlab, or Julia and has great promise for fitting large and complex statistical models in many areas of application. We discuss Stan from users' and developers'…
Anchoring quartet-based phylogenetic distances and applications to species tree reconstruction.
Sayyari, Erfan; Mirarab, Siavash
2016-11-11
Inferring species trees from gene trees using the coalescent-based summary methods has been the subject of much attention, yet new scalable and accurate methods are needed. We introduce DISTIQUE, a new statistically consistent summary method for inferring species trees from gene trees under the coalescent model. We generalize our results to arbitrary phylogenetic inference problems; we show that two arbitrarily chosen leaves, called anchors, can be used to estimate relative distances between all other pairs of leaves by inferring relevant quartet trees. This results in a family of distance-based tree inference methods, with running times ranging between quadratic to quartic in the number of leaves. We show in simulated studies that DISTIQUE has comparable accuracy to leading coalescent-based summary methods and reduced running times.
Data Acquisition and Preprocessing in Studies on Humans: What Is Not Taught in Statistics Classes?
Zhu, Yeyi; Hernandez, Ladia M; Mueller, Peter; Dong, Yongquan; Forman, Michele R
2013-01-01
The aim of this paper is to address issues in research that may be missing from statistics classes and important for (bio-)statistics students. In the context of a case study, we discuss data acquisition and preprocessing steps that fill the gap between research questions posed by subject matter scientists and statistical methodology for formal inference. Issues include participant recruitment, data collection training and standardization, variable coding, data review and verification, data cleaning and editing, and documentation. Despite the critical importance of these details in research, most of these issues are rarely discussed in an applied statistics program. One reason for the lack of more formal training is the difficulty in addressing the many challenges that can possibly arise in the course of a study in a systematic way. This article can help to bridge this gap between research questions and formal statistical inference by using an illustrative case study for a discussion. We hope that reading and discussing this paper and practicing data preprocessing exercises will sensitize statistics students to these important issues and achieve optimal conduct, quality control, analysis, and interpretation of a study.
Automatic imitation of pro- and antisocial gestures: Is implicit social behavior censored?
Cracco, Emiel; Genschow, Oliver; Radkova, Ina; Brass, Marcel
2018-01-01
According to social reward theories, automatic imitation can be understood as a means to obtain positive social consequences. In line with this view, it has been shown that automatic imitation is modulated by contextual variables that constrain the positive outcomes of imitation. However, this work has largely neglected that many gestures have an inherent pro- or antisocial meaning. As a result of their meaning, antisocial gestures are considered taboo and should not be used in public. In three experiments, we show that automatic imitation of symbolic gestures is modulated by the social intent of these gestures. Experiment 1 (N=37) revealed reduced automatic imitation of antisocial compared with prosocial gestures. Experiment 2 (N=118) and Experiment 3 (N=118) used a social priming procedure to show that this effect was stronger in a prosocial context than in an antisocial context. These findings were supported in a within-study meta-analysis using both frequentist and Bayesian statistics. Together, our results indicate that automatic imitation is regulated by internalized social norms that act as a stop signal when inappropriate actions are triggered. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Traversaro, Francisco; O. Redelico, Francisco
2018-04-01
In nonlinear dynamics, and to a lesser extent in other fields, a widely used measure of complexity is the Permutation Entropy. But there is still no known method to determine the accuracy of this measure. There has been little research on the statistical properties of this quantity that characterize time series. The literature describes some resampling methods of quantities used in nonlinear dynamics - as the largest Lyapunov exponent - but these seems to fail. In this contribution, we propose a parametric bootstrap methodology using a symbolic representation of the time series to obtain the distribution of the Permutation Entropy estimator. We perform several time series simulations given by well-known stochastic processes: the 1/fα noise family, and show in each case that the proposed accuracy measure is as efficient as the one obtained by the frequentist approach of repeating the experiment. The complexity of brain electrical activity, measured by the Permutation Entropy, has been extensively used in epilepsy research for detection in dynamical changes in electroencephalogram (EEG) signal with no consideration of the variability of this complexity measure. An application of the parametric bootstrap methodology is used to compare normal and pre-ictal EEG signals.
An Artificial Intelligence Approach to Analyzing Student Errors in Statistics.
ERIC Educational Resources Information Center
Sebrechts, Marc M.; Schooler, Lael J.
1987-01-01
Describes the development of an artificial intelligence system called GIDE that analyzes student errors in statistics problems by inferring the students' intentions. Learning strategies involved in problem solving are discussed and the inclusion of goal structures is explained. (LRW)
Network inference using informative priors.
Mukherjee, Sach; Speed, Terence P
2008-09-23
Recent years have seen much interest in the study of systems characterized by multiple interacting components. A class of statistical models called graphical models, in which graphs are used to represent probabilistic relationships between variables, provides a framework for formal inference regarding such systems. In many settings, the object of inference is the network structure itself. This problem of "network inference" is well known to be a challenging one. However, in scientific settings there is very often existing information regarding network connectivity. A natural idea then is to take account of such information during inference. This article addresses the question of incorporating prior information into network inference. We focus on directed models called Bayesian networks, and use Markov chain Monte Carlo to draw samples from posterior distributions over network structures. We introduce prior distributions on graphs capable of capturing information regarding network features including edges, classes of edges, degree distributions, and sparsity. We illustrate our approach in the context of systems biology, applying our methods to network inference in cancer signaling.
Attention - Control in the Frequentistic Processing of Multidimensional Event Streams.
1980-07-01
Human memory. Annual Review of Psychology, 1979, 30, 63-702. Craik , F. I. M., & Lockhart , R. S. Levels of processing : A framework for memory research...1979; Jacoby & Craik , 1979). Thus, the notions of memora- bility (or retrievability) and levels of processing are tied closely in the sense that the...differing levels and degrees of elaborateness (Jacoby & Craik , 1979). Decisions as to which attributes receive elaborated processing and how they are
Reaction Time in Grade 5: Data Collection within the Practice of Statistics
ERIC Educational Resources Information Center
Watson, Jane; English, Lyn
2017-01-01
This study reports on a classroom activity for Grade 5 students investigating their reaction times. The investigation was part of a 3-year research project introducing students to informal inference and giving them experience carrying out the practice of statistics. For this activity the focus within the practice of statistics was on introducing…
ERIC Educational Resources Information Center
Bakker, Arthur; Ben-Zvi, Dani; Makar, Katie
2017-01-01
To understand how statistical and other types of reasoning are coordinated with actions to reduce uncertainty, we conducted a case study in vocational education that involved statistical hypothesis testing. We analyzed an intern's research project in a hospital laboratory in which reducing uncertainties was crucial to make a valid statistical…
Causal inference in biology networks with integrated belief propagation.
Chang, Rui; Karr, Jonathan R; Schadt, Eric E
2015-01-01
Inferring causal relationships among molecular and higher order phenotypes is a critical step in elucidating the complexity of living systems. Here we propose a novel method for inferring causality that is no longer constrained by the conditional dependency arguments that limit the ability of statistical causal inference methods to resolve causal relationships within sets of graphical models that are Markov equivalent. Our method utilizes Bayesian belief propagation to infer the responses of perturbation events on molecular traits given a hypothesized graph structure. A distance measure between the inferred response distribution and the observed data is defined to assess the 'fitness' of the hypothesized causal relationships. To test our algorithm, we infer causal relationships within equivalence classes of gene networks in which the form of the functional interactions that are possible are assumed to be nonlinear, given synthetic microarray and RNA sequencing data. We also apply our method to infer causality in real metabolic network with v-structure and feedback loop. We show that our method can recapitulate the causal structure and recover the feedback loop only from steady-state data which conventional method cannot.
Gene-network inference by message passing
NASA Astrophysics Data System (ADS)
Braunstein, A.; Pagnani, A.; Weigt, M.; Zecchina, R.
2008-01-01
The inference of gene-regulatory processes from gene-expression data belongs to the major challenges of computational systems biology. Here we address the problem from a statistical-physics perspective and develop a message-passing algorithm which is able to infer sparse, directed and combinatorial regulatory mechanisms. Using the replica technique, the algorithmic performance can be characterized analytically for artificially generated data. The algorithm is applied to genome-wide expression data of baker's yeast under various environmental conditions. We find clear cases of combinatorial control, and enrichment in common functional annotations of regulated genes and their regulators.
Wilkinson, Michael
2014-03-01
Decisions about support for predictions of theories in light of data are made using statistical inference. The dominant approach in sport and exercise science is the Neyman-Pearson (N-P) significance-testing approach. When applied correctly it provides a reliable procedure for making dichotomous decisions for accepting or rejecting zero-effect null hypotheses with known and controlled long-run error rates. Type I and type II error rates must be specified in advance and the latter controlled by conducting an a priori sample size calculation. The N-P approach does not provide the probability of hypotheses or indicate the strength of support for hypotheses in light of data, yet many scientists believe it does. Outcomes of analyses allow conclusions only about the existence of non-zero effects, and provide no information about the likely size of true effects or their practical/clinical value. Bayesian inference can show how much support data provide for different hypotheses, and how personal convictions should be altered in light of data, but the approach is complicated by formulating probability distributions about prior subjective estimates of population effects. A pragmatic solution is magnitude-based inference, which allows scientists to estimate the true magnitude of population effects and how likely they are to exceed an effect magnitude of practical/clinical importance, thereby integrating elements of subjective Bayesian-style thinking. While this approach is gaining acceptance, progress might be hastened if scientists appreciate the shortcomings of traditional N-P null hypothesis significance testing.
Bayesian multimodel inference for dose-response studies
Link, W.A.; Albers, P.H.
2007-01-01
Statistical inference in dose?response studies is model-based: The analyst posits a mathematical model of the relation between exposure and response, estimates parameters of the model, and reports conclusions conditional on the model. Such analyses rarely include any accounting for the uncertainties associated with model selection. The Bayesian inferential system provides a convenient framework for model selection and multimodel inference. In this paper we briefly describe the Bayesian paradigm and Bayesian multimodel inference. We then present a family of models for multinomial dose?response data and apply Bayesian multimodel inferential methods to the analysis of data on the reproductive success of American kestrels (Falco sparveriuss) exposed to various sublethal dietary concentrations of methylmercury.
Statistical Inference for Big Data Problems in Molecular Biophysics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ramanathan, Arvind; Savol, Andrej; Burger, Virginia
2012-01-01
We highlight the role of statistical inference techniques in providing biological insights from analyzing long time-scale molecular simulation data. Technologi- cal and algorithmic improvements in computation have brought molecular simu- lations to the forefront of techniques applied to investigating the basis of living systems. While these longer simulations, increasingly complex reaching petabyte scales presently, promise a detailed view into microscopic behavior, teasing out the important information has now become a true challenge on its own. Mining this data for important patterns is critical to automating therapeutic intervention discovery, improving protein design, and fundamentally understanding the mech- anistic basis of cellularmore » homeostasis.« less
Tropical geometry of statistical models.
Pachter, Lior; Sturmfels, Bernd
2004-11-16
This article presents a unified mathematical framework for inference in graphical models, building on the observation that graphical models are algebraic varieties. From this geometric viewpoint, observations generated from a model are coordinates of a point in the variety, and the sum-product algorithm is an efficient tool for evaluating specific coordinates. Here, we address the question of how the solutions to various inference problems depend on the model parameters. The proposed answer is expressed in terms of tropical algebraic geometry. The Newton polytope of a statistical model plays a key role. Our results are applied to the hidden Markov model and the general Markov model on a binary tree.
Salehi, Sohrab; Steif, Adi; Roth, Andrew; Aparicio, Samuel; Bouchard-Côté, Alexandre; Shah, Sohrab P
2017-03-01
Next-generation sequencing (NGS) of bulk tumour tissue can identify constituent cell populations in cancers and measure their abundance. This requires computational deconvolution of allelic counts from somatic mutations, which may be incapable of fully resolving the underlying population structure. Single cell sequencing (SCS) is a more direct method, although its replacement of NGS is impeded by technical noise and sampling limitations. We propose ddClone, which analytically integrates NGS and SCS data, leveraging their complementary attributes through joint statistical inference. We show on real and simulated datasets that ddClone produces more accurate results than can be achieved by either method alone.
The Heuristic Value of p in Inductive Statistical Inference
Krueger, Joachim I.; Heck, Patrick R.
2017-01-01
Many statistical methods yield the probability of the observed data – or data more extreme – under the assumption that a particular hypothesis is true. This probability is commonly known as ‘the’ p-value. (Null Hypothesis) Significance Testing ([NH]ST) is the most prominent of these methods. The p-value has been subjected to much speculation, analysis, and criticism. We explore how well the p-value predicts what researchers presumably seek: the probability of the hypothesis being true given the evidence, and the probability of reproducing significant results. We also explore the effect of sample size on inferential accuracy, bias, and error. In a series of simulation experiments, we find that the p-value performs quite well as a heuristic cue in inductive inference, although there are identifiable limits to its usefulness. We conclude that despite its general usefulness, the p-value cannot bear the full burden of inductive inference; it is but one of several heuristic cues available to the data analyst. Depending on the inferential challenge at hand, investigators may supplement their reports with effect size estimates, Bayes factors, or other suitable statistics, to communicate what they think the data say. PMID:28649206
Subjective randomness as statistical inference.
Griffiths, Thomas L; Daniels, Dylan; Austerweil, Joseph L; Tenenbaum, Joshua B
2018-06-01
Some events seem more random than others. For example, when tossing a coin, a sequence of eight heads in a row does not seem very random. Where do these intuitions about randomness come from? We argue that subjective randomness can be understood as the result of a statistical inference assessing the evidence that an event provides for having been produced by a random generating process. We show how this account provides a link to previous work relating randomness to algorithmic complexity, in which random events are those that cannot be described by short computer programs. Algorithmic complexity is both incomputable and too general to capture the regularities that people can recognize, but viewing randomness as statistical inference provides two paths to addressing these problems: considering regularities generated by simpler computing machines, and restricting the set of probability distributions that characterize regularity. Building on previous work exploring these different routes to a more restricted notion of randomness, we define strong quantitative models of human randomness judgments that apply not just to binary sequences - which have been the focus of much of the previous work on subjective randomness - but also to binary matrices and spatial clustering. Copyright © 2018 Elsevier Inc. All rights reserved.
The Heuristic Value of p in Inductive Statistical Inference.
Krueger, Joachim I; Heck, Patrick R
2017-01-01
Many statistical methods yield the probability of the observed data - or data more extreme - under the assumption that a particular hypothesis is true. This probability is commonly known as 'the' p -value. (Null Hypothesis) Significance Testing ([NH]ST) is the most prominent of these methods. The p -value has been subjected to much speculation, analysis, and criticism. We explore how well the p -value predicts what researchers presumably seek: the probability of the hypothesis being true given the evidence, and the probability of reproducing significant results. We also explore the effect of sample size on inferential accuracy, bias, and error. In a series of simulation experiments, we find that the p -value performs quite well as a heuristic cue in inductive inference, although there are identifiable limits to its usefulness. We conclude that despite its general usefulness, the p -value cannot bear the full burden of inductive inference; it is but one of several heuristic cues available to the data analyst. Depending on the inferential challenge at hand, investigators may supplement their reports with effect size estimates, Bayes factors, or other suitable statistics, to communicate what they think the data say.
Statistical inference of seabed sound-speed structure in the Gulf of Oman Basin.
Sagers, Jason D; Knobles, David P
2014-06-01
Addressed is the statistical inference of the sound-speed depth profile of a thick soft seabed from broadband sound propagation data recorded in the Gulf of Oman Basin in 1977. The acoustic data are in the form of time series signals recorded on a sparse vertical line array and generated by explosive sources deployed along a 280 km track. The acoustic data offer a unique opportunity to study a deep-water bottom-limited thickly sedimented environment because of the large number of time series measurements, very low seabed attenuation, and auxiliary measurements. A maximum entropy method is employed to obtain a conditional posterior probability distribution (PPD) for the sound-speed ratio and the near-surface sound-speed gradient. The multiple data samples allow for a determination of the average error constraint value required to uniquely specify the PPD for each data sample. Two complicating features of the statistical inference study are addressed: (1) the need to develop an error function that can both utilize the measured multipath arrival structure and mitigate the effects of data errors and (2) the effect of small bathymetric slopes on the structure of the bottom interacting arrivals.
Experimental and environmental factors affect spurious detection of ecological thresholds
Daily, Jonathan P.; Hitt, Nathaniel P.; Smith, David; Snyder, Craig D.
2012-01-01
Threshold detection methods are increasingly popular for assessing nonlinear responses to environmental change, but their statistical performance remains poorly understood. We simulated linear change in stream benthic macroinvertebrate communities and evaluated the performance of commonly used threshold detection methods based on model fitting (piecewise quantile regression [PQR]), data partitioning (nonparametric change point analysis [NCPA]), and a hybrid approach (significant zero crossings [SiZer]). We demonstrated that false detection of ecological thresholds (type I errors) and inferences on threshold locations are influenced by sample size, rate of linear change, and frequency of observations across the environmental gradient (i.e., sample-environment distribution, SED). However, the relative importance of these factors varied among statistical methods and between inference types. False detection rates were influenced primarily by user-selected parameters for PQR (τ) and SiZer (bandwidth) and secondarily by sample size (for PQR) and SED (for SiZer). In contrast, the location of reported thresholds was influenced primarily by SED. Bootstrapped confidence intervals for NCPA threshold locations revealed strong correspondence to SED. We conclude that the choice of statistical methods for threshold detection should be matched to experimental and environmental constraints to minimize false detection rates and avoid spurious inferences regarding threshold location.
Statistical Inference for Quality-Adjusted Survival Time
2003-08-01
survival functions of QAL. If an influence function for a test statistic exists for complete data case, denoted as ’i, then a test statistic for...the survival function for the censoring variable. Zhao and Tsiatis (2001) proposed a test statistic where O is the influence function of the general...to 1 everywhere until a subject’s death. We have considered other forms of test statistics. One option is to use an influence function 0i that is
Intimate Partner Violence in the United States - 2010
... administration............................................................................. 9 Statistical testing and inference ................................................................... 9 Additional methodological information ..........................................................10 2. Prevalence and Frequency of Individual ...
Estimating the probability of rare events: addressing zero failure data.
Quigley, John; Revie, Matthew
2011-07-01
Traditional statistical procedures for estimating the probability of an event result in an estimate of zero when no events are realized. Alternative inferential procedures have been proposed for the situation where zero events have been realized but often these are ad hoc, relying on selecting methods dependent on the data that have been realized. Such data-dependent inference decisions violate fundamental statistical principles, resulting in estimation procedures whose benefits are difficult to assess. In this article, we propose estimating the probability of an event occurring through minimax inference on the probability that future samples of equal size realize no more events than that in the data on which the inference is based. Although motivated by inference on rare events, the method is not restricted to zero event data and closely approximates the maximum likelihood estimate (MLE) for nonzero data. The use of the minimax procedure provides a risk adverse inferential procedure where there are no events realized. A comparison is made with the MLE and regions of the underlying probability are identified where this approach is superior. Moreover, a comparison is made with three standard approaches to supporting inference where no event data are realized, which we argue are unduly pessimistic. We show that for situations of zero events the estimator can be simply approximated with 1/2.5n, where n is the number of trials. © 2011 Society for Risk Analysis.
Learning Quantitative Sequence-Function Relationships from Massively Parallel Experiments
NASA Astrophysics Data System (ADS)
Atwal, Gurinder S.; Kinney, Justin B.
2016-03-01
A fundamental aspect of biological information processing is the ubiquity of sequence-function relationships—functions that map the sequence of DNA, RNA, or protein to a biochemically relevant activity. Most sequence-function relationships in biology are quantitative, but only recently have experimental techniques for effectively measuring these relationships been developed. The advent of such "massively parallel" experiments presents an exciting opportunity for the concepts and methods of statistical physics to inform the study of biological systems. After reviewing these recent experimental advances, we focus on the problem of how to infer parametric models of sequence-function relationships from the data produced by these experiments. Specifically, we retrace and extend recent theoretical work showing that inference based on mutual information, not the standard likelihood-based approach, is often necessary for accurately learning the parameters of these models. Closely connected with this result is the emergence of "diffeomorphic modes"—directions in parameter space that are far less constrained by data than likelihood-based inference would suggest. Analogous to Goldstone modes in physics, diffeomorphic modes arise from an arbitrarily broken symmetry of the inference problem. An analytically tractable model of a massively parallel experiment is then described, providing an explicit demonstration of these fundamental aspects of statistical inference. This paper concludes with an outlook on the theoretical and computational challenges currently facing studies of quantitative sequence-function relationships.
Statistical primer: how to deal with missing data in scientific research?
Papageorgiou, Grigorios; Grant, Stuart W; Takkenberg, Johanna J M; Mokhles, Mostafa M
2018-05-10
Missing data are a common challenge encountered in research which can compromise the results of statistical inference when not handled appropriately. This paper aims to introduce basic concepts of missing data to a non-statistical audience, list and compare some of the most popular approaches for handling missing data in practice and provide guidelines and recommendations for dealing with and reporting missing data in scientific research. Complete case analysis and single imputation are simple approaches for handling missing data and are popular in practice, however, in most cases they are not guaranteed to provide valid inferences. Multiple imputation is a robust and general alternative which is appropriate for data missing at random, surpassing the disadvantages of the simpler approaches, but should always be conducted with care. The aforementioned approaches are illustrated and compared in an example application using Cox regression.
ERIC Educational Resources Information Center
Lane-Getaz, Sharon
2017-01-01
In reaction to misuses and misinterpretations of p-values and confidence intervals, a social science journal editor banned p-values from its pages. This study aimed to show that education could address misuse and abuse. This study examines inference-related learning outcomes for social science students in an introductory course supplemented with…
Sandoval-Castellanos, Edson; Palkopoulou, Eleftheria; Dalén, Love
2014-01-01
Inference of population demographic history has vastly improved in recent years due to a number of technological and theoretical advances including the use of ancient DNA. Approximate Bayesian computation (ABC) stands among the most promising methods due to its simple theoretical fundament and exceptional flexibility. However, limited availability of user-friendly programs that perform ABC analysis renders it difficult to implement, and hence programming skills are frequently required. In addition, there is limited availability of programs able to deal with heterochronous data. Here we present the software BaySICS: Bayesian Statistical Inference of Coalescent Simulations. BaySICS provides an integrated and user-friendly platform that performs ABC analyses by means of coalescent simulations from DNA sequence data. It estimates historical demographic population parameters and performs hypothesis testing by means of Bayes factors obtained from model comparisons. Although providing specific features that improve inference from datasets with heterochronous data, BaySICS also has several capabilities making it a suitable tool for analysing contemporary genetic datasets. Those capabilities include joint analysis of independent tables, a graphical interface and the implementation of Markov-chain Monte Carlo without likelihoods.
Multiple outcomes are often measured on each experimental unit in toxicology experiments. These multiple observations typically imply the existence of correlation between endpoints, and a statistical analysis that incorporates it may result in improved inference. When both disc...
The Love of Large Numbers: A Popularity Bias in Consumer Choice.
Powell, Derek; Yu, Jingqi; DeWolf, Melissa; Holyoak, Keith J
2017-10-01
Social learning-the ability to learn from observing the decisions of other people and the outcomes of those decisions-is fundamental to human evolutionary and cultural success. The Internet now provides social evidence on an unprecedented scale. However, properly utilizing this evidence requires a capacity for statistical inference. We examined how people's interpretation of online review scores is influenced by the numbers of reviews-a potential indicator both of an item's popularity and of the precision of the average review score. Our task was designed to pit statistical information against social information. We modeled the behavior of an "intuitive statistician" using empirical prior information from millions of reviews posted on Amazon.com and then compared the model's predictions with the behavior of experimental participants. Under certain conditions, people preferred a product with more reviews to one with fewer reviews even though the statistical model indicated that the latter was likely to be of higher quality than the former. Overall, participants' judgments suggested that they failed to make meaningful statistical inferences.
Learning what to expect (in visual perception)
Seriès, Peggy; Seitz, Aaron R.
2013-01-01
Expectations are known to greatly affect our experience of the world. A growing theory in computational neuroscience is that perception can be successfully described using Bayesian inference models and that the brain is “Bayes-optimal” under some constraints. In this context, expectations are particularly interesting, because they can be viewed as prior beliefs in the statistical inference process. A number of questions remain unsolved, however, for example: How fast do priors change over time? Are there limits in the complexity of the priors that can be learned? How do an individual’s priors compare to the true scene statistics? Can we unlearn priors that are thought to correspond to natural scene statistics? Where and what are the neural substrate of priors? Focusing on the perception of visual motion, we here review recent studies from our laboratories and others addressing these issues. We discuss how these data on motion perception fit within the broader literature on perceptual Bayesian priors, perceptual expectations, and statistical and perceptual learning and review the possible neural basis of priors. PMID:24187536
Statistical Inference for Porous Materials using Persistent Homology.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moon, Chul; Heath, Jason E.; Mitchell, Scott A.
2017-12-01
We propose a porous materials analysis pipeline using persistent homology. We rst compute persistent homology of binarized 3D images of sampled material subvolumes. For each image we compute sets of homology intervals, which are represented as summary graphics called persistence diagrams. We convert persistence diagrams into image vectors in order to analyze the similarity of the homology of the material images using the mature tools for image analysis. Each image is treated as a vector and we compute its principal components to extract features. We t a statistical model using the loadings of principal components to estimate material porosity, permeability,more » anisotropy, and tortuosity. We also propose an adaptive version of the structural similarity index (SSIM), a similarity metric for images, as a measure to determine the statistical representative elementary volumes (sREV) for persistence homology. Thus we provide a capability for making a statistical inference of the uid ow and transport properties of porous materials based on their geometry and connectivity.« less
Philosophy and the practice of Bayesian statistics
Gelman, Andrew; Shalizi, Cosma Rohilla
2015-01-01
A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypothetico-deductivism. We examine the actual role played by prior distributions in Bayesian models, and the crucial aspects of model checking and model revision, which fall outside the scope of Bayesian confirmation theory. We draw on the literature on the consistency of Bayesian updating and also on our experience of applied work in social science. Clarity about these matters should benefit not just philosophy of science, but also statistical practice. At best, the inductivist view has encouraged researchers to fit and compare models without checking them; at worst, theorists have actively discouraged practitioners from performing model checking because it does not fit into their framework. PMID:22364575
Philosophy and the practice of Bayesian statistics.
Gelman, Andrew; Shalizi, Cosma Rohilla
2013-02-01
A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypothetico-deductivism. We examine the actual role played by prior distributions in Bayesian models, and the crucial aspects of model checking and model revision, which fall outside the scope of Bayesian confirmation theory. We draw on the literature on the consistency of Bayesian updating and also on our experience of applied work in social science. Clarity about these matters should benefit not just philosophy of science, but also statistical practice. At best, the inductivist view has encouraged researchers to fit and compare models without checking them; at worst, theorists have actively discouraged practitioners from performing model checking because it does not fit into their framework. © 2012 The British Psychological Society.
Federal Register 2010, 2011, 2012, 2013, 2014
2013-04-24
... Bureau, Statistical Abstract of the United States: 2011, Table 427 (2007). \\28\\ The 2007 U.S Census data.... (U.S. CENSUS BUREAU, STATISTICAL ABSTRACT OF THE UNITED STATES 2011, Table 428.) The criterion by... Statistical Abstract of the U.S., that inference is further supported by the fact that in both Tables, many...
Dutton, P; Love, SB; Billingham, L; Hassan, AB
2016-01-01
Trials run in either rare diseases, such as rare cancers, or rare sub-populations of common diseases are challenging in terms of identifying, recruiting and treating sufficient patients in a sensible period. Treatments for rare diseases are often designed for other disease areas and then later proposed as possible treatments for the rare disease after initial phase I testing is complete. To ensure the trial is in the best interests of the patient participants, frequent interim analyses are needed to force the trial to stop promptly if the treatment is futile or toxic. These non-definitive phase II trials should also be stopped for efficacy to accelerate research progress if the treatment proves to be particularly promising. In this paper, we review frequentist and Bayesian methods that have been adapted to incorporate two binary endpoints and frequent interim analyses. The Eurosarc Trial of Linsitinib in advanced Ewing Sarcoma (LINES) is used as a motivating example and provides a suitable platform to compare these approaches. The Bayesian approach provides greater design flexibility, but does not provide additional value over the frequentist approaches in a single trial setting when the prior is non-informative. However, Bayesian designs are able to borrow from any previous experience, using prior information to improve efficiency. PMID:27587590
Augmenting Latent Dirichlet Allocation and Rank Threshold Detection with Ontologies
2010-03-01
Probabilistic Latent Semantic Indexing (PLSI) is an automated indexing information retrieval model [20]. It is based on a statistical latent class model which is...uses a statistical foundation that is more accurate in finding hidden semantic relationships [20]. The model uses factor analysis of count data, number...principle of statistical infer- ence which asserts that all of the information in a sample is contained in the likelihood function [20]. The statistical
Inference Control Mechanism for Statistical Database: Frequency-Imposed Data Distortions.
ERIC Educational Resources Information Center
Liew, Chong K.; And Others
1985-01-01
Introduces two data distortion methods (Frequency-Imposed Distortion, Frequency-Imposed Probability Distortion) and uses a Monte Carlo study to compare their performance with that of other distortion methods (Point Distortion, Probability Distortion). Indications that data generated by these two methods produce accurate statistics and protect…
Notes on power of normality tests of error terms in regression models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Střelec, Luboš
2015-03-10
Normality is one of the basic assumptions in applying statistical procedures. For example in linear regression most of the inferential procedures are based on the assumption of normality, i.e. the disturbance vector is assumed to be normally distributed. Failure to assess non-normality of the error terms may lead to incorrect results of usual statistical inference techniques such as t-test or F-test. Thus, error terms should be normally distributed in order to allow us to make exact inferences. As a consequence, normally distributed stochastic errors are necessary in order to make a not misleading inferences which explains a necessity and importancemore » of robust tests of normality. Therefore, the aim of this contribution is to discuss normality testing of error terms in regression models. In this contribution, we introduce the general RT class of robust tests for normality, and present and discuss the trade-off between power and robustness of selected classical and robust normality tests of error terms in regression models.« less
Johnson, Jason K.; Oyen, Diane Adele; Chertkov, Michael; ...
2016-12-01
Inference and learning of graphical models are both well-studied problems in statistics and machine learning that have found many applications in science and engineering. However, exact inference is intractable in general graphical models, which suggests the problem of seeking the best approximation to a collection of random variables within some tractable family of graphical models. In this paper, we focus on the class of planar Ising models, for which exact inference is tractable using techniques of statistical physics. Based on these techniques and recent methods for planarity testing and planar embedding, we propose a greedy algorithm for learning the bestmore » planar Ising model to approximate an arbitrary collection of binary random variables (possibly from sample data). Given the set of all pairwise correlations among variables, we select a planar graph and optimal planar Ising model defined on this graph to best approximate that set of correlations. Finally, we demonstrate our method in simulations and for two applications: modeling senate voting records and identifying geo-chemical depth trends from Mars rover data.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Johnson, Jason K.; Oyen, Diane Adele; Chertkov, Michael
Inference and learning of graphical models are both well-studied problems in statistics and machine learning that have found many applications in science and engineering. However, exact inference is intractable in general graphical models, which suggests the problem of seeking the best approximation to a collection of random variables within some tractable family of graphical models. In this paper, we focus on the class of planar Ising models, for which exact inference is tractable using techniques of statistical physics. Based on these techniques and recent methods for planarity testing and planar embedding, we propose a greedy algorithm for learning the bestmore » planar Ising model to approximate an arbitrary collection of binary random variables (possibly from sample data). Given the set of all pairwise correlations among variables, we select a planar graph and optimal planar Ising model defined on this graph to best approximate that set of correlations. Finally, we demonstrate our method in simulations and for two applications: modeling senate voting records and identifying geo-chemical depth trends from Mars rover data.« less
An argument for mechanism-based statistical inference in cancer
Ochs, Michael; Price, Nathan D.; Tomasetti, Cristian; Younes, Laurent
2015-01-01
Cancer is perhaps the prototypical systems disease, and as such has been the focus of extensive study in quantitative systems biology. However, translating these programs into personalized clinical care remains elusive and incomplete. In this perspective, we argue that realizing this agenda—in particular, predicting disease phenotypes, progression and treatment response for individuals—requires going well beyond standard computational and bioinformatics tools and algorithms. It entails designing global mathematical models over network-scale configurations of genomic states and molecular concentrations, and learning the model parameters from limited available samples of high-dimensional and integrative omics data. As such, any plausible design should accommodate: biological mechanism, necessary for both feasible learning and interpretable decision making; stochasticity, to deal with uncertainty and observed variation at many scales; and a capacity for statistical inference at the patient level. This program, which requires a close, sustained collaboration between mathematicians and biologists, is illustrated in several contexts, including learning bio-markers, metabolism, cell signaling, network inference and tumorigenesis. PMID:25381197
Network inference using informative priors
Mukherjee, Sach; Speed, Terence P.
2008-01-01
Recent years have seen much interest in the study of systems characterized by multiple interacting components. A class of statistical models called graphical models, in which graphs are used to represent probabilistic relationships between variables, provides a framework for formal inference regarding such systems. In many settings, the object of inference is the network structure itself. This problem of “network inference” is well known to be a challenging one. However, in scientific settings there is very often existing information regarding network connectivity. A natural idea then is to take account of such information during inference. This article addresses the question of incorporating prior information into network inference. We focus on directed models called Bayesian networks, and use Markov chain Monte Carlo to draw samples from posterior distributions over network structures. We introduce prior distributions on graphs capable of capturing information regarding network features including edges, classes of edges, degree distributions, and sparsity. We illustrate our approach in the context of systems biology, applying our methods to network inference in cancer signaling. PMID:18799736
Bayesian Inference for Functional Dynamics Exploring in fMRI Data.
Guo, Xuan; Liu, Bing; Chen, Le; Chen, Guantao; Pan, Yi; Zhang, Jing
2016-01-01
This paper aims to review state-of-the-art Bayesian-inference-based methods applied to functional magnetic resonance imaging (fMRI) data. Particularly, we focus on one specific long-standing challenge in the computational modeling of fMRI datasets: how to effectively explore typical functional interactions from fMRI time series and the corresponding boundaries of temporal segments. Bayesian inference is a method of statistical inference which has been shown to be a powerful tool to encode dependence relationships among the variables with uncertainty. Here we provide an introduction to a group of Bayesian-inference-based methods for fMRI data analysis, which were designed to detect magnitude or functional connectivity change points and to infer their functional interaction patterns based on corresponding temporal boundaries. We also provide a comparison of three popular Bayesian models, that is, Bayesian Magnitude Change Point Model (BMCPM), Bayesian Connectivity Change Point Model (BCCPM), and Dynamic Bayesian Variable Partition Model (DBVPM), and give a summary of their applications. We envision that more delicate Bayesian inference models will be emerging and play increasingly important roles in modeling brain functions in the years to come.
Modular Spectral Inference Framework Applied to Young Stars and Brown Dwarfs
NASA Technical Reports Server (NTRS)
Gully-Santiago, Michael A.; Marley, Mark S.
2017-01-01
In practice, synthetic spectral models are imperfect, causing inaccurate estimates of stellar parameters. Using forward modeling and statistical inference, we derive accurate stellar parameters for a given observed spectrum by emulating a grid of precomputed spectra to track uncertainties. Spectral inference as applied to brown dwarfs re: Synthetic spectral models (Marley et al 1996 and 2014) via the newest grid spans a massive multi-dimensional grid applied to IGRINS spectra, improving atmospheric models for JWST. When applied to young stars(10Myr) with large starpots, they can be measured spectroscopically, especially in the near-IR with IGRINS.
Inference of neuronal network spike dynamics and topology from calcium imaging data
Lütcke, Henry; Gerhard, Felipe; Zenke, Friedemann; Gerstner, Wulfram; Helmchen, Fritjof
2013-01-01
Two-photon calcium imaging enables functional analysis of neuronal circuits by inferring action potential (AP) occurrence (“spike trains”) from cellular fluorescence signals. It remains unclear how experimental parameters such as signal-to-noise ratio (SNR) and acquisition rate affect spike inference and whether additional information about network structure can be extracted. Here we present a simulation framework for quantitatively assessing how well spike dynamics and network topology can be inferred from noisy calcium imaging data. For simulated AP-evoked calcium transients in neocortical pyramidal cells, we analyzed the quality of spike inference as a function of SNR and data acquisition rate using a recently introduced peeling algorithm. Given experimentally attainable values of SNR and acquisition rate, neural spike trains could be reconstructed accurately and with up to millisecond precision. We then applied statistical neuronal network models to explore how remaining uncertainties in spike inference affect estimates of network connectivity and topological features of network organization. We define the experimental conditions suitable for inferring whether the network has a scale-free structure and determine how well hub neurons can be identified. Our findings provide a benchmark for future calcium imaging studies that aim to reliably infer neuronal network properties. PMID:24399936
NASA Astrophysics Data System (ADS)
Mandel, Kaisey; Kirshner, R. P.; Narayan, G.; Wood-Vasey, W. M.; Friedman, A. S.; Hicken, M.
2010-01-01
I have constructed a comprehensive statistical model for Type Ia supernova light curves spanning optical through near infrared data simultaneously. The near infrared light curves are found to be excellent standard candles (sigma(MH) = 0.11 +/- 0.03 mag) that are less vulnerable to systematic error from dust extinction, a major confounding factor for cosmological studies. A hierarchical statistical framework incorporates coherently multiple sources of randomness and uncertainty, including photometric error, intrinsic supernova light curve variations and correlations, dust extinction and reddening, peculiar velocity dispersion and distances, for probabilistic inference with Type Ia SN light curves. Inferences are drawn from the full probability density over individual supernovae and the SN Ia and dust populations, conditioned on a dataset of SN Ia light curves and redshifts. To compute probabilistic inferences with hierarchical models, I have developed BayeSN, a Markov Chain Monte Carlo algorithm based on Gibbs sampling. This code explores and samples the global probability density of parameters describing individual supernovae and the population. I have applied this hierarchical model to optical and near infrared data of over 100 nearby Type Ia SN from PAIRITEL, the CfA3 sample, and the literature. Using this statistical model, I find that SN with optical and NIR data have a smaller residual scatter in the Hubble diagram than SN with only optical data. The continued study of Type Ia SN in the near infrared will be important for improving their utility as precise and accurate cosmological distance indicators.
Statistical inference involving binomial and negative binomial parameters.
García-Pérez, Miguel A; Núñez-Antón, Vicente
2009-05-01
Statistical inference about two binomial parameters implies that they are both estimated by binomial sampling. There are occasions in which one aims at testing the equality of two binomial parameters before and after the occurrence of the first success along a sequence of Bernoulli trials. In these cases, the binomial parameter before the first success is estimated by negative binomial sampling whereas that after the first success is estimated by binomial sampling, and both estimates are related. This paper derives statistical tools to test two hypotheses, namely, that both binomial parameters equal some specified value and that both parameters are equal though unknown. Simulation studies are used to show that in small samples both tests are accurate in keeping the nominal Type-I error rates, and also to determine sample size requirements to detect large, medium, and small effects with adequate power. Additional simulations also show that the tests are sufficiently robust to certain violations of their assumptions.
Econophysical visualization of Adam Smith’s invisible hand
NASA Astrophysics Data System (ADS)
Cohen, Morrel H.; Eliazar, Iddo I.
2013-02-01
Consider a complex system whose macrostate is statistically observable, but yet whose operating mechanism is an unknown black-box. In this paper we address the problem of inferring, from the system’s macrostate statistics, the system’s intrinsic force yielding the observed statistics. The inference is established via two diametrically opposite approaches which result in the very same intrinsic force: a top-down approach based on the notion of entropy, and a bottom-up approach based on the notion of Langevin dynamics. The general results established are applied to the problem of visualizing the intrinsic socioeconomic force-Adam Smith’s invisible hand-shaping the distribution of wealth in human societies. Our analysis yields quantitative econophysical representations of figurative socioeconomic forces, quantitative definitions of “poor” and “rich”, and a quantitative characterization of the “poor-get-poorer” and the “rich-get-richer” phenomena.
Sampling and counting genome rearrangement scenarios
2015-01-01
Background Even for moderate size inputs, there are a tremendous number of optimal rearrangement scenarios, regardless what the model is and which specific question is to be answered. Therefore giving one optimal solution might be misleading and cannot be used for statistical inferring. Statistically well funded methods are necessary to sample uniformly from the solution space and then a small number of samples are sufficient for statistical inferring. Contribution In this paper, we give a mini-review about the state-of-the-art of sampling and counting rearrangement scenarios, focusing on the reversal, DCJ and SCJ models. Above that, we also give a Gibbs sampler for sampling most parsimonious labeling of evolutionary trees under the SCJ model. The method has been implemented and tested on real life data. The software package together with example data can be downloaded from http://www.renyi.hu/~miklosi/SCJ-Gibbs/ PMID:26452124
Online Updating of Statistical Inference in the Big Data Setting.
Schifano, Elizabeth D; Wu, Jing; Wang, Chun; Yan, Jun; Chen, Ming-Hui
2016-01-01
We present statistical methods for big data arising from online analytical processing, where large amounts of data arrive in streams and require fast analysis without storage/access to the historical data. In particular, we develop iterative estimating algorithms and statistical inferences for linear models and estimating equations that update as new data arrive. These algorithms are computationally efficient, minimally storage-intensive, and allow for possible rank deficiencies in the subset design matrices due to rare-event covariates. Within the linear model setting, the proposed online-updating framework leads to predictive residual tests that can be used to assess the goodness-of-fit of the hypothesized model. We also propose a new online-updating estimator under the estimating equation setting. Theoretical properties of the goodness-of-fit tests and proposed estimators are examined in detail. In simulation studies and real data applications, our estimator compares favorably with competing approaches under the estimating equation setting.
When mechanism matters: Bayesian forecasting using models of ecological diffusion
Hefley, Trevor J.; Hooten, Mevin B.; Russell, Robin E.; Walsh, Daniel P.; Powell, James A.
2017-01-01
Ecological diffusion is a theory that can be used to understand and forecast spatio-temporal processes such as dispersal, invasion, and the spread of disease. Hierarchical Bayesian modelling provides a framework to make statistical inference and probabilistic forecasts, using mechanistic ecological models. To illustrate, we show how hierarchical Bayesian models of ecological diffusion can be implemented for large data sets that are distributed densely across space and time. The hierarchical Bayesian approach is used to understand and forecast the growth and geographic spread in the prevalence of chronic wasting disease in white-tailed deer (Odocoileus virginianus). We compare statistical inference and forecasts from our hierarchical Bayesian model to phenomenological regression-based methods that are commonly used to analyse spatial occurrence data. The mechanistic statistical model based on ecological diffusion led to important ecological insights, obviated a commonly ignored type of collinearity, and was the most accurate method for forecasting.
Probabilistic Signal Recovery and Random Matrices
2016-12-08
applications in statistics , biomedical data analysis, quantization, dimen- sion reduction, and networks science. 1. High-dimensional inference and geometry Our...low-rank approxima- tion, with applications to community detection in networks, Annals of Statistics 44 (2016), 373–400. [7] C. Le, E. Levina, R...approximation, with applications to community detection in networks, Annals of Statistics 44 (2016), 373–400. C. Le, E. Levina, R. Vershynin, Concentration
The Use of a Context-Based Information Retrieval Technique
2009-07-01
provided in context. Latent Semantic Analysis (LSA) is a statistical technique for inferring contextual and structural information, and previous studies...WAIS). 10 DSTO-TR-2322 1.4.4 Latent Semantic Analysis LSA, which is also known as latent semantic indexing (LSI), uses a statistical and...1.4.6 Language Models In contrast, natural language models apply algorithms that combine statistical information with semantic information. Semantic
An Introduction to Confidence Intervals for Both Statistical Estimates and Effect Sizes.
ERIC Educational Resources Information Center
Capraro, Mary Margaret
This paper summarizes methods of estimating confidence intervals, including classical intervals and intervals for effect sizes. The recent American Psychological Association (APA) Task Force on Statistical Inference report suggested that confidence intervals should always be reported, and the fifth edition of the APA "Publication Manual"…
Balancing Treatment and Control Groups in Quasi-Experiments: An Introduction to Propensity Scoring
ERIC Educational Resources Information Center
Connelly, Brian S.; Sackett, Paul R.; Waters, Shonna D.
2013-01-01
Organizational and applied sciences have long struggled with improving causal inference in quasi-experiments. We introduce organizational researchers to propensity scoring, a statistical technique that has become popular in other applied sciences as a means for improving internal validity. Propensity scoring statistically models how individuals in…
Modeling Cross-Situational Word-Referent Learning: Prior Questions
ERIC Educational Resources Information Center
Yu, Chen; Smith, Linda B.
2012-01-01
Both adults and young children possess powerful statistical computation capabilities--they can infer the referent of a word from highly ambiguous contexts involving many words and many referents by aggregating cross-situational statistical information across contexts. This ability has been explained by models of hypothesis testing and by models of…
Temporal and Statistical Information in Causal Structure Learning
ERIC Educational Resources Information Center
McCormack, Teresa; Frosch, Caren; Patrick, Fiona; Lagnado, David
2015-01-01
Three experiments examined children's and adults' abilities to use statistical and temporal information to distinguish between common cause and causal chain structures. In Experiment 1, participants were provided with conditional probability information and/or temporal information and asked to infer the causal structure of a 3-variable mechanical…
Secondary Analysis of National Longitudinal Transition Study 2 Data
ERIC Educational Resources Information Center
Hicks, Tyler A.; Knollman, Greg A.
2015-01-01
This review examines published secondary analyses of National Longitudinal Transition Study 2 (NLTS2) data, with a primary focus upon statistical objectives, paradigms, inferences, and methods. Its primary purpose was to determine which statistical techniques have been common in secondary analyses of NLTS2 data. The review begins with an…
Some General Goals in Teaching Statistics.
ERIC Educational Resources Information Center
Blalock, H. M.
1987-01-01
States that regardless of the content or level of a statistics course, five goals to reach are: (1) overcoming fears, resistances, and tendencies to memorize; (2) the importance of intellectual honesty and integrity; (3) understanding relationship between deductive and inductive inferences; (4) learning to play role of reasonable critic; and (5)…
Propensity Score Analysis: An Alternative Statistical Approach for HRD Researchers
ERIC Educational Resources Information Center
Keiffer, Greggory L.; Lane, Forrest C.
2016-01-01
Purpose: This paper aims to introduce matching in propensity score analysis (PSA) as an alternative statistical approach for researchers looking to make causal inferences using intact groups. Design/methodology/approach: An illustrative example demonstrated the varying results of analysis of variance, analysis of covariance and PSA on a heuristic…
Technology Focus: Using Technology to Explore Statistical Inference
ERIC Educational Resources Information Center
Garofalo, Joe; Juersivich, Nicole
2007-01-01
There is much research that documents what many teachers know, that students struggle with many concepts in probability and statistics. This article presents two sample activities the authors use to help preservice teachers develop ideas about how they can use technology to promote their students' ability to understand mathematics and connect…
ERIC Educational Resources Information Center
Conant, Darcy Lynn
2013-01-01
Stochastic understanding of probability distribution undergirds development of conceptual connections between probability and statistics and supports development of a principled understanding of statistical inference. This study investigated the impact of an instructional course intervention designed to support development of stochastic…
Basic Statistical Concepts and Methods for Earth Scientists
Olea, Ricardo A.
2008-01-01
INTRODUCTION Statistics is the science of collecting, analyzing, interpreting, modeling, and displaying masses of numerical data primarily for the characterization and understanding of incompletely known systems. Over the years, these objectives have lead to a fair amount of analytical work to achieve, substantiate, and guide descriptions and inferences.
ERIC Educational Resources Information Center
Amundson, Vickie E.; Bernstein, Ira H.
1973-01-01
Authors note that Fehrer and Biederman's two statistical tests were not of equal power and that their conclusion could be a statistical artifact of both the lesser power of the verbal report comparison and the insensitivity of their particular verbal report indicator. (Editor)
Optimism bias leads to inconclusive results - an empirical study
Djulbegovic, Benjamin; Kumar, Ambuj; Magazin, Anja; Schroen, Anneke T.; Soares, Heloisa; Hozo, Iztok; Clarke, Mike; Sargent, Daniel; Schell, Michael J.
2010-01-01
Objective Optimism bias refers to unwarranted belief in the efficacy of new therapies. We assessed the impact of optimism bias on a proportion of trials that did not answer their research question successfully, and explored whether poor accrual or optimism bias is responsible for inconclusive results. Study Design Systematic review Setting Retrospective analysis of a consecutive series phase III randomized controlled trials (RCTs) performed under the aegis of National Cancer Institute Cooperative groups. Results 359 trials (374 comparisons) enrolling 150,232 patients were analyzed. 70% (262/374) of the trials generated conclusive results according to the statistical criteria. Investigators made definitive statements related to the treatment preference in 73% (273/374) of studies. Investigators’ judgments and statistical inferences were concordant in 75% (279/374) of trials. Investigators consistently overestimated their expected treatment effects, but to a significantly larger extent for inconclusive trials. The median ratio of expected over observed hazard ratio or odds ratio was 1.34 (range 0.19 – 15.40) in conclusive trials compared to 1.86 (range 1.09 – 12.00) in inconclusive studies (p<0.0001). Only 17% of the trials had treatment effects that matched original researchers’ expectations. Conclusion Formal statistical inference is sufficient to answer the research question in 75% of RCTs. The answers to the other 25% depend mostly on subjective judgments, which at times are in conflict with statistical inference. Optimism bias significantly contributes to inconclusive results. PMID:21163620
Optimism bias leads to inconclusive results-an empirical study.
Djulbegovic, Benjamin; Kumar, Ambuj; Magazin, Anja; Schroen, Anneke T; Soares, Heloisa; Hozo, Iztok; Clarke, Mike; Sargent, Daniel; Schell, Michael J
2011-06-01
Optimism bias refers to unwarranted belief in the efficacy of new therapies. We assessed the impact of optimism bias on a proportion of trials that did not answer their research question successfully and explored whether poor accrual or optimism bias is responsible for inconclusive results. Systematic review. Retrospective analysis of a consecutive-series phase III randomized controlled trials (RCTs) performed under the aegis of National Cancer Institute Cooperative groups. Three hundred fifty-nine trials (374 comparisons) enrolling 150,232 patients were analyzed. Seventy percent (262 of 374) of the trials generated conclusive results according to the statistical criteria. Investigators made definitive statements related to the treatment preference in 73% (273 of 374) of studies. Investigators' judgments and statistical inferences were concordant in 75% (279 of 374) of trials. Investigators consistently overestimated their expected treatment effects but to a significantly larger extent for inconclusive trials. The median ratio of expected and observed hazard ratio or odds ratio was 1.34 (range: 0.19-15.40) in conclusive trials compared with 1.86 (range: 1.09-12.00) in inconclusive studies (P<0.0001). Only 17% of the trials had treatment effects that matched original researchers' expectations. Formal statistical inference is sufficient to answer the research question in 75% of RCTs. The answers to the other 25% depend mostly on subjective judgments, which at times are in conflict with statistical inference. Optimism bias significantly contributes to inconclusive results. Copyright © 2011 Elsevier Inc. All rights reserved.
Forward and backward inference in spatial cognition.
Penny, Will D; Zeidman, Peter; Burgess, Neil
2013-01-01
This paper shows that the various computations underlying spatial cognition can be implemented using statistical inference in a single probabilistic model. Inference is implemented using a common set of 'lower-level' computations involving forward and backward inference over time. For example, to estimate where you are in a known environment, forward inference is used to optimally combine location estimates from path integration with those from sensory input. To decide which way to turn to reach a goal, forward inference is used to compute the likelihood of reaching that goal under each option. To work out which environment you are in, forward inference is used to compute the likelihood of sensory observations under the different hypotheses. For reaching sensory goals that require a chaining together of decisions, forward inference can be used to compute a state trajectory that will lead to that goal, and backward inference to refine the route and estimate control signals that produce the required trajectory. We propose that these computations are reflected in recent findings of pattern replay in the mammalian brain. Specifically, that theta sequences reflect decision making, theta flickering reflects model selection, and remote replay reflects route and motor planning. We also propose a mapping of the above computational processes onto lateral and medial entorhinal cortex and hippocampus.
Forward and Backward Inference in Spatial Cognition
Penny, Will D.; Zeidman, Peter; Burgess, Neil
2013-01-01
This paper shows that the various computations underlying spatial cognition can be implemented using statistical inference in a single probabilistic model. Inference is implemented using a common set of ‘lower-level’ computations involving forward and backward inference over time. For example, to estimate where you are in a known environment, forward inference is used to optimally combine location estimates from path integration with those from sensory input. To decide which way to turn to reach a goal, forward inference is used to compute the likelihood of reaching that goal under each option. To work out which environment you are in, forward inference is used to compute the likelihood of sensory observations under the different hypotheses. For reaching sensory goals that require a chaining together of decisions, forward inference can be used to compute a state trajectory that will lead to that goal, and backward inference to refine the route and estimate control signals that produce the required trajectory. We propose that these computations are reflected in recent findings of pattern replay in the mammalian brain. Specifically, that theta sequences reflect decision making, theta flickering reflects model selection, and remote replay reflects route and motor planning. We also propose a mapping of the above computational processes onto lateral and medial entorhinal cortex and hippocampus. PMID:24348230
NASA Astrophysics Data System (ADS)
Ma, Chuang; Chen, Han-Shuang; Lai, Ying-Cheng; Zhang, Hai-Feng
2018-02-01
Complex networks hosting binary-state dynamics arise in a variety of contexts. In spite of previous works, to fully reconstruct the network structure from observed binary data remains challenging. We articulate a statistical inference based approach to this problem. In particular, exploiting the expectation-maximization (EM) algorithm, we develop a method to ascertain the neighbors of any node in the network based solely on binary data, thereby recovering the full topology of the network. A key ingredient of our method is the maximum-likelihood estimation of the probabilities associated with actual or nonexistent links, and we show that the EM algorithm can distinguish the two kinds of probability values without any ambiguity, insofar as the length of the available binary time series is reasonably long. Our method does not require any a priori knowledge of the detailed dynamical processes, is parameter-free, and is capable of accurate reconstruction even in the presence of noise. We demonstrate the method using combinations of distinct types of binary dynamical processes and network topologies, and provide a physical understanding of the underlying reconstruction mechanism. Our statistical inference based reconstruction method contributes an additional piece to the rapidly expanding "toolbox" of data based reverse engineering of complex networked systems.
A statistical model for interpreting computerized dynamic posturography data
NASA Technical Reports Server (NTRS)
Feiveson, Alan H.; Metter, E. Jeffrey; Paloski, William H.
2002-01-01
Computerized dynamic posturography (CDP) is widely used for assessment of altered balance control. CDP trials are quantified using the equilibrium score (ES), which ranges from zero to 100, as a decreasing function of peak sway angle. The problem of how best to model and analyze ESs from a controlled study is considered. The ES often exhibits a skewed distribution in repeated trials, which can lead to incorrect inference when applying standard regression or analysis of variance models. Furthermore, CDP trials are terminated when a patient loses balance. In these situations, the ES is not observable, but is assigned the lowest possible score--zero. As a result, the response variable has a mixed discrete-continuous distribution, further compromising inference obtained by standard statistical methods. Here, we develop alternative methodology for analyzing ESs under a stochastic model extending the ES to a continuous latent random variable that always exists, but is unobserved in the event of a fall. Loss of balance occurs conditionally, with probability depending on the realized latent ES. After fitting the model by a form of quasi-maximum-likelihood, one may perform statistical inference to assess the effects of explanatory variables. An example is provided, using data from the NIH/NIA Baltimore Longitudinal Study on Aging.
Statistical inference of protein structural alignments using information and compression.
Collier, James H; Allison, Lloyd; Lesk, Arthur M; Stuckey, Peter J; Garcia de la Banda, Maria; Konagurthu, Arun S
2017-04-01
Structural molecular biology depends crucially on computational techniques that compare protein three-dimensional structures and generate structural alignments (the assignment of one-to-one correspondences between subsets of amino acids based on atomic coordinates). Despite its importance, the structural alignment problem has not been formulated, much less solved, in a consistent and reliable way. To overcome these difficulties, we present here a statistical framework for the precise inference of structural alignments, built on the Bayesian and information-theoretic principle of Minimum Message Length (MML). The quality of any alignment is measured by its explanatory power-the amount of lossless compression achieved to explain the protein coordinates using that alignment. We have implemented this approach in MMLigner , the first program able to infer statistically significant structural alignments. We also demonstrate the reliability of MMLigner 's alignment results when compared with the state of the art. Importantly, MMLigner can also discover different structural alignments of comparable quality, a challenging problem for oligomers and protein complexes. Source code, binaries and an interactive web version are available at http://lcb.infotech.monash.edu.au/mmligner . arun.konagurthu@monash.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Ma, Chuang; Chen, Han-Shuang; Lai, Ying-Cheng; Zhang, Hai-Feng
2018-02-01
Complex networks hosting binary-state dynamics arise in a variety of contexts. In spite of previous works, to fully reconstruct the network structure from observed binary data remains challenging. We articulate a statistical inference based approach to this problem. In particular, exploiting the expectation-maximization (EM) algorithm, we develop a method to ascertain the neighbors of any node in the network based solely on binary data, thereby recovering the full topology of the network. A key ingredient of our method is the maximum-likelihood estimation of the probabilities associated with actual or nonexistent links, and we show that the EM algorithm can distinguish the two kinds of probability values without any ambiguity, insofar as the length of the available binary time series is reasonably long. Our method does not require any a priori knowledge of the detailed dynamical processes, is parameter-free, and is capable of accurate reconstruction even in the presence of noise. We demonstrate the method using combinations of distinct types of binary dynamical processes and network topologies, and provide a physical understanding of the underlying reconstruction mechanism. Our statistical inference based reconstruction method contributes an additional piece to the rapidly expanding "toolbox" of data based reverse engineering of complex networked systems.
PREFACE: ELC International Meeting on Inference, Computation, and Spin Glasses (ICSG2013)
NASA Astrophysics Data System (ADS)
Kabashima, Yoshiyuki; Hukushima, Koji; Inoue, Jun-ichi; Tanaka, Toshiyuki; Watanabe, Osamu
2013-12-01
The close relationship between probability-based inference and statistical mechanics of disordered systems has been noted for some time. This relationship has provided researchers with a theoretical foundation in various fields of information processing for analytical performance evaluation and construction of efficient algorithms based on message-passing or Monte Carlo sampling schemes. The ELC International Meeting on 'Inference, Computation, and Spin Glasses (ICSG2013)', was held in Sapporo 28-30 July 2013. The meeting was organized as a satellite meeting of STATPHYS25 in order to offer a forum where concerned researchers can assemble and exchange information on the latest results and newly established methodologies, and discuss future directions of the interdisciplinary studies between statistical mechanics and information sciences. Financial support from Grant-in-Aid for Scientific Research on Innovative Areas, MEXT, Japan 'Exploring the Limits of Computation (ELC)' is gratefully acknowledged. We are pleased to publish 23 papers contributed by invited speakers of ICSG2013 in this volume of Journal of Physics: Conference Series. We hope that this volume will promote further development of this highly vigorous interdisciplinary field between statistical mechanics and information/computer science. Editors and ICSG2013 Organizing Committee: Koji Hukushima Jun-ichi Inoue (Local Chair of ICSG2013) Yoshiyuki Kabashima (Editor-in-Chief) Toshiyuki Tanaka Osamu Watanabe (General Chair of ICSG2013)
Measuring the Number of M Dwarfs per M Dwarf Using Kepler Eclipsing Binaries
NASA Astrophysics Data System (ADS)
Shan, Yutong; Johnson, John A.; Morton, Timothy D.
2015-11-01
We measure the binarity of detached M dwarfs in the Kepler field with orbital periods in the range of 1-90 days. Kepler’s photometric precision and nearly continuous monitoring of stellar targets over time baselines ranging from 3 months to 4 years make its detection efficiency for eclipsing binaries nearly complete over this period range and for all radius ratios. Our investigation employs a statistical framework akin to that used for inferring planetary occurrence rates from planetary transits. The obvious simplification is that eclipsing binaries have a vastly improved detection efficiency that is limited chiefly by their geometric probabilities to eclipse. For the M-dwarf sample observed by the Kepler Mission, the fractional incidence of eclipsing binaries implies that there are {0.11}-0.04+0.02 close stellar companions per apparently single M dwarf. Our measured binarity is higher than previous inferences of the occurrence rate of close binaries via radial velocity techniques, at roughly the 2σ level. This study represents the first use of eclipsing binary detections from a high quality transiting planet mission to infer binary statistics. Application of this statistical framework to the eclipsing binaries discovered by future transit surveys will establish better constraints on short-period M+M binary rate, as well as binarity measurements for stars of other spectral types.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, J.; Elmore, R.; Kennedy, C.
This research is to illustrate the use of statistical inference techniques in order to quantify the uncertainty surrounding reliability estimates in a step-stress accelerated degradation testing (SSADT) scenario. SSADT can be used when a researcher is faced with a resource-constrained environment, e.g., limits on chamber time or on the number of units to test. We apply the SSADT methodology to a degradation experiment involving concentrated solar power (CSP) mirrors and compare the results to a more traditional multiple accelerated testing paradigm. Specifically, our work includes: (1) designing a durability testing plan for solar mirrors (3M's new improved silvered acrylic "Solarmore » Reflector Film (SFM) 1100") through the ultra-accelerated weathering system (UAWS), (2) defining degradation paths of optical performance based on the SSADT model which is accelerated by high UV-radiant exposure, and (3) developing service lifetime prediction models for solar mirrors using advanced statistical inference. We use the method of least squares to estimate the model parameters and this serves as the basis for the statistical inference in SSADT. Several quantities of interest can be estimated from this procedure, e.g., mean-time-to-failure (MTTF) and warranty time. The methods allow for the estimation of quantities that may be of interest to the domain scientists.« less
Cancer Survival Estimates Due to Non-Uniform Loss to Follow-Up and Non-Proportional Hazards
K M, Jagathnath Krishna; Mathew, Aleyamma; Sara George, Preethi
2017-06-25
Background: Cancer survival depends on loss to follow-up (LFU) and non-proportional hazards (non-PH). If LFU is high, survival will be over-estimated. If hazard is non-PH, rank tests will provide biased inference and Cox-model will provide biased hazard-ratio. We assessed the bias due to LFU and non-PH factor in cancer survival and provided alternate methods for unbiased inference and hazard-ratio. Materials and Methods: Kaplan-Meier survival were plotted using a realistic breast cancer (BC) data-set, with >40%, 5-year LFU and compared it using another BC data-set with <15%, 5-year LFU to assess the bias in survival due to high LFU. Age at diagnosis of the latter data set was used to illustrate the bias due to a non-PH factor. Log-rank test was employed to assess the bias in p-value and Cox-model was used to assess the bias in hazard-ratio for the non-PH factor. Schoenfeld statistic was used to test the non-PH of age. For the non-PH factor, we employed Renyi statistic for inference and time dependent Cox-model for hazard-ratio. Results: Five-year BC survival was 69% (SE: 1.1%) vs. 90% (SE: 0.7%) for data with low vs. high LFU respectively. Age (<45, 46-54 & >54 years) was a non-PH factor (p-value: 0.036). However, survival by age was significant (log-rank p-value: 0.026), but not significant using Renyi statistic (p=0.067). Hazard ratio (HR) for age using Cox-model was 1.012 (95%CI: 1.004 -1.019) and the same using time-dependent Cox-model was in the other direction (HR: 0.997; 95% CI: 0.997- 0.998). Conclusion: Over-estimated survival was observed for cancer with high LFU. Log-rank statistic and Cox-model provided biased results for non-PH factor. For data with non-PH factors, Renyi statistic and time dependent Cox-model can be used as alternate methods to obtain unbiased inference and estimates. Creative Commons Attribution License
NASA Astrophysics Data System (ADS)
Caballero, R. N.; Lee, K. J.; Lentati, L.; Desvignes, G.; Champion, D. J.; Verbiest, J. P. W.; Janssen, G. H.; Stappers, B. W.; Kramer, M.; Lazarus, P.; Possenti, A.; Tiburzi, C.; Perrodin, D.; Osłowski, S.; Babak, S.; Bassa, C. G.; Brem, P.; Burgay, M.; Cognard, I.; Gair, J. R.; Graikou, E.; Guillemot, L.; Hessels, J. W. T.; Karuppusamy, R.; Lassus, A.; Liu, K.; McKee, J.; Mingarelli, C. M. F.; Petiteau, A.; Purver, M. B.; Rosado, P. A.; Sanidas, S.; Sesana, A.; Shaifullah, G.; Smits, R.; Taylor, S. R.; Theureau, G.; van Haasteren, R.; Vecchio, A.
2016-04-01
The sensitivity of Pulsar Timing Arrays to gravitational waves (GWs) depends on the noise present in the individual pulsar timing data. Noise may be either intrinsic or extrinsic to the pulsar. Intrinsic sources of noise will include rotational instabilities, for example. Extrinsic sources of noise include contributions from physical processes which are not sufficiently well modelled, for example, dispersion and scattering effects, analysis errors and instrumental instabilities. We present the results from a noise analysis for 42 millisecond pulsars (MSPs) observed with the European Pulsar Timing Array. For characterizing the low-frequency, stochastic and achromatic noise component, or `timing noise', we employ two methods, based on Bayesian and frequentist statistics. For 25 MSPs, we achieve statistically significant measurements of their timing noise parameters and find that the two methods give consistent results. For the remaining 17 MSPs, we place upper limits on the timing noise amplitude at the 95 per cent confidence level. We additionally place an upper limit on the contribution to the pulsar noise budget from errors in the reference terrestrial time standards (below 1 per cent), and we find evidence for a noise component which is present only in the data of one of the four used telescopes. Finally, we estimate that the timing noise of individual pulsars reduces the sensitivity of this data set to an isotropic, stochastic GW background by a factor of >9.1 and by a factor of >2.3 for continuous GWs from resolvable, inspiralling supermassive black hole binaries with circular orbits.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Marzouk, Youssef
Predictive simulation of complex physical systems increasingly rests on the interplay of experimental observations with computational models. Key inputs, parameters, or structural aspects of models may be incomplete or unknown, and must be developed from indirect and limited observations. At the same time, quantified uncertainties are needed to qualify computational predictions in the support of design and decision-making. In this context, Bayesian statistics provides a foundation for inference from noisy and limited data, but at prohibitive computional expense. This project intends to make rigorous predictive modeling *feasible* in complex physical systems, via accelerated and scalable tools for uncertainty quantification, Bayesianmore » inference, and experimental design. Specific objectives are as follows: 1. Develop adaptive posterior approximations and dimensionality reduction approaches for Bayesian inference in high-dimensional nonlinear systems. 2. Extend accelerated Bayesian methodologies to large-scale {\\em sequential} data assimilation, fully treating nonlinear models and non-Gaussian state and parameter distributions. 3. Devise efficient surrogate-based methods for Bayesian model selection and the learning of model structure. 4. Develop scalable simulation/optimization approaches to nonlinear Bayesian experimental design, for both parameter inference and model selection. 5. Demonstrate these inferential tools on chemical kinetic models in reacting flow, constructing and refining thermochemical and electrochemical models from limited data. Demonstrate Bayesian filtering on canonical stochastic PDEs and in the dynamic estimation of inhomogeneous subsurface properties and flow fields.« less
Unraveling multiple changes in complex climate time series using Bayesian inference
NASA Astrophysics Data System (ADS)
Berner, Nadine; Trauth, Martin H.; Holschneider, Matthias
2016-04-01
Change points in time series are perceived as heterogeneities in the statistical or dynamical characteristics of observations. Unraveling such transitions yields essential information for the understanding of the observed system. The precise detection and basic characterization of underlying changes is therefore of particular importance in environmental sciences. We present a kernel-based Bayesian inference approach to investigate direct as well as indirect climate observations for multiple generic transition events. In order to develop a diagnostic approach designed to capture a variety of natural processes, the basic statistical features of central tendency and dispersion are used to locally approximate a complex time series by a generic transition model. A Bayesian inversion approach is developed to robustly infer on the location and the generic patterns of such a transition. To systematically investigate time series for multiple changes occurring at different temporal scales, the Bayesian inversion is extended to a kernel-based inference approach. By introducing basic kernel measures, the kernel inference results are composed into a proxy probability to a posterior distribution of multiple transitions. Thus, based on a generic transition model a probability expression is derived that is capable to indicate multiple changes within a complex time series. We discuss the method's performance by investigating direct and indirect climate observations. The approach is applied to environmental time series (about 100 a), from the weather station in Tuscaloosa, Alabama, and confirms documented instrumentation changes. Moreover, the approach is used to investigate a set of complex terrigenous dust records from the ODP sites 659, 721/722 and 967 interpreted as climate indicators of the African region of the Plio-Pleistocene period (about 5 Ma). The detailed inference unravels multiple transitions underlying the indirect climate observations coinciding with established global climate events.
Bayesian Parameter Inference and Model Selection by Population Annealing in Systems Biology
Murakami, Yohei
2014-01-01
Parameter inference and model selection are very important for mathematical modeling in systems biology. Bayesian statistics can be used to conduct both parameter inference and model selection. Especially, the framework named approximate Bayesian computation is often used for parameter inference and model selection in systems biology. However, Monte Carlo methods needs to be used to compute Bayesian posterior distributions. In addition, the posterior distributions of parameters are sometimes almost uniform or very similar to their prior distributions. In such cases, it is difficult to choose one specific value of parameter with high credibility as the representative value of the distribution. To overcome the problems, we introduced one of the population Monte Carlo algorithms, population annealing. Although population annealing is usually used in statistical mechanics, we showed that population annealing can be used to compute Bayesian posterior distributions in the approximate Bayesian computation framework. To deal with un-identifiability of the representative values of parameters, we proposed to run the simulations with the parameter ensemble sampled from the posterior distribution, named “posterior parameter ensemble”. We showed that population annealing is an efficient and convenient algorithm to generate posterior parameter ensemble. We also showed that the simulations with the posterior parameter ensemble can, not only reproduce the data used for parameter inference, but also capture and predict the data which was not used for parameter inference. Lastly, we introduced the marginal likelihood in the approximate Bayesian computation framework for Bayesian model selection. We showed that population annealing enables us to compute the marginal likelihood in the approximate Bayesian computation framework and conduct model selection depending on the Bayes factor. PMID:25089832
Stotts, Steven A; Koch, Robert A
2017-08-01
In this paper an approach is presented to estimate the constraint required to apply maximum entropy (ME) for statistical inference with underwater acoustic data from a single track segment. Previous algorithms for estimating the ME constraint require multiple source track segments to determine the constraint. The approach is relevant for addressing model mismatch effects, i.e., inaccuracies in parameter values determined from inversions because the propagation model does not account for all acoustic processes that contribute to the measured data. One effect of model mismatch is that the lowest cost inversion solution may be well outside a relatively well-known parameter value's uncertainty interval (prior), e.g., source speed from track reconstruction or towed source levels. The approach requires, for some particular parameter value, the ME constraint to produce an inferred uncertainty interval that encompasses the prior. Motivating this approach is the hypothesis that the proposed constraint determination procedure would produce a posterior probability density that accounts for the effect of model mismatch on inferred values of other inversion parameters for which the priors might be quite broad. Applications to both measured and simulated data are presented for model mismatch that produces minimum cost solutions either inside or outside some priors.
NASA Astrophysics Data System (ADS)
Goodman, Steven N.
1989-11-01
This dissertation explores the use of a mathematical measure of statistical evidence, the log likelihood ratio, in clinical trials. The methods and thinking behind the use of an evidential measure are contrasted with traditional methods of analyzing data, which depend primarily on a p-value as an estimate of the statistical strength of an observed data pattern. It is contended that neither the behavioral dictates of Neyman-Pearson hypothesis testing methods, nor the coherency dictates of Bayesian methods are realistic models on which to base inference. The use of the likelihood alone is applied to four aspects of trial design or conduct: the calculation of sample size, the monitoring of data, testing for the equivalence of two treatments, and meta-analysis--the combining of results from different trials. Finally, a more general model of statistical inference, using belief functions, is used to see if it is possible to separate the assessment of evidence from our background knowledge. It is shown that traditional and Bayesian methods can be modeled as two ends of a continuum of structured background knowledge, methods which summarize evidence at the point of maximum likelihood assuming no structure, and Bayesian methods assuming complete knowledge. Both schools are seen to be missing a concept of ignorance- -uncommitted belief. This concept provides the key to understanding the problem of sampling to a foregone conclusion and the role of frequency properties in statistical inference. The conclusion is that statistical evidence cannot be defined independently of background knowledge, and that frequency properties of an estimator are an indirect measure of uncommitted belief. Several likelihood summaries need to be used in clinical trials, with the quantitative disparity between summaries being an indirect measure of our ignorance. This conclusion is linked with parallel ideas in the philosophy of science and cognitive psychology.
Ionospheric and Birkeland current distributions inferred from the MAGSAT magnetometer data
NASA Technical Reports Server (NTRS)
Zanetti, L. J.; Potemra, T. A.; Baumjohann, W.
1983-01-01
Ionospheric and field-aligned sheet current density distributions are presently inferred by means of MAGSAT vector magnetometer data, together with an accurate magnetic field model. By comparing Hall current densities inferred from the MAGSAT data and those inferred from simultaneously recorded ground based data acquired by the Scandinavian magnetometer array, it is determined that the former have previously been underestimated due to high damping of magnetic variations with high spatial wave numbers between the ionosphere and the MAGSAT orbit. Among important results of this study is noted the fact that the Birkeland and electrojet current systems are colocated. The analyses have shown a tendency for triangular rather than constant electrojet current distributions as a function of latitude, consistent with the statistical, uniform regions 1 and 2 Birkeland current patterns.
Strelioff, Christopher C; Crutchfield, James P; Hübler, Alfred W
2007-07-01
Markov chains are a natural and well understood tool for describing one-dimensional patterns in time or space. We show how to infer kth order Markov chains, for arbitrary k , from finite data by applying Bayesian methods to both parameter estimation and model-order selection. Extending existing results for multinomial models of discrete data, we connect inference to statistical mechanics through information-theoretic (type theory) techniques. We establish a direct relationship between Bayesian evidence and the partition function which allows for straightforward calculation of the expectation and variance of the conditional relative entropy and the source entropy rate. Finally, we introduce a method that uses finite data-size scaling with model-order comparison to infer the structure of out-of-class processes.
Harari, Gil
2014-01-01
Statistic significance, also known as p-value, and CI (Confidence Interval) are common statistics measures and are essential for the statistical analysis of studies in medicine and life sciences. These measures provide complementary information about the statistical probability and conclusions regarding the clinical significance of study findings. This article is intended to describe the methodologies, compare between the methods, assert their suitability for the different needs of study results analysis and to explain situations in which each method should be used.
Reliability of a Measure of Institutional Discrimination against Minorities
1979-12-01
samples are presented. The first is based upon classical statistical theory and the second derives from a series of computer-generated Monte Carlo...Institutional racism and sexism . Englewood Cliffs, N. J.: Prentice-Hall, Inc., 1978. Hays, W. L. and Winkler, R. L. Statistics : probability, inference... statistical measure of the e of institutional discrimination are discussed. Two methods of dealing with the problem of reliability of the measure in small
Comparative analysis on the selection of number of clusters in community detection
NASA Astrophysics Data System (ADS)
Kawamoto, Tatsuro; Kabashima, Yoshiyuki
2018-02-01
We conduct a comparative analysis on various estimates of the number of clusters in community detection. An exhaustive comparison requires testing of all possible combinations of frameworks, algorithms, and assessment criteria. In this paper we focus on the framework based on a stochastic block model, and investigate the performance of greedy algorithms, statistical inference, and spectral methods. For the assessment criteria, we consider modularity, map equation, Bethe free energy, prediction errors, and isolated eigenvalues. From the analysis, the tendency of overfit and underfit that the assessment criteria and algorithms have becomes apparent. In addition, we propose that the alluvial diagram is a suitable tool to visualize statistical inference results and can be useful to determine the number of clusters.
Distributed Sensing and Processing for Multi-Camera Networks
NASA Astrophysics Data System (ADS)
Sankaranarayanan, Aswin C.; Chellappa, Rama; Baraniuk, Richard G.
Sensor networks with large numbers of cameras are becoming increasingly prevalent in a wide range of applications, including video conferencing, motion capture, surveillance, and clinical diagnostics. In this chapter, we identify some of the fundamental challenges in designing such systems: robust statistical inference, computationally efficiency, and opportunistic and parsimonious sensing. We show that the geometric constraints induced by the imaging process are extremely useful for identifying and designing optimal estimators for object detection and tracking tasks. We also derive pipelined and parallelized implementations of popular tools used for statistical inference in non-linear systems, of which multi-camera systems are examples. Finally, we highlight the use of the emerging theory of compressive sensing in reducing the amount of data sensed and communicated by a camera network.
General intelligence does not help us understand cognitive evolution.
Shuker, David M; Barrett, Louise; Dickins, Thomas E; Scott-Phillips, Thom C; Barton, Robert A
2017-01-01
Burkart et al. conflate the domain-specificity of cognitive processes with the statistical pattern of variance in behavioural measures that partly reflect those processes. General intelligence is a statistical abstraction, not a cognitive trait, and we argue that the former does not warrant inferences about the nature or evolution of the latter.
Exploring Tree Age & Diameter to Illustrate Sample Design & Inference in Observational Ecology
ERIC Educational Resources Information Center
Casady, Grant M.
2015-01-01
Undergraduate biology labs often explore the techniques of data collection but neglect the statistical framework necessary to express findings. Students can be confused about how to use their statistical knowledge to address specific biological questions. Growth in the area of observational ecology requires that students gain experience in…
Computer-Based Instruction in Statistical Inference; Final Report. Technical Memorandum (TM Series).
ERIC Educational Resources Information Center
Rosenbaum, J.; And Others
A two-year investigation into the development of computer-assisted instruction (CAI) for the improvement of undergraduate training in statistics was undertaken. The first year was largely devoted to designing PLANIT (Programming LANguage for Interactive Teaching) which reduces, or completely eliminates, the need an author of CAI lessons would…
Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction
ERIC Educational Resources Information Center
Imbens, Guido W.; Rubin, Donald B.
2015-01-01
Most questions in social and biomedical sciences are causal in nature: what would happen to individuals, or to groups, if part of their environment were changed? In this groundbreaking text, two world-renowned experts present statistical methods for studying such questions. This book starts with the notion of potential outcomes, each corresponding…
The Co-Emergence of Aggregate and Modelling Reasoning
ERIC Educational Resources Information Center
Aridor, Keren; Ben-Zvi, Dani
2017-01-01
This article examines how two processes--reasoning with statistical modelling of a real phenomenon and aggregate reasoning--can co-emerge. We focus in this case study on the emergent reasoning of two fifth graders (aged 10) involved in statistical data analysis, informal inference, and modelling activities using TinkerPlots™. We describe nine…
ERIC Educational Resources Information Center
Schulz, Laura E.; Bonawitz, Elizabeth Baraff; Griffiths, Thomas L.
2007-01-01
Causal learning requires integrating constraints provided by domain-specific theories with domain-general statistical learning. In order to investigate the interaction between these factors, the authors presented preschoolers with stories pitting their existing theories against statistical evidence. Each child heard 2 stories in which 2 candidate…
Using Informal Inferential Reasoning to Develop Formal Concepts: Analyzing an Activity
ERIC Educational Resources Information Center
Weinberg, Aaron; Wiesner, Emilie; Pfaff, Thomas J.
2010-01-01
Inferential reasoning is a central component of statistics. Researchers have suggested that students should develop an informal understanding of the ideas that underlie inference before learning the concepts formally. This paper presents a hands-on activity that is designed to help students in an introductory statistics course draw informal…
A Modeling Approach to the Development of Students' Informal Inferential Reasoning
ERIC Educational Resources Information Center
Doerr, Helen M.; Delmas, Robert; Makar, Katie
2017-01-01
Teaching from an informal statistical inference perspective can address the challenge of teaching statistics in a coherent way. We argue that activities that promote model-based reasoning address two additional challenges: providing a coherent sequence of topics and promoting the application of knowledge to novel situations. We take a models and…
Statistical inference of selection and divergence of rice blast resistance gene Pi-ta
USDA-ARS?s Scientific Manuscript database
The resistance gene Pi-ta has been effectively used to control rice blast disease worldwide. A few recent studies have described the possible evolution of Pi-ta in cultivated and weedy rice. However, evolutionary statistics used for the studies are too limited to precisely understand selection and d...
A Scalable Approach to Probabilistic Latent Space Inference of Large-Scale Networks
Yin, Junming; Ho, Qirong; Xing, Eric P.
2014-01-01
We propose a scalable approach for making inference about latent spaces of large networks. With a succinct representation of networks as a bag of triangular motifs, a parsimonious statistical model, and an efficient stochastic variational inference algorithm, we are able to analyze real networks with over a million vertices and hundreds of latent roles on a single machine in a matter of hours, a setting that is out of reach for many existing methods. When compared to the state-of-the-art probabilistic approaches, our method is several orders of magnitude faster, with competitive or improved accuracy for latent space recovery and link prediction. PMID:25400487
The influence of social information on children's statistical and causal inferences.
Sobel, David M; Kirkham, Natasha Z
2012-01-01
Constructivist accounts of learning posit that causal inference is a child-driven process. Recent interpretations of such accounts also suggest that the process children use for causal learning is rational: Children interpret and learn from new evidence in light of their existing beliefs. We argue that such mechanisms are also driven by informative social cues and suggest ways in which such information influences both preschoolers' and infants' inferences. In doing so, we argue that a rational constructivist account should not only focus on describing the child's internal cognitive mechanisms for learning but also on how social information affects the process of learning.
IMAGINE: Interstellar MAGnetic field INference Engine
NASA Astrophysics Data System (ADS)
Steininger, Theo
2018-03-01
IMAGINE (Interstellar MAGnetic field INference Engine) performs inference on generic parametric models of the Galaxy. The modular open source framework uses highly optimized tools and technology such as the MultiNest sampler (ascl:1109.006) and the information field theory framework NIFTy (ascl:1302.013) to create an instance of the Milky Way based on a set of parameters for physical observables, using Bayesian statistics to judge the mismatch between measured data and model prediction. The flexibility of the IMAGINE framework allows for simple refitting for newly available data sets and makes state-of-the-art Bayesian methods easily accessible particularly for random components of the Galactic magnetic field.
Cocco, S; Monasson, R; Sessak, V
2011-05-01
We consider the problem of inferring the interactions between a set of N binary variables from the knowledge of their frequencies and pairwise correlations. The inference framework is based on the Hopfield model, a special case of the Ising model where the interaction matrix is defined through a set of patterns in the variable space, and is of rank much smaller than N. We show that maximum likelihood inference is deeply related to principal component analysis when the amplitude of the pattern components ξ is negligible compared to √N. Using techniques from statistical mechanics, we calculate the corrections to the patterns to the first order in ξ/√N. We stress the need to generalize the Hopfield model and include both attractive and repulsive patterns in order to correctly infer networks with sparse and strong interactions. We present a simple geometrical criterion to decide how many attractive and repulsive patterns should be considered as a function of the sampling noise. We moreover discuss how many sampled configurations are required for a good inference, as a function of the system size N and of the amplitude ξ. The inference approach is illustrated on synthetic and biological data.
Inferring explicit weighted consensus networks to represent alternative evolutionary histories
2013-01-01
Background The advent of molecular biology techniques and constant increase in availability of genetic material have triggered the development of many phylogenetic tree inference methods. However, several reticulate evolution processes, such as horizontal gene transfer and hybridization, have been shown to blur the species evolutionary history by causing discordance among phylogenies inferred from different genes. Methods To tackle this problem, we hereby describe a new method for inferring and representing alternative (reticulate) evolutionary histories of species as an explicit weighted consensus network which can be constructed from a collection of gene trees with or without prior knowledge of the species phylogeny. Results We provide a way of building a weighted phylogenetic network for each of the following reticulation mechanisms: diploid hybridization, intragenic recombination and complete or partial horizontal gene transfer. We successfully tested our method on some synthetic and real datasets to infer the above-mentioned evolutionary events which may have influenced the evolution of many species. Conclusions Our weighted consensus network inference method allows one to infer, visualize and validate statistically major conflicting signals induced by the mechanisms of reticulate evolution. The results provided by the new method can be used to represent the inferred conflicting signals by means of explicit and easy-to-interpret phylogenetic networks. PMID:24359207
Inferring epidemiological parameters from phylogenies using regression-ABC: A comparative study
Gascuel, Olivier
2017-01-01
Inferring epidemiological parameters such as the R0 from time-scaled phylogenies is a timely challenge. Most current approaches rely on likelihood functions, which raise specific issues that range from computing these functions to finding their maxima numerically. Here, we present a new regression-based Approximate Bayesian Computation (ABC) approach, which we base on a large variety of summary statistics intended to capture the information contained in the phylogeny and its corresponding lineage-through-time plot. The regression step involves the Least Absolute Shrinkage and Selection Operator (LASSO) method, which is a robust machine learning technique. It allows us to readily deal with the large number of summary statistics, while avoiding resorting to Markov Chain Monte Carlo (MCMC) techniques. To compare our approach to existing ones, we simulated target trees under a variety of epidemiological models and settings, and inferred parameters of interest using the same priors. We found that, for large phylogenies, the accuracy of our regression-ABC is comparable to that of likelihood-based approaches involving birth-death processes implemented in BEAST2. Our approach even outperformed these when inferring the host population size with a Susceptible-Infected-Removed epidemiological model. It also clearly outperformed a recent kernel-ABC approach when assuming a Susceptible-Infected epidemiological model with two host types. Lastly, by re-analyzing data from the early stages of the recent Ebola epidemic in Sierra Leone, we showed that regression-ABC provides more realistic estimates for the duration parameters (latency and infectiousness) than the likelihood-based method. Overall, ABC based on a large variety of summary statistics and a regression method able to perform variable selection and avoid overfitting is a promising approach to analyze large phylogenies. PMID:28263987
Evolving building blocks of rhythm: how human cognition creates music via cultural transmission.
Ravignani, Andrea; Thompson, Bill; Grossi, Thomas; Delgado, Tania; Kirby, Simon
2018-03-06
Why does musical rhythm have the structure it does? Musical rhythm, in all its cross-cultural diversity, exhibits commonalities across world cultures. Traditionally, music research has been split into two fields. Some scientists focused on musicality, namely the human biocognitive predispositions for music, with an emphasis on cross-cultural similarities. Other scholars investigated music, seen as a cultural product, focusing on the variation in world musical cultures. Recent experiments found deep connections between music and musicality, reconciling these opposing views. Here, we address the question of how individual cognitive biases affect the process of cultural evolution of music. Data from two experiments are analyzed using two complementary techniques. In the experiments, participants hear drumming patterns and imitate them. These patterns are then given to the same or another participant to imitate. The structure of these initially random patterns is tracked along experimental "generations." Frequentist statistics show how participants' biases are amplified by cultural transmission, making drumming patterns more structured. Structure is achieved faster in transmission within rather than between participants. A Bayesian model approximates the motif structures participants learned and created. Our data and models suggest that individual biases for musicality may shape the cultural transmission of musical rhythm. © 2018 New York Academy of Sciences.
Disentangling weak and strong interactions in B→ K^{*}(→ Kπ )π Dalitz-plot analyses
NASA Astrophysics Data System (ADS)
Charles, Jérôme; Descotes-Genon, Sébastien; Ocariz, José; Pérez Pérez, Alejandro
2017-08-01
Dalitz-plot analyses of B→ Kπ π decays provide direct access to decay amplitudes, and thereby weak and strong phases can be disentangled by resolving the interference patterns in phase space between intermediate resonant states. A phenomenological isospin analysis of B→ K^*(→ Kπ )π decay amplitudes is presented exploiting available amplitude analyses performed at the BaBar, Belle and LHCb experiments. A first application consists in constraining the CKM parameters thanks to an external hadronic input. A method, proposed some time ago by two different groups and relying on a bound on the electroweak penguin contribution, is shown to lack the desired robustness and accuracy, and we propose a more alluring alternative using a bound on the annihilation contribution. A second application consists in extracting information on hadronic amplitudes assuming the values of the CKM parameters from a global fit to quark flavour data. The current data yields several solutions, which do not fully support the hierarchy of hadronic amplitudes usually expected from theoretical arguments (colour suppression, suppression of electroweak penguins), as illustrated from computations within QCD factorisation. Some prospects concerning the impact of future measurements at LHCb and Belle II are also presented. Results are obtained with the CKMfitter analysis package, featuring the frequentist statistical approach and using the Rfit scheme to handle theoretical uncertainties.
Sultan, Amira; Kaminski, Juliane; Mojzisch, Andreas
2018-01-01
In recent years, an increasing number of studies has investigated majority influence in nonhuman animals. However, due to both terminological and methodological issues, evidence for conformity in nonhuman animals is scarce and controversial. Preliminary evidence suggests that wild birds, wild monkeys, and fish show conformity, that is, forgoing personal information in order to copy the majority. By contrast, chimpanzees seem to lack this tendency. The present study is the first to examine whether dogs (Canis familiaris) show conformity. Specifically, we tested whether dogs conform to a majority of conspecifics rather than stick to what they have previously learned. After dogs had acquired a behavioral preference via training (i.e., shaping), they were confronted with counter-preferential behavior of either no, one or three conspecifics. Traditional frequentist analyses show that the dogs’ behavior did not differ significantly between the three conditions. Complementary Bayesian analyses suggest that our data provide moderate evidence for the null hypothesis. In conclusion, our results suggest that dogs stick to what they have learned rather than conform to the counter-preferential behavior of others. We discuss the possible statistical and methodological limitations of this finding. Furthermore, we take a functional perspective on conformity and discuss under which circumstances dogs might show conformity after all. PMID:29570747
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mendez Mendez, Diana Patricia
In this work we carried out the disappearance analysis of muon neutrinos produced in the Fermilab Booster Neutrino Beam, using the data released to the public by the collaborations of the MiniBooNE and SciBooNE experiments. The calculations were made with programs in C and C++, implementing the ROOT libraries. From the analysis, using both the classical Pearson method and the Feldman and Cousins frequentist corrections, we obtained the 90\\% C.L. limit for the oscillation parameters sin 22θ and Δm 2 in the region 0.1 ≤ Δm 2 ≤ 10 eV 2 using a two neutrino model. The result presented inmore » this work is consistent with the official one, with small deviations ascribed to round-off errors in the format of the used data, as well as statistical fluctuations in the generation of fake experiments used in the Feldman and Cousins method. As the official one, our result is consistent with the null oscillation hypothesis. This work was carried out independently to the MiniBooNE and SciBooNE collaborations and its results are not official.« less
Sescousse, Guillaume; Ligneul, Romain; van Holst, Ruth J; Janssen, Lieneke K; de Boer, Femke; Janssen, Marcel; Berry, Anne S; Jagust, William J; Cools, Roshan
2018-05-01
Dopamine is central to a number of cognitive functions and brain disorders. Given the cost of neurochemical imaging in humans, behavioural proxy measures of dopamine have gained in popularity in the past decade, such as spontaneous eye blink rate (sEBR). Increased sEBR is commonly associated with increased dopamine function based on pharmacological evidence and patient studies. Yet, this hypothesis has not been validated using in vivo measures of dopamine function in humans. To fill this gap, we measured sEBR and striatal dopamine synthesis capacity using [ 18 F]DOPA PET in 20 participants (nine healthy individuals and 11 pathological gamblers). Our results, based on frequentist and Bayesian statistics, as well as region-of-interest and voxel-wise analyses, argue against a positive relationship between sEBR and striatal dopamine synthesis capacity. They show that, if anything, the evidence is in favour of a negative relationship. These results, which complement findings from a recent study that failed to observe a relationship between sEBR and dopamine D2 receptor availability, suggest that caution and nuance are warranted when interpreting sEBR in terms of a proxy measure of striatal dopamine. © 2018 The Authors. European Journal of Neuroscience published by Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
2016-06-01
theories of the mammalian visual system, and exploiting descriptive text that may accompany a still image for improved inference. The focus of the Brown...test, computer vision, semantic description , street scenes, belief propagation, generative models, nonlinear filtering, sufficient statistics 16...visual system, and exploiting descriptive text that may accompany a still image for improved inference. The focus of the Brown team was on single images
STATISTICAL RELATIONAL LEARNING AND SCRIPT INDUCTION FOR TEXTUAL INFERENCE
2017-12-01
E 23rd St Austin , TX 78712 8. PERFORMING ORGANIZATION REPORT NUMBER 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) Air Force Research ...Processing (EMNLP), Austin , TX , 2016. Pichotta, K. and Mooney, R.J., “Using Sentence-Level LSTM Language Models for Script Inference,” Proceedings of the...on Uphill Battles in Language Processing, Austin , TX , 2016. Rajani, N., and Mooney, R. J., “Stacked Ensembles of Information Extractors for
Dorazio, R.M.; Johnson, F.A.
2003-01-01
Bayesian inference and decision theory may be used in the solution of relatively complex problems of natural resource management, owing to recent advances in statistical theory and computing. In particular, Markov chain Monte Carlo algorithms provide a computational framework for fitting models of adequate complexity and for evaluating the expected consequences of alternative management actions. We illustrate these features using an example based on management of waterfowl habitat.
Improved central confidence intervals for the ratio of Poisson means
NASA Astrophysics Data System (ADS)
Cousins, R. D.
The problem of confidence intervals for the ratio of two unknown Poisson means was "solved" decades ago, but a closer examination reveals that the standard solution is far from optimal from the frequentist point of view. We construct a more powerful set of central confidence intervals, each of which is a (typically proper) subinterval of the corresponding standard interval. They also provide upper and lower confidence limits which are more restrictive than the standard limits. The construction follows Neyman's original prescription, though discreteness of the Poisson distribution and the presence of a nuisance parameter (one of the unknown means) lead to slightly conservative intervals. Philosophically, the issue of the appropriateness of the construction method is similar to the issue of conditioning on the margins in 2×2 contingency tables. From a frequentist point of view, the new set maintains (over) coverage of the unknown true value of the ratio of means at each stated confidence level, even though the new intervals are shorter than the old intervals by any measure (except for two cases where they are identical). As an example, when the number 2 is drawn from each Poisson population, the 90% CL central confidence interval on the ratio of means is (0.169, 5.196), rather than (0.108, 9.245). In the cited literature, such confidence intervals have applications in numerous branches of pure and applied science, including agriculture, wildlife studies, manufacturing, medicine, reliability theory, and elementary particle physics.
CADDIS Volume 3. Examples and Applications: Analytical Examples
Examples illustrating the use of statistical analysis to support different types of evidence, stream temperature, temperature inferred from macroinverterbate, macroinvertebrate responses, zinc concentrations, observed trait characteristics.
Improving stochastic estimates with inference methods: calculating matrix diagonals.
Selig, Marco; Oppermann, Niels; Ensslin, Torsten A
2012-02-01
Estimating the diagonal entries of a matrix, that is not directly accessible but only available as a linear operator in the form of a computer routine, is a common necessity in many computational applications, especially in image reconstruction and statistical inference. Here, methods of statistical inference are used to improve the accuracy or the computational costs of matrix probing methods to estimate matrix diagonals. In particular, the generalized Wiener filter methodology, as developed within information field theory, is shown to significantly improve estimates based on only a few sampling probes, in cases in which some form of continuity of the solution can be assumed. The strength, length scale, and precise functional form of the exploited autocorrelation function of the matrix diagonal is determined from the probes themselves. The developed algorithm is successfully applied to mock and real world problems. These performance tests show that, in situations where a matrix diagonal has to be calculated from only a small number of computationally expensive probes, a speedup by a factor of 2 to 10 is possible with the proposed method. © 2012 American Physical Society
2013-05-02
REPORT Statistical Relational Learning ( SRL ) as an Enabling Technology for Data Acquisition and Data Fusion in Video 14. ABSTRACT 16. SECURITY...particular, it is important to reason about which portions of video require expensive analysis and storage. This project aims to make these...inferences using new and existing tools from Statistical Relational Learning ( SRL ). SRL is a recently emerging technology that enables the effective 1
Statistical Inference on Memory Structure of Processes and Its Applications to Information Theory
2016-05-12
valued times series from a sample. (A practical algorithm to compute the estimator is a work in progress.) Third, finitely-valued spatial processes...ES) U.S. Army Research Office P.O. Box 12211 Research Triangle Park, NC 27709-2211 mathematical statistics; time series ; Markov chains; random...proved. Second, a statistical method is developed to estimate the memory depth of discrete- time and continuously-valued times series from a sample. (A