ERIC Educational Resources Information Center
Braham, Hana Manor; Ben-Zvi, Dani
2017-01-01
A fundamental aspect of statistical inference is representation of real-world data using statistical models. This article analyzes students' articulations of statistical models and modeling during their first steps in making informal statistical inferences. An integrated modeling approach (IMA) was designed and implemented to help students…
Intuitive statistics by 8-month-old infants
Xu, Fei; Garcia, Vashti
2008-01-01
Human learners make inductive inferences based on small amounts of data: we generalize from samples to populations and vice versa. The academic discipline of statistics formalizes these intuitive statistical inferences. What is the origin of this ability? We report six experiments investigating whether 8-month-old infants are “intuitive statisticians.” Our results showed that, given a sample, the infants were able to make inferences about the population from which the sample had been drawn. Conversely, given information about the entire population of relatively small size, the infants were able to make predictions about the sample. Our findings provide evidence that infants possess a powerful mechanism for inductive learning, either using heuristics or basic principles of probability. This ability to make inferences based on samples or information about the population develops early and in the absence of schooling or explicit teaching. Human infants may be rational learners from very early in development. PMID:18378901
ERIC Educational Resources Information Center
Trumpower, David L.
2015-01-01
Making inferences about population differences based on samples of data, that is, performing intuitive analysis of variance (IANOVA), is common in everyday life. However, the intuitive reasoning of individuals when making such inferences (even following statistics instruction), often differs from the normative logic of formal statistics. The…
Multi-Agent Inference in Social Networks: A Finite Population Learning Approach.
Fan, Jianqing; Tong, Xin; Zeng, Yao
When people in a society want to make inference about some parameter, each person may want to use data collected by other people. Information (data) exchange in social networks is usually costly, so to make reliable statistical decisions, people need to trade off the benefits and costs of information acquisition. Conflicts of interests and coordination problems will arise in the process. Classical statistics does not consider people's incentives and interactions in the data collection process. To address this imperfection, this work explores multi-agent Bayesian inference problems with a game theoretic social network model. Motivated by our interest in aggregate inference at the societal level, we propose a new concept, finite population learning , to address whether with high probability, a large fraction of people in a given finite population network can make "good" inference. Serving as a foundation, this concept enables us to study the long run trend of aggregate inference quality as population grows.
Statistical Inferences from Formaldehyde Dna-Protein Cross-Link Data
Physiologically-based pharmacokinetic (PBPK) modeling has reached considerable sophistication in its application in the pharmacological and environmental health areas. Yet, mature methodologies for making statistical inferences have not been routinely incorporated in these applic...
ERIC Educational Resources Information Center
Kazak, Sibel; Pratt, Dave
2017-01-01
This study considers probability models as tools for both making informal statistical inferences and building stronger conceptual connections between data and chance topics in teaching statistics. In this paper, we aim to explore pre-service mathematics teachers' use of probability models for a chance game, where the sum of two dice matters in…
Multi-Agent Inference in Social Networks: A Finite Population Learning Approach
Tong, Xin; Zeng, Yao
2016-01-01
When people in a society want to make inference about some parameter, each person may want to use data collected by other people. Information (data) exchange in social networks is usually costly, so to make reliable statistical decisions, people need to trade off the benefits and costs of information acquisition. Conflicts of interests and coordination problems will arise in the process. Classical statistics does not consider people’s incentives and interactions in the data collection process. To address this imperfection, this work explores multi-agent Bayesian inference problems with a game theoretic social network model. Motivated by our interest in aggregate inference at the societal level, we propose a new concept, finite population learning, to address whether with high probability, a large fraction of people in a given finite population network can make “good” inference. Serving as a foundation, this concept enables us to study the long run trend of aggregate inference quality as population grows. PMID:27076691
ERIC Educational Resources Information Center
Henriques, Ana; Oliveira, Hélia
2016-01-01
This paper reports on the results of a study investigating the potential to embed Informal Statistical Inference in statistical investigations, using TinkerPlots, for assisting 8th grade students' informal inferential reasoning to emerge, particularly their articulations of uncertainty. Data collection included students' written work on a…
ERIC Educational Resources Information Center
Watson, Jane
2007-01-01
Inference, or decision making, is seen in curriculum documents as the final step in a statistical investigation. For a formal statistical enquiry this may be associated with sophisticated tests involving probability distributions. For young students without the mathematical background to perform such tests, it is still possible to draw informal…
Making statistical inferences about software reliability
NASA Technical Reports Server (NTRS)
Miller, Douglas R.
1988-01-01
Failure times of software undergoing random debugging can be modelled as order statistics of independent but nonidentically distributed exponential random variables. Using this model inferences can be made about current reliability and, if debugging continues, future reliability. This model also shows the difficulty inherent in statistical verification of very highly reliable software such as that used by digital avionics in commercial aircraft.
ERIC Educational Resources Information Center
Denbleyker, John Nickolas
2012-01-01
The shortcomings of the proportion above cut (PAC) statistic used so prominently in the educational landscape renders it a very problematic measure for making correct inferences with student test data. The limitations of PAC-based statistics are more pronounced with cross-test comparisons due to their dependency on cut-score locations. A better…
Statistical inference for tumor growth inhibition T/C ratio.
Wu, Jianrong
2010-09-01
The tumor growth inhibition T/C ratio is commonly used to quantify treatment effects in drug screening tumor xenograft experiments. The T/C ratio is converted to an antitumor activity rating using an arbitrary cutoff point and often without any formal statistical inference. Here, we applied a nonparametric bootstrap method and a small sample likelihood ratio statistic to make a statistical inference of the T/C ratio, including both hypothesis testing and a confidence interval estimate. Furthermore, sample size and power are also discussed for statistical design of tumor xenograft experiments. Tumor xenograft data from an actual experiment were analyzed to illustrate the application.
The Philosophical Foundations of Prescriptive Statements and Statistical Inference
ERIC Educational Resources Information Center
Sun, Shuyan; Pan, Wei
2011-01-01
From the perspectives of the philosophy of science and statistical inference, we discuss the challenges of making prescriptive statements in quantitative research articles. We first consider the prescriptive nature of educational research and argue that prescriptive statements are a necessity in educational research. The logic of deduction,…
Inference and the Introductory Statistics Course
ERIC Educational Resources Information Center
Pfannkuch, Maxine; Regan, Matt; Wild, Chris; Budgett, Stephanie; Forbes, Sharleen; Harraway, John; Parsonage, Ross
2011-01-01
This article sets out some of the rationale and arguments for making major changes to the teaching and learning of statistical inference in introductory courses at our universities by changing from a norm-based, mathematical approach to more conceptually accessible computer-based approaches. The core problem of the inferential argument with its…
Apes are intuitive statisticians.
Rakoczy, Hannes; Clüver, Annette; Saucke, Liane; Stoffregen, Nicole; Gräbener, Alice; Migura, Judith; Call, Josep
2014-04-01
Inductive learning and reasoning, as we use it both in everyday life and in science, is characterized by flexible inferences based on statistical information: inferences from populations to samples and vice versa. Many forms of such statistical reasoning have been found to develop late in human ontogeny, depending on formal education and language, and to be fragile even in adults. New revolutionary research, however, suggests that even preverbal human infants make use of intuitive statistics. Here, we conducted the first investigation of such intuitive statistical reasoning with non-human primates. In a series of 7 experiments, Bonobos, Chimpanzees, Gorillas and Orangutans drew flexible statistical inferences from populations to samples. These inferences, furthermore, were truly based on statistical information regarding the relative frequency distributions in a population, and not on absolute frequencies. Intuitive statistics in its most basic form is thus an evolutionarily more ancient rather than a uniquely human capacity. Copyright © 2014 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Bakker, Arthur; Ben-Zvi, Dani; Makar, Katie
2017-12-01
To understand how statistical and other types of reasoning are coordinated with actions to reduce uncertainty, we conducted a case study in vocational education that involved statistical hypothesis testing. We analyzed an intern's research project in a hospital laboratory in which reducing uncertainties was crucial to make a valid statistical inference. In his project, the intern, Sam, investigated whether patients' blood could be sent through pneumatic post without influencing the measurement of particular blood components. We asked, in the process of making a statistical inference, how are reasons and actions coordinated to reduce uncertainty? For the analysis, we used the semantic theory of inferentialism, specifically, the concept of webs of reasons and actions—complexes of interconnected reasons for facts and actions; these reasons include premises and conclusions, inferential relations, implications, motives for action, and utility of tools for specific purposes in a particular context. Analysis of interviews with Sam, his supervisor and teacher as well as video data of Sam in the classroom showed that many of Sam's actions aimed to reduce variability, rule out errors, and thus reduce uncertainties so as to arrive at a valid inference. Interestingly, the decisive factor was not the outcome of a t test but of the reference change value, a clinical chemical measure of analytic and biological variability. With insights from this case study, we expect that students can be better supported in connecting statistics with context and in dealing with uncertainty.
Incorporating Biological Knowledge into Evaluation of Casual Regulatory Hypothesis
NASA Technical Reports Server (NTRS)
Chrisman, Lonnie; Langley, Pat; Bay, Stephen; Pohorille, Andrew; DeVincenzi, D. (Technical Monitor)
2002-01-01
Biological data can be scarce and costly to obtain. The small number of samples available typically limits statistical power and makes reliable inference of causal relations extremely difficult. However, we argue that statistical power can be increased substantially by incorporating prior knowledge and data from diverse sources. We present a Bayesian framework that combines information from different sources and we show empirically that this lets one make correct causal inferences with small sample sizes that otherwise would be impossible.
Quantum-Like Representation of Non-Bayesian Inference
NASA Astrophysics Data System (ADS)
Asano, M.; Basieva, I.; Khrennikov, A.; Ohya, M.; Tanaka, Y.
2013-01-01
This research is related to the problem of "irrational decision making or inference" that have been discussed in cognitive psychology. There are some experimental studies, and these statistical data cannot be described by classical probability theory. The process of decision making generating these data cannot be reduced to the classical Bayesian inference. For this problem, a number of quantum-like coginitive models of decision making was proposed. Our previous work represented in a natural way the classical Bayesian inference in the frame work of quantum mechanics. By using this representation, in this paper, we try to discuss the non-Bayesian (irrational) inference that is biased by effects like the quantum interference. Further, we describe "psychological factor" disturbing "rationality" as an "environment" correlating with the "main system" of usual Bayesian inference.
It's a Girl! Random Numbers, Simulations, and the Law of Large Numbers
ERIC Educational Resources Information Center
Goodwin, Chris; Ortiz, Enrique
2015-01-01
Modeling using mathematics and making inferences about mathematical situations are becoming more prevalent in most fields of study. Descriptive statistics cannot be used to generalize about a population or make predictions of what can occur. Instead, inference must be used. Simulation and sampling are essential in building a foundation for…
ERIC Educational Resources Information Center
Noll, Jennifer; Hancock, Stacey
2015-01-01
This research investigates what students' use of statistical language can tell us about their conceptions of distribution and sampling in relation to informal inference. Prior research documents students' challenges in understanding ideas of distribution and sampling as tools for making informal statistical inferences. We know that these…
Data-driven inference for the spatial scan statistic.
Almeida, Alexandre C L; Duarte, Anderson R; Duczmal, Luiz H; Oliveira, Fernando L P; Takahashi, Ricardo H C
2011-08-02
Kulldorff's spatial scan statistic for aggregated area maps searches for clusters of cases without specifying their size (number of areas) or geographic location in advance. Their statistical significance is tested while adjusting for the multiple testing inherent in such a procedure. However, as is shown in this work, this adjustment is not done in an even manner for all possible cluster sizes. A modification is proposed to the usual inference test of the spatial scan statistic, incorporating additional information about the size of the most likely cluster found. A new interpretation of the results of the spatial scan statistic is done, posing a modified inference question: what is the probability that the null hypothesis is rejected for the original observed cases map with a most likely cluster of size k, taking into account only those most likely clusters of size k found under null hypothesis for comparison? This question is especially important when the p-value computed by the usual inference process is near the alpha significance level, regarding the correctness of the decision based in this inference. A practical procedure is provided to make more accurate inferences about the most likely cluster found by the spatial scan statistic.
The role of causal criteria in causal inferences: Bradford Hill's "aspects of association".
Ward, Andrew C
2009-06-17
As noted by Wesley Salmon and many others, causal concepts are ubiquitous in every branch of theoretical science, in the practical disciplines and in everyday life. In the theoretical and practical sciences especially, people often base claims about causal relations on applications of statistical methods to data. However, the source and type of data place important constraints on the choice of statistical methods as well as on the warrant attributed to the causal claims based on the use of such methods. For example, much of the data used by people interested in making causal claims come from non-experimental, observational studies in which random allocations to treatment and control groups are not present. Thus, one of the most important problems in the social and health sciences concerns making justified causal inferences using non-experimental, observational data. In this paper, I examine one method of justifying such inferences that is especially widespread in epidemiology and the health sciences generally - the use of causal criteria. I argue that while the use of causal criteria is not appropriate for either deductive or inductive inferences, they do have an important role to play in inferences to the best explanation. As such, causal criteria, exemplified by what Bradford Hill referred to as "aspects of [statistical] associations", have an indispensible part to play in the goal of making justified causal claims.
The role of causal criteria in causal inferences: Bradford Hill's "aspects of association"
Ward, Andrew C
2009-01-01
As noted by Wesley Salmon and many others, causal concepts are ubiquitous in every branch of theoretical science, in the practical disciplines and in everyday life. In the theoretical and practical sciences especially, people often base claims about causal relations on applications of statistical methods to data. However, the source and type of data place important constraints on the choice of statistical methods as well as on the warrant attributed to the causal claims based on the use of such methods. For example, much of the data used by people interested in making causal claims come from non-experimental, observational studies in which random allocations to treatment and control groups are not present. Thus, one of the most important problems in the social and health sciences concerns making justified causal inferences using non-experimental, observational data. In this paper, I examine one method of justifying such inferences that is especially widespread in epidemiology and the health sciences generally – the use of causal criteria. I argue that while the use of causal criteria is not appropriate for either deductive or inductive inferences, they do have an important role to play in inferences to the best explanation. As such, causal criteria, exemplified by what Bradford Hill referred to as "aspects of [statistical] associations", have an indispensible part to play in the goal of making justified causal claims. PMID:19534788
ERIC Educational Resources Information Center
Bakker, Arthur; Ben-Zvi, Dani; Makar, Katie
2017-01-01
To understand how statistical and other types of reasoning are coordinated with actions to reduce uncertainty, we conducted a case study in vocational education that involved statistical hypothesis testing. We analyzed an intern's research project in a hospital laboratory in which reducing uncertainties was crucial to make a valid statistical…
Serang, Oliver; Noble, William Stafford
2012-01-01
The problem of identifying the proteins in a complex mixture using tandem mass spectrometry can be framed as an inference problem on a graph that connects peptides to proteins. Several existing protein identification methods make use of statistical inference methods for graphical models, including expectation maximization, Markov chain Monte Carlo, and full marginalization coupled with approximation heuristics. We show that, for this problem, the majority of the cost of inference usually comes from a few highly connected subgraphs. Furthermore, we evaluate three different statistical inference methods using a common graphical model, and we demonstrate that junction tree inference substantially improves rates of convergence compared to existing methods. The python code used for this paper is available at http://noble.gs.washington.edu/proj/fido. PMID:22331862
Building Intuitions about Statistical Inference Based on Resampling
ERIC Educational Resources Information Center
Watson, Jane; Chance, Beth
2012-01-01
Formal inference, which makes theoretical assumptions about distributions and applies hypothesis testing procedures with null and alternative hypotheses, is notoriously difficult for tertiary students to master. The debate about whether this content should appear in Years 11 and 12 of the "Australian Curriculum: Mathematics" has gone on…
Direct evidence for a dual process model of deductive inference.
Markovits, Henry; Brunet, Marie-Laurence; Thompson, Valerie; Brisson, Janie
2013-07-01
In 2 experiments, we tested a strong version of a dual process theory of conditional inference (cf. Verschueren et al., 2005a, 2005b) that assumes that most reasoners have 2 strategies available, the choice of which is determined by situational variables, cognitive capacity, and metacognitive control. The statistical strategy evaluates inferences probabilistically, accepting those with high conditional probability. The counterexample strategy rejects inferences when a counterexample shows the inference to be invalid. To discriminate strategy use, we presented reasoners with conditional statements (if p, then q) and explicit statistical information about the relative frequency of the probability of p/q (50% vs. 90%). A statistical strategy would accept the more probable inferences more frequently, whereas the counterexample one would reject both. In Experiment 1, reasoners under time pressure used the statistical strategy more, but switched to the counterexample strategy when time constraints were removed; the former took less time than the latter. These data are consistent with the hypothesis that the statistical strategy is the default heuristic. Under a free-time condition, reasoners preferred the counterexample strategy and kept it when put under time pressure. Thus, it is not simply a lack of capacity that produces a statistical strategy; instead, it seems that time pressure disrupts the ability to make good metacognitive choices. In line with this conclusion, in a 2nd experiment, we measured reasoners' confidence in their performance; those under time pressure were less confident in the statistical than the counterexample strategy and more likely to switch strategies under free-time conditions. PsycINFO Database Record (c) 2013 APA, all rights reserved.
Applying Statistical Process Control to Clinical Data: An Illustration.
ERIC Educational Resources Information Center
Pfadt, Al; And Others
1992-01-01
Principles of statistical process control are applied to a clinical setting through the use of control charts to detect changes, as part of treatment planning and clinical decision-making processes. The logic of control chart analysis is derived from principles of statistical inference. Sample charts offer examples of evaluating baselines and…
ERIC Educational Resources Information Center
Madhere, Serge
An analytic procedure, efficiency analysis, is proposed for improving the utility of quantitative program evaluation for decision making. The three features of the procedure are explained: (1) for statistical control, it adopts and extends the regression-discontinuity design; (2) for statistical inferences, it de-emphasizes hypothesis testing in…
NASA Astrophysics Data System (ADS)
Albert, Carlo; Ulzega, Simone; Stoop, Ruedi
2016-04-01
Measured time-series of both precipitation and runoff are known to exhibit highly non-trivial statistical properties. For making reliable probabilistic predictions in hydrology, it is therefore desirable to have stochastic models with output distributions that share these properties. When parameters of such models have to be inferred from data, we also need to quantify the associated parametric uncertainty. For non-trivial stochastic models, however, this latter step is typically very demanding, both conceptually and numerically, and always never done in hydrology. Here, we demonstrate that methods developed in statistical physics make a large class of stochastic differential equation (SDE) models amenable to a full-fledged Bayesian parameter inference. For concreteness we demonstrate these methods by means of a simple yet non-trivial toy SDE model. We consider a natural catchment that can be described by a linear reservoir, at the scale of observation. All the neglected processes are assumed to happen at much shorter time-scales and are therefore modeled with a Gaussian white noise term, the standard deviation of which is assumed to scale linearly with the system state (water volume in the catchment). Even for constant input, the outputs of this simple non-linear SDE model show a wealth of desirable statistical properties, such as fat-tailed distributions and long-range correlations. Standard algorithms for Bayesian inference fail, for models of this kind, because their likelihood functions are extremely high-dimensional intractable integrals over all possible model realizations. The use of Kalman filters is illegitimate due to the non-linearity of the model. Particle filters could be used but become increasingly inefficient with growing number of data points. Hamiltonian Monte Carlo algorithms allow us to translate this inference problem to the problem of simulating the dynamics of a statistical mechanics system and give us access to most sophisticated methods that have been developed in the statistical physics community over the last few decades. We demonstrate that such methods, along with automated differentiation algorithms, allow us to perform a full-fledged Bayesian inference, for a large class of SDE models, in a highly efficient and largely automatized manner. Furthermore, our algorithm is highly parallelizable. For our toy model, discretized with a few hundred points, a full Bayesian inference can be performed in a matter of seconds on a standard PC.
Structured statistical models of inductive reasoning.
Kemp, Charles; Tenenbaum, Joshua B
2009-01-01
Everyday inductive inferences are often guided by rich background knowledge. Formal models of induction should aim to incorporate this knowledge and should explain how different kinds of knowledge lead to the distinctive patterns of reasoning found in different inductive contexts. This article presents a Bayesian framework that attempts to meet both goals and describes [corrected] 4 applications of the framework: a taxonomic model, a spatial model, a threshold model, and a causal model. Each model makes probabilistic inferences about the extensions of novel properties, but the priors for the 4 models are defined over different kinds of structures that capture different relationships between the categories in a domain. The framework therefore shows how statistical inference can operate over structured background knowledge, and the authors argue that this interaction between structure and statistics is critical for explaining the power and flexibility of human reasoning.
Exploring High School Students Beginning Reasoning about Significance Tests with Technology
ERIC Educational Resources Information Center
García, Víctor N.; Sánchez, Ernesto
2017-01-01
In the present study we analyze how students reason about or make inferences given a particular hypothesis testing problem (without having studied formal methods of statistical inference) when using Fathom. They use Fathom to create an empirical sampling distribution through computer simulation. It is found that most student´s reasoning rely on…
Theory-based Bayesian models of inductive learning and reasoning.
Tenenbaum, Joshua B; Griffiths, Thomas L; Kemp, Charles
2006-07-01
Inductive inference allows humans to make powerful generalizations from sparse data when learning about word meanings, unobserved properties, causal relationships, and many other aspects of the world. Traditional accounts of induction emphasize either the power of statistical learning, or the importance of strong constraints from structured domain knowledge, intuitive theories or schemas. We argue that both components are necessary to explain the nature, use and acquisition of human knowledge, and we introduce a theory-based Bayesian framework for modeling inductive learning and reasoning as statistical inferences over structured knowledge representations.
Notes on power of normality tests of error terms in regression models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Střelec, Luboš
2015-03-10
Normality is one of the basic assumptions in applying statistical procedures. For example in linear regression most of the inferential procedures are based on the assumption of normality, i.e. the disturbance vector is assumed to be normally distributed. Failure to assess non-normality of the error terms may lead to incorrect results of usual statistical inference techniques such as t-test or F-test. Thus, error terms should be normally distributed in order to allow us to make exact inferences. As a consequence, normally distributed stochastic errors are necessary in order to make a not misleading inferences which explains a necessity and importancemore » of robust tests of normality. Therefore, the aim of this contribution is to discuss normality testing of error terms in regression models. In this contribution, we introduce the general RT class of robust tests for normality, and present and discuss the trade-off between power and robustness of selected classical and robust normality tests of error terms in regression models.« less
Equivalent statistics and data interpretation.
Francis, Gregory
2017-08-01
Recent reform efforts in psychological science have led to a plethora of choices for scientists to analyze their data. A scientist making an inference about their data must now decide whether to report a p value, summarize the data with a standardized effect size and its confidence interval, report a Bayes Factor, or use other model comparison methods. To make good choices among these options, it is necessary for researchers to understand the characteristics of the various statistics used by the different analysis frameworks. Toward that end, this paper makes two contributions. First, it shows that for the case of a two-sample t test with known sample sizes, many different summary statistics are mathematically equivalent in the sense that they are based on the very same information in the data set. When the sample sizes are known, the p value provides as much information about a data set as the confidence interval of Cohen's d or a JZS Bayes factor. Second, this equivalence means that different analysis methods differ only in their interpretation of the empirical data. At first glance, it might seem that mathematical equivalence of the statistics suggests that it does not matter much which statistic is reported, but the opposite is true because the appropriateness of a reported statistic is relative to the inference it promotes. Accordingly, scientists should choose an analysis method appropriate for their scientific investigation. A direct comparison of the different inferential frameworks provides some guidance for scientists to make good choices and improve scientific practice.
ERIC Educational Resources Information Center
Page, Robert; Satake, Eiki
2017-01-01
While interest in Bayesian statistics has been growing in statistics education, the treatment of the topic is still inadequate in both textbooks and the classroom. Because so many fields of study lead to careers that involve a decision-making process requiring an understanding of Bayesian methods, it is becoming increasingly clear that Bayesian…
Welvaert, Marijke; Caley, Peter
2016-01-01
Citizen science and crowdsourcing have been emerging as methods to collect data for surveillance and/or monitoring activities. They could be gathered under the overarching term citizen surveillance . The discipline, however, still struggles to be widely accepted in the scientific community, mainly because these activities are not embedded in a quantitative framework. This results in an ongoing discussion on how to analyze and make useful inference from these data. When considering the data collection process, we illustrate how citizen surveillance can be classified according to the nature of the underlying observation process measured in two dimensions-the degree of observer reporting intention and the control in observer detection effort. By classifying the observation process in these dimensions we distinguish between crowdsourcing, unstructured citizen science and structured citizen science. This classification helps the determine data processing and statistical treatment of these data for making inference. Using our framework, it is apparent that published studies are overwhelmingly associated with structured citizen science, and there are well developed statistical methods for the resulting data. In contrast, methods for making useful inference from purely crowd-sourced data remain under development, with the challenges of accounting for the unknown observation process considerable. Our quantitative framework for citizen surveillance calls for an integration of citizen science and crowdsourcing and provides a way forward to solve the statistical challenges inherent to citizen-sourced data.
Propensity Score Analysis: An Alternative Statistical Approach for HRD Researchers
ERIC Educational Resources Information Center
Keiffer, Greggory L.; Lane, Forrest C.
2016-01-01
Purpose: This paper aims to introduce matching in propensity score analysis (PSA) as an alternative statistical approach for researchers looking to make causal inferences using intact groups. Design/methodology/approach: An illustrative example demonstrated the varying results of analysis of variance, analysis of covariance and PSA on a heuristic…
Spectral likelihood expansions for Bayesian inference
NASA Astrophysics Data System (ADS)
Nagel, Joseph B.; Sudret, Bruno
2016-03-01
A spectral approach to Bayesian inference is presented. It pursues the emulation of the posterior probability density. The starting point is a series expansion of the likelihood function in terms of orthogonal polynomials. From this spectral likelihood expansion all statistical quantities of interest can be calculated semi-analytically. The posterior is formally represented as the product of a reference density and a linear combination of polynomial basis functions. Both the model evidence and the posterior moments are related to the expansion coefficients. This formulation avoids Markov chain Monte Carlo simulation and allows one to make use of linear least squares instead. The pros and cons of spectral Bayesian inference are discussed and demonstrated on the basis of simple applications from classical statistics and inverse modeling.
ERIC Educational Resources Information Center
Hourigan, Mairéad; Leavy, Aisling
2016-01-01
As part of Japanese Lesson study research focusing on "comparing and describing likelihoods", fifth grade elementary students used real-world data in decision-making. Sporting statistics facilitated opportunities for informal inference, where data were used to make and justify predictions.
Meng, Xiang-He; Shen, Hui; Chen, Xiang-Ding; Xiao, Hong-Mei; Deng, Hong-Wen
2018-03-01
Genome-wide association studies (GWAS) have successfully identified numerous genetic variants associated with diverse complex phenotypes and diseases, and provided tremendous opportunities for further analyses using summary association statistics. Recently, Pickrell et al. developed a robust method for causal inference using independent putative causal SNPs. However, this method may fail to infer the causal relationship between two phenotypes when only a limited number of independent putative causal SNPs identified. Here, we extended Pickrell's method to make it more applicable for the general situations. We extended the causal inference method by replacing the putative causal SNPs with the lead SNPs (the set of the most significant SNPs in each independent locus) and tested the performance of our extended method using both simulation and empirical data. Simulations suggested that when the same number of genetic variants is used, our extended method had similar distribution of test statistic under the null model as well as comparable power under the causal model compared with the original method by Pickrell et al. But in practice, our extended method would generally be more powerful because the number of independent lead SNPs was often larger than the number of independent putative causal SNPs. And including more SNPs, on the other hand, would not cause more false positives. By applying our extended method to summary statistics from GWAS for blood metabolites and femoral neck bone mineral density (FN-BMD), we successfully identified ten blood metabolites that may causally influence FN-BMD. We extended a causal inference method for inferring putative causal relationship between two phenotypes using summary statistics from GWAS, and identified a number of potential causal metabolites for FN-BMD, which may provide novel insights into the pathophysiological mechanisms underlying osteoporosis.
Ayres, Daniel L; Darling, Aaron; Zwickl, Derrick J; Beerli, Peter; Holder, Mark T; Lewis, Paul O; Huelsenbeck, John P; Ronquist, Fredrik; Swofford, David L; Cummings, Michael P; Rambaut, Andrew; Suchard, Marc A
2012-01-01
Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years, there has been a concentration of interest in statistical approaches such as Bayesian inference and maximum likelihood estimation. Yet, for large data sets and realistic or interesting models of evolution, these approaches remain computationally demanding. High-throughput sequencing can yield data for thousands of taxa, but scaling to such problems using serial computing often necessitates the use of nonstatistical or approximate approaches. The recent emergence of graphics processing units (GPUs) provides an opportunity to leverage their excellent floating-point computational performance to accelerate statistical phylogenetic inference. A specialized library for phylogenetic calculation would allow existing software packages to make more effective use of available computer hardware, including GPUs. Adoption of a common library would also make it easier for other emerging computing architectures, such as field programmable gate arrays, to be used in the future. We present BEAGLE, an application programming interface (API) and library for high-performance statistical phylogenetic inference. The API provides a uniform interface for performing phylogenetic likelihood calculations on a variety of compute hardware platforms. The library includes a set of efficient implementations and can currently exploit hardware including GPUs using NVIDIA CUDA, central processing units (CPUs) with Streaming SIMD Extensions and related processor supplementary instruction sets, and multicore CPUs via OpenMP. To demonstrate the advantages of a common API, we have incorporated the library into several popular phylogenetic software packages. The BEAGLE library is free open source software licensed under the Lesser GPL and available from http://beagle-lib.googlecode.com. An example client program is available as public domain software.
Ayres, Daniel L.; Darling, Aaron; Zwickl, Derrick J.; Beerli, Peter; Holder, Mark T.; Lewis, Paul O.; Huelsenbeck, John P.; Ronquist, Fredrik; Swofford, David L.; Cummings, Michael P.; Rambaut, Andrew; Suchard, Marc A.
2012-01-01
Abstract Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years, there has been a concentration of interest in statistical approaches such as Bayesian inference and maximum likelihood estimation. Yet, for large data sets and realistic or interesting models of evolution, these approaches remain computationally demanding. High-throughput sequencing can yield data for thousands of taxa, but scaling to such problems using serial computing often necessitates the use of nonstatistical or approximate approaches. The recent emergence of graphics processing units (GPUs) provides an opportunity to leverage their excellent floating-point computational performance to accelerate statistical phylogenetic inference. A specialized library for phylogenetic calculation would allow existing software packages to make more effective use of available computer hardware, including GPUs. Adoption of a common library would also make it easier for other emerging computing architectures, such as field programmable gate arrays, to be used in the future. We present BEAGLE, an application programming interface (API) and library for high-performance statistical phylogenetic inference. The API provides a uniform interface for performing phylogenetic likelihood calculations on a variety of compute hardware platforms. The library includes a set of efficient implementations and can currently exploit hardware including GPUs using NVIDIA CUDA, central processing units (CPUs) with Streaming SIMD Extensions and related processor supplementary instruction sets, and multicore CPUs via OpenMP. To demonstrate the advantages of a common API, we have incorporated the library into several popular phylogenetic software packages. The BEAGLE library is free open source software licensed under the Lesser GPL and available from http://beagle-lib.googlecode.com. An example client program is available as public domain software. PMID:21963610
Goyal, Ravi; De Gruttola, Victor
2018-01-30
Analysis of sexual history data intended to describe sexual networks presents many challenges arising from the fact that most surveys collect information on only a very small fraction of the population of interest. In addition, partners are rarely identified and responses are subject to reporting biases. Typically, each network statistic of interest, such as mean number of sexual partners for men or women, is estimated independently of other network statistics. There is, however, a complex relationship among networks statistics; and knowledge of these relationships can aid in addressing concerns mentioned earlier. We develop a novel method that constrains a posterior predictive distribution of a collection of network statistics in order to leverage the relationships among network statistics in making inference about network properties of interest. The method ensures that inference on network properties is compatible with an actual network. Through extensive simulation studies, we also demonstrate that use of this method can improve estimates in settings where there is uncertainty that arises both from sampling and from systematic reporting bias compared with currently available approaches to estimation. To illustrate the method, we apply it to estimate network statistics using data from the Chicago Health and Social Life Survey. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Lee, K. David; Wiesenfeld, Eric; Gelfand, Andrew
2007-04-01
One of the greatest challenges in modern combat is maintaining a high level of timely Situational Awareness (SA). In many situations, computational complexity and accuracy considerations make the development and deployment of real-time, high-level inference tools very difficult. An innovative hybrid framework that combines Bayesian inference, in the form of Bayesian Networks, and Possibility Theory, in the form of Fuzzy Logic systems, has recently been introduced to provide a rigorous framework for high-level inference. In previous research, the theoretical basis and benefits of the hybrid approach have been developed. However, lacking is a concrete experimental comparison of the hybrid framework with traditional fusion methods, to demonstrate and quantify this benefit. The goal of this research, therefore, is to provide a statistical analysis on the comparison of the accuracy and performance of hybrid network theory, with pure Bayesian and Fuzzy systems and an inexact Bayesian system approximated using Particle Filtering. To accomplish this task, domain specific models will be developed under these different theoretical approaches and then evaluated, via Monte Carlo Simulation, in comparison to situational ground truth to measure accuracy and fidelity. Following this, a rigorous statistical analysis of the performance results will be performed, to quantify the benefit of hybrid inference to other fusion tools.
Statistical modeling of software reliability
NASA Technical Reports Server (NTRS)
Miller, Douglas R.
1992-01-01
This working paper discusses the statistical simulation part of a controlled software development experiment being conducted under the direction of the System Validation Methods Branch, Information Systems Division, NASA Langley Research Center. The experiment uses guidance and control software (GCS) aboard a fictitious planetary landing spacecraft: real-time control software operating on a transient mission. Software execution is simulated to study the statistical aspects of reliability and other failure characteristics of the software during development, testing, and random usage. Quantification of software reliability is a major goal. Various reliability concepts are discussed. Experiments are described for performing simulations and collecting appropriate simulated software performance and failure data. This data is then used to make statistical inferences about the quality of the software development and verification processes as well as inferences about the reliability of software versions and reliability growth under random testing and debugging.
Cocco, Simona; Leibler, Stanislas; Monasson, Rémi
2009-01-01
Complexity of neural systems often makes impracticable explicit measurements of all interactions between their constituents. Inverse statistical physics approaches, which infer effective couplings between neurons from their spiking activity, have been so far hindered by their computational complexity. Here, we present 2 complementary, computationally efficient inverse algorithms based on the Ising and “leaky integrate-and-fire” models. We apply those algorithms to reanalyze multielectrode recordings in the salamander retina in darkness and under random visual stimulus. We find strong positive couplings between nearby ganglion cells common to both stimuli, whereas long-range couplings appear under random stimulus only. The uncertainty on the inferred couplings due to limitations in the recordings (duration, small area covered on the retina) is discussed. Our methods will allow real-time evaluation of couplings for large assemblies of neurons. PMID:19666487
Hansen, John P
2003-01-01
Healthcare quality improvement professionals need to understand and use inferential statistics to interpret sample data from their organizations. In quality improvement and healthcare research studies all the data from a population often are not available, so investigators take samples and make inferences about the population by using inferential statistics. This three-part series will give readers an understanding of the concepts of inferential statistics as well as the specific tools for calculating confidence intervals for samples of data. This article, Part 2, describes probability, populations, and samples. The uses of descriptive and inferential statistics are outlined. The article also discusses the properties and probability of normal distributions, including the standard normal distribution.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Marzouk, Youssef
Predictive simulation of complex physical systems increasingly rests on the interplay of experimental observations with computational models. Key inputs, parameters, or structural aspects of models may be incomplete or unknown, and must be developed from indirect and limited observations. At the same time, quantified uncertainties are needed to qualify computational predictions in the support of design and decision-making. In this context, Bayesian statistics provides a foundation for inference from noisy and limited data, but at prohibitive computional expense. This project intends to make rigorous predictive modeling *feasible* in complex physical systems, via accelerated and scalable tools for uncertainty quantification, Bayesianmore » inference, and experimental design. Specific objectives are as follows: 1. Develop adaptive posterior approximations and dimensionality reduction approaches for Bayesian inference in high-dimensional nonlinear systems. 2. Extend accelerated Bayesian methodologies to large-scale {\\em sequential} data assimilation, fully treating nonlinear models and non-Gaussian state and parameter distributions. 3. Devise efficient surrogate-based methods for Bayesian model selection and the learning of model structure. 4. Develop scalable simulation/optimization approaches to nonlinear Bayesian experimental design, for both parameter inference and model selection. 5. Demonstrate these inferential tools on chemical kinetic models in reacting flow, constructing and refining thermochemical and electrochemical models from limited data. Demonstrate Bayesian filtering on canonical stochastic PDEs and in the dynamic estimation of inhomogeneous subsurface properties and flow fields.« less
Dorazio, R.M.; Johnson, F.A.
2003-01-01
Bayesian inference and decision theory may be used in the solution of relatively complex problems of natural resource management, owing to recent advances in statistical theory and computing. In particular, Markov chain Monte Carlo algorithms provide a computational framework for fitting models of adequate complexity and for evaluating the expected consequences of alternative management actions. We illustrate these features using an example based on management of waterfowl habitat.
ERIC Educational Resources Information Center
Hoekstra, Rink; Johnson, Addie; Kiers, Henk A. L.
2012-01-01
The use of confidence intervals (CIs) as an addition or as an alternative to null hypothesis significance testing (NHST) has been promoted as a means to make researchers more aware of the uncertainty that is inherent in statistical inference. Little is known, however, about whether presenting results via CIs affects how readers judge the…
2013-05-02
REPORT Statistical Relational Learning ( SRL ) as an Enabling Technology for Data Acquisition and Data Fusion in Video 14. ABSTRACT 16. SECURITY...particular, it is important to reason about which portions of video require expensive analysis and storage. This project aims to make these...inferences using new and existing tools from Statistical Relational Learning ( SRL ). SRL is a recently emerging technology that enables the effective 1
NIRS-SPM: statistical parametric mapping for near infrared spectroscopy
NASA Astrophysics Data System (ADS)
Tak, Sungho; Jang, Kwang Eun; Jung, Jinwook; Jang, Jaeduck; Jeong, Yong; Ye, Jong Chul
2008-02-01
Even though there exists a powerful statistical parametric mapping (SPM) tool for fMRI, similar public domain tools are not available for near infrared spectroscopy (NIRS). In this paper, we describe a new public domain statistical toolbox called NIRS-SPM for quantitative analysis of NIRS signals. Specifically, NIRS-SPM statistically analyzes the NIRS data using GLM and makes inference as the excursion probability which comes from the random field that are interpolated from the sparse measurement. In order to obtain correct inference, NIRS-SPM offers the pre-coloring and pre-whitening method for temporal correlation estimation. For simultaneous recording NIRS signal with fMRI, the spatial mapping between fMRI image and real coordinate in 3-D digitizer is estimated using Horn's algorithm. These powerful tools allows us the super-resolution localization of the brain activation which is not possible using the conventional NIRS analysis tools.
Topics in inference and decision-making with partial knowledge
NASA Technical Reports Server (NTRS)
Safavian, S. Rasoul; Landgrebe, David
1990-01-01
Two essential elements needed in the process of inference and decision-making are prior probabilities and likelihood functions. When both of these components are known accurately and precisely, the Bayesian approach provides a consistent and coherent solution to the problems of inference and decision-making. In many situations, however, either one or both of the above components may not be known, or at least may not be known precisely. This problem of partial knowledge about prior probabilities and likelihood functions is addressed. There are at least two ways to cope with this lack of precise knowledge: robust methods, and interval-valued methods. First, ways of modeling imprecision and indeterminacies in prior probabilities and likelihood functions are examined; then how imprecision in the above components carries over to the posterior probabilities is examined. Finally, the problem of decision making with imprecise posterior probabilities and the consequences of such actions are addressed. Application areas where the above problems may occur are in statistical pattern recognition problems, for example, the problem of classification of high-dimensional multispectral remote sensing image data.
Carvalho, Pedro; Marques, Rui Cunha
2016-02-15
This study aims to search for economies of size and scope in the Portuguese water sector applying Bayesian and classical statistics to make inference in stochastic frontier analysis (SFA). This study proves the usefulness and advantages of the application of Bayesian statistics for making inference in SFA over traditional SFA which just uses classical statistics. The resulting Bayesian methods allow overcoming some problems that arise in the application of the traditional SFA, such as the bias in small samples and skewness of residuals. In the present case study of the water sector in Portugal, these Bayesian methods provide more plausible and acceptable results. Based on the results obtained we found that there are important economies of output density, economies of size, economies of vertical integration and economies of scope in the Portuguese water sector, pointing out to the huge advantages in undertaking mergers by joining the retail and wholesale components and by joining the drinking water and wastewater services. Copyright © 2015 Elsevier B.V. All rights reserved.
P values are only an index to evidence: 20th- vs. 21st-century statistical science.
Burnham, K P; Anderson, D R
2014-03-01
Early statistical methods focused on pre-data probability statements (i.e., data as random variables) such as P values; these are not really inferences nor are P values evidential. Statistical science clung to these principles throughout much of the 20th century as a wide variety of methods were developed for special cases. Looking back, it is clear that the underlying paradigm (i.e., testing and P values) was weak. As Kuhn (1970) suggests, new paradigms have taken the place of earlier ones: this is a goal of good science. New methods have been developed and older methods extended and these allow proper measures of strength of evidence and multimodel inference. It is time to move forward with sound theory and practice for the difficult practical problems that lie ahead. Given data the useful foundation shifts to post-data probability statements such as model probabilities (Akaike weights) or related quantities such as odds ratios and likelihood intervals. These new methods allow formal inference from multiple models in the a prior set. These quantities are properly evidential. The past century was aimed at finding the "best" model and making inferences from it. The goal in the 21st century is to base inference on all the models weighted by their model probabilities (model averaging). Estimates of precision can include model selection uncertainty leading to variances conditional on the model set. The 21st century will be about the quantification of information, proper measures of evidence, and multi-model inference. Nelder (1999:261) concludes, "The most important task before us in developing statistical science is to demolish the P-value culture, which has taken root to a frightening extent in many areas of both pure and applied science and technology".
Logical reasoning versus information processing in the dual-strategy model of reasoning.
Markovits, Henry; Brisson, Janie; de Chantal, Pier-Luc
2017-01-01
One of the major debates concerning the nature of inferential reasoning is between counterexample-based strategies such as mental model theory and statistical strategies underlying probabilistic models. The dual-strategy model, proposed by Verschueren, Schaeken, & d'Ydewalle (2005a, 2005b), which suggests that people might have access to both kinds of strategy has been supported by several recent studies. These have shown that statistical reasoners make inferences based on using information about premises in order to generate a likelihood estimate of conclusion probability. However, while results concerning counterexample reasoners are consistent with a counterexample detection model, these results could equally be interpreted as indicating a greater sensitivity to logical form. In order to distinguish these 2 interpretations, in Studies 1 and 2, we presented reasoners with Modus ponens (MP) inferences with statistical information about premise strength and in Studies 3 and 4, naturalistic MP inferences with premises having many disabling conditions. Statistical reasoners accepted the MP inference more often than counterexample reasoners in Studies 1 and 2, while the opposite pattern was observed in Studies 3 and 4. Results show that these strategies must be defined in terms of information processing, with no clear relations to "logical" reasoning. These results have additional implications for the underlying debate about the nature of human reasoning. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Statistical Inference at Work: Statistical Process Control as an Example
ERIC Educational Resources Information Center
Bakker, Arthur; Kent, Phillip; Derry, Jan; Noss, Richard; Hoyles, Celia
2008-01-01
To characterise statistical inference in the workplace this paper compares a prototypical type of statistical inference at work, statistical process control (SPC), with a type of statistical inference that is better known in educational settings, hypothesis testing. Although there are some similarities between the reasoning structure involved in…
Statistical inference for noisy nonlinear ecological dynamic systems.
Wood, Simon N
2010-08-26
Chaotic ecological dynamic systems defy conventional statistical analysis. Systems with near-chaotic dynamics are little better. Such systems are almost invariably driven by endogenous dynamic processes plus demographic and environmental process noise, and are only observable with error. Their sensitivity to history means that minute changes in the driving noise realization, or the system parameters, will cause drastic changes in the system trajectory. This sensitivity is inherited and amplified by the joint probability density of the observable data and the process noise, rendering it useless as the basis for obtaining measures of statistical fit. Because the joint density is the basis for the fit measures used by all conventional statistical methods, this is a major theoretical shortcoming. The inability to make well-founded statistical inferences about biological dynamic models in the chaotic and near-chaotic regimes, other than on an ad hoc basis, leaves dynamic theory without the methods of quantitative validation that are essential tools in the rest of biological science. Here I show that this impasse can be resolved in a simple and general manner, using a method that requires only the ability to simulate the observed data on a system from the dynamic model about which inferences are required. The raw data series are reduced to phase-insensitive summary statistics, quantifying local dynamic structure and the distribution of observations. Simulation is used to obtain the mean and the covariance matrix of the statistics, given model parameters, allowing the construction of a 'synthetic likelihood' that assesses model fit. This likelihood can be explored using a straightforward Markov chain Monte Carlo sampler, but one further post-processing step returns pure likelihood-based inference. I apply the method to establish the dynamic nature of the fluctuations in Nicholson's classic blowfly experiments.
A Scalable Approach to Probabilistic Latent Space Inference of Large-Scale Networks
Yin, Junming; Ho, Qirong; Xing, Eric P.
2014-01-01
We propose a scalable approach for making inference about latent spaces of large networks. With a succinct representation of networks as a bag of triangular motifs, a parsimonious statistical model, and an efficient stochastic variational inference algorithm, we are able to analyze real networks with over a million vertices and hundreds of latent roles on a single machine in a matter of hours, a setting that is out of reach for many existing methods. When compared to the state-of-the-art probabilistic approaches, our method is several orders of magnitude faster, with competitive or improved accuracy for latent space recovery and link prediction. PMID:25400487
IMAGINE: Interstellar MAGnetic field INference Engine
NASA Astrophysics Data System (ADS)
Steininger, Theo
2018-03-01
IMAGINE (Interstellar MAGnetic field INference Engine) performs inference on generic parametric models of the Galaxy. The modular open source framework uses highly optimized tools and technology such as the MultiNest sampler (ascl:1109.006) and the information field theory framework NIFTy (ascl:1302.013) to create an instance of the Milky Way based on a set of parameters for physical observables, using Bayesian statistics to judge the mismatch between measured data and model prediction. The flexibility of the IMAGINE framework allows for simple refitting for newly available data sets and makes state-of-the-art Bayesian methods easily accessible particularly for random components of the Galactic magnetic field.
Making inference from wildlife collision data: inferring predator absence from prey strikes
Hosack, Geoffrey R.; Barry, Simon C.
2017-01-01
Wildlife collision data are ubiquitous, though challenging for making ecological inference due to typically irreducible uncertainty relating to the sampling process. We illustrate a new approach that is useful for generating inference from predator data arising from wildlife collisions. By simply conditioning on a second prey species sampled via the same collision process, and by using a biologically realistic numerical response functions, we can produce a coherent numerical response relationship between predator and prey. This relationship can then be used to make inference on the population size of the predator species, including the probability of extinction. The statistical conditioning enables us to account for unmeasured variation in factors influencing the runway strike incidence for individual airports and to enable valid comparisons. A practical application of the approach for testing hypotheses about the distribution and abundance of a predator species is illustrated using the hypothesized red fox incursion into Tasmania, Australia. We estimate that conditional on the numerical response between fox and lagomorph runway strikes on mainland Australia, the predictive probability of observing no runway strikes of foxes in Tasmania after observing 15 lagomorph strikes is 0.001. We conclude there is enough evidence to safely reject the null hypothesis that there is a widespread red fox population in Tasmania at a population density consistent with prey availability. The method is novel and has potential wider application. PMID:28243534
Australian Curriculum Linked Lessons: Statistics
ERIC Educational Resources Information Center
Day, Lorraine
2014-01-01
Students recognise and analyse data and draw inferences. They represent, summarise and interpret data and undertake purposeful investigations involving the collection and interpretation of data… They develop an increasingly sophisticated ability to critically evaluate chance and data concepts and make reasoned judgments and decisions, as well as…
Sileshi, G
2006-10-01
Researchers and regulatory agencies often make statistical inferences from insect count data using modelling approaches that assume homogeneous variance. Such models do not allow for formal appraisal of variability which in its different forms is the subject of interest in ecology. Therefore, the objectives of this paper were to (i) compare models suitable for handling variance heterogeneity and (ii) select optimal models to ensure valid statistical inferences from insect count data. The log-normal, standard Poisson, Poisson corrected for overdispersion, zero-inflated Poisson, the negative binomial distribution and zero-inflated negative binomial models were compared using six count datasets on foliage-dwelling insects and five families of soil-dwelling insects. Akaike's and Schwarz Bayesian information criteria were used for comparing the various models. Over 50% of the counts were zeros even in locally abundant species such as Ootheca bennigseni Weise, Mesoplatys ochroptera Stål and Diaecoderus spp. The Poisson model after correction for overdispersion and the standard negative binomial distribution model provided better description of the probability distribution of seven out of the 11 insects than the log-normal, standard Poisson, zero-inflated Poisson or zero-inflated negative binomial models. It is concluded that excess zeros and variance heterogeneity are common data phenomena in insect counts. If not properly modelled, these properties can invalidate the normal distribution assumptions resulting in biased estimation of ecological effects and jeopardizing the integrity of the scientific inferences. Therefore, it is recommended that statistical models appropriate for handling these data properties be selected using objective criteria to ensure efficient statistical inference.
Testing for Granger Causality in the Frequency Domain: A Phase Resampling Method.
Liu, Siwei; Molenaar, Peter
2016-01-01
This article introduces phase resampling, an existing but rarely used surrogate data method for making statistical inferences of Granger causality in frequency domain time series analysis. Granger causality testing is essential for establishing causal relations among variables in multivariate dynamic processes. However, testing for Granger causality in the frequency domain is challenging due to the nonlinear relation between frequency domain measures (e.g., partial directed coherence, generalized partial directed coherence) and time domain data. Through a simulation study, we demonstrate that phase resampling is a general and robust method for making statistical inferences even with short time series. With Gaussian data, phase resampling yields satisfactory type I and type II error rates in all but one condition we examine: when a small effect size is combined with an insufficient number of data points. Violations of normality lead to slightly higher error rates but are mostly within acceptable ranges. We illustrate the utility of phase resampling with two empirical examples involving multivariate electroencephalography (EEG) and skin conductance data.
Albert, Carlo; Ulzega, Simone; Stoop, Ruedi
2016-04-01
Parameter inference is a fundamental problem in data-driven modeling. Given observed data that is believed to be a realization of some parameterized model, the aim is to find parameter values that are able to explain the observed data. In many situations, the dominant sources of uncertainty must be included into the model for making reliable predictions. This naturally leads to stochastic models. Stochastic models render parameter inference much harder, as the aim then is to find a distribution of likely parameter values. In Bayesian statistics, which is a consistent framework for data-driven learning, this so-called posterior distribution can be used to make probabilistic predictions. We propose a novel, exact, and very efficient approach for generating posterior parameter distributions for stochastic differential equation models calibrated to measured time series. The algorithm is inspired by reinterpreting the posterior distribution as a statistical mechanics partition function of an object akin to a polymer, where the measurements are mapped on heavier beads compared to those of the simulated data. To arrive at distribution samples, we employ a Hamiltonian Monte Carlo approach combined with a multiple time-scale integration. A separation of time scales naturally arises if either the number of measurement points or the number of simulation points becomes large. Furthermore, at least for one-dimensional problems, we can decouple the harmonic modes between measurement points and solve the fastest part of their dynamics analytically. Our approach is applicable to a wide range of inference problems and is highly parallelizable.
Map LineUps: Effects of spatial structure on graphical inference.
Beecham, Roger; Dykes, Jason; Meulemans, Wouter; Slingsby, Aidan; Turkay, Cagatay; Wood, Jo
2017-01-01
Fundamental to the effective use of visualization as an analytic and descriptive tool is the assurance that presenting data visually provides the capability of making inferences from what we see. This paper explores two related approaches to quantifying the confidence we may have in making visual inferences from mapped geospatial data. We adapt Wickham et al.'s 'Visual Line-up' method as a direct analogy with Null Hypothesis Significance Testing (NHST) and propose a new approach for generating more credible spatial null hypotheses. Rather than using as a spatial null hypothesis the unrealistic assumption of complete spatial randomness, we propose spatially autocorrelated simulations as alternative nulls. We conduct a set of crowdsourced experiments (n=361) to determine the just noticeable difference (JND) between pairs of choropleth maps of geographic units controlling for spatial autocorrelation (Moran's I statistic) and geometric configuration (variance in spatial unit area). Results indicate that people's abilities to perceive differences in spatial autocorrelation vary with baseline autocorrelation structure and the geometric configuration of geographic units. These results allow us, for the first time, to construct a visual equivalent of statistical power for geospatial data. Our JND results add to those provided in recent years by Klippel et al. (2011), Harrison et al. (2014) and Kay & Heer (2015) for correlation visualization. Importantly, they provide an empirical basis for an improved construction of visual line-ups for maps and the development of theory to inform geospatial tests of graphical inference.
Hozo, Iztok; Schell, Michael J; Djulbegovic, Benjamin
2008-07-01
The absolute truth in research is unobtainable, as no evidence or research hypothesis is ever 100% conclusive. Therefore, all data and inferences can in principle be considered as "inconclusive." Scientific inference and decision-making need to take into account errors, which are unavoidable in the research enterprise. The errors can occur at the level of conclusions that aim to discern the truthfulness of research hypothesis based on the accuracy of research evidence and hypothesis, and decisions, the goal of which is to enable optimal decision-making under present and specific circumstances. To optimize the chance of both correct conclusions and correct decisions, the synthesis of all major statistical approaches to clinical research is needed. The integration of these approaches (frequentist, Bayesian, and decision-analytic) can be accomplished through formal risk:benefit (R:B) analysis. This chapter illustrates the rational choice of a research hypothesis using R:B analysis based on decision-theoretic expected utility theory framework and the concept of "acceptable regret" to calculate the threshold probability of the "truth" above which the benefit of accepting a research hypothesis outweighs its risks.
Petersson, K M; Nichols, T E; Poline, J B; Holmes, A P
1999-01-01
Functional neuroimaging (FNI) provides experimental access to the intact living brain making it possible to study higher cognitive functions in humans. In this review and in a companion paper in this issue, we discuss some common methods used to analyse FNI data. The emphasis in both papers is on assumptions and limitations of the methods reviewed. There are several methods available to analyse FNI data indicating that none is optimal for all purposes. In order to make optimal use of the methods available it is important to know the limits of applicability. For the interpretation of FNI results it is also important to take into account the assumptions, approximations and inherent limitations of the methods used. This paper gives a brief overview over some non-inferential descriptive methods and common statistical models used in FNI. Issues relating to the complex problem of model selection are discussed. In general, proper model selection is a necessary prerequisite for the validity of the subsequent statistical inference. The non-inferential section describes methods that, combined with inspection of parameter estimates and other simple measures, can aid in the process of model selection and verification of assumptions. The section on statistical models covers approaches to global normalization and some aspects of univariate, multivariate, and Bayesian models. Finally, approaches to functional connectivity and effective connectivity are discussed. In the companion paper we review issues related to signal detection and statistical inference. PMID:10466149
When mechanism matters: Bayesian forecasting using models of ecological diffusion
Hefley, Trevor J.; Hooten, Mevin B.; Russell, Robin E.; Walsh, Daniel P.; Powell, James A.
2017-01-01
Ecological diffusion is a theory that can be used to understand and forecast spatio-temporal processes such as dispersal, invasion, and the spread of disease. Hierarchical Bayesian modelling provides a framework to make statistical inference and probabilistic forecasts, using mechanistic ecological models. To illustrate, we show how hierarchical Bayesian models of ecological diffusion can be implemented for large data sets that are distributed densely across space and time. The hierarchical Bayesian approach is used to understand and forecast the growth and geographic spread in the prevalence of chronic wasting disease in white-tailed deer (Odocoileus virginianus). We compare statistical inference and forecasts from our hierarchical Bayesian model to phenomenological regression-based methods that are commonly used to analyse spatial occurrence data. The mechanistic statistical model based on ecological diffusion led to important ecological insights, obviated a commonly ignored type of collinearity, and was the most accurate method for forecasting.
The Love of Large Numbers: A Popularity Bias in Consumer Choice.
Powell, Derek; Yu, Jingqi; DeWolf, Melissa; Holyoak, Keith J
2017-10-01
Social learning-the ability to learn from observing the decisions of other people and the outcomes of those decisions-is fundamental to human evolutionary and cultural success. The Internet now provides social evidence on an unprecedented scale. However, properly utilizing this evidence requires a capacity for statistical inference. We examined how people's interpretation of online review scores is influenced by the numbers of reviews-a potential indicator both of an item's popularity and of the precision of the average review score. Our task was designed to pit statistical information against social information. We modeled the behavior of an "intuitive statistician" using empirical prior information from millions of reviews posted on Amazon.com and then compared the model's predictions with the behavior of experimental participants. Under certain conditions, people preferred a product with more reviews to one with fewer reviews even though the statistical model indicated that the latter was likely to be of higher quality than the former. Overall, participants' judgments suggested that they failed to make meaningful statistical inferences.
Statistical Inference for Porous Materials using Persistent Homology.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moon, Chul; Heath, Jason E.; Mitchell, Scott A.
2017-12-01
We propose a porous materials analysis pipeline using persistent homology. We rst compute persistent homology of binarized 3D images of sampled material subvolumes. For each image we compute sets of homology intervals, which are represented as summary graphics called persistence diagrams. We convert persistence diagrams into image vectors in order to analyze the similarity of the homology of the material images using the mature tools for image analysis. Each image is treated as a vector and we compute its principal components to extract features. We t a statistical model using the loadings of principal components to estimate material porosity, permeability,more » anisotropy, and tortuosity. We also propose an adaptive version of the structural similarity index (SSIM), a similarity metric for images, as a measure to determine the statistical representative elementary volumes (sREV) for persistence homology. Thus we provide a capability for making a statistical inference of the uid ow and transport properties of porous materials based on their geometry and connectivity.« less
Investigation of Statistical Inference Methodologies Through Scale Model Propagation Experiments
2015-09-30
statistical inference methodologies for ocean- acoustic problems by investigating and applying statistical methods to data collected from scale-model...to begin planning experiments for statistical inference applications. APPROACH In the ocean acoustics community over the past two decades...solutions for waveguide parameters. With the introduction of statistical inference to the field of ocean acoustics came the desire to interpret marginal
Quantum-Like Bayesian Networks for Modeling Decision Making
Moreira, Catarina; Wichert, Andreas
2016-01-01
In this work, we explore an alternative quantum structure to perform quantum probabilistic inferences to accommodate the paradoxical findings of the Sure Thing Principle. We propose a Quantum-Like Bayesian Network, which consists in replacing classical probabilities by quantum probability amplitudes. However, since this approach suffers from the problem of exponential growth of quantum parameters, we also propose a similarity heuristic that automatically fits quantum parameters through vector similarities. This makes the proposed model general and predictive in contrast to the current state of the art models, which cannot be generalized for more complex decision scenarios and that only provide an explanatory nature for the observed paradoxes. In the end, the model that we propose consists in a nonparametric method for estimating inference effects from a statistical point of view. It is a statistical model that is simpler than the previous quantum dynamic and quantum-like models proposed in the literature. We tested the proposed network with several empirical data from the literature, mainly from the Prisoner's Dilemma game and the Two Stage Gambling game. The results obtained show that the proposed quantum Bayesian Network is a general method that can accommodate violations of the laws of classical probability theory and make accurate predictions regarding human decision-making in these scenarios. PMID:26858669
Data mining and statistical inference in selective laser melting
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kamath, Chandrika
Selective laser melting (SLM) is an additive manufacturing process that builds a complex three-dimensional part, layer-by-layer, using a laser beam to fuse fine metal powder together. The design freedom afforded by SLM comes associated with complexity. As the physical phenomena occur over a broad range of length and time scales, the computational cost of modeling the process is high. At the same time, the large number of parameters that control the quality of a part make experiments expensive. In this paper, we describe ways in which we can use data mining and statistical inference techniques to intelligently combine simulations andmore » experiments to build parts with desired properties. We start with a brief summary of prior work in finding process parameters for high-density parts. We then expand on this work to show how we can improve the approach by using feature selection techniques to identify important variables, data-driven surrogate models to reduce computational costs, improved sampling techniques to cover the design space adequately, and uncertainty analysis for statistical inference. Here, our results indicate that techniques from data mining and statistics can complement those from physical modeling to provide greater insight into complex processes such as selective laser melting.« less
Data mining and statistical inference in selective laser melting
Kamath, Chandrika
2016-01-11
Selective laser melting (SLM) is an additive manufacturing process that builds a complex three-dimensional part, layer-by-layer, using a laser beam to fuse fine metal powder together. The design freedom afforded by SLM comes associated with complexity. As the physical phenomena occur over a broad range of length and time scales, the computational cost of modeling the process is high. At the same time, the large number of parameters that control the quality of a part make experiments expensive. In this paper, we describe ways in which we can use data mining and statistical inference techniques to intelligently combine simulations andmore » experiments to build parts with desired properties. We start with a brief summary of prior work in finding process parameters for high-density parts. We then expand on this work to show how we can improve the approach by using feature selection techniques to identify important variables, data-driven surrogate models to reduce computational costs, improved sampling techniques to cover the design space adequately, and uncertainty analysis for statistical inference. Here, our results indicate that techniques from data mining and statistics can complement those from physical modeling to provide greater insight into complex processes such as selective laser melting.« less
An argument for mechanism-based statistical inference in cancer
Ochs, Michael; Price, Nathan D.; Tomasetti, Cristian; Younes, Laurent
2015-01-01
Cancer is perhaps the prototypical systems disease, and as such has been the focus of extensive study in quantitative systems biology. However, translating these programs into personalized clinical care remains elusive and incomplete. In this perspective, we argue that realizing this agenda—in particular, predicting disease phenotypes, progression and treatment response for individuals—requires going well beyond standard computational and bioinformatics tools and algorithms. It entails designing global mathematical models over network-scale configurations of genomic states and molecular concentrations, and learning the model parameters from limited available samples of high-dimensional and integrative omics data. As such, any plausible design should accommodate: biological mechanism, necessary for both feasible learning and interpretable decision making; stochasticity, to deal with uncertainty and observed variation at many scales; and a capacity for statistical inference at the patient level. This program, which requires a close, sustained collaboration between mathematicians and biologists, is illustrated in several contexts, including learning bio-markers, metabolism, cell signaling, network inference and tumorigenesis. PMID:25381197
Optimal Predictions in Everyday Cognition: The Wisdom of Individuals or Crowds?
ERIC Educational Resources Information Center
Mozer, Michael C.; Pashler, Harold; Homaei, Hadjar
2008-01-01
Griffiths and Tenenbaum (2006) asked individuals to make predictions about the duration or extent of everyday events (e.g., cake baking times), and reported that predictions were optimal, employing Bayesian inference based on veridical prior distributions. Although the predictions conformed strikingly to statistics of the world, they reflect…
Why We (Usually) Don't Have to Worry about Multiple Comparisons
ERIC Educational Resources Information Center
Gelman, Andrew; Hill, Jennifer; Yajima, Masanao
2012-01-01
Applied researchers often find themselves making statistical inferences in settings that would seem to require multiple comparisons adjustments. We challenge the Type I error paradigm that underlies these corrections. Moreover we posit that the problem of multiple comparisons can disappear entirely when viewed from a hierarchical Bayesian…
NASA Astrophysics Data System (ADS)
Stan Development Team
2018-01-01
Stan facilitates statistical inference at the frontiers of applied statistics and provides both a modeling language for specifying complex statistical models and a library of statistical algorithms for computing inferences with those models. These components are exposed through interfaces in environments such as R, Python, and the command line.
Local dependence in random graph models: characterization, properties and statistical inference
Schweinberger, Michael; Handcock, Mark S.
2015-01-01
Summary Dependent phenomena, such as relational, spatial and temporal phenomena, tend to be characterized by local dependence in the sense that units which are close in a well-defined sense are dependent. In contrast with spatial and temporal phenomena, though, relational phenomena tend to lack a natural neighbourhood structure in the sense that it is unknown which units are close and thus dependent. Owing to the challenge of characterizing local dependence and constructing random graph models with local dependence, many conventional exponential family random graph models induce strong dependence and are not amenable to statistical inference. We take first steps to characterize local dependence in random graph models, inspired by the notion of finite neighbourhoods in spatial statistics and M-dependence in time series, and we show that local dependence endows random graph models with desirable properties which make them amenable to statistical inference. We show that random graph models with local dependence satisfy a natural domain consistency condition which every model should satisfy, but conventional exponential family random graph models do not satisfy. In addition, we establish a central limit theorem for random graph models with local dependence, which suggests that random graph models with local dependence are amenable to statistical inference. We discuss how random graph models with local dependence can be constructed by exploiting either observed or unobserved neighbourhood structure. In the absence of observed neighbourhood structure, we take a Bayesian view and express the uncertainty about the neighbourhood structure by specifying a prior on a set of suitable neighbourhood structures. We present simulation results and applications to two real world networks with ‘ground truth’. PMID:26560142
ERIC Educational Resources Information Center
Jacob, Bridgette L.
2013-01-01
The difficulties introductory statistics students have with formal statistical inference are well known in the field of statistics education. "Informal" statistical inference has been studied as a means to introduce inferential reasoning well before and without the formalities of formal statistical inference. This mixed methods study…
Sandoval-Castellanos, Edson; Palkopoulou, Eleftheria; Dalén, Love
2014-01-01
Inference of population demographic history has vastly improved in recent years due to a number of technological and theoretical advances including the use of ancient DNA. Approximate Bayesian computation (ABC) stands among the most promising methods due to its simple theoretical fundament and exceptional flexibility. However, limited availability of user-friendly programs that perform ABC analysis renders it difficult to implement, and hence programming skills are frequently required. In addition, there is limited availability of programs able to deal with heterochronous data. Here we present the software BaySICS: Bayesian Statistical Inference of Coalescent Simulations. BaySICS provides an integrated and user-friendly platform that performs ABC analyses by means of coalescent simulations from DNA sequence data. It estimates historical demographic population parameters and performs hypothesis testing by means of Bayes factors obtained from model comparisons. Although providing specific features that improve inference from datasets with heterochronous data, BaySICS also has several capabilities making it a suitable tool for analysing contemporary genetic datasets. Those capabilities include joint analysis of independent tables, a graphical interface and the implementation of Markov-chain Monte Carlo without likelihoods.
Bayesian inference for joint modelling of longitudinal continuous, binary and ordinal events.
Li, Qiuju; Pan, Jianxin; Belcher, John
2016-12-01
In medical studies, repeated measurements of continuous, binary and ordinal outcomes are routinely collected from the same patient. Instead of modelling each outcome separately, in this study we propose to jointly model the trivariate longitudinal responses, so as to take account of the inherent association between the different outcomes and thus improve statistical inferences. This work is motivated by a large cohort study in the North West of England, involving trivariate responses from each patient: Body Mass Index, Depression (Yes/No) ascertained with cut-off score not less than 8 at the Hospital Anxiety and Depression Scale, and Pain Interference generated from the Medical Outcomes Study 36-item short-form health survey with values returned on an ordinal scale 1-5. There are some well-established methods for combined continuous and binary, or even continuous and ordinal responses, but little work was done on the joint analysis of continuous, binary and ordinal responses. We propose conditional joint random-effects models, which take into account the inherent association between the continuous, binary and ordinal outcomes. Bayesian analysis methods are used to make statistical inferences. Simulation studies show that, by jointly modelling the trivariate outcomes, standard deviations of the estimates of parameters in the models are smaller and much more stable, leading to more efficient parameter estimates and reliable statistical inferences. In the real data analysis, the proposed joint analysis yields a much smaller deviance information criterion value than the separate analysis, and shows other good statistical properties too. © The Author(s) 2014.
Explanations and Context in the Emergence of Students' Informal Inferential Reasoning
ERIC Educational Resources Information Center
Gil, Einat; Ben-Zvi, Dani
2011-01-01
Explanations are considered to be key aids to understanding the study of mathematics, science, and other complex disciplines. This paper discusses the role of students' explanations in making sense of data and learning to reason informally about statistical inference. We closely follow students' explanations in which they utilize their experiences…
What Response Rates Are Needed to Make Reliable Inferences from Student Evaluations of Teaching?
ERIC Educational Resources Information Center
Zumrawi, Abdel Azim; Bates, Simon P.; Schroeder, Marianne
2014-01-01
This paper addresses the determination of statistically desirable response rates in students' surveys, with emphasis on assessing the effect of underlying variability in the student evaluation of teaching (SET). We discuss factors affecting the determination of adequate response rates and highlight challenges caused by non-response and lack of…
Learning Probabilistic Inference through Spike-Timing-Dependent Plasticity.
Pecevski, Dejan; Maass, Wolfgang
2016-01-01
Numerous experimental data show that the brain is able to extract information from complex, uncertain, and often ambiguous experiences. Furthermore, it can use such learnt information for decision making through probabilistic inference. Several models have been proposed that aim at explaining how probabilistic inference could be performed by networks of neurons in the brain. We propose here a model that can also explain how such neural network could acquire the necessary information for that from examples. We show that spike-timing-dependent plasticity in combination with intrinsic plasticity generates in ensembles of pyramidal cells with lateral inhibition a fundamental building block for that: probabilistic associations between neurons that represent through their firing current values of random variables. Furthermore, by combining such adaptive network motifs in a recursive manner the resulting network is enabled to extract statistical information from complex input streams, and to build an internal model for the distribution p (*) that generates the examples it receives. This holds even if p (*) contains higher-order moments. The analysis of this learning process is supported by a rigorous theoretical foundation. Furthermore, we show that the network can use the learnt internal model immediately for prediction, decision making, and other types of probabilistic inference.
Learning Probabilistic Inference through Spike-Timing-Dependent Plasticity123
Pecevski, Dejan
2016-01-01
Abstract Numerous experimental data show that the brain is able to extract information from complex, uncertain, and often ambiguous experiences. Furthermore, it can use such learnt information for decision making through probabilistic inference. Several models have been proposed that aim at explaining how probabilistic inference could be performed by networks of neurons in the brain. We propose here a model that can also explain how such neural network could acquire the necessary information for that from examples. We show that spike-timing-dependent plasticity in combination with intrinsic plasticity generates in ensembles of pyramidal cells with lateral inhibition a fundamental building block for that: probabilistic associations between neurons that represent through their firing current values of random variables. Furthermore, by combining such adaptive network motifs in a recursive manner the resulting network is enabled to extract statistical information from complex input streams, and to build an internal model for the distribution p* that generates the examples it receives. This holds even if p* contains higher-order moments. The analysis of this learning process is supported by a rigorous theoretical foundation. Furthermore, we show that the network can use the learnt internal model immediately for prediction, decision making, and other types of probabilistic inference. PMID:27419214
Inference from the small scales of cosmic shear with current and future Dark Energy Survey data
MacCrann, N.; Aleksić, J.; Amara, A.; ...
2016-11-05
Cosmic shear is sensitive to fluctuations in the cosmological matter density field, including on small physical scales, where matter clustering is affected by baryonic physics in galaxies and galaxy clusters, such as star formation, supernovae feedback and AGN feedback. While muddying any cosmological information that is contained in small scale cosmic shear measurements, this does mean that cosmic shear has the potential to constrain baryonic physics and galaxy formation. We perform an analysis of the Dark Energy Survey (DES) Science Verification (SV) cosmic shear measurements, now extended to smaller scales, and using the Mead et al. 2015 halo model tomore » account for baryonic feedback. While the SV data has limited statistical power, we demonstrate using a simulated likelihood analysis that the final DES data will have the statistical power to differentiate among baryonic feedback scenarios. We also explore some of the difficulties in interpreting the small scales in cosmic shear measurements, presenting estimates of the size of several other systematic effects that make inference from small scales difficult, including uncertainty in the modelling of intrinsic alignment on nonlinear scales, `lensing bias', and shape measurement selection effects. For the latter two, we make use of novel image simulations. While future cosmic shear datasets have the statistical power to constrain baryonic feedback scenarios, there are several systematic effects that require improved treatments, in order to make robust conclusions about baryonic feedback.« less
A Bayesian Interpretation of First-Order Phase Transitions
NASA Astrophysics Data System (ADS)
Davis, Sergio; Peralta, Joaquín; Navarrete, Yasmín; González, Diego; Gutiérrez, Gonzalo
2016-03-01
In this work we review the formalism used in describing the thermodynamics of first-order phase transitions from the point of view of maximum entropy inference. We present the concepts of transition temperature, latent heat and entropy difference between phases as emergent from the more fundamental concept of internal energy, after a statistical inference analysis. We explicitly demonstrate this point of view by making inferences on a simple game, resulting in the same formalism as in thermodynamical phase transitions. We show that analogous quantities will inevitably arise in any problem of inferring the result of a yes/no question, given two different states of knowledge and information in the form of expectation values. This exposition may help to clarify the role of these thermodynamical quantities in the context of different first-order phase transitions such as the case of magnetic Hamiltonians (e.g. the Potts model).
Remembrance of inferences past: Amortization in human hypothesis generation.
Dasgupta, Ishita; Schulz, Eric; Goodman, Noah D; Gershman, Samuel J
2018-05-21
Bayesian models of cognition assume that people compute probability distributions over hypotheses. However, the required computations are frequently intractable or prohibitively expensive. Since people often encounter many closely related distributions, selective reuse of computations (amortized inference) is a computationally efficient use of the brain's limited resources. We present three experiments that provide evidence for amortization in human probabilistic reasoning. When sequentially answering two related queries about natural scenes, participants' responses to the second query systematically depend on the structure of the first query. This influence is sensitive to the content of the queries, only appearing when the queries are related. Using a cognitive load manipulation, we find evidence that people amortize summary statistics of previous inferences, rather than storing the entire distribution. These findings support the view that the brain trades off accuracy and computational cost, to make efficient use of its limited cognitive resources to approximate probabilistic inference. Copyright © 2018 Elsevier B.V. All rights reserved.
The Reasoning behind Informal Statistical Inference
ERIC Educational Resources Information Center
Makar, Katie; Bakker, Arthur; Ben-Zvi, Dani
2011-01-01
Informal statistical inference (ISI) has been a frequent focus of recent research in statistics education. Considering the role that context plays in developing ISI calls into question the need to be more explicit about the reasoning that underpins ISI. This paper uses educational literature on informal statistical inference and philosophical…
Gopnik, Alison
2012-09-28
New theoretical ideas and empirical research show that very young children's learning and thinking are strikingly similar to much learning and thinking in science. Preschoolers test hypotheses against data and make causal inferences; they learn from statistics and informal experimentation, and from watching and listening to others. The mathematical framework of probabilistic models and Bayesian inference can describe this learning in precise ways. These discoveries have implications for early childhood education and policy. In particular, they suggest both that early childhood experience is extremely important and that the trend toward more structured and academic early childhood programs is misguided.
Wilkinson, Michael
2014-03-01
Decisions about support for predictions of theories in light of data are made using statistical inference. The dominant approach in sport and exercise science is the Neyman-Pearson (N-P) significance-testing approach. When applied correctly it provides a reliable procedure for making dichotomous decisions for accepting or rejecting zero-effect null hypotheses with known and controlled long-run error rates. Type I and type II error rates must be specified in advance and the latter controlled by conducting an a priori sample size calculation. The N-P approach does not provide the probability of hypotheses or indicate the strength of support for hypotheses in light of data, yet many scientists believe it does. Outcomes of analyses allow conclusions only about the existence of non-zero effects, and provide no information about the likely size of true effects or their practical/clinical value. Bayesian inference can show how much support data provide for different hypotheses, and how personal convictions should be altered in light of data, but the approach is complicated by formulating probability distributions about prior subjective estimates of population effects. A pragmatic solution is magnitude-based inference, which allows scientists to estimate the true magnitude of population effects and how likely they are to exceed an effect magnitude of practical/clinical importance, thereby integrating elements of subjective Bayesian-style thinking. While this approach is gaining acceptance, progress might be hastened if scientists appreciate the shortcomings of traditional N-P null hypothesis significance testing.
Forward and backward inference in spatial cognition.
Penny, Will D; Zeidman, Peter; Burgess, Neil
2013-01-01
This paper shows that the various computations underlying spatial cognition can be implemented using statistical inference in a single probabilistic model. Inference is implemented using a common set of 'lower-level' computations involving forward and backward inference over time. For example, to estimate where you are in a known environment, forward inference is used to optimally combine location estimates from path integration with those from sensory input. To decide which way to turn to reach a goal, forward inference is used to compute the likelihood of reaching that goal under each option. To work out which environment you are in, forward inference is used to compute the likelihood of sensory observations under the different hypotheses. For reaching sensory goals that require a chaining together of decisions, forward inference can be used to compute a state trajectory that will lead to that goal, and backward inference to refine the route and estimate control signals that produce the required trajectory. We propose that these computations are reflected in recent findings of pattern replay in the mammalian brain. Specifically, that theta sequences reflect decision making, theta flickering reflects model selection, and remote replay reflects route and motor planning. We also propose a mapping of the above computational processes onto lateral and medial entorhinal cortex and hippocampus.
Forward and Backward Inference in Spatial Cognition
Penny, Will D.; Zeidman, Peter; Burgess, Neil
2013-01-01
This paper shows that the various computations underlying spatial cognition can be implemented using statistical inference in a single probabilistic model. Inference is implemented using a common set of ‘lower-level’ computations involving forward and backward inference over time. For example, to estimate where you are in a known environment, forward inference is used to optimally combine location estimates from path integration with those from sensory input. To decide which way to turn to reach a goal, forward inference is used to compute the likelihood of reaching that goal under each option. To work out which environment you are in, forward inference is used to compute the likelihood of sensory observations under the different hypotheses. For reaching sensory goals that require a chaining together of decisions, forward inference can be used to compute a state trajectory that will lead to that goal, and backward inference to refine the route and estimate control signals that produce the required trajectory. We propose that these computations are reflected in recent findings of pattern replay in the mammalian brain. Specifically, that theta sequences reflect decision making, theta flickering reflects model selection, and remote replay reflects route and motor planning. We also propose a mapping of the above computational processes onto lateral and medial entorhinal cortex and hippocampus. PMID:24348230
Measuring the Number of M Dwarfs per M Dwarf Using Kepler Eclipsing Binaries
NASA Astrophysics Data System (ADS)
Shan, Yutong; Johnson, John A.; Morton, Timothy D.
2015-11-01
We measure the binarity of detached M dwarfs in the Kepler field with orbital periods in the range of 1-90 days. Kepler’s photometric precision and nearly continuous monitoring of stellar targets over time baselines ranging from 3 months to 4 years make its detection efficiency for eclipsing binaries nearly complete over this period range and for all radius ratios. Our investigation employs a statistical framework akin to that used for inferring planetary occurrence rates from planetary transits. The obvious simplification is that eclipsing binaries have a vastly improved detection efficiency that is limited chiefly by their geometric probabilities to eclipse. For the M-dwarf sample observed by the Kepler Mission, the fractional incidence of eclipsing binaries implies that there are {0.11}-0.04+0.02 close stellar companions per apparently single M dwarf. Our measured binarity is higher than previous inferences of the occurrence rate of close binaries via radial velocity techniques, at roughly the 2σ level. This study represents the first use of eclipsing binary detections from a high quality transiting planet mission to infer binary statistics. Application of this statistical framework to the eclipsing binaries discovered by future transit surveys will establish better constraints on short-period M+M binary rate, as well as binarity measurements for stars of other spectral types.
Inferring Characteristics of Sensorimotor Behavior by Quantifying Dynamics of Animal Locomotion
NASA Astrophysics Data System (ADS)
Leung, KaWai
Locomotion is one of the most well-studied topics in animal behavioral studies. Many fundamental and clinical research make use of the locomotion of an animal model to explore various aspects in sensorimotor behavior. In the past, most of these studies focused on population average of a specific trait due to limitation of data collection and processing power. With recent advance in computer vision and statistical modeling techniques, it is now possible to track and analyze large amounts of behavioral data. In this thesis, I present two projects that aim to infer the characteristics of sensorimotor behavior by quantifying the dynamics of locomotion of nematode Caenorhabditis elegans and fruit fly Drosophila melanogaster, shedding light on statistical dependence between sensing and behavior. In the first project, I investigate the possibility of inferring noxious sensory information from the behavior of Caenorhabditis elegans. I develop a statistical model to infer the heat stimulus level perceived by individual animals from their stereotyped escape responses after stimulation by an IR laser. The model allows quantification of analgesic-like effects of chemical agents or genetic mutations in the worm. At the same time, the method is able to differentiate perturbations of locomotion behavior that are beyond affecting the sensory system. With this model I propose experimental designs that allows statistically significant identification of analgesic-like effects. In the second project, I investigate the relationship of energy budget and stability of locomotion in determining the walking speed distribution of Drosophila melanogaster during aging. The locomotion stability at different age groups is estimated from video recordings using Floquet theory. I calculate the power consumption of different locomotion speed using a biomechanics model. In conclusion, the power consumption, not stability, predicts the locomotion speed distribution at different ages.
ERIC Educational Resources Information Center
Xu, Zeyu; Nichols, Austin
2010-01-01
The gold standard in making causal inference on program effects is a randomized trial. Most randomization designs in education randomize classrooms or schools rather than individual students. Such "clustered randomization" designs have one principal drawback: They tend to have limited statistical power or precision. This study aims to…
Overholser, Brian R; Sowinski, Kevin M
2007-12-01
Biostatistics is the application of statistics to biologic data. The field of statistics can be broken down into 2 fundamental parts: descriptive and inferential. Descriptive statistics are commonly used to categorize, display, and summarize data. Inferential statistics can be used to make predictions based on a sample obtained from a population or some large body of information. It is these inferences that are used to test specific research hypotheses. This 2-part review will outline important features of descriptive and inferential statistics as they apply to commonly conducted research studies in the biomedical literature. Part 1 in this issue will discuss fundamental topics of statistics and data analysis. Additionally, some of the most commonly used statistical tests found in the biomedical literature will be reviewed in Part 2 in the February 2008 issue.
On Statistical Analysis of Neuroimages with Imperfect Registration
Kim, Won Hwa; Ravi, Sathya N.; Johnson, Sterling C.; Okonkwo, Ozioma C.; Singh, Vikas
2016-01-01
A variety of studies in neuroscience/neuroimaging seek to perform statistical inference on the acquired brain image scans for diagnosis as well as understanding the pathological manifestation of diseases. To do so, an important first step is to register (or co-register) all of the image data into a common coordinate system. This permits meaningful comparison of the intensities at each voxel across groups (e.g., diseased versus healthy) to evaluate the effects of the disease and/or use machine learning algorithms in a subsequent step. But errors in the underlying registration make this problematic, they either decrease the statistical power or make the follow-up inference tasks less effective/accurate. In this paper, we derive a novel algorithm which offers immunity to local errors in the underlying deformation field obtained from registration procedures. By deriving a deformation invariant representation of the image, the downstream analysis can be made more robust as if one had access to a (hypothetical) far superior registration procedure. Our algorithm is based on recent work on scattering transform. Using this as a starting point, we show how results from harmonic analysis (especially, non-Euclidean wavelets) yields strategies for designing deformation and additive noise invariant representations of large 3-D brain image volumes. We present a set of results on synthetic and real brain images where we achieve robust statistical analysis even in the presence of substantial deformation errors; here, standard analysis procedures significantly under-perform and fail to identify the true signal. PMID:27042168
Wang, Zhuo; Danziger, Samuel A; Heavner, Benjamin D; Ma, Shuyi; Smith, Jennifer J; Li, Song; Herricks, Thurston; Simeonidis, Evangelos; Baliga, Nitin S; Aitchison, John D; Price, Nathan D
2017-05-01
Gene regulatory and metabolic network models have been used successfully in many organisms, but inherent differences between them make networks difficult to integrate. Probabilistic Regulation Of Metabolism (PROM) provides a partial solution, but it does not incorporate network inference and underperforms in eukaryotes. We present an Integrated Deduced And Metabolism (IDREAM) method that combines statistically inferred Environment and Gene Regulatory Influence Network (EGRIN) models with the PROM framework to create enhanced metabolic-regulatory network models. We used IDREAM to predict phenotypes and genetic interactions between transcription factors and genes encoding metabolic activities in the eukaryote, Saccharomyces cerevisiae. IDREAM models contain many fewer interactions than PROM and yet produce significantly more accurate growth predictions. IDREAM consistently outperformed PROM using any of three popular yeast metabolic models and across three experimental growth conditions. Importantly, IDREAM's enhanced accuracy makes it possible to identify subtle synthetic growth defects. With experimental validation, these novel genetic interactions involving the pyruvate dehydrogenase complex suggested a new role for fatty acid-responsive factor Oaf1 in regulating acetyl-CoA production in glucose grown cells.
Cloern, James E.; Jassby, Alan D.; Carstensen, Jacob; Bennett, William A.; Kimmerer, Wim; Mac Nally, Ralph; Schoellhamer, David H.; Winder, Monika
2012-01-01
We comment on a nonstandard statistical treatment of time-series data first published by Breton et al. (2006) in Limnology and Oceanography and, more recently, used by Glibert (2010) in Reviews in Fisheries Science. In both papers, the authors make strong inferences about the underlying causes of population variability based on correlations between cumulative sum (CUSUM) transformations of organism abundances and environmental variables. Breton et al. (2006) reported correlations between CUSUM-transformed values of diatom biomass in Belgian coastal waters and the North Atlantic Oscillation, and between meteorological and hydrological variables. Each correlation of CUSUM-transformed variables was judged to be statistically significant. On the basis of these correlations, Breton et al. (2006) developed "the first evidence of synergy between climate and human-induced river-based nitrate inputs with respect to their effects on the magnitude of spring Phaeocystis colony blooms and their dominance over diatoms."
Bayes factors for the linear ballistic accumulator model of decision-making.
Evans, Nathan J; Brown, Scott D
2018-04-01
Evidence accumulation models of decision-making have led to advances in several different areas of psychology. These models provide a way to integrate response time and accuracy data, and to describe performance in terms of latent cognitive processes. Testing important psychological hypotheses using cognitive models requires a method to make inferences about different versions of the models which assume different parameters to cause observed effects. The task of model-based inference using noisy data is difficult, and has proven especially problematic with current model selection methods based on parameter estimation. We provide a method for computing Bayes factors through Monte-Carlo integration for the linear ballistic accumulator (LBA; Brown and Heathcote, 2008), a widely used evidence accumulation model. Bayes factors are used frequently for inference with simpler statistical models, and they do not require parameter estimation. In order to overcome the computational burden of estimating Bayes factors via brute force integration, we exploit general purpose graphical processing units; we provide free code for this. This approach allows estimation of Bayes factors via Monte-Carlo integration within a practical time frame. We demonstrate the method using both simulated and real data. We investigate the stability of the Monte-Carlo approximation, and the LBA's inferential properties, in simulation studies.
ERIC Educational Resources Information Center
Trafimow, David
2017-01-01
There has been much controversy over the null hypothesis significance testing procedure, with much of the criticism centered on the problem of inverse inference. Specifically, p gives the probability of the finding (or one more extreme) given the null hypothesis, whereas the null hypothesis significance testing procedure involves drawing a…
Assessing the fit of site-occupancy models
MacKenzie, D.I.; Bailey, L.L.
2004-01-01
Few species are likely to be so evident that they will always be detected at a site when present. Recently a model has been developed that enables estimation of the proportion of area occupied, when the target species is not detected with certainty. Here we apply this modeling approach to data collected on terrestrial salamanders in the Plethodon glutinosus complex in the Great Smoky Mountains National Park, USA, and wish to address the question 'how accurately does the fitted model represent the data?' The goodness-of-fit of the model needs to be assessed in order to make accurate inferences. This article presents a method where a simple Pearson chi-square statistic is calculated and a parametric bootstrap procedure is used to determine whether the observed statistic is unusually large. We found evidence that the most global model considered provides a poor fit to the data, hence estimated an overdispersion factor to adjust model selection procedures and inflate standard errors. Two hypothetical datasets with known assumption violations are also analyzed, illustrating that the method may be used to guide researchers to making appropriate inferences. The results of a simulation study are presented to provide a broader view of the methods properties.
Design-based Sample and Probability Law-Assumed Sample: Their Role in Scientific Investigation.
ERIC Educational Resources Information Center
Ojeda, Mario Miguel; Sahai, Hardeo
2002-01-01
Discusses some key statistical concepts in probabilistic and non-probabilistic sampling to provide an overview for understanding the inference process. Suggests a statistical model constituting the basis of statistical inference and provides a brief review of the finite population descriptive inference and a quota sampling inferential theory.…
The Importance of Statistical Modeling in Data Analysis and Inference
ERIC Educational Resources Information Center
Rollins, Derrick, Sr.
2017-01-01
Statistical inference simply means to draw a conclusion based on information that comes from data. Error bars are the most commonly used tool for data analysis and inference in chemical engineering data studies. This work demonstrates, using common types of data collection studies, the importance of specifying the statistical model for sound…
Optimally designing games for behavioural research
Rafferty, Anna N.; Zaharia, Matei; Griffiths, Thomas L.
2014-01-01
Computer games can be motivating and engaging experiences that facilitate learning, leading to their increasing use in education and behavioural experiments. For these applications, it is often important to make inferences about the knowledge and cognitive processes of players based on their behaviour. However, designing games that provide useful behavioural data are a difficult task that typically requires significant trial and error. We address this issue by creating a new formal framework that extends optimal experiment design, used in statistics, to apply to game design. In this framework, we use Markov decision processes to model players' actions within a game, and then make inferences about the parameters of a cognitive model from these actions. Using a variety of concept learning games, we show that in practice, this method can predict which games will result in better estimates of the parameters of interest. The best games require only half as many players to attain the same level of precision. PMID:25002821
Inferring Demographic History Using Two-Locus Statistics.
Ragsdale, Aaron P; Gutenkunst, Ryan N
2017-06-01
Population demographic history may be learned from contemporary genetic variation data. Methods based on aggregating the statistics of many single loci into an allele frequency spectrum (AFS) have proven powerful, but such methods ignore potentially informative patterns of linkage disequilibrium (LD) between neighboring loci. To leverage such patterns, we developed a composite-likelihood framework for inferring demographic history from aggregated statistics of pairs of loci. Using this framework, we show that two-locus statistics are more sensitive to demographic history than single-locus statistics such as the AFS. In particular, two-locus statistics escape the notorious confounding of depth and duration of a bottleneck, and they provide a means to estimate effective population size based on the recombination rather than mutation rate. We applied our approach to a Zambian population of Drosophila melanogaster Notably, using both single- and two-locus statistics, we inferred a substantially lower ancestral effective population size than previous works and did not infer a bottleneck history. Together, our results demonstrate the broad potential for two-locus statistics to enable powerful population genetic inference. Copyright © 2017 by the Genetics Society of America.
Hansen, John P
2003-01-01
Healthcare quality improvement professionals need to understand and use inferential statistics to interpret sample data from their organizations. In quality improvement and healthcare research studies all the data from a population often are not available, so investigators take samples and make inferences about the population by using inferential statistics. This three-part series will give readers an understanding of the concepts of inferential statistics as well as the specific tools for calculating confidence intervals for samples of data. This article, Part 1, presents basic information about data including a classification system that describes the four major types of variables: continuous quantitative variable, discrete quantitative variable, ordinal categorical variable (including the binomial variable), and nominal categorical variable. A histogram is a graph that displays the frequency distribution for a continuous variable. The article also demonstrates how to calculate the mean, median, standard deviation, and variance for a continuous variable.
Reiter, Andrea M F; Deserno, Lorenz; Kallert, Thomas; Heinze, Hans-Jochen; Heinz, Andreas; Schlagenhauf, Florian
2016-10-26
Addicted individuals continue substance use despite the knowledge of harmful consequences and often report having no choice but to consume. Computational psychiatry accounts have linked this clinical observation to difficulties in making flexible and goal-directed decisions in dynamic environments via consideration of potential alternative choices. To probe this in alcohol-dependent patients (n = 43) versus healthy volunteers (n = 35), human participants performed an anticorrelated decision-making task during functional neuroimaging. Via computational modeling, we investigated behavioral and neural signatures of inference regarding the alternative option. While healthy control subjects exploited the anticorrelated structure of the task to guide decision-making, alcohol-dependent patients were relatively better explained by a model-free strategy due to reduced inference on the alternative option after punishment. Whereas model-free prediction error signals were preserved, alcohol-dependent patients exhibited blunted medial prefrontal signatures of inference on the alternative option. This reduction was associated with patients' behavioral deficit in updating the alternative choice option and their obsessive-compulsive drinking habits. All results remained significant when adjusting for potential confounders (e.g., neuropsychological measures and gray matter density). A disturbed integration of alternative choice options implemented by the medial prefrontal cortex appears to be one important explanation for the puzzling question of why addicted individuals continue drug consumption despite negative consequences. In addiction, patients maintain substance use despite devastating consequences and often report having no choice but to consume. These clinical observations have been theoretically linked to disturbed mechanisms of inference, for example, to difficulties when learning statistical regularities of the environmental structure to guide decisions. Using computational modeling, we demonstrate disturbed inference on alternative choice options in alcohol addiction. Patients neglecting "what might have happened" was accompanied by blunted coding of inference regarding alternative choice options in the medial prefrontal cortex. An impaired integration of alternative choice options implemented by the medial prefrontal cortex might contribute to ongoing drug consumption in the face of evident negative consequences. Copyright © 2016 the authors 0270-6474/16/3610935-14$15.00/0.
Buchsbaum, Daphna; Seiver, Elizabeth; Bridgers, Sophie; Gopnik, Alison
2012-01-01
A major challenge children face is uncovering the causal structure of the world around them. Previous research on children's causal inference has demonstrated their ability to learn about causal relationships in the physical environment using probabilistic evidence. However, children must also learn about causal relationships in the social environment, including discovering the causes of other people's behavior, and understanding the causal relationships between others' goal-directed actions and the outcomes of those actions. In this chapter, we argue that social reasoning and causal reasoning are deeply linked, both in the real world and in children's minds. Children use both types of information together and in fact reason about both physical and social causation in fundamentally similar ways. We suggest that children jointly construct and update causal theories about their social and physical environment and that this process is best captured by probabilistic models of cognition. We first present studies showing that adults are able to jointly infer causal structure and human action structure from videos of unsegmented human motion. Next, we describe how children use social information to make inferences about physical causes. We show that the pedagogical nature of a demonstrator influences children's choices of which actions to imitate from within a causal sequence and that this social information interacts with statistical causal evidence. We then discuss how children combine evidence from an informant's testimony and expressed confidence with evidence from their own causal observations to infer the efficacy of different potential causes. We also discuss how children use these same causal observations to make inferences about the knowledge state of the social informant. Finally, we suggest that psychological causation and attribution are part of the same causal system as physical causation. We present evidence that just as children use covariation between physical causes and their effects to learn physical causal relationships, they also use covaration between people's actions and the environment to make inferences about the causes of human behavior.
ERIC Educational Resources Information Center
Daugaard, Hanne Trebbien; Cain, Kate; Elbro, Carsten
2017-01-01
We examined the relationship between inference making, vocabulary knowledge, and verbal working memory on children's reading comprehension in 62 6th graders (aged 12). The effect of vocabulary knowledge on reading comprehension was predicted to be partly mediated by inference making for two reasons: Inference making often taps the semantic…
Teaching Statistical Inference for Causal Effects in Experiments and Observational Studies
ERIC Educational Resources Information Center
Rubin, Donald B.
2004-01-01
Inference for causal effects is a critical activity in many branches of science and public policy. The field of statistics is the one field most suited to address such problems, whether from designed experiments or observational studies. Consequently, it is arguably essential that departments of statistics teach courses in causal inference to both…
[The research protocol VI: How to choose the appropriate statistical test. Inferential statistics].
Flores-Ruiz, Eric; Miranda-Novales, María Guadalupe; Villasís-Keever, Miguel Ángel
2017-01-01
The statistical analysis can be divided in two main components: descriptive analysis and inferential analysis. An inference is to elaborate conclusions from the tests performed with the data obtained from a sample of a population. Statistical tests are used in order to establish the probability that a conclusion obtained from a sample is applicable to the population from which it was obtained. However, choosing the appropriate statistical test in general poses a challenge for novice researchers. To choose the statistical test it is necessary to take into account three aspects: the research design, the number of measurements and the scale of measurement of the variables. Statistical tests are divided into two sets, parametric and nonparametric. Parametric tests can only be used if the data show a normal distribution. Choosing the right statistical test will make it easier for readers to understand and apply the results.
Training Inference Making Skills Using a Situation Model Approach Improves Reading Comprehension
Bos, Lisanne T.; De Koning, Bjorn B.; Wassenburg, Stephanie I.; van der Schoot, Menno
2016-01-01
This study aimed to enhance third and fourth graders’ text comprehension at the situation model level. Therefore, we tested a reading strategy training developed to target inference making skills, which are widely considered to be pivotal to situation model construction. The training was grounded in contemporary literature on situation model-based inference making and addressed the source (text-based versus knowledge-based), type (necessary versus unnecessary for (re-)establishing coherence), and depth of an inference (making single lexical inferences versus combining multiple lexical inferences), as well as the type of searching strategy (forward versus backward). Results indicated that, compared to a control group (n = 51), children who followed the experimental training (n = 67) improved their inference making skills supportive to situation model construction. Importantly, our training also resulted in increased levels of general reading comprehension and motivation. In sum, this study showed that a ‘level of text representation’-approach can provide a useful framework to teach inference making skills to third and fourth graders. PMID:26913014
Reasoning about Informal Statistical Inference: One Statistician's View
ERIC Educational Resources Information Center
Rossman, Allan J.
2008-01-01
This paper identifies key concepts and issues associated with the reasoning of informal statistical inference. I focus on key ideas of inference that I think all students should learn, including at secondary level as well as tertiary. I argue that a fundamental component of inference is to go beyond the data at hand, and I propose that statistical…
NASA Astrophysics Data System (ADS)
Saputra, K. V. I.; Cahyadi, L.; Sembiring, U. A.
2018-01-01
Start in this paper, we assess our traditional elementary statistics education and also we introduce elementary statistics with simulation-based inference. To assess our statistical class, we adapt the well-known CAOS (Comprehensive Assessment of Outcomes in Statistics) test that serves as an external measure to assess the student’s basic statistical literacy. This test generally represents as an accepted measure of statistical literacy. We also introduce a new teaching method on elementary statistics class. Different from the traditional elementary statistics course, we will introduce a simulation-based inference method to conduct hypothesis testing. From the literature, it has shown that this new teaching method works very well in increasing student’s understanding of statistics.
Cluster mass inference via random field theory.
Zhang, Hui; Nichols, Thomas E; Johnson, Timothy D
2009-01-01
Cluster extent and voxel intensity are two widely used statistics in neuroimaging inference. Cluster extent is sensitive to spatially extended signals while voxel intensity is better for intense but focal signals. In order to leverage strength from both statistics, several nonparametric permutation methods have been proposed to combine the two methods. Simulation studies have shown that of the different cluster permutation methods, the cluster mass statistic is generally the best. However, to date, there is no parametric cluster mass inference available. In this paper, we propose a cluster mass inference method based on random field theory (RFT). We develop this method for Gaussian images, evaluate it on Gaussian and Gaussianized t-statistic images and investigate its statistical properties via simulation studies and real data. Simulation results show that the method is valid under the null hypothesis and demonstrate that it can be more powerful than the cluster extent inference method. Further, analyses with a single subject and a group fMRI dataset demonstrate better power than traditional cluster size inference, and good accuracy relative to a gold-standard permutation test.
How accurately can other people infer your thoughts-And does culture matter?
Valanides, Constantinos; Sheppard, Elizabeth; Mitchell, Peter
2017-01-01
This research investigated how accurately people infer what others are thinking after observing a brief sample of their behaviour and whether culture/similarity is a relevant factor. Target participants (14 British and 14 Mediterraneans) were cued to think about either positive or negative events they had experienced. Subsequently, perceiver participants (16 British and 16 Mediterraneans) watched videos of the targets thinking about these things. Perceivers (both groups) were significantly accurate in judging when targets had been cued to think of something positive versus something negative, indicating notable inferential ability. Additionally, Mediterranean perceivers were better than British perceivers in making such inferences, irrespective of nationality of the targets, something that was statistically accounted for by corresponding group differences in levels of independently measured collectivism. The results point to the need for further research to investigate the possibility that being reared in a collectivist culture fosters ability in interpreting others' behaviour.
How accurately can other people infer your thoughts—And does culture matter?
Valanides, Constantinos; Sheppard, Elizabeth; Mitchell, Peter
2017-01-01
This research investigated how accurately people infer what others are thinking after observing a brief sample of their behaviour and whether culture/similarity is a relevant factor. Target participants (14 British and 14 Mediterraneans) were cued to think about either positive or negative events they had experienced. Subsequently, perceiver participants (16 British and 16 Mediterraneans) watched videos of the targets thinking about these things. Perceivers (both groups) were significantly accurate in judging when targets had been cued to think of something positive versus something negative, indicating notable inferential ability. Additionally, Mediterranean perceivers were better than British perceivers in making such inferences, irrespective of nationality of the targets, something that was statistically accounted for by corresponding group differences in levels of independently measured collectivism. The results point to the need for further research to investigate the possibility that being reared in a collectivist culture fosters ability in interpreting others’ behaviour. PMID:29112972
Fluency heuristic: a model of how the mind exploits a by-product of information retrieval.
Hertwig, Ralph; Herzog, Stefan M; Schooler, Lael J; Reimer, Torsten
2008-09-01
Boundedly rational heuristics for inference can be surprisingly accurate and frugal for several reasons. They can exploit environmental structures, co-opt complex capacities, and elude effortful search by exploiting information that automatically arrives on the mental stage. The fluency heuristic is a prime example of a heuristic that makes the most of an automatic by-product of retrieval from memory, namely, retrieval fluency. In 4 experiments, the authors show that retrieval fluency can be a proxy for real-world quantities, that people can discriminate between two objects' retrieval fluencies, and that people's inferences are in line with the fluency heuristic (in particular fast inferences) and with experimentally manipulated fluency. The authors conclude that the fluency heuristic may be one tool in the mind's repertoire of strategies that artfully probes memory for encapsulated frequency information that can veridically reflect statistical regularities in the world. (c) 2008 APA, all rights reserved.
Ohno-Machado, Lucila; Vinterbo, Staal; Dreiseitl, Stephan
2002-01-01
Protecting individual data in disclosed databases is essential. Data anonymization strategies can produce table ambiguation by suppression of selected cells. Using table ambiguation, different degrees of anonymization can be achieved, depending on the number of individuals that a particular case must become indistinguishable from. This number defines the level of anonymization. Anonymization by cell suppression does not necessarily prevent inferences from being made from the disclosed data. Preventing inferences may be important to preserve confidentiality. We show that anonymized data sets can preserve descriptive characteristics of the data, but might also be used for making inferences on particular individuals, which is a feature that may not be desirable. The degradation of predictive performance is directly proportional to the degree of anonymity. As an example, we report the effect of anonymization on the predictive performance of a model constructed to estimate the probability of disease given clinical findings.
Ohno-Machado, L.; Vinterbo, S. A.; Dreiseitl, S.
2001-01-01
Protecting individual data in disclosed databases is essential. Data anonymization strategies can produce table ambiguation by suppression of selected cells. Using table ambiguation, different degrees of anonymization can be achieved, depending on the number of individuals that a particular case must become indistinguishable from. This number defines the level of anonymization. Anonymization by cell suppression does not necessarily prevent inferences from being made from the disclosed data. Preventing inferences may be important to preserve confidentiality. We show that anonymized data sets can preserve descriptive characteristics of the data, but might also be used for making inferences on particular individuals, which is a feature that may not be desirable. The degradation of predictive performance is directly proportional to the degree of anonymity. As an example, we report the effect of anonymization on the predictive performance of a model constructed to estimate the probability of disease given clinical findings. PMID:11825239
A Test by Any Other Name: P Values, Bayes Factors, and Statistical Inference.
Stern, Hal S
2016-01-01
Procedures used for statistical inference are receiving increased scrutiny as the scientific community studies the factors associated with insuring reproducible research. This note addresses recent negative attention directed at p values, the relationship of confidence intervals and tests, and the role of Bayesian inference and Bayes factors, with an eye toward better understanding these different strategies for statistical inference. We argue that researchers and data analysts too often resort to binary decisions (e.g., whether to reject or accept the null hypothesis) in settings where this may not be required.
Children's Category-Based Inferences Affect Classification
ERIC Educational Resources Information Center
Ross, Brian H.; Gelman, Susan A.; Rosengren, Karl S.
2005-01-01
Children learn many new categories and make inferences about these categories. Much work has examined how children make inferences on the basis of category knowledge. However, inferences may also affect what is learned about a category. Four experiments examine whether category-based inferences during category learning influence category knowledge…
Using Guided Reinvention to Develop Teachers' Understanding of Hypothesis Testing Concepts
ERIC Educational Resources Information Center
Dolor, Jason; Noll, Jennifer
2015-01-01
Statistics education reform efforts emphasize the importance of informal inference in the learning of statistics. Research suggests statistics teachers experience similar difficulties understanding statistical inference concepts as students and how teacher knowledge can impact student learning. This study investigates how teachers reinvented an…
Racing to learn: statistical inference and learning in a single spiking neuron with adaptive kernels
Afshar, Saeed; George, Libin; Tapson, Jonathan; van Schaik, André; Hamilton, Tara J.
2014-01-01
This paper describes the Synapto-dendritic Kernel Adapting Neuron (SKAN), a simple spiking neuron model that performs statistical inference and unsupervised learning of spatiotemporal spike patterns. SKAN is the first proposed neuron model to investigate the effects of dynamic synapto-dendritic kernels and demonstrate their computational power even at the single neuron scale. The rule-set defining the neuron is simple: there are no complex mathematical operations such as normalization, exponentiation or even multiplication. The functionalities of SKAN emerge from the real-time interaction of simple additive and binary processes. Like a biological neuron, SKAN is robust to signal and parameter noise, and can utilize both in its operations. At the network scale neurons are locked in a race with each other with the fastest neuron to spike effectively “hiding” its learnt pattern from its neighbors. The robustness to noise, high speed, and simple building blocks not only make SKAN an interesting neuron model in computational neuroscience, but also make it ideal for implementation in digital and analog neuromorphic systems which is demonstrated through an implementation in a Field Programmable Gate Array (FPGA). Matlab, Python, and Verilog implementations of SKAN are available at: http://www.uws.edu.au/bioelectronics_neuroscience/bens/reproducible_research. PMID:25505378
Afshar, Saeed; George, Libin; Tapson, Jonathan; van Schaik, André; Hamilton, Tara J
2014-01-01
This paper describes the Synapto-dendritic Kernel Adapting Neuron (SKAN), a simple spiking neuron model that performs statistical inference and unsupervised learning of spatiotemporal spike patterns. SKAN is the first proposed neuron model to investigate the effects of dynamic synapto-dendritic kernels and demonstrate their computational power even at the single neuron scale. The rule-set defining the neuron is simple: there are no complex mathematical operations such as normalization, exponentiation or even multiplication. The functionalities of SKAN emerge from the real-time interaction of simple additive and binary processes. Like a biological neuron, SKAN is robust to signal and parameter noise, and can utilize both in its operations. At the network scale neurons are locked in a race with each other with the fastest neuron to spike effectively "hiding" its learnt pattern from its neighbors. The robustness to noise, high speed, and simple building blocks not only make SKAN an interesting neuron model in computational neuroscience, but also make it ideal for implementation in digital and analog neuromorphic systems which is demonstrated through an implementation in a Field Programmable Gate Array (FPGA). Matlab, Python, and Verilog implementations of SKAN are available at: http://www.uws.edu.au/bioelectronics_neuroscience/bens/reproducible_research.
Multiple Illuminant Colour Estimation via Statistical Inference on Factor Graphs.
Mutimbu, Lawrence; Robles-Kelly, Antonio
2016-08-31
This paper presents a method to recover a spatially varying illuminant colour estimate from scenes lit by multiple light sources. Starting with the image formation process, we formulate the illuminant recovery problem in a statistically datadriven setting. To do this, we use a factor graph defined across the scale space of the input image. In the graph, we utilise a set of illuminant prototypes computed using a data driven approach. As a result, our method delivers a pixelwise illuminant colour estimate being devoid of libraries or user input. The use of a factor graph also allows for the illuminant estimates to be recovered making use of a maximum a posteriori (MAP) inference process. Moreover, we compute the probability marginals by performing a Delaunay triangulation on our factor graph. We illustrate the utility of our method for pixelwise illuminant colour recovery on widely available datasets and compare against a number of alternatives. We also show sample colour correction results on real-world images.
Permutation inference for the general linear model
Winkler, Anderson M.; Ridgway, Gerard R.; Webster, Matthew A.; Smith, Stephen M.; Nichols, Thomas E.
2014-01-01
Permutation methods can provide exact control of false positives and allow the use of non-standard statistics, making only weak assumptions about the data. With the availability of fast and inexpensive computing, their main limitation would be some lack of flexibility to work with arbitrary experimental designs. In this paper we report on results on approximate permutation methods that are more flexible with respect to the experimental design and nuisance variables, and conduct detailed simulations to identify the best method for settings that are typical for imaging research scenarios. We present a generic framework for permutation inference for complex general linear models (glms) when the errors are exchangeable and/or have a symmetric distribution, and show that, even in the presence of nuisance effects, these permutation inferences are powerful while providing excellent control of false positives in a wide range of common and relevant imaging research scenarios. We also demonstrate how the inference on glm parameters, originally intended for independent data, can be used in certain special but useful cases in which independence is violated. Detailed examples of common neuroimaging applications are provided, as well as a complete algorithm – the “randomise” algorithm – for permutation inference with the glm. PMID:24530839
Parametric modelling of cost data in medical studies.
Nixon, R M; Thompson, S G
2004-04-30
The cost of medical resources used is often recorded for each patient in clinical studies in order to inform decision-making. Although cost data are generally skewed to the right, interest is in making inferences about the population mean cost. Common methods for non-normal data, such as data transformation, assuming asymptotic normality of the sample mean or non-parametric bootstrapping, are not ideal. This paper describes possible parametric models for analysing cost data. Four example data sets are considered, which have different sample sizes and degrees of skewness. Normal, gamma, log-normal, and log-logistic distributions are fitted, together with three-parameter versions of the latter three distributions. Maximum likelihood estimates of the population mean are found; confidence intervals are derived by a parametric BC(a) bootstrap and checked by MCMC methods. Differences between model fits and inferences are explored.Skewed parametric distributions fit cost data better than the normal distribution, and should in principle be preferred for estimating the population mean cost. However for some data sets, we find that models that fit badly can give similar inferences to those that fit well. Conversely, particularly when sample sizes are not large, different parametric models that fit the data equally well can lead to substantially different inferences. We conclude that inferences are sensitive to choice of statistical model, which itself can remain uncertain unless there is enough data to model the tail of the distribution accurately. Investigating the sensitivity of conclusions to choice of model should thus be an essential component of analysing cost data in practice. Copyright 2004 John Wiley & Sons, Ltd.
ERIC Educational Resources Information Center
Larwin, Karen H.; Larwin, David A.
2011-01-01
Bootstrapping methods and random distribution methods are increasingly recommended as better approaches for teaching students about statistical inference in introductory-level statistics courses. The authors examined the effect of teaching undergraduate business statistics students using random distribution and bootstrapping simulations. It is the…
Application of Transformations in Parametric Inference
ERIC Educational Resources Information Center
Brownstein, Naomi; Pensky, Marianna
2008-01-01
The objective of the present paper is to provide a simple approach to statistical inference using the method of transformations of variables. We demonstrate performance of this powerful tool on examples of constructions of various estimation procedures, hypothesis testing, Bayes analysis and statistical inference for the stress-strength systems.…
ERIC Educational Resources Information Center
Honda, Hidehito; Matsuka, Toshihiko; Ueda, Kazuhiro
2017-01-01
Some researchers on binary choice inference have argued that people make inferences based on simple heuristics, such as recognition, fluency, or familiarity. Others have argued that people make inferences based on available knowledge. To examine the boundary between heuristic and knowledge usage, we examine binary choice inference processes in…
Jang, Sumin; Choubey, Sandeep; Furchtgott, Leon; Zou, Ling-Nan; Doyle, Adele; Menon, Vilas; Loew, Ethan B; Krostag, Anne-Rachel; Martinez, Refugio A; Madisen, Linda; Levi, Boaz P; Ramanathan, Sharad
2017-01-01
The complexity of gene regulatory networks that lead multipotent cells to acquire different cell fates makes a quantitative understanding of differentiation challenging. Using a statistical framework to analyze single-cell transcriptomics data, we infer the gene expression dynamics of early mouse embryonic stem (mES) cell differentiation, uncovering discrete transitions across nine cell states. We validate the predicted transitions across discrete states using flow cytometry. Moreover, using live-cell microscopy, we show that individual cells undergo abrupt transitions from a naïve to primed pluripotent state. Using the inferred discrete cell states to build a probabilistic model for the underlying gene regulatory network, we further predict and experimentally verify that these states have unique response to perturbations, thus defining them functionally. Our study provides a framework to infer the dynamics of differentiation from single cell transcriptomics data and to build predictive models of the gene regulatory networks that drive the sequence of cell fate decisions during development. DOI: http://dx.doi.org/10.7554/eLife.20487.001 PMID:28296635
Fast, Cynthia D; Flesher, M Melissa; Nocera, Nathanial A; Fanselow, Michael S; Blaisdell, Aaron P
2016-06-01
Identifying statistical patterns between environmental stimuli enables organisms to respond adaptively when cues are later observed. However, stimuli are often obscured from detection, necessitating behavior under conditions of ambiguity. Considerable evidence indicates decisions under ambiguity rely on inference processes that draw on past experiences to generate predictions under novel conditions. Despite the high demand for this process and the observation that it deteriorates disproportionately with age, the underlying mechanisms remain unknown. We developed a rodent model of decision-making during ambiguity to examine features of experience that contribute to inference. Rats learned either a simple (positive patterning) or complex (negative patterning) instrumental discrimination between the illumination of one or two lights. During test, only one light was lit while the other relevant light was blocked from physical detection (covered by an opaque shield, rendering its status ambiguous). We found experience with the complex negative patterning discrimination was necessary for rats to behave sensitively to the ambiguous test situation. These rats behaved as if they inferred the presence of the hidden light, responding differently than when the light was explicitly absent (uncovered and unlit). Differential expression profiles of the immediate early gene cFos indicated hippocampal involvement in the inference process while localized microinfusions of the muscarinic antagonist, scopolamine, into the dorsal hippocampus caused rats to behave as if only one light was present. That is, blocking cholinergic modulation prevented the rat from inferring the presence of the hidden light. Collectively, these results suggest cholinergic modulation mediates recruitment of hippocampal processes related to past experiences and transfer of these processes to make decisions during ambiguous situations. Our results correspond with correlations observed between human brain function and inference abilities, suggesting our experiments may inform interventions to alleviate or prevent cognitive dysfunction. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.
Interpretation of correlations in clinical research.
Hung, Man; Bounsanga, Jerry; Voss, Maren Wright
2017-11-01
Critically analyzing research is a key skill in evidence-based practice and requires knowledge of research methods, results interpretation, and applications, all of which rely on a foundation based in statistics. Evidence-based practice makes high demands on trained medical professionals to interpret an ever-expanding array of research evidence. As clinical training emphasizes medical care rather than statistics, it is useful to review the basics of statistical methods and what they mean for interpreting clinical studies. We reviewed the basic concepts of correlational associations, violations of normality, unobserved variable bias, sample size, and alpha inflation. The foundations of causal inference were discussed and sound statistical analyses were examined. We discuss four ways in which correlational analysis is misused, including causal inference overreach, over-reliance on significance, alpha inflation, and sample size bias. Recent published studies in the medical field provide evidence of causal assertion overreach drawn from correlational findings. The findings present a primer on the assumptions and nature of correlational methods of analysis and urge clinicians to exercise appropriate caution as they critically analyze the evidence before them and evaluate evidence that supports practice. Critically analyzing new evidence requires statistical knowledge in addition to clinical knowledge. Studies can overstate relationships, expressing causal assertions when only correlational evidence is available. Failure to account for the effect of sample size in the analyses tends to overstate the importance of predictive variables. It is important not to overemphasize the statistical significance without consideration of effect size and whether differences could be considered clinically meaningful.
Thompson, Steven K
2006-12-01
A flexible class of adaptive sampling designs is introduced for sampling in network and spatial settings. In the designs, selections are made sequentially with a mixture distribution based on an active set that changes as the sampling progresses, using network or spatial relationships as well as sample values. The new designs have certain advantages compared with previously existing adaptive and link-tracing designs, including control over sample sizes and of the proportion of effort allocated to adaptive selections. Efficient inference involves averaging over sample paths consistent with the minimal sufficient statistic. A Markov chain resampling method makes the inference computationally feasible. The designs are evaluated in network and spatial settings using two empirical populations: a hidden human population at high risk for HIV/AIDS and an unevenly distributed bird population.
Whitmore, Roy W; Chen, Wenlin
2013-12-04
The ability to infer human exposure to substances from drinking water using monitoring data helps determine and/or refine potential risks associated with drinking water consumption. We describe a survey sampling approach and its application to an atrazine groundwater monitoring study to adequately characterize upper exposure centiles and associated confidence intervals with predetermined precision. Study design and data analysis included sampling frame definition, sample stratification, sample size determination, allocation to strata, analysis weights, and weighted population estimates. Sampling frame encompassed 15 840 groundwater community water systems (CWS) in 21 states throughout the U. S. Median, and 95th percentile atrazine concentrations were 0.0022 and 0.024 ppb, respectively, for all CWS. Statistical estimates agreed with historical monitoring results, suggesting that the study design was adequate and robust. This methodology makes no assumptions regarding the occurrence distribution (e.g., lognormality); thus analyses based on the design-induced distribution provide the most robust basis for making inferences from the sample to target population.
Automated Box-Cox Transformations for Improved Visual Encoding.
Maciejewski, Ross; Pattath, Avin; Ko, Sungahn; Hafen, Ryan; Cleveland, William S; Ebert, David S
2013-01-01
The concept of preconditioning data (utilizing a power transformation as an initial step) for analysis and visualization is well established within the statistical community and is employed as part of statistical modeling and analysis. Such transformations condition the data to various inherent assumptions of statistical inference procedures, as well as making the data more symmetric and easier to visualize and interpret. In this paper, we explore the use of the Box-Cox family of power transformations to semiautomatically adjust visual parameters. We focus on time-series scaling, axis transformations, and color binning for choropleth maps. We illustrate the usage of this transformation through various examples, and discuss the value and some issues in semiautomatically using these transformations for more effective data visualization.
Investigating Mathematics Teachers' Thoughts of Statistical Inference
ERIC Educational Resources Information Center
Yang, Kai-Lin
2012-01-01
Research on statistical cognition and application suggests that statistical inference concepts are commonly misunderstood by students and even misinterpreted by researchers. Although some research has been done on students' misunderstanding or misconceptions of confidence intervals (CIs), few studies explore either students' or mathematics…
Lessons from Inferentialism for Statistics Education
ERIC Educational Resources Information Center
Bakker, Arthur; Derry, Jan
2011-01-01
This theoretical paper relates recent interest in informal statistical inference (ISI) to the semantic theory termed inferentialism, a significant development in contemporary philosophy, which places inference at the heart of human knowing. This theory assists epistemological reflection on challenges in statistics education encountered when…
Statistical Inference and Patterns of Inequality in the Global North
ERIC Educational Resources Information Center
Moran, Timothy Patrick
2006-01-01
Cross-national inequality trends have historically been a crucial field of inquiry across the social sciences, and new methodological techniques of statistical inference have recently improved the ability to analyze these trends over time. This paper applies Monte Carlo, bootstrap inference methods to the income surveys of the Luxembourg Income…
The Dimensionality of Inference Making: Are Local and Global Inferences Distinguishable?
ERIC Educational Resources Information Center
Muijselaar, Marloes M. L.
2018-01-01
We investigated the dimensionality of inference making in samples of 4- to 9-year-olds (Ns = 416-783) to determine if local and global coherence inferences could be distinguished. In addition, we examined the validity of our experimenter-developed inference measure by comparing with three additional measures of listening comprehension. Multitrait,…
Statistical inference and Aristotle's Rhetoric.
Macdonald, Ranald R
2004-11-01
Formal logic operates in a closed system where all the information relevant to any conclusion is present, whereas this is not the case when one reasons about events and states of the world. Pollard and Richardson drew attention to the fact that the reasoning behind statistical tests does not lead to logically justifiable conclusions. In this paper statistical inferences are defended not by logic but by the standards of everyday reasoning. Aristotle invented formal logic, but argued that people mostly get at the truth with the aid of enthymemes--incomplete syllogisms which include arguing from examples, analogies and signs. It is proposed that statistical tests work in the same way--in that they are based on examples, invoke the analogy of a model and use the size of the effect under test as a sign that the chance hypothesis is unlikely. Of existing theories of statistical inference only a weak version of Fisher's takes this into account. Aristotle anticipated Fisher by producing an argument of the form that there were too many cases in which an outcome went in a particular direction for that direction to be plausibly attributed to chance. We can therefore conclude that Aristotle would have approved of statistical inference and there is a good reason for calling this form of statistical inference classical.
Software for Data Analysis with Graphical Models
NASA Technical Reports Server (NTRS)
Buntine, Wray L.; Roy, H. Scott
1994-01-01
Probabilistic graphical models are being used widely in artificial intelligence and statistics, for instance, in diagnosis and expert systems, as a framework for representing and reasoning with probabilities and independencies. They come with corresponding algorithms for performing statistical inference. This offers a unifying framework for prototyping and/or generating data analysis algorithms from graphical specifications. This paper illustrates the framework with an example and then presents some basic techniques for the task: problem decomposition and the calculation of exact Bayes factors. Other tools already developed, such as automatic differentiation, Gibbs sampling, and use of the EM algorithm, make this a broad basis for the generation of data analysis software.
CADDIS Volume 4. Data Analysis: Biological and Environmental Data Requirements
Overview of PECBO Module, using scripts to infer environmental conditions from biological observations, statistically estimating species-environment relationships, methods for inferring environmental conditions, statistical scripts in module.
Statistical methods for the beta-binomial model in teratology.
Yamamoto, E; Yanagimoto, T
1994-01-01
The beta-binomial model is widely used for analyzing teratological data involving littermates. Recent developments in statistical analyses of teratological data are briefly reviewed with emphasis on the model. For statistical inference of the parameters in the beta-binomial distribution, separation of the likelihood introduces an likelihood inference. This leads to reducing biases of estimators and also to improving accuracy of empirical significance levels of tests. Separate inference of the parameters can be conducted in a unified way. PMID:8187716
On Some Assumptions of the Null Hypothesis Statistical Testing
ERIC Educational Resources Information Center
Patriota, Alexandre Galvão
2017-01-01
Bayesian and classical statistical approaches are based on different types of logical principles. In order to avoid mistaken inferences and misguided interpretations, the practitioner must respect the inference rules embedded into each statistical method. Ignoring these principles leads to the paradoxical conclusions that the hypothesis…
Yu, Jihnhee; Yang, Luge; Vexler, Albert; Hutson, Alan D
2016-06-15
The receiver operating characteristic (ROC) curve is a popular technique with applications, for example, investigating an accuracy of a biomarker to delineate between disease and non-disease groups. A common measure of accuracy of a given diagnostic marker is the area under the ROC curve (AUC). In contrast with the AUC, the partial area under the ROC curve (pAUC) looks into the area with certain specificities (i.e., true negative rate) only, and it can be often clinically more relevant than examining the entire ROC curve. The pAUC is commonly estimated based on a U-statistic with the plug-in sample quantile, making the estimator a non-traditional U-statistic. In this article, we propose an accurate and easy method to obtain the variance of the nonparametric pAUC estimator. The proposed method is easy to implement for both one biomarker test and the comparison of two correlated biomarkers because it simply adapts the existing variance estimator of U-statistics. In this article, we show accuracy and other advantages of the proposed variance estimation method by broadly comparing it with previously existing methods. Further, we develop an empirical likelihood inference method based on the proposed variance estimator through a simple implementation. In an application, we demonstrate that, depending on the inferences by either the AUC or pAUC, we can make a different decision on a prognostic ability of a same set of biomarkers. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Suner, Aslı; Karakülah, Gökhan; Dicle, Oğuz
2014-01-01
Statistical hypothesis testing is an essential component of biological and medical studies for making inferences and estimations from the collected data in the study; however, the misuse of statistical tests is widely common. In order to prevent possible errors in convenient statistical test selection, it is currently possible to consult available test selection algorithms developed for various purposes. However, the lack of an algorithm presenting the most common statistical tests used in biomedical research in a single flowchart causes several problems such as shifting users among the algorithms, poor decision support in test selection and lack of satisfaction of potential users. Herein, we demonstrated a unified flowchart; covers mostly used statistical tests in biomedical domain, to provide decision aid to non-statistician users while choosing the appropriate statistical test for testing their hypothesis. We also discuss some of the findings while we are integrating the flowcharts into each other to develop a single but more comprehensive decision algorithm.
ERIC Educational Resources Information Center
Thompson, Bruce
Web-based statistical instruction, like all statistical instruction, ought to focus on teaching the essence of the research endeavor: the exercise of reflective judgment. Using the framework of the recent report of the American Psychological Association (APA) Task Force on Statistical Inference (Wilkinson and the APA Task Force on Statistical…
A refined method for multivariate meta-analysis and meta-regression
Jackson, Daniel; Riley, Richard D
2014-01-01
Making inferences about the average treatment effect using the random effects model for meta-analysis is problematic in the common situation where there is a small number of studies. This is because estimates of the between-study variance are not precise enough to accurately apply the conventional methods for testing and deriving a confidence interval for the average effect. We have found that a refined method for univariate meta-analysis, which applies a scaling factor to the estimated effects’ standard error, provides more accurate inference. We explain how to extend this method to the multivariate scenario and show that our proposal for refined multivariate meta-analysis and meta-regression can provide more accurate inferences than the more conventional approach. We explain how our proposed approach can be implemented using standard output from multivariate meta-analysis software packages and apply our methodology to two real examples. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd. PMID:23996351
Data Analysis Techniques for Physical Scientists
NASA Astrophysics Data System (ADS)
Pruneau, Claude A.
2017-10-01
Preface; How to read this book; 1. The scientific method; Part I. Foundation in Probability and Statistics: 2. Probability; 3. Probability models; 4. Classical inference I: estimators; 5. Classical inference II: optimization; 6. Classical inference III: confidence intervals and statistical tests; 7. Bayesian inference; Part II. Measurement Techniques: 8. Basic measurements; 9. Event reconstruction; 10. Correlation functions; 11. The multiple facets of correlation functions; 12. Data correction methods; Part III. Simulation Techniques: 13. Monte Carlo methods; 14. Collision and detector modeling; List of references; Index.
Overview of PECBO Module, using scripts to infer environmental conditions from biological observations, statistically estimating species-environment relationships, methods for inferring environmental conditions, statistical scripts in module.
Inference on the Strength of Balancing Selection for Epistatically Interacting Loci
Buzbas, Erkan Ozge; Joyce, Paul; Rosenberg, Noah A.
2011-01-01
Existing inference methods for estimating the strength of balancing selection in multi-locus genotypes rely on the assumption that there are no epistatic interactions between loci. Complex systems in which balancing selection is prevalent, such as sets of human immune system genes, are known to contain components that interact epistatically. Therefore, current methods may not produce reliable inference on the strength of selection at these loci. In this paper, we address this problem by presenting statistical methods that can account for epistatic interactions in making inference about balancing selection. A theoretical result due to Fearnhead (2006) is used to build a multi-locus Wright-Fisher model of balancing selection, allowing for epistatic interactions among loci. Antagonistic and synergistic types of interactions are examined. The joint posterior distribution of the selection and mutation parameters is sampled by Markov chain Monte Carlo methods, and the plausibility of models is assessed via Bayes factors. As a component of the inference process, an algorithm to generate multi-locus allele frequencies under balancing selection models with epistasis is also presented. Recent evidence on interactions among a set of human immune system genes is introduced as a motivating biological system for the epistatic model, and data on these genes are used to demonstrate the methods. PMID:21277883
40 CFR Appendix C to Part 60 - Determination of Emission Rate Change
Code of Federal Regulations, 2011 CFR
2011-07-01
... emission rate to the atmosphere. The method used is the Student's t test, commonly used to make inferences.... EC01JN92.294 3.5 Calculate the test statistic, t, using Equation 4. EC01JN92.295 4.Results 4.1If E b>,E a... occurred. Table 1 Degrees of freedom (n a=n b−2) t′ (95 percent confidence level) 2 2.920 3 2.353 4 2.132 5...
Dong, Qi; Elliott, Michael R; Raghunathan, Trivellore E
2014-06-01
Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs.
Dong, Qi; Elliott, Michael R.; Raghunathan, Trivellore E.
2017-01-01
Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs. PMID:29200608
[Bayesian approach for the cost-effectiveness evaluation of healthcare technologies].
Berchialla, Paola; Gregori, Dario; Brunello, Franco; Veltri, Andrea; Petrinco, Michele; Pagano, Eva
2009-01-01
The development of Bayesian statistical methods for the assessment of the cost-effectiveness of health care technologies is reviewed. Although many studies adopt a frequentist approach, several authors have advocated the use of Bayesian methods in health economics. Emphasis has been placed on the advantages of the Bayesian approach, which include: (i) the ability to make more intuitive and meaningful inferences; (ii) the ability to tackle complex problems, such as allowing for the inclusion of patients who generate no cost, thanks to the availability of powerful computational algorithms; (iii) the importance of a full use of quantitative and structural prior information to produce realistic inferences. Much literature comparing the cost-effectiveness of two treatments is based on the incremental cost-effectiveness ratio. However, new methods are arising with the purpose of decision making. These methods are based on a net benefits approach. In the present context, the cost-effectiveness acceptability curves have been pointed out to be intrinsically Bayesian in their formulation. They plot the probability of a positive net benefit against the threshold cost of a unit increase in efficacy.A case study is presented in order to illustrate the Bayesian statistics in the cost-effectiveness analysis. Emphasis is placed on the cost-effectiveness acceptability curves. Advantages and disadvantages of the method described in this paper have been compared to frequentist methods and discussed.
Bodner, Kimberly E; Engelhardt, Christopher R; Minshew, Nancy J; Williams, Diane L
2015-09-01
Studies investigating inferential reasoning in autism spectrum disorder (ASD) have focused on the ability to make socially-related inferences or inferences more generally. Important variables for intervention planning such as whether inferences depend on physical experiences or the nature of social information have received less consideration. A measure of bridging inferences of physical causation, mental states, and emotional states was administered to older children, adolescents, and adults with and without ASD. The ASD group had more difficulty making inferences, particularly related to emotional understanding. Results suggest that individuals with ASD may not have the stored experiential knowledge that specific inferences depend upon or have difficulties accessing relevant experiences due to linguistic limitations. Further research is needed to tease these elements apart.
The Effects of an Inference-Making Strategy Taught with and without Goal Setting
ERIC Educational Resources Information Center
Reed, Deborah K.; Lynn, Devon
2016-01-01
This quasi-experimental study examined the effects of a strategy for making text-dependent inferences--with and without embedded self-regulation skills--on the reading comprehension of 24 middle-grade students with disabilities. Classes were randomly assigned to receive the inference intervention only (IO), inference + individual goal setting…
GAMBIT: the global and modular beyond-the-standard-model inference tool
NASA Astrophysics Data System (ADS)
Athron, Peter; Balazs, Csaba; Bringmann, Torsten; Buckley, Andy; Chrząszcz, Marcin; Conrad, Jan; Cornell, Jonathan M.; Dal, Lars A.; Dickinson, Hugh; Edsjö, Joakim; Farmer, Ben; Gonzalo, Tomás E.; Jackson, Paul; Krislock, Abram; Kvellestad, Anders; Lundberg, Johan; McKay, James; Mahmoudi, Farvah; Martinez, Gregory D.; Putze, Antje; Raklev, Are; Ripken, Joachim; Rogan, Christopher; Saavedra, Aldo; Savage, Christopher; Scott, Pat; Seo, Seon-Hee; Serra, Nicola; Weniger, Christoph; White, Martin; Wild, Sebastian
2017-11-01
We describe the open-source global fitting package GAMBIT: the Global And Modular Beyond-the-Standard-Model Inference Tool. GAMBIT combines extensive calculations of observables and likelihoods in particle and astroparticle physics with a hierarchical model database, advanced tools for automatically building analyses of essentially any model, a flexible and powerful system for interfacing to external codes, a suite of different statistical methods and parameter scanning algorithms, and a host of other utilities designed to make scans faster, safer and more easily-extendible than in the past. Here we give a detailed description of the framework, its design and motivation, and the current models and other specific components presently implemented in GAMBIT. Accompanying papers deal with individual modules and present first GAMBIT results. GAMBIT can be downloaded from gambit.hepforge.org.
Statistics for nuclear engineers and scientists. Part 1. Basic statistical inference
DOE Office of Scientific and Technical Information (OSTI.GOV)
Beggs, W.J.
1981-02-01
This report is intended for the use of engineers and scientists working in the nuclear industry, especially at the Bettis Atomic Power Laboratory. It serves as the basis for several Bettis in-house statistics courses. The objectives of the report are to introduce the reader to the language and concepts of statistics and to provide a basic set of techniques to apply to problems of the collection and analysis of data. Part 1 covers subjects of basic inference. The subjects include: descriptive statistics; probability; simple inference for normally distributed populations, and for non-normal populations as well; comparison of two populations; themore » analysis of variance; quality control procedures; and linear regression analysis.« less
Introducing Statistical Inference to Biology Students through Bootstrapping and Randomization
ERIC Educational Resources Information Center
Lock, Robin H.; Lock, Patti Frazer
2008-01-01
Bootstrap methods and randomization tests are increasingly being used as alternatives to standard statistical procedures in biology. They also serve as an effective introduction to the key ideas of statistical inference in introductory courses for biology students. We discuss the use of such simulation based procedures in an integrated curriculum…
Developing Young Children's Emergent Inferential Practices in Statistics
ERIC Educational Resources Information Center
Makar, Katie
2016-01-01
Informal statistical inference has now been researched at all levels of schooling and initial tertiary study. Work in informal statistical inference is least understood in the early years, where children have had little if any exposure to data handling. A qualitative study in Australia was carried out through a series of teaching experiments with…
UniEnt: uniform entropy model for the dynamics of a neuronal population
NASA Astrophysics Data System (ADS)
Hernandez Lahme, Damian; Nemenman, Ilya
Sensory information and motor responses are encoded in the brain in a collective spiking activity of a large number of neurons. Understanding the neural code requires inferring statistical properties of such collective dynamics from multicellular neurophysiological recordings. Questions of whether synchronous activity or silence of multiple neurons carries information about the stimuli or the motor responses are especially interesting. Unfortunately, detection of such high order statistical interactions from data is especially challenging due to the exponentially large dimensionality of the state space of neural collectives. Here we present UniEnt, a method for the inference of strengths of multivariate neural interaction patterns. The method is based on the Bayesian prior that makes no assumptions (uniform a priori expectations) about the value of the entropy of the observed multivariate neural activity, in contrast to popular approaches that maximize this entropy. We then study previously published multi-electrode recordings data from salamander retina, exposing the relevance of higher order neural interaction patterns for information encoding in this system. This work was supported in part by Grants JSMF/220020321 and NSF/IOS/1208126.
A Predictive Approach to Network Reverse-Engineering
NASA Astrophysics Data System (ADS)
Wiggins, Chris
2005-03-01
A central challenge of systems biology is the ``reverse engineering" of transcriptional networks: inferring which genes exert regulatory control over which other genes. Attempting such inference at the genomic scale has only recently become feasible, via data-intensive biological innovations such as DNA microrrays (``DNA chips") and the sequencing of whole genomes. In this talk we present a predictive approach to network reverse-engineering, in which we integrate DNA chip data and sequence data to build a model of the transcriptional network of the yeast S. cerevisiae capable of predicting the response of genes in unseen experiments. The technique can also be used to extract ``motifs,'' sequence elements which act as binding sites for regulatory proteins. We validate by a number of approaches and present comparison of theoretical prediction vs. experimental data, along with biological interpretations of the resulting model. En route, we will illustrate some basic notions in statistical learning theory (fitting vs. over-fitting; cross- validation; assessing statistical significance), highlighting ways in which physicists can make a unique contribution in data- driven approaches to reverse engineering.
Experimental statistics for biological sciences.
Bang, Heejung; Davidian, Marie
2010-01-01
In this chapter, we cover basic and fundamental principles and methods in statistics - from "What are Data and Statistics?" to "ANOVA and linear regression," which are the basis of any statistical thinking and undertaking. Readers can easily find the selected topics in most introductory statistics textbooks, but we have tried to assemble and structure them in a succinct and reader-friendly manner in a stand-alone chapter. This text has long been used in real classroom settings for both undergraduate and graduate students who do or do not major in statistical sciences. We hope that from this chapter, readers would understand the key statistical concepts and terminologies, how to design a study (experimental or observational), how to analyze the data (e.g., describe the data and/or estimate the parameter(s) and make inference), and how to interpret the results. This text would be most useful if it is used as a supplemental material, while the readers take their own statistical courses or it would serve as a great reference text associated with a manual for any statistical software as a self-teaching guide.
A Quantum Probability Model of Causal Reasoning
Trueblood, Jennifer S.; Busemeyer, Jerome R.
2012-01-01
People can often outperform statistical methods and machine learning algorithms in situations that involve making inferences about the relationship between causes and effects. While people are remarkably good at causal reasoning in many situations, there are several instances where they deviate from expected responses. This paper examines three situations where judgments related to causal inference problems produce unexpected results and describes a quantum inference model based on the axiomatic principles of quantum probability theory that can explain these effects. Two of the three phenomena arise from the comparison of predictive judgments (i.e., the conditional probability of an effect given a cause) with diagnostic judgments (i.e., the conditional probability of a cause given an effect). The third phenomenon is a new finding examining order effects in predictive causal judgments. The quantum inference model uses the notion of incompatibility among different causes to account for all three phenomena. Psychologically, the model assumes that individuals adopt different points of view when thinking about different causes. The model provides good fits to the data and offers a coherent account for all three causal reasoning effects thus proving to be a viable new candidate for modeling human judgment. PMID:22593747
Bal, Mert; Amasyali, M Fatih; Sever, Hayri; Kose, Guven; Demirhan, Ayse
2014-01-01
The importance of the decision support systems is increasingly supporting the decision making process in cases of uncertainty and the lack of information and they are widely used in various fields like engineering, finance, medicine, and so forth, Medical decision support systems help the healthcare personnel to select optimal method during the treatment of the patients. Decision support systems are intelligent software systems that support decision makers on their decisions. The design of decision support systems consists of four main subjects called inference mechanism, knowledge-base, explanation module, and active memory. Inference mechanism constitutes the basis of decision support systems. There are various methods that can be used in these mechanisms approaches. Some of these methods are decision trees, artificial neural networks, statistical methods, rule-based methods, and so forth. In decision support systems, those methods can be used separately or a hybrid system, and also combination of those methods. In this study, synthetic data with 10, 100, 1000, and 2000 records have been produced to reflect the probabilities on the ALARM network. The accuracy of 11 machine learning methods for the inference mechanism of medical decision support system is compared on various data sets.
Liu, Fang; Eugenio, Evercita C
2018-04-01
Beta regression is an increasingly popular statistical technique in medical research for modeling of outcomes that assume values in (0, 1), such as proportions and patient reported outcomes. When outcomes take values in the intervals [0,1), (0,1], or [0,1], zero-or-one-inflated beta (zoib) regression can be used. We provide a thorough review on beta regression and zoib regression in the modeling, inferential, and computational aspects via the likelihood-based and Bayesian approaches. We demonstrate the statistical and practical importance of correctly modeling the inflation at zero/one rather than ad hoc replacing them with values close to zero/one via simulation studies; the latter approach can lead to biased estimates and invalid inferences. We show via simulation studies that the likelihood-based approach is computationally faster in general than MCMC algorithms used in the Bayesian inferences, but runs the risk of non-convergence, large biases, and sensitivity to starting values in the optimization algorithm especially with clustered/correlated data, data with sparse inflation at zero and one, and data that warrant regularization of the likelihood. The disadvantages of the regular likelihood-based approach make the Bayesian approach an attractive alternative in these cases. Software packages and tools for fitting beta and zoib regressions in both the likelihood-based and Bayesian frameworks are also reviewed.
Bayesian Reconstruction of Disease Outbreaks by Combining Epidemiologic and Genomic Data
Jombart, Thibaut; Cori, Anne; Didelot, Xavier; Cauchemez, Simon; Fraser, Christophe; Ferguson, Neil
2014-01-01
Recent years have seen progress in the development of statistically rigorous frameworks to infer outbreak transmission trees (“who infected whom”) from epidemiological and genetic data. Making use of pathogen genome sequences in such analyses remains a challenge, however, with a variety of heuristic approaches having been explored to date. We introduce a statistical method exploiting both pathogen sequences and collection dates to unravel the dynamics of densely sampled outbreaks. Our approach identifies likely transmission events and infers dates of infections, unobserved cases and separate introductions of the disease. It also proves useful for inferring numbers of secondary infections and identifying heterogeneous infectivity and super-spreaders. After testing our approach using simulations, we illustrate the method with the analysis of the beginning of the 2003 Singaporean outbreak of Severe Acute Respiratory Syndrome (SARS), providing new insights into the early stage of this epidemic. Our approach is the first tool for disease outbreak reconstruction from genetic data widely available as free software, the R package outbreaker. It is applicable to various densely sampled epidemics, and improves previous approaches by detecting unobserved and imported cases, as well as allowing multiple introductions of the pathogen. Because of its generality, we believe this method will become a tool of choice for the analysis of densely sampled disease outbreaks, and will form a rigorous framework for subsequent methodological developments. PMID:24465202
Models for inference in dynamic metacommunity systems
Dorazio, Robert M.; Kery, Marc; Royle, J. Andrew; Plattner, Matthias
2010-01-01
A variety of processes are thought to be involved in the formation and dynamics of species assemblages. For example, various metacommunity theories are based on differences in the relative contributions of dispersal of species among local communities and interactions of species within local communities. Interestingly, metacommunity theories continue to be advanced without much empirical validation. Part of the problem is that statistical models used to analyze typical survey data either fail to specify ecological processes with sufficient complexity or they fail to account for errors in detection of species during sampling. In this paper, we describe a statistical modeling framework for the analysis of metacommunity dynamics that is based on the idea of adopting a unified approach, multispecies occupancy modeling, for computing inferences about individual species, local communities of species, or the entire metacommunity of species. This approach accounts for errors in detection of species during sampling and also allows different metacommunity paradigms to be specified in terms of species- and location-specific probabilities of occurrence, extinction, and colonization: all of which are estimable. In addition, this approach can be used to address inference problems that arise in conservation ecology, such as predicting temporal and spatial changes in biodiversity for use in making conservation decisions. To illustrate, we estimate changes in species composition associated with the species-specific phenologies of flight patterns of butterflies in Switzerland for the purpose of estimating regional differences in biodiversity.
Mutual Information in Frequency and Its Application to Measure Cross-Frequency Coupling in Epilepsy
NASA Astrophysics Data System (ADS)
Malladi, Rakesh; Johnson, Don H.; Kalamangalam, Giridhar P.; Tandon, Nitin; Aazhang, Behnaam
2018-06-01
We define a metric, mutual information in frequency (MI-in-frequency), to detect and quantify the statistical dependence between different frequency components in the data, referred to as cross-frequency coupling and apply it to electrophysiological recordings from the brain to infer cross-frequency coupling. The current metrics used to quantify the cross-frequency coupling in neuroscience cannot detect if two frequency components in non-Gaussian brain recordings are statistically independent or not. Our MI-in-frequency metric, based on Shannon's mutual information between the Cramer's representation of stochastic processes, overcomes this shortcoming and can detect statistical dependence in frequency between non-Gaussian signals. We then describe two data-driven estimators of MI-in-frequency: one based on kernel density estimation and the other based on the nearest neighbor algorithm and validate their performance on simulated data. We then use MI-in-frequency to estimate mutual information between two data streams that are dependent across time, without making any parametric model assumptions. Finally, we use the MI-in- frequency metric to investigate the cross-frequency coupling in seizure onset zone from electrocorticographic recordings during seizures. The inferred cross-frequency coupling characteristics are essential to optimize the spatial and spectral parameters of electrical stimulation based treatments of epilepsy.
Using Alien Coins to Test Whether Simple Inference Is Bayesian
ERIC Educational Resources Information Center
Cassey, Peter; Hawkins, Guy E.; Donkin, Chris; Brown, Scott D.
2016-01-01
Reasoning and inference are well-studied aspects of basic cognition that have been explained as statistically optimal Bayesian inference. Using a simplified experimental design, we conducted quantitative comparisons between Bayesian inference and human inference at the level of individuals. In 3 experiments, with more than 13,000 participants, we…
Ryu, Jihye; Torres, Elizabeth B.
2018-01-01
The field of enacted/embodied cognition has emerged as a contemporary attempt to connect the mind and body in the study of cognition. However, there has been a paucity of methods that enable a multi-layered approach tapping into different levels of functionality within the nervous systems (e.g., continuously capturing in tandem multi-modal biophysical signals in naturalistic settings). The present study introduces a new theoretical and statistical framework to characterize the influences of cognitive demands on biophysical rhythmic signals harnessed from deliberate, spontaneous and autonomic activities. In this study, nine participants performed a basic pointing task to communicate a decision while they were exposed to different levels of cognitive load. Within these decision-making contexts, we examined the moment-by-moment fluctuations in the peak amplitude and timing of the biophysical time series data (e.g., continuous waveforms extracted from hand kinematics and heart signals). These spike-trains data offered high statistical power for personalized empirical statistical estimation and were well-characterized by a Gamma process. Our approach enabled the identification of different empirically estimated families of probability distributions to facilitate inference regarding the continuous physiological phenomena underlying cognitively driven decision-making. We found that the same pointing task revealed shifts in the probability distribution functions (PDFs) of the hand kinematic signals under study and were accompanied by shifts in the signatures of the heart inter-beat-interval timings. Within the time scale of an experimental session, marked changes in skewness and dispersion of the distributions were tracked on the Gamma parameter plane with 95% confidence. The results suggest that traditional theoretical assumptions of stationarity and normality in biophysical data from the nervous systems are incongruent with the true statistical nature of empirical data. This work offers a unifying platform for personalized statistical inference that goes far beyond those used in conventional studies, often assuming a “one size fits all model” on data drawn from discrete events such as mouse clicks, and observations that leave out continuously co-occurring spontaneous activity taking place largely beneath awareness. PMID:29681805
ERIC Educational Resources Information Center
Bodner, Kimberly E.; Engelhardt, Christopher R.; Minshew, Nancy J.; Williams, Diane L.
2015-01-01
Studies investigating inferential reasoning in autism spectrum disorder (ASD) have focused on the ability to make socially-related inferences or inferences more generally. Important variables for intervention planning such as whether inferences depend on physical experiences or the nature of social information have received less consideration. A…
Freed, Jenny; Cain, Kate
2017-01-01
Comprehension is critical for classroom learning and educational success. Inferences are integral to good comprehension: successful comprehension requires the listener to generate local coherence inferences, which involve integrating information between clauses, and global coherence inferences, which involve integrating textual information with background knowledge to infer motivations, themes, etc. A central priority for the diagnosis of comprehension difficulties and our understanding of why these difficulties arise is the development of valid assessment instruments. We explored typically developing children's ability to make local and global coherence inferences using a novel assessment of listening comprehension. The aims were to determine whether children were more likely to make the target inferences when these were asked during story presentation versus after presentation of the story, and whether there were any age differences between conditions. Children in Years 3 (n = 29) and 5 (n = 31) listened to short stories presented either in a segmented format, in which questions to assess local and global coherence inferences were asked at specific points during story presentation, or in a whole format, when all the questions were asked after the story had been presented. There was developmental progression between age groups for both types of inference question. Children also scored higher on the global coherence inference questions than the local coherence inference questions. There was a benefit of the segmented format for younger children, particularly for the local inference questions. The results suggest that children are more likely to make target inferences if prompted during presentation of the story, and that this format is particularly facilitative for younger children and for local coherence inferences. This has implications for the design of comprehension assessments as well as for supporting children with comprehension difficulties in the classroom. © 2016 Royal College of Speech and Language Therapists.
Hayat, Matthew J.; Powell, Amanda; Johnson, Tessa; Cadwell, Betsy L.
2017-01-01
Statistical literacy and knowledge is needed to read and understand the public health literature. The purpose of this study was to quantify basic and advanced statistical methods used in public health research. We randomly sampled 216 published articles from seven top tier general public health journals. Studies were reviewed by two readers and a standardized data collection form completed for each article. Data were analyzed with descriptive statistics and frequency distributions. Results were summarized for statistical methods used in the literature, including descriptive and inferential statistics, modeling, advanced statistical techniques, and statistical software used. Approximately 81.9% of articles reported an observational study design and 93.1% of articles were substantively focused. Descriptive statistics in table or graphical form were reported in more than 95% of the articles, and statistical inference reported in more than 76% of the studies reviewed. These results reveal the types of statistical methods currently used in the public health literature. Although this study did not obtain information on what should be taught, information on statistical methods being used is useful for curriculum development in graduate health sciences education, as well as making informed decisions about continuing education for public health professionals. PMID:28591190
Hayat, Matthew J; Powell, Amanda; Johnson, Tessa; Cadwell, Betsy L
2017-01-01
Statistical literacy and knowledge is needed to read and understand the public health literature. The purpose of this study was to quantify basic and advanced statistical methods used in public health research. We randomly sampled 216 published articles from seven top tier general public health journals. Studies were reviewed by two readers and a standardized data collection form completed for each article. Data were analyzed with descriptive statistics and frequency distributions. Results were summarized for statistical methods used in the literature, including descriptive and inferential statistics, modeling, advanced statistical techniques, and statistical software used. Approximately 81.9% of articles reported an observational study design and 93.1% of articles were substantively focused. Descriptive statistics in table or graphical form were reported in more than 95% of the articles, and statistical inference reported in more than 76% of the studies reviewed. These results reveal the types of statistical methods currently used in the public health literature. Although this study did not obtain information on what should be taught, information on statistical methods being used is useful for curriculum development in graduate health sciences education, as well as making informed decisions about continuing education for public health professionals.
A normative inference approach for optimal sample sizes in decisions from experience
Ostwald, Dirk; Starke, Ludger; Hertwig, Ralph
2015-01-01
“Decisions from experience” (DFE) refers to a body of work that emerged in research on behavioral decision making over the last decade. One of the major experimental paradigms employed to study experience-based choice is the “sampling paradigm,” which serves as a model of decision making under limited knowledge about the statistical structure of the world. In this paradigm respondents are presented with two payoff distributions, which, in contrast to standard approaches in behavioral economics, are specified not in terms of explicit outcome-probability information, but by the opportunity to sample outcomes from each distribution without economic consequences. Participants are encouraged to explore the distributions until they feel confident enough to decide from which they would prefer to draw from in a final trial involving real monetary payoffs. One commonly employed measure to characterize the behavior of participants in the sampling paradigm is the sample size, that is, the number of outcome draws which participants choose to obtain from each distribution prior to terminating sampling. A natural question that arises in this context concerns the “optimal” sample size, which could be used as a normative benchmark to evaluate human sampling behavior in DFE. In this theoretical study, we relate the DFE sampling paradigm to the classical statistical decision theoretic literature and, under a probabilistic inference assumption, evaluate optimal sample sizes for DFE. In our treatment we go beyond analytically established results by showing how the classical statistical decision theoretic framework can be used to derive optimal sample sizes under arbitrary, but numerically evaluable, constraints. Finally, we critically evaluate the value of deriving optimal sample sizes under this framework as testable predictions for the experimental study of sampling behavior in DFE. PMID:26441720
Using genetic data to strengthen causal inference in observational research.
Pingault, Jean-Baptiste; O'Reilly, Paul F; Schoeler, Tabea; Ploubidis, George B; Rijsdijk, Frühling; Dudbridge, Frank
2018-06-05
Causal inference is essential across the biomedical, behavioural and social sciences.By progressing from confounded statistical associations to evidence of causal relationships, causal inference can reveal complex pathways underlying traits and diseases and help to prioritize targets for intervention. Recent progress in genetic epidemiology - including statistical innovation, massive genotyped data sets and novel computational tools for deep data mining - has fostered the intense development of methods exploiting genetic data and relatedness to strengthen causal inference in observational research. In this Review, we describe how such genetically informed methods differ in their rationale, applicability and inherent limitations and outline how they should be integrated in the future to offer a rich causal inference toolbox.
Brannigan, V M; Bier, V M; Berg, C
1992-09-01
Toxic torts are product liability cases dealing with alleged injuries due to chemical or biological hazards such as radiation, thalidomide, or Agent Orange. Toxic tort cases typically rely more heavily than other product liability cases on indirect or statistical proof of injury. There have been numerous theoretical analyses of statistical proof of injury in toxic tort cases. However, there have been only a handful of actual legal decisions regarding the use of such statistical evidence, and most of those decisions have been inconclusive. Recently, a major case from the Fifth Circuit, involving allegations that Benedectin (a morning sickness drug) caused birth defects, was decided entirely on the basis of statistical inference. This paper examines both the conceptual basis of that decision, and also the relationships among statistical inference, scientific evidence, and the rules of product liability in general.
Deep Learning for Population Genetic Inference.
Sheehan, Sara; Song, Yun S
2016-03-01
Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme.
Deep Learning for Population Genetic Inference
Sheehan, Sara; Song, Yun S.
2016-01-01
Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme. PMID:27018908
Testing AGN unification via inference from large catalogs
NASA Astrophysics Data System (ADS)
Nikutta, Robert; Ivezic, Zeljko; Elitzur, Moshe; Nenkova, Maia
2018-01-01
Source orientation and clumpiness of the central dust are the main factors in AGN classification. Type-1 QSOs are easy to observe and large samples are available (e.g. in SDSS), but obscured type-2 AGN are dimmer and redder as our line of sight is more obscured, making it difficult to obtain a complete sample. WISE has found up to a million QSOs. With only 4 bands and a relatively small aperture the analysis of individual sources is challenging, but the large sample allows inference of bulk properties at a very significant level.CLUMPY (www.clumpy.org) is arguably the most popular database of AGN torus SEDs. We model the ensemble properties of the entire WISE AGN content using regularized linear regression, with orientation-dependent CLUMPY color-color-magnitude (CCM) tracks as basis functions. We can reproduce the observed number counts per CCM bin with percent-level accuracy, and simultaneously infer the probability distributions of all torus parameters, redshifts, additional SED components, and identify type-1/2 AGN populations through their IR properties alone. We increase the statistical power of our AGN unification tests even further, by adding other datasets as axes in the regression problem. To this end, we make use of the NOAO Data Lab (datalab.noao.edu), which hosts several high-level large datasets and provides very powerful tools for handling large data, e.g. cross-matched catalogs, fast remote queries, etc.
How social cognition can inform social decision making.
Lee, Victoria K; Harris, Lasana T
2013-12-25
Social decision-making is often complex, requiring the decision-maker to make inferences of others' mental states in addition to engaging traditional decision-making processes like valuation and reward processing. A growing body of research in neuroeconomics has examined decision-making involving social and non-social stimuli to explore activity in brain regions such as the striatum and prefrontal cortex, largely ignoring the power of the social context. Perhaps more complex processes may influence decision-making in social vs. non-social contexts. Years of social psychology and social neuroscience research have documented a multitude of processes (e.g., mental state inferences, impression formation, spontaneous trait inferences) that occur upon viewing another person. These processes rely on a network of brain regions including medial prefrontal cortex (MPFC), superior temporal sulcus (STS), temporal parietal junction, and precuneus among others. Undoubtedly, these social cognition processes affect social decision-making since mental state inferences occur spontaneously and automatically. Few studies have looked at how these social inference processes affect decision-making in a social context despite the capability of these inferences to serve as predictions that can guide future decision-making. Here we review and integrate the person perception and decision-making literatures to understand how social cognition can inform the study of social decision-making in a way that is consistent with both literatures. We identify gaps in both literatures-while behavioral economics largely ignores social processes that spontaneously occur upon viewing another person, social psychology has largely failed to talk about the implications of social cognition processes in an economic decision-making context-and examine the benefits of integrating social psychological theory with behavioral economic theory.
How social cognition can inform social decision making
Lee, Victoria K.; Harris, Lasana T.
2013-01-01
Social decision-making is often complex, requiring the decision-maker to make inferences of others' mental states in addition to engaging traditional decision-making processes like valuation and reward processing. A growing body of research in neuroeconomics has examined decision-making involving social and non-social stimuli to explore activity in brain regions such as the striatum and prefrontal cortex, largely ignoring the power of the social context. Perhaps more complex processes may influence decision-making in social vs. non-social contexts. Years of social psychology and social neuroscience research have documented a multitude of processes (e.g., mental state inferences, impression formation, spontaneous trait inferences) that occur upon viewing another person. These processes rely on a network of brain regions including medial prefrontal cortex (MPFC), superior temporal sulcus (STS), temporal parietal junction, and precuneus among others. Undoubtedly, these social cognition processes affect social decision-making since mental state inferences occur spontaneously and automatically. Few studies have looked at how these social inference processes affect decision-making in a social context despite the capability of these inferences to serve as predictions that can guide future decision-making. Here we review and integrate the person perception and decision-making literatures to understand how social cognition can inform the study of social decision-making in a way that is consistent with both literatures. We identify gaps in both literatures—while behavioral economics largely ignores social processes that spontaneously occur upon viewing another person, social psychology has largely failed to talk about the implications of social cognition processes in an economic decision-making context—and examine the benefits of integrating social psychological theory with behavioral economic theory. PMID:24399928
The Problem of Auto-Correlation in Parasitology
Pollitt, Laura C.; Reece, Sarah E.; Mideo, Nicole; Nussey, Daniel H.; Colegrave, Nick
2012-01-01
Explaining the contribution of host and pathogen factors in driving infection dynamics is a major ambition in parasitology. There is increasing recognition that analyses based on single summary measures of an infection (e.g., peak parasitaemia) do not adequately capture infection dynamics and so, the appropriate use of statistical techniques to analyse dynamics is necessary to understand infections and, ultimately, control parasites. However, the complexities of within-host environments mean that tracking and analysing pathogen dynamics within infections and among hosts poses considerable statistical challenges. Simple statistical models make assumptions that will rarely be satisfied in data collected on host and parasite parameters. In particular, model residuals (unexplained variance in the data) should not be correlated in time or space. Here we demonstrate how failure to account for such correlations can result in incorrect biological inference from statistical analysis. We then show how mixed effects models can be used as a powerful tool to analyse such repeated measures data in the hope that this will encourage better statistical practices in parasitology. PMID:22511865
Difference to Inference: teaching logical and statistical reasoning through on-line interactivity.
Malloy, T E
2001-05-01
Difference to Inference is an on-line JAVA program that simulates theory testing and falsification through research design and data collection in a game format. The program, based on cognitive and epistemological principles, is designed to support learning of the thinking skills underlying deductive and inductive logic and statistical reasoning. Difference to Inference has database connectivity so that game scores can be counted as part of course grades.
Design of experiments and data analysis challenges in calibration for forensics applications
Anderson-Cook, Christine M.; Burr, Thomas L.; Hamada, Michael S.; ...
2015-07-15
Forensic science aims to infer characteristics of source terms using measured observables. Our focus is on statistical design of experiments and data analysis challenges arising in nuclear forensics. More specifically, we focus on inferring aspects of experimental conditions (of a process to produce product Pu oxide powder), such as temperature, nitric acid concentration, and Pu concentration, using measured features of the product Pu oxide powder. The measured features, Y, include trace chemical concentrations and particle morphology such as particle size and shape of the produced Pu oxide power particles. Making inferences about the nature of inputs X that were usedmore » to create nuclear materials having particular characteristics, Y, is an inverse problem. Therefore, statistical analysis can be used to identify the best set (or sets) of Xs for a new set of observed responses Y. One can fit a model (or models) such as Υ = f(Χ) + error, for each of the responses, based on a calibration experiment and then “invert” to solve for the best set of Xs for a new set of Ys. This perspectives paper uses archived experimental data to consider aspects of data collection and experiment design for the calibration data to maximize the quality of the predicted Ys in the forward models; that is, we assume that well-estimated forward models are effective in the inverse problem. In addition, we consider how to identify a best solution for the inferred X, and evaluate the quality of the result and its robustness to a variety of initial assumptions, and different correlation structures between the responses. In addition, we also briefly review recent advances in metrology issues related to characterizing particle morphology measurements used in the response vector, Y.« less
Design of experiments and data analysis challenges in calibration for forensics applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anderson-Cook, Christine M.; Burr, Thomas L.; Hamada, Michael S.
Forensic science aims to infer characteristics of source terms using measured observables. Our focus is on statistical design of experiments and data analysis challenges arising in nuclear forensics. More specifically, we focus on inferring aspects of experimental conditions (of a process to produce product Pu oxide powder), such as temperature, nitric acid concentration, and Pu concentration, using measured features of the product Pu oxide powder. The measured features, Y, include trace chemical concentrations and particle morphology such as particle size and shape of the produced Pu oxide power particles. Making inferences about the nature of inputs X that were usedmore » to create nuclear materials having particular characteristics, Y, is an inverse problem. Therefore, statistical analysis can be used to identify the best set (or sets) of Xs for a new set of observed responses Y. One can fit a model (or models) such as Υ = f(Χ) + error, for each of the responses, based on a calibration experiment and then “invert” to solve for the best set of Xs for a new set of Ys. This perspectives paper uses archived experimental data to consider aspects of data collection and experiment design for the calibration data to maximize the quality of the predicted Ys in the forward models; that is, we assume that well-estimated forward models are effective in the inverse problem. In addition, we consider how to identify a best solution for the inferred X, and evaluate the quality of the result and its robustness to a variety of initial assumptions, and different correlation structures between the responses. In addition, we also briefly review recent advances in metrology issues related to characterizing particle morphology measurements used in the response vector, Y.« less
Are great apes able to reason from multi-item samples to populations of food items?
Eckert, Johanna; Rakoczy, Hannes; Call, Josep
2017-10-01
Inductive learning from limited observations is a cognitive capacity of fundamental importance. In humans, it is underwritten by our intuitive statistics, the ability to draw systematic inferences from populations to randomly drawn samples and vice versa. According to recent research in cognitive development, human intuitive statistics develops early in infancy. Recent work in comparative psychology has produced first evidence for analogous cognitive capacities in great apes who flexibly drew inferences from populations to samples. In the present study, we investigated whether great apes (Pongo abelii, Pan troglodytes, Pan paniscus, Gorilla gorilla) also draw inductive inferences in the opposite direction, from samples to populations. In two experiments, apes saw an experimenter randomly drawing one multi-item sample from each of two populations of food items. The populations differed in their proportion of preferred to neutral items (24:6 vs. 6:24) but apes saw only the distribution of food items in the samples that reflected the distribution of the respective populations (e.g., 4:1 vs. 1:4). Based on this observation they were then allowed to choose between the two populations. Results show that apes seemed to make inferences from samples to populations and thus chose the population from which the more favorable (4:1) sample was drawn in Experiment 1. In this experiment, the more attractive sample not only contained proportionally but also absolutely more preferred food items than the less attractive sample. Experiment 2, however, revealed that when absolute and relative frequencies were disentangled, apes performed at chance level. Whether these limitations in apes' performance reflect true limits of cognitive competence or merely performance limitations due to accessory task demands is still an open question. © 2017 Wiley Periodicals, Inc.
Honda, Hidehito; Matsuka, Toshihiko; Ueda, Kazuhiro
2017-05-01
Some researchers on binary choice inference have argued that people make inferences based on simple heuristics, such as recognition, fluency, or familiarity. Others have argued that people make inferences based on available knowledge. To examine the boundary between heuristic and knowledge usage, we examine binary choice inference processes in terms of attribute substitution in heuristic use (Kahneman & Frederick, 2005). In this framework, it is predicted that people will rely on heuristic or knowledge-based inference depending on the subjective difficulty of the inference task. We conducted competitive tests of binary choice inference models representing simple heuristics (fluency and familiarity heuristics) and knowledge-based inference models. We found that a simple heuristic model (especially a familiarity heuristic model) explained inference patterns for subjectively difficult inference tasks, and that a knowledge-based inference model explained subjectively easy inference tasks. These results were consistent with the predictions of the attribute substitution framework. Issues on usage of simple heuristics and psychological processes are discussed. Copyright © 2016 Cognitive Science Society, Inc.
A Framework for Thinking about Informal Statistical Inference
ERIC Educational Resources Information Center
Makar, Katie; Rubin, Andee
2009-01-01
Informal inferential reasoning has shown some promise in developing students' deeper understanding of statistical processes. This paper presents a framework to think about three key principles of informal inference--generalizations "beyond the data," probabilistic language, and data as evidence. The authors use primary school classroom…
Sensitivity to the Sampling Process Emerges From the Principle of Efficiency.
Jara-Ettinger, Julian; Sun, Felix; Schulz, Laura; Tenenbaum, Joshua B
2018-05-01
Humans can seamlessly infer other people's preferences, based on what they do. Broadly, two types of accounts have been proposed to explain different aspects of this ability. The first account focuses on spatial information: Agents' efficient navigation in space reveals what they like. The second account focuses on statistical information: Uncommon choices reveal stronger preferences. Together, these two lines of research suggest that we have two distinct capacities for inferring preferences. Here we propose that this is not the case, and that spatial-based and statistical-based preference inferences can be explained by the assumption that agents are efficient alone. We show that people's sensitivity to spatial and statistical information when they infer preferences is best predicted by a computational model of the principle of efficiency, and that this model outperforms dual-system models, even when the latter are fit to participant judgments. Our results suggest that, as adults, a unified understanding of agency under the principle of efficiency underlies our ability to infer preferences. Copyright © 2018 Cognitive Science Society, Inc.
Inference with viral quasispecies diversity indices: clonal and NGS approaches.
Gregori, Josep; Salicrú, Miquel; Domingo, Esteban; Sanchez, Alex; Esteban, Juan I; Rodríguez-Frías, Francisco; Quer, Josep
2014-04-15
Given the inherent dynamics of a viral quasispecies, we are often interested in the comparison of diversity indices of sequential samples of a patient, or in the comparison of diversity indices of virus in groups of patients in a treated versus control design. It is then important to make sure that the diversity measures from each sample may be compared with no bias and within a consistent statistical framework. In the present report, we review some indices often used as measures for viral quasispecies complexity and provide means for statistical inference, applying procedures taken from the ecology field. In particular, we examine the Shannon entropy and the mutation frequency, and we discuss the appropriateness of different normalization methods of the Shannon entropy found in the literature. By taking amplicons ultra-deep pyrosequencing (UDPS) raw data as a surrogate of a real hepatitis C virus viral population, we study through in-silico sampling the statistical properties of these indices under two methods of viral quasispecies sampling, classical cloning followed by Sanger sequencing (CCSS) and next-generation sequencing (NGS) such as UDPS. We propose solutions specific to each of the two sampling methods-CCSS and NGS-to guarantee statistically conforming conclusions as free of bias as possible. josep.gregori@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
An accessible method for implementing hierarchical models with spatio-temporal abundance data
Ross, Beth E.; Hooten, Melvin B.; Koons, David N.
2012-01-01
A common goal in ecology and wildlife management is to determine the causes of variation in population dynamics over long periods of time and across large spatial scales. Many assumptions must nevertheless be overcome to make appropriate inference about spatio-temporal variation in population dynamics, such as autocorrelation among data points, excess zeros, and observation error in count data. To address these issues, many scientists and statisticians have recommended the use of Bayesian hierarchical models. Unfortunately, hierarchical statistical models remain somewhat difficult to use because of the necessary quantitative background needed to implement them, or because of the computational demands of using Markov Chain Monte Carlo algorithms to estimate parameters. Fortunately, new tools have recently been developed that make it more feasible for wildlife biologists to fit sophisticated hierarchical Bayesian models (i.e., Integrated Nested Laplace Approximation, ‘INLA’). We present a case study using two important game species in North America, the lesser and greater scaup, to demonstrate how INLA can be used to estimate the parameters in a hierarchical model that decouples observation error from process variation, and accounts for unknown sources of excess zeros as well as spatial and temporal dependence in the data. Ultimately, our goal was to make unbiased inference about spatial variation in population trends over time.
Statistical inference for extended or shortened phase II studies based on Simon's two-stage designs.
Zhao, Junjun; Yu, Menggang; Feng, Xi-Ping
2015-06-07
Simon's two-stage designs are popular choices for conducting phase II clinical trials, especially in the oncology trials to reduce the number of patients placed on ineffective experimental therapies. Recently Koyama and Chen (2008) discussed how to conduct proper inference for such studies because they found that inference procedures used with Simon's designs almost always ignore the actual sampling plan used. In particular, they proposed an inference method for studies when the actual second stage sample sizes differ from planned ones. We consider an alternative inference method based on likelihood ratio. In particular, we order permissible sample paths under Simon's two-stage designs using their corresponding conditional likelihood. In this way, we can calculate p-values using the common definition: the probability of obtaining a test statistic value at least as extreme as that observed under the null hypothesis. In addition to providing inference for a couple of scenarios where Koyama and Chen's method can be difficult to apply, the resulting estimate based on our method appears to have certain advantage in terms of inference properties in many numerical simulations. It generally led to smaller biases and narrower confidence intervals while maintaining similar coverages. We also illustrated the two methods in a real data setting. Inference procedures used with Simon's designs almost always ignore the actual sampling plan. Reported P-values, point estimates and confidence intervals for the response rate are not usually adjusted for the design's adaptiveness. Proper statistical inference procedures should be used.
NASA Astrophysics Data System (ADS)
Rajabi, Mohammad Mahdi; Ataie-Ashtiani, Behzad
2016-05-01
Bayesian inference has traditionally been conceived as the proper framework for the formal incorporation of expert knowledge in parameter estimation of groundwater models. However, conventional Bayesian inference is incapable of taking into account the imprecision essentially embedded in expert provided information. In order to solve this problem, a number of extensions to conventional Bayesian inference have been introduced in recent years. One of these extensions is 'fuzzy Bayesian inference' which is the result of integrating fuzzy techniques into Bayesian statistics. Fuzzy Bayesian inference has a number of desirable features which makes it an attractive approach for incorporating expert knowledge in the parameter estimation process of groundwater models: (1) it is well adapted to the nature of expert provided information, (2) it allows to distinguishably model both uncertainty and imprecision, and (3) it presents a framework for fusing expert provided information regarding the various inputs of the Bayesian inference algorithm. However an important obstacle in employing fuzzy Bayesian inference in groundwater numerical modeling applications is the computational burden, as the required number of numerical model simulations often becomes extremely exhaustive and often computationally infeasible. In this paper, a novel approach of accelerating the fuzzy Bayesian inference algorithm is proposed which is based on using approximate posterior distributions derived from surrogate modeling, as a screening tool in the computations. The proposed approach is first applied to a synthetic test case of seawater intrusion (SWI) in a coastal aquifer. It is shown that for this synthetic test case, the proposed approach decreases the number of required numerical simulations by an order of magnitude. Then the proposed approach is applied to a real-world test case involving three-dimensional numerical modeling of SWI in Kish Island, located in the Persian Gulf. An expert elicitation methodology is developed and applied to the real-world test case in order to provide a road map for the use of fuzzy Bayesian inference in groundwater modeling applications.
NASA Astrophysics Data System (ADS)
Alsing, Justin; Wandelt, Benjamin; Feeney, Stephen
2018-07-01
Many statistical models in cosmology can be simulated forwards but have intractable likelihood functions. Likelihood-free inference methods allow us to perform Bayesian inference from these models using only forward simulations, free from any likelihood assumptions or approximations. Likelihood-free inference generically involves simulating mock data and comparing to the observed data; this comparison in data space suffers from the curse of dimensionality and requires compression of the data to a small number of summary statistics to be tractable. In this paper, we use massive asymptotically optimal data compression to reduce the dimensionality of the data space to just one number per parameter, providing a natural and optimal framework for summary statistic choice for likelihood-free inference. Secondly, we present the first cosmological application of Density Estimation Likelihood-Free Inference (DELFI), which learns a parametrized model for joint distribution of data and parameters, yielding both the parameter posterior and the model evidence. This approach is conceptually simple, requires less tuning than traditional Approximate Bayesian Computation approaches to likelihood-free inference and can give high-fidelity posteriors from orders of magnitude fewer forward simulations. As an additional bonus, it enables parameter inference and Bayesian model comparison simultaneously. We demonstrate DELFI with massive data compression on an analysis of the joint light-curve analysis supernova data, as a simple validation case study. We show that high-fidelity posterior inference is possible for full-scale cosmological data analyses with as few as ˜104 simulations, with substantial scope for further improvement, demonstrating the scalability of likelihood-free inference to large and complex cosmological data sets.
A Coalitional Game for Distributed Inference in Sensor Networks With Dependent Observations
NASA Astrophysics Data System (ADS)
He, Hao; Varshney, Pramod K.
2016-04-01
We consider the problem of collaborative inference in a sensor network with heterogeneous and statistically dependent sensor observations. Each sensor aims to maximize its inference performance by forming a coalition with other sensors and sharing information within the coalition. It is proved that the inference performance is a nondecreasing function of the coalition size. However, in an energy constrained network, the energy consumption of inter-sensor communication also increases with increasing coalition size, which discourages the formation of the grand coalition (the set of all sensors). In this paper, the formation of non-overlapping coalitions with statistically dependent sensors is investigated under a specific communication constraint. We apply a game theoretical approach to fully explore and utilize the information contained in the spatial dependence among sensors to maximize individual sensor performance. Before formulating the distributed inference problem as a coalition formation game, we first quantify the gain and loss in forming a coalition by introducing the concepts of diversity gain and redundancy loss for both estimation and detection problems. These definitions, enabled by the statistical theory of copulas, allow us to characterize the influence of statistical dependence among sensor observations on inference performance. An iterative algorithm based on merge-and-split operations is proposed for the solution and the stability of the proposed algorithm is analyzed. Numerical results are provided to demonstrate the superiority of our proposed game theoretical approach.
Hall, L Malcolm; Collins, Catherine; Collet, Bertrand
2018-02-02
The utility of molecular response data arising from in-vivo single and repeated measure fish disease-challenge experiments is compared. An in-silico 'experiment' involving the generation of two imaginary immune-molecule quantity response profiles over time for individual animals was carried out. Daily 'observed' molecule quantities were drawn from the 'known' individual response profiles to mimic the results of single and repeated measurement. The results indicate that repeated measure experiments are required to infer individual level response profiles, and that these experiments also provide more accurate summary statistics and data more suited to inferring the dependent ordering of the molecular response. Additionally repeated measure experiments utilise fewer animals than single measure experiments. These results are described alongside a discussion of experimental methodological issues pertinent to the adoption of aquatic animal repeated measure experimental designs. We conclude that investigators need to take particular care when making inferences from single measure experiments and that serious consideration should be given to using repeated measure experiments for in-vivo fish disease-challenge investigations. Crown Copyright © 2018. Published by Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Peled, Ofra N.; Peled, Irit; Peled, Jonathan U.
2013-01-01
The phenomenon of birth of a baby is a common and familiar one, and yet college students participating in a general biology class did not possess the expected common knowledge of the equal probability of gender births. We found that these students held strikingly skewed conceptions regarding gender birth ratio, estimating the number of female births to be more than twice the number of male births. Possible sources of these beliefs were analysed, showing flaws in statistical thinking such as viewing small unplanned samples as representing the whole population and making inferences from an inappropriate population. Some educational implications are discussed and a short teaching example (using data assembly) demonstrates an instructional direction that might facilitate conceptual change.
Making Inferences in Adulthood: Falling Leaves Mean It's Fall.
ERIC Educational Resources Information Center
Zandi, Taher; Gregory, Monica E.
1988-01-01
Assessed age differences in making inferences from prose. Older adults correctly answered mean of 10 questions related to implicit information and 8 related to explicit information. Young adults answered mean of 7 implicit and 12 explicit information questions. In spite of poorer recall of factual details, older subjects made inferences to greater…
Models for inference in dynamic metacommunity systems
Dorazio, R.M.; Kery, M.; Royle, J. Andrew; Plattner, M.
2010-01-01
A variety of processes are thought to be involved in the formation and dynamics of species assemblages. For example, various metacommunity theories are based on differences in the relative contributions of dispersal of species among local communities and interactions of species within local communities. Interestingly, metacommunity theories continue to be advanced without much empirical validation. Part of the problem is that statistical models used to analyze typical survey data either fail to specify ecological processes with sufficient complexity or they fail to account for errors in detection of species during sampling. In this paper, we describe a statistical modeling framework for the analysis of metacommunity dynamics that is based on the idea of adopting a unified approach, multispecies occupancy modeling, for computing inferences about individual species, local communities of species, or the entire metacommunity of species. This approach accounts for errors in detection of species during sampling and also allows different metacommunity paradigms to be specified in terms of species-and location-specific probabilities of occurrence, extinction, and colonization: all of which are estimable. In addition, this approach can be used to address inference problems that arise in conservation ecology, such as predicting temporal and spatial changes in biodiversity for use in making conservation decisions. To illustrate, we estimate changes in species composition associated with the species-specific phenologies of flight patterns of butterflies in Switzerland for the purpose of estimating regional differences in biodiversity. ?? 2010 by the Ecological Society of America.
Spurious correlations and inference in landscape genetics
Samuel A. Cushman; Erin L. Landguth
2010-01-01
Reliable interpretation of landscape genetic analyses depends on statistical methods that have high power to identify the correct process driving gene flow while rejecting incorrect alternative hypotheses. Little is known about statistical power and inference in individual-based landscape genetics. Our objective was to evaluate the power of causalmodelling with partial...
"Magnitude-based inference": a statistical review.
Welsh, Alan H; Knight, Emma J
2015-04-01
We consider "magnitude-based inference" and its interpretation by examining in detail its use in the problem of comparing two means. We extract from the spreadsheets, which are provided to users of the analysis (http://www.sportsci.org/), a precise description of how "magnitude-based inference" is implemented. We compare the implemented version of the method with general descriptions of it and interpret the method in familiar statistical terms. We show that "magnitude-based inference" is not a progressive improvement on modern statistics. The additional probabilities introduced are not directly related to the confidence interval but, rather, are interpretable either as P values for two different nonstandard tests (for different null hypotheses) or as approximate Bayesian calculations, which also lead to a type of test. We also discuss sample size calculations associated with "magnitude-based inference" and show that the substantial reduction in sample sizes claimed for the method (30% of the sample size obtained from standard frequentist calculations) is not justifiable so the sample size calculations should not be used. Rather than using "magnitude-based inference," a better solution is to be realistic about the limitations of the data and use either confidence intervals or a fully Bayesian analysis.
Estimating the Diets of Animals Using Stable Isotopes and a Comprehensive Bayesian Mixing Model
Hopkins, John B.; Ferguson, Jake M.
2012-01-01
Using stable isotope mixing models (SIMMs) as a tool to investigate the foraging ecology of animals is gaining popularity among researchers. As a result, statistical methods are rapidly evolving and numerous models have been produced to estimate the diets of animals—each with their benefits and their limitations. Deciding which SIMM to use is contingent on factors such as the consumer of interest, its food sources, sample size, the familiarity a user has with a particular framework for statistical analysis, or the level of inference the researcher desires to make (e.g., population- or individual-level). In this paper, we provide a review of commonly used SIMM models and describe a comprehensive SIMM that includes all features commonly used in SIMM analysis and two new features. We used data collected in Yosemite National Park to demonstrate IsotopeR's ability to estimate dietary parameters. We then examined the importance of each feature in the model and compared our results to inferences from commonly used SIMMs. IsotopeR's user interface (in R) will provide researchers a user-friendly tool for SIMM analysis. The model is also applicable for use in paleontology, archaeology, and forensic studies as well as estimating pollution inputs. PMID:22235246
Quantification of downscaled precipitation uncertainties via Bayesian inference
NASA Astrophysics Data System (ADS)
Nury, A. H.; Sharma, A.; Marshall, L. A.
2017-12-01
Prediction of precipitation from global climate model (GCM) outputs remains critical to decision-making in water-stressed regions. In this regard, downscaling of GCM output has been a useful tool for analysing future hydro-climatological states. Several downscaling approaches have been developed for precipitation downscaling, including those using dynamical or statistical downscaling methods. Frequently, outputs from dynamical downscaling are not readily transferable across regions for significant methodical and computational difficulties. Statistical downscaling approaches provide a flexible and efficient alternative, providing hydro-climatological outputs across multiple temporal and spatial scales in many locations. However these approaches are subject to significant uncertainty, arising due to uncertainty in the downscaled model parameters and in the use of different reanalysis products for inferring appropriate model parameters. Consequently, these will affect the performance of simulation in catchment scale. This study develops a Bayesian framework for modelling downscaled daily precipitation from GCM outputs. This study aims to introduce uncertainties in downscaling evaluating reanalysis datasets against observational rainfall data over Australia. In this research a consistent technique for quantifying downscaling uncertainties by means of Bayesian downscaling frame work has been proposed. The results suggest that there are differences in downscaled precipitation occurrences and extremes.
Estimating pseudocounts and fold changes for digital expression measurements.
Erhard, Florian
2018-06-19
Fold changes from count based high-throughput experiments such as RNA-seq suffer from a zero-frequency problem. To circumvent division by zero, so-called pseudocounts are added to make all observed counts strictly positive. The magnitude of pseudocounts for digital expression measurements and on which stage of the analysis they are introduced remained an arbitrary choice. Moreover, in the strict sense, fold changes are not quantities that can be computed. Instead, due to the stochasticity involved in the experiments, they must be estimated by statistical inference. Here, we build on a statistical framework for fold changes, where pseudocounts correspond to the parameters of the prior distribution used for Bayesian inference of the fold change. We show that arbirary and widely used choices for applying pseudocounts can lead to biased results. As a statistical rigorous alternative, we propose and test an empirical Bayes procedure to choose appropriate pseudocounts. Moreover, we introduce the novel estimator Ψ LFC for fold changes showing favorable properties with small counts and smaller deviations from the truth in simulations and real data compared to existing methods. Our results have direct implications for entities with few reads in sequencing experiments, and indirectly also affect results for entities with many reads. Ψ LFC is available as an R package under https://github.com/erhard-lab/lfc (Apache 2.0 license); R scripts to generate all figures are available at zenodo (doi:10.5281/zenodo.1163029).
Drichoutis, Andreas C.; Lusk, Jayson L.
2014-01-01
Despite the fact that conceptual models of individual decision making under risk are deterministic, attempts to econometrically estimate risk preferences require some assumption about the stochastic nature of choice. Unfortunately, the consequences of making different assumptions are, at present, unclear. In this paper, we compare three popular error specifications (Fechner, contextual utility, and Luce error) for three different preference functionals (expected utility, rank-dependent utility, and a mixture of those two) using in- and out-of-sample selection criteria. We find drastically different inferences about structural risk preferences across the competing functionals and error specifications. Expected utility theory is least affected by the selection of the error specification. A mixture model combining the two conceptual models assuming contextual utility provides the best fit of the data both in- and out-of-sample. PMID:25029467
Drichoutis, Andreas C; Lusk, Jayson L
2014-01-01
Despite the fact that conceptual models of individual decision making under risk are deterministic, attempts to econometrically estimate risk preferences require some assumption about the stochastic nature of choice. Unfortunately, the consequences of making different assumptions are, at present, unclear. In this paper, we compare three popular error specifications (Fechner, contextual utility, and Luce error) for three different preference functionals (expected utility, rank-dependent utility, and a mixture of those two) using in- and out-of-sample selection criteria. We find drastically different inferences about structural risk preferences across the competing functionals and error specifications. Expected utility theory is least affected by the selection of the error specification. A mixture model combining the two conceptual models assuming contextual utility provides the best fit of the data both in- and out-of-sample.
Bram, Anthony D
2017-01-01
The Wechsler intelligence tests (currently Wechsler, 2008 , 2014) have traditionally been part of the multimethod test battery favored by psychodynamically oriented assessors. In this tradition, assessors have used Wechsler data to make inferences about personality that transcend cognition. Recent trends in clinical psychology, however, have deemphasized this psychodynamic way of working. In this article, I make a conceptual and clinical case for reviving and refining a psychodynamic approach to inference making about personality using the Wechsler Verbal Comprehension subtests. Specifically, I (a) describe the psychological and environmental conditions sampled by the Wechsler tests, (b) discuss the Wechsler tests conceptually in terms of assessing vulnerability to breakdowns in adaptive defensive functioning, (c) review a general framework for inference making, and (d) offer considerations for and illustrate pragmatic application of the Verbal Comprehension subtests data to make inferences that help answer referral questions and have important treatment implications.
Fu, Wenjiang J.; Stromberg, Arnold J.; Viele, Kert; Carroll, Raymond J.; Wu, Guoyao
2009-01-01
Over the past two decades, there have been revolutionary developments in life science technologies characterized by high throughput, high efficiency, and rapid computation. Nutritionists now have the advanced methodologies for the analysis of DNA, RNA, protein, low-molecular-weight metabolites, as well as access to bioinformatics databases. Statistics, which can be defined as the process of making scientific inferences from data that contain variability, has historically played an integral role in advancing nutritional sciences. Currently, in the era of systems biology, statistics has become an increasingly important tool to quantitatively analyze information about biological macromolecules. This article describes general terms used in statistical analysis of large, complex experimental data. These terms include experimental design, power analysis, sample size calculation, and experimental errors (type I and II errors) for nutritional studies at population, tissue, cellular, and molecular levels. In addition, we highlighted various sources of experimental variations in studies involving microarray gene expression, real-time polymerase chain reaction, proteomics, and other bioinformatics technologies. Moreover, we provided guidelines for nutritionists and other biomedical scientists to plan and conduct studies and to analyze the complex data. Appropriate statistical analyses are expected to make an important contribution to solving major nutrition-associated problems in humans and animals (including obesity, diabetes, cardiovascular disease, cancer, ageing, and intrauterine fetal retardation). PMID:20233650
Assessing colour-dependent occupation statistics inferred from galaxy group catalogues
NASA Astrophysics Data System (ADS)
Campbell, Duncan; van den Bosch, Frank C.; Hearin, Andrew; Padmanabhan, Nikhil; Berlind, Andreas; Mo, H. J.; Tinker, Jeremy; Yang, Xiaohu
2015-09-01
We investigate the ability of current implementations of galaxy group finders to recover colour-dependent halo occupation statistics. To test the fidelity of group catalogue inferred statistics, we run three different group finders used in the literature over a mock that includes galaxy colours in a realistic manner. Overall, the resulting mock group catalogues are remarkably similar, and most colour-dependent statistics are recovered with reasonable accuracy. However, it is also clear that certain systematic errors arise as a consequence of correlated errors in group membership determination, central/satellite designation, and halo mass assignment. We introduce a new statistic, the halo transition probability (HTP), which captures the combined impact of all these errors. As a rule of thumb, errors tend to equalize the properties of distinct galaxy populations (i.e. red versus blue galaxies or centrals versus satellites), and to result in inferred occupation statistics that are more accurate for red galaxies than for blue galaxies. A statistic that is particularly poorly recovered from the group catalogues is the red fraction of central galaxies as a function of halo mass. Group finders do a good job in recovering galactic conformity, but also have a tendency to introduce weak conformity when none is present. We conclude that proper inference of colour-dependent statistics from group catalogues is best achieved using forward modelling (i.e. running group finders over mock data) or by implementing a correction scheme based on the HTP, as long as the latter is not too strongly model dependent.
Influence of study goals on study design and execution.
Kirklin, J W; Blackstone, E H; Naftel, D C; Turner, M E
1997-12-01
From the viewpoint of a clinician who makes recommendations to patients about choosing from the multiple possible management schemes, quantitative information derived from statistical analyses of observational studies is useful. Although random assignment of therapy is optimal, appropriately performed studies in which therapy has been nonrandomly "assigned" are considered acceptable, albeit occasionally with limitations in inferences. The analyses are considered most useful when they generate multivariable equations suitable for predicting time-related outcomes in individual patients. Graphic presentations improve communication with patients and facilitate truly informed consent.
Inference in Right Hemisphere Damaged Individuals' Comprehension: The Role of Sustained Attention
ERIC Educational Resources Information Center
Saldert, Charlotta; Ahlsen, Elisabeth
2007-01-01
The ability to make inferences for the purposes of comprehension is considered an important factor in pragmatic ability. In this experimental group study with stroke patients, the ability to make inferences and its associations with sustained attention and verbal working memory were explored. A group of 14 left-hemisphere-damaged individuals had…
Statistical learning and selective inference.
Taylor, Jonathan; Tibshirani, Robert J
2015-06-23
We describe the problem of "selective inference." This addresses the following challenge: Having mined a set of data to find potential associations, how do we properly assess the strength of these associations? The fact that we have "cherry-picked"--searched for the strongest associations--means that we must set a higher bar for declaring significant the associations that we see. This challenge becomes more important in the era of big data and complex statistical modeling. The cherry tree (dataset) can be very large and the tools for cherry picking (statistical learning methods) are now very sophisticated. We describe some recent new developments in selective inference and illustrate their use in forward stepwise regression, the lasso, and principal components analysis.
The statistical analysis of circadian phase and amplitude in constant-routine core-temperature data
NASA Technical Reports Server (NTRS)
Brown, E. N.; Czeisler, C. A.
1992-01-01
Accurate estimation of the phases and amplitude of the endogenous circadian pacemaker from constant-routine core-temperature series is crucial for making inferences about the properties of the human biological clock from data collected under this protocol. This paper presents a set of statistical methods based on a harmonic-regression-plus-correlated-noise model for estimating the phases and the amplitude of the endogenous circadian pacemaker from constant-routine core-temperature data. The methods include a Bayesian Monte Carlo procedure for computing the uncertainty in these circadian functions. We illustrate the techniques with a detailed study of a single subject's core-temperature series and describe their relationship to other statistical methods for circadian data analysis. In our laboratory, these methods have been successfully used to analyze more than 300 constant routines and provide a highly reliable means of extracting phase and amplitude information from core-temperature data.
Perkovic, Sonja; Orquin, Jacob Lund
2018-01-01
Ecological rationality results from matching decision strategies to appropriate environmental structures, but how does the matching happen? We propose that people learn the statistical structure of the environment through observation and use this learned structure to guide ecologically rational behavior. We tested this hypothesis in the context of organic foods. In Study 1, we found that products from healthful food categories are more likely to be organic than products from nonhealthful food categories. In Study 2, we found that consumers' perceptions of the healthfulness and prevalence of organic products in many food categories are accurate. Finally, in Study 3, we found that people perceive organic products as more healthful than nonorganic products when the statistical structure justifies this inference. Our findings suggest that people believe organic foods are more healthful than nonorganic foods and use an organic-food cue to guide their behavior because organic foods are, on average, 30% more healthful.
OTD Observations of Continental US Ground and Cloud Flashes
NASA Technical Reports Server (NTRS)
Koshak, William
2007-01-01
Lightning optical flash parameters (e.g., radiance, area, duration, number of optical groups, and number of optical events) derived from almost five years of Optical Transient Detector (OTD) data are analyzed. Hundreds of thousands of OTD flashes occurring over the continental US are categorized according to flash type (ground or cloud flash) using US National Lightning Detection Network TM (NLDN) data. The statistics of the optical characteristics of the ground and cloud flashes are inter-compared on an overall basis, and as a function of ground flash polarity. A standard two-distribution hypothesis test is used to inter-compare the population means of a given lightning parameter for the two flash types. Given the differences in the statistics of the optical characteristics, it is suggested that statistical analyses (e.g., Bayesian Inference) of the space-based optical measurements might make it possible to successfully discriminate ground and cloud flashes a reasonable percentage of the time.
Variations on Bayesian Prediction and Inference
2016-05-09
inference 2.2.1 Background There are a number of statistical inference problems that are not generally formulated via a full probability model...problem of inference about an unknown parameter, the Bayesian approach requires a full probability 1. REPORT DATE (DD-MM-YYYY) 4. TITLE AND...the problem of inference about an unknown parameter, the Bayesian approach requires a full probability model/likelihood which can be an obstacle
[Methodological design of the National Health and Nutrition Survey 2016].
Romero-Martínez, Martín; Shamah-Levy, Teresa; Cuevas-Nasu, Lucía; Gómez-Humarán, Ignacio Méndez; Gaona-Pineda, Elsa Berenice; Gómez-Acosta, Luz María; Rivera-Dommarco, Juan Ángel; Hernández-Ávila, Mauricio
2017-01-01
Describe the design methodology of the halfway health and nutrition national survey (Ensanut-MC) 2016. The Ensanut-MC is a national probabilistic survey whose objective population are the inhabitants of private households in Mexico. The sample size was determined to make inferences on the urban and rural areas in four regions. Describes main design elements: target population, topics of study, sampling procedure, measurement procedure and logistics organization. A final sample of 9 479 completed household interviews, and a sample of 16 591 individual interviews. The response rate for households was 77.9%, and the response rate for individuals was 91.9%. The Ensanut-MC probabilistic design allows valid statistical inferences about interest parameters for Mexico´s public health and nutrition, specifically on overweight, obesity and diabetes mellitus. Updated information also supports the monitoring, updating and formulation of new policies and priority programs.
Tsiatis, Anastasios A.; Davidian, Marie; Cao, Weihua
2010-01-01
Summary A routine challenge is that of making inference on parameters in a statistical model of interest from longitudinal data subject to drop out, which are a special case of the more general setting of monotonely coarsened data. Considerable recent attention has focused on doubly robust estimators, which in this context involve positing models for both the missingness (more generally, coarsening) mechanism and aspects of the distribution of the full data, that have the appealing property of yielding consistent inferences if only one of these models is correctly specified. Doubly robust estimators have been criticized for potentially disastrous performance when both of these models are even only mildly misspecified. We propose a doubly robust estimator applicable in general monotone coarsening problems that achieves comparable or improved performance relative to existing doubly robust methods, which we demonstrate via simulation studies and by application to data from an AIDS clinical trial. PMID:20731640
Fieschi, M
2012-01-01
To review the history of AI in Medicine in the 1980's, placing the SPHINX system in the context of other research in this field. Summarize the main systems for AI in medicine developed in the 1970-1980 decade and their relationship to the development of clinical decision-making and consultation systems The approaches taken by AI in medicine research groups is compared and contrasted to those of others using statistical and logical methods for representing clinical inferences, and the different AI approaches are summarized, and related to the architecture and systems implementation of SPHINX CONCLUSION: The SPHINX system combined a number of advanced representational and inference choices from AI in designing a decision- support system for clinical consultation in the 1980s. The context within which the system was developed is outlined and related to the historical evolution of AI in medicine during that decade.
Heck, Daniel W; Hilbig, Benjamin E; Moshagen, Morten
2017-08-01
Decision strategies explain how people integrate multiple sources of information to make probabilistic inferences. In the past decade, increasingly sophisticated methods have been developed to determine which strategy explains decision behavior best. We extend these efforts to test psychologically more plausible models (i.e., strategies), including a new, probabilistic version of the take-the-best (TTB) heuristic that implements a rank order of error probabilities based on sequential processing. Within a coherent statistical framework, deterministic and probabilistic versions of TTB and other strategies can directly be compared using model selection by minimum description length or the Bayes factor. In an experiment with inferences from given information, only three of 104 participants were best described by the psychologically plausible, probabilistic version of TTB. Similar as in previous studies, most participants were classified as users of weighted-additive, a strategy that integrates all available information and approximates rational decisions. Copyright © 2017 Elsevier Inc. All rights reserved.
Reasoning in explanation-based decision making.
Pennington, N; Hastie, R
1993-01-01
A general theory of explanation-based decision making is outlined and the multiple roles of inference processes in the theory are indicated. A typology of formal and informal inference forms, originally proposed by Collins (1978a, 1978b), is introduced as an appropriate framework to represent inferences that occur in the overarching explanation-based process. Results from the analysis of verbal reports of decision processes are presented to demonstrate the centrality and systematic character of reasoning in a representative legal decision-making task.
Occupational risks for bladder cancer among men in Sweden.
Malker, H S; McLaughlin, J K; Silverman, D T; Ericsson, J L; Stone, B J; Weiner, J A; Malker, B K; Blot, W J
1987-12-15
With the use of the Swedish Cancer-Environment Registry, census data on employment in 1960 were linked with registry data on bladder cancer during 1961-79. This hypothesis-generating study revealed for the first time associations between bladder cancer and employment in pulp and fiberboard manufacturing, in rope and twine making, and work as a dental technician. Statistically significant increases in risk were also found for several occupations previously associated with bladder cancer, including barbers and beauticians, artistic painters, toolmakers and machinists, and physicians, and employment in butcher shops, industrial chemical making, apparel manufacturing, and plumbing. Etiologic inferences cannot be made from this investigation, but the findings from this large national resource provide further clues to the occupational determinants of bladder cancer.
In defence of model-based inference in phylogeography
Beaumont, Mark A.; Nielsen, Rasmus; Robert, Christian; Hey, Jody; Gaggiotti, Oscar; Knowles, Lacey; Estoup, Arnaud; Panchal, Mahesh; Corander, Jukka; Hickerson, Mike; Sisson, Scott A.; Fagundes, Nelson; Chikhi, Lounès; Beerli, Peter; Vitalis, Renaud; Cornuet, Jean-Marie; Huelsenbeck, John; Foll, Matthieu; Yang, Ziheng; Rousset, Francois; Balding, David; Excoffier, Laurent
2017-01-01
Recent papers have promoted the view that model-based methods in general, and those based on Approximate Bayesian Computation (ABC) in particular, are flawed in a number of ways, and are therefore inappropriate for the analysis of phylogeographic data. These papers further argue that Nested Clade Phylogeographic Analysis (NCPA) offers the best approach in statistical phylogeography. In order to remove the confusion and misconceptions introduced by these papers, we justify and explain the reasoning behind model-based inference. We argue that ABC is a statistically valid approach, alongside other computational statistical techniques that have been successfully used to infer parameters and compare models in population genetics. We also examine the NCPA method and highlight numerous deficiencies, either when used with single or multiple loci. We further show that the ages of clades are carelessly used to infer ages of demographic events, that these ages are estimated under a simple model of panmixia and population stationarity but are then used under different and unspecified models to test hypotheses, a usage the invalidates these testing procedures. We conclude by encouraging researchers to study and use model-based inference in population genetics. PMID:29284924
Statistical inference for remote sensing-based estimates of net deforestation
Ronald E. McRoberts; Brian F. Walters
2012-01-01
Statistical inference requires expression of an estimate in probabilistic terms, usually in the form of a confidence interval. An approach to constructing confidence intervals for remote sensing-based estimates of net deforestation is illustrated. The approach is based on post-classification methods using two independent forest/non-forest classifications because...
Statistical modelling of networked human-automation performance using working memory capacity.
Ahmed, Nisar; de Visser, Ewart; Shaw, Tyler; Mohamed-Ameen, Amira; Campbell, Mark; Parasuraman, Raja
2014-01-01
This study examines the challenging problem of modelling the interaction between individual attentional limitations and decision-making performance in networked human-automation system tasks. Analysis of real experimental data from a task involving networked supervision of multiple unmanned aerial vehicles by human participants shows that both task load and network message quality affect performance, but that these effects are modulated by individual differences in working memory (WM) capacity. These insights were used to assess three statistical approaches for modelling and making predictions with real experimental networked supervisory performance data: classical linear regression, non-parametric Gaussian processes and probabilistic Bayesian networks. It is shown that each of these approaches can help designers of networked human-automated systems cope with various uncertainties in order to accommodate future users by linking expected operating conditions and performance from real experimental data to observable cognitive traits like WM capacity. Practitioner Summary: Working memory (WM) capacity helps account for inter-individual variability in operator performance in networked unmanned aerial vehicle supervisory tasks. This is useful for reliable performance prediction near experimental conditions via linear models; robust statistical prediction beyond experimental conditions via Gaussian process models and probabilistic inference about unknown task conditions/WM capacities via Bayesian network models.
Butler, Andrew C.; Dennis, Nancy A.; Marsh, Elizabeth J.
2012-01-01
People can acquire both true and false knowledge about the world from fictional stories (Marsh & Fazio, 2007). The present study explored whether the benefits and costs of learning about the world from fictional stories extend beyond memory for directly stated pieces of information. Of interest was whether readers would use correct and incorrect story references to make deductive inferences about related information in the story, and then integrate those inferences into their knowledge bases. Subjects read stories containing correct, neutral, and misleading references to facts about the world; each reference could be combined with another reference that occurred in a later sentence to make a deductive inference. Later, they answered general knowledge questions that tested for these deductive inferences. The results showed that subjects generated and retained the deductive inferences regardless of whether the inferences were consistent or inconsistent with world knowledge, and irrespective of whether the references were placed consecutively in the text or separated by many sentences. Readers learn more than what is directly stated in stories; they use references to the real world to make both correct and incorrect inferences that are integrated into their knowledge bases. PMID:22640369
Malle, Bertram F; Holbrook, Jess
2012-04-01
People interpret behavior by making inferences about agents' intentionality, mind, and personality. Past research studied such inferences 1 at a time; in real life, people make these inferences simultaneously. The present studies therefore examined whether 4 major inferences (intentionality, desire, belief, and personality), elicited simultaneously in response to an observed behavior, might be ordered in a hierarchy of likelihood and speed. To achieve generalizability, the studies included a wide range of stimulus behaviors, presented them verbally and as dynamic videos, and assessed inferences both in a retrieval paradigm (measuring the likelihood and speed of accessing inferences immediately after they were made) and in an online processing paradigm (measuring the speed of forming inferences during behavior observation). Five studies provide evidence for a hierarchy of social inferences-from intentionality and desire to belief to personality-that is stable across verbal and visual presentations and that parallels the order found in developmental and primate research. (c) 2012 APA, all rights reserved.
Progressive statistics for studies in sports medicine and exercise science.
Hopkins, William G; Marshall, Stephen W; Batterham, Alan M; Hanin, Juri
2009-01-01
Statistical guidelines and expert statements are now available to assist in the analysis and reporting of studies in some biomedical disciplines. We present here a more progressive resource for sample-based studies, meta-analyses, and case studies in sports medicine and exercise science. We offer forthright advice on the following controversial or novel issues: using precision of estimation for inferences about population effects in preference to null-hypothesis testing, which is inadequate for assessing clinical or practical importance; justifying sample size via acceptable precision or confidence for clinical decisions rather than via adequate power for statistical significance; showing SD rather than SEM, to better communicate the magnitude of differences in means and nonuniformity of error; avoiding purely nonparametric analyses, which cannot provide inferences about magnitude and are unnecessary; using regression statistics in validity studies, in preference to the impractical and biased limits of agreement; making greater use of qualitative methods to enrich sample-based quantitative projects; and seeking ethics approval for public access to the depersonalized raw data of a study, to address the need for more scrutiny of research and better meta-analyses. Advice on less contentious issues includes the following: using covariates in linear models to adjust for confounders, to account for individual differences, and to identify potential mechanisms of an effect; using log transformation to deal with nonuniformity of effects and error; identifying and deleting outliers; presenting descriptive, effect, and inferential statistics in appropriate formats; and contending with bias arising from problems with sampling, assignment, blinding, measurement error, and researchers' prejudices. This article should advance the field by stimulating debate, promoting innovative approaches, and serving as a useful checklist for authors, reviewers, and editors.
Approximation and inference methods for stochastic biochemical kinetics—a tutorial review
NASA Astrophysics Data System (ADS)
Schnoerr, David; Sanguinetti, Guido; Grima, Ramon
2017-03-01
Stochastic fluctuations of molecule numbers are ubiquitous in biological systems. Important examples include gene expression and enzymatic processes in living cells. Such systems are typically modelled as chemical reaction networks whose dynamics are governed by the chemical master equation. Despite its simple structure, no analytic solutions to the chemical master equation are known for most systems. Moreover, stochastic simulations are computationally expensive, making systematic analysis and statistical inference a challenging task. Consequently, significant effort has been spent in recent decades on the development of efficient approximation and inference methods. This article gives an introduction to basic modelling concepts as well as an overview of state of the art methods. First, we motivate and introduce deterministic and stochastic methods for modelling chemical networks, and give an overview of simulation and exact solution methods. Next, we discuss several approximation methods, including the chemical Langevin equation, the system size expansion, moment closure approximations, time-scale separation approximations and hybrid methods. We discuss their various properties and review recent advances and remaining challenges for these methods. We present a comparison of several of these methods by means of a numerical case study and highlight some of their respective advantages and disadvantages. Finally, we discuss the problem of inference from experimental data in the Bayesian framework and review recent methods developed the literature. In summary, this review gives a self-contained introduction to modelling, approximations and inference methods for stochastic chemical kinetics.
Conceptual influences on category-based induction
Gelman, Susan A.; Davidson, Natalie S.
2013-01-01
One important function of categories is to permit rich inductive inferences. Prior work shows that children use category labels to guide their inductive inferences. However, there are competing theories to explain this phenomenon, differing in the roles attributed to conceptual information versus perceptual similarity. Seven experiments with 4- to 5-year-old children and adults (N = 344) test these theories by teaching categories for which category membership and perceptual similarity are in conflict, and varying the conceptual basis of the novel categories. Results indicate that for non-natural kind categories that have little conceptual coherence, children make inferences based on perceptual similarity, whereas adults make inferences based on category membership. In contrast, for basic- and ontological-level categories that have a principled conceptual basis, children and adults alike make use of category membership more than perceptual similarity as the basis of their inferences. These findings provide evidence in favor of the role of conceptual information in preschoolers’ inferences, and further demonstrate that labeled categories are not all equivalent; they differ in their inductive potential. PMID:23517863
Thermodynamics of statistical inference by cells.
Lang, Alex H; Fisher, Charles K; Mora, Thierry; Mehta, Pankaj
2014-10-03
The deep connection between thermodynamics, computation, and information is now well established both theoretically and experimentally. Here, we extend these ideas to show that thermodynamics also places fundamental constraints on statistical estimation and learning. To do so, we investigate the constraints placed by (nonequilibrium) thermodynamics on the ability of biochemical signaling networks to estimate the concentration of an external signal. We show that accuracy is limited by energy consumption, suggesting that there are fundamental thermodynamic constraints on statistical inference.
Inferring general relations between network characteristics from specific network ensembles.
Cardanobile, Stefano; Pernice, Volker; Deger, Moritz; Rotter, Stefan
2012-01-01
Different network models have been suggested for the topology underlying complex interactions in natural systems. These models are aimed at replicating specific statistical features encountered in real-world networks. However, it is rarely considered to which degree the results obtained for one particular network class can be extrapolated to real-world networks. We address this issue by comparing different classical and more recently developed network models with respect to their ability to generate networks with large structural variability. In particular, we consider the statistical constraints which the respective construction scheme imposes on the generated networks. After having identified the most variable networks, we address the issue of which constraints are common to all network classes and are thus suitable candidates for being generic statistical laws of complex networks. In fact, we find that generic, not model-related dependencies between different network characteristics do exist. This makes it possible to infer global features from local ones using regression models trained on networks with high generalization power. Our results confirm and extend previous findings regarding the synchronization properties of neural networks. Our method seems especially relevant for large networks, which are difficult to map completely, like the neural networks in the brain. The structure of such large networks cannot be fully sampled with the present technology. Our approach provides a method to estimate global properties of under-sampled networks in good approximation. Finally, we demonstrate on three different data sets (C. elegans neuronal network, R. prowazekii metabolic network, and a network of synonyms extracted from Roget's Thesaurus) that real-world networks have statistical relations compatible with those obtained using regression models.
Ganju, Jitendra; Yu, Xinxin; Ma, Guoguang Julie
2013-01-01
Formal inference in randomized clinical trials is based on controlling the type I error rate associated with a single pre-specified statistic. The deficiency of using just one method of analysis is that it depends on assumptions that may not be met. For robust inference, we propose pre-specifying multiple test statistics and relying on the minimum p-value for testing the null hypothesis of no treatment effect. The null hypothesis associated with the various test statistics is that the treatment groups are indistinguishable. The critical value for hypothesis testing comes from permutation distributions. Rejection of the null hypothesis when the smallest p-value is less than the critical value controls the type I error rate at its designated value. Even if one of the candidate test statistics has low power, the adverse effect on the power of the minimum p-value statistic is not much. Its use is illustrated with examples. We conclude that it is better to rely on the minimum p-value rather than a single statistic particularly when that single statistic is the logrank test, because of the cost and complexity of many survival trials. Copyright © 2013 John Wiley & Sons, Ltd.
Inductive Selectivity in Children’s Cross-classified Concepts
Nguyen, Simone P.
2012-01-01
Cross-classified items pose an interesting challenge to children’s induction since these items belong to many different categories, each of which may serve as a basis for a different type of inference. Inductive selectivity is the ability to appropriately make different types of inferences about a single cross-classifiable item based on its different category memberships. This research includes five experiments that examine the development of inductive selectivity in 3-, 4-, and 5-year-olds (N = 272). Overall, the results show that by age 4 years, children have inductive selectivity with taxonomic and script categories. That is, children use taxonomic categories to make biochemical inferences about an item whereas children use script categories to make situational inferences about an item. PMID:22803510
Ensemble stacking mitigates biases in inference of synaptic connectivity.
Chambers, Brendan; Levy, Maayan; Dechery, Joseph B; MacLean, Jason N
2018-01-01
A promising alternative to directly measuring the anatomical connections in a neuronal population is inferring the connections from the activity. We employ simulated spiking neuronal networks to compare and contrast commonly used inference methods that identify likely excitatory synaptic connections using statistical regularities in spike timing. We find that simple adjustments to standard algorithms improve inference accuracy: A signing procedure improves the power of unsigned mutual-information-based approaches and a correction that accounts for differences in mean and variance of background timing relationships, such as those expected to be induced by heterogeneous firing rates, increases the sensitivity of frequency-based methods. We also find that different inference methods reveal distinct subsets of the synaptic network and each method exhibits different biases in the accurate detection of reciprocity and local clustering. To correct for errors and biases specific to single inference algorithms, we combine methods into an ensemble. Ensemble predictions, generated as a linear combination of multiple inference algorithms, are more sensitive than the best individual measures alone, and are more faithful to ground-truth statistics of connectivity, mitigating biases specific to single inference methods. These weightings generalize across simulated datasets, emphasizing the potential for the broad utility of ensemble-based approaches.
Bayesian inference of physiologically meaningful parameters from body sway measurements.
Tietäväinen, A; Gutmann, M U; Keski-Vakkuri, E; Corander, J; Hæggström, E
2017-06-19
The control of the human body sway by the central nervous system, muscles, and conscious brain is of interest since body sway carries information about the physiological status of a person. Several models have been proposed to describe body sway in an upright standing position, however, due to the statistical intractability of the more realistic models, no formal parameter inference has previously been conducted and the expressive power of such models for real human subjects remains unknown. Using the latest advances in Bayesian statistical inference for intractable models, we fitted a nonlinear control model to posturographic measurements, and we showed that it can accurately predict the sway characteristics of both simulated and real subjects. Our method provides a full statistical characterization of the uncertainty related to all model parameters as quantified by posterior probability density functions, which is useful for comparisons across subjects and test settings. The ability to infer intractable control models from sensor data opens new possibilities for monitoring and predicting body status in health applications.
Geoscience in the Big Data Era: Are models obsolete?
NASA Astrophysics Data System (ADS)
Yuen, D. A.; Zheng, L.; Stark, P. B.; Morra, G.; Knepley, M.; Wang, X.
2016-12-01
In last few decades, the velocity, volume, and variety of geophysical data have increased, while the development of the Internet and distributed computing has led to the emergence of "data science." Fitting and running numerical models, especially based on PDEs, is the main consumer of flops in geoscience. Can large amounts of diverse data supplant modeling? Without the ability to conduct randomized, controlled experiments, causal inference requires understanding the physics. It is sometimes possible to predict well without understanding the system—if (1) the system is predictable, (2) data on "important" variables are available, and (3) the system changes slowly enough. And sometimes even a crude model can help the data "speak for themselves" much more clearly. For example, Shearer (1991) used a 1-dimensional velocity model to stack long-period seismograms, revealing upper mantle discontinuities. This was a "big data" approach: the main use of computing was in the data processing, rather than in modeling, yet the "signal" became clear. In contrast, modelers tend to use all available computing power to fit even more complex models, resulting in a cycle where uncertainty quantification (UQ) is never possible: even if realistic UQ required only 1,000 model evaluations, it is never in reach. To make more reliable inferences requires better data analysis and statistics, not more complex models. Geoscientists need to learn new skills and tools: sound software engineering practices; open programming languages suitable for big data; parallel and distributed computing; data visualization; and basic nonparametric, computationally based statistical inference, such as permutation tests. They should work reproducibly, scripting all analyses and avoiding point-and-click tools.
The precuneus may encode irrationality in human gambling.
Sacre, P; Kerr, M S D; Subramanian, S; Kahn, K; Gonzalez-Martinez, J; Johnson, M A; Sarma, S V; Gale, J T
2016-08-01
Humans often make irrational decisions, especially psychiatric patients who have dysfunctional cognitive and emotional circuitry. Understanding the neural basis of decision-making is therefore essential towards patient management, yet current studies suffer from several limitations. Functional magnetic resonance imaging (fMRI) studies in humans have dominated decision-making neuroscience, but have poor temporal resolution and the blood oxygenation level-dependent signal is only a proxy for neural activity. On the other hand, lesion studies in humans used to infer functionality in decision-making lack characterization of neural activity altogether. Using a combination of local field potential recordings in human subjects performing a financial decision-making task, spectral analyses, and non-parametric cluster statistics, we analyzed the activity in the precuneus. In nine subjects, the neural activity modulated significantly between rational and irrational trials in the precuneus (p <; 0.001). In particular, high-frequency activity (70-100 Hz) increased when irrational decisions were made. Although preliminary, these results suggest suppression of gamma rhythms via electrical stimulation in the precuneus as a therapeutic intervention for pathological decision-making.
The anatomy of choice: dopamine and decision-making
Friston, Karl; Schwartenbeck, Philipp; FitzGerald, Thomas; Moutoussis, Michael; Behrens, Timothy; Dolan, Raymond J.
2014-01-01
This paper considers goal-directed decision-making in terms of embodied or active inference. We associate bounded rationality with approximate Bayesian inference that optimizes a free energy bound on model evidence. Several constructs such as expected utility, exploration or novelty bonuses, softmax choice rules and optimism bias emerge as natural consequences of free energy minimization. Previous accounts of active inference have focused on predictive coding. In this paper, we consider variational Bayes as a scheme that the brain might use for approximate Bayesian inference. This scheme provides formal constraints on the computational anatomy of inference and action, which appear to be remarkably consistent with neuroanatomy. Active inference contextualizes optimal decision theory within embodied inference, where goals become prior beliefs. For example, expected utility theory emerges as a special case of free energy minimization, where the sensitivity or inverse temperature (associated with softmax functions and quantal response equilibria) has a unique and Bayes-optimal solution. Crucially, this sensitivity corresponds to the precision of beliefs about behaviour. The changes in precision during variational updates are remarkably reminiscent of empirical dopaminergic responses—and they may provide a new perspective on the role of dopamine in assimilating reward prediction errors to optimize decision-making. PMID:25267823
The anatomy of choice: dopamine and decision-making.
Friston, Karl; Schwartenbeck, Philipp; FitzGerald, Thomas; Moutoussis, Michael; Behrens, Timothy; Dolan, Raymond J
2014-11-05
This paper considers goal-directed decision-making in terms of embodied or active inference. We associate bounded rationality with approximate Bayesian inference that optimizes a free energy bound on model evidence. Several constructs such as expected utility, exploration or novelty bonuses, softmax choice rules and optimism bias emerge as natural consequences of free energy minimization. Previous accounts of active inference have focused on predictive coding. In this paper, we consider variational Bayes as a scheme that the brain might use for approximate Bayesian inference. This scheme provides formal constraints on the computational anatomy of inference and action, which appear to be remarkably consistent with neuroanatomy. Active inference contextualizes optimal decision theory within embodied inference, where goals become prior beliefs. For example, expected utility theory emerges as a special case of free energy minimization, where the sensitivity or inverse temperature (associated with softmax functions and quantal response equilibria) has a unique and Bayes-optimal solution. Crucially, this sensitivity corresponds to the precision of beliefs about behaviour. The changes in precision during variational updates are remarkably reminiscent of empirical dopaminergic responses-and they may provide a new perspective on the role of dopamine in assimilating reward prediction errors to optimize decision-making.
Grover, Sandeep; Del Greco M, Fabiola; Stein, Catherine M; Ziegler, Andreas
2017-01-01
Confounding and reverse causality have prevented us from drawing meaningful clinical interpretation even in well-powered observational studies. Confounding may be attributed to our inability to randomize the exposure variable in observational studies. Mendelian randomization (MR) is one approach to overcome confounding. It utilizes one or more genetic polymorphisms as a proxy for the exposure variable of interest. Polymorphisms are randomly distributed in a population, they are static throughout an individual's lifetime, and may thus help in inferring directionality in exposure-outcome associations. Genome-wide association studies (GWAS) or meta-analyses of GWAS are characterized by large sample sizes and the availability of many single nucleotide polymorphisms (SNPs), making GWAS-based MR an attractive approach. GWAS-based MR comes with specific challenges, including multiple causality. Despite shortcomings, it still remains one of the most powerful techniques for inferring causality.With MR still an evolving concept with complex statistical challenges, the literature is relatively scarce in terms of providing working examples incorporating real datasets. In this chapter, we provide a step-by-step guide for causal inference based on the principles of MR with a real dataset using both individual and summary data from unrelated individuals. We suggest best possible practices and give recommendations based on the current literature.
Discrete Inverse and State Estimation Problems
NASA Astrophysics Data System (ADS)
Wunsch, Carl
2006-06-01
The problems of making inferences about the natural world from noisy observations and imperfect theories occur in almost all scientific disciplines. This book addresses these problems using examples taken from geophysical fluid dynamics. It focuses on discrete formulations, both static and time-varying, known variously as inverse, state estimation or data assimilation problems. Starting with fundamental algebraic and statistical ideas, the book guides the reader through a range of inference tools including the singular value decomposition, Gauss-Markov and minimum variance estimates, Kalman filters and related smoothers, and adjoint (Lagrange multiplier) methods. The final chapters discuss a variety of practical applications to geophysical flow problems. Discrete Inverse and State Estimation Problems is an ideal introduction to the topic for graduate students and researchers in oceanography, meteorology, climate dynamics, and geophysical fluid dynamics. It is also accessible to a wider scientific audience; the only prerequisite is an understanding of linear algebra. Provides a comprehensive introduction to discrete methods of inference from incomplete information Based upon 25 years of practical experience using real data and models Develops sequential and whole-domain analysis methods from simple least-squares Contains many examples and problems, and web-based support through MIT opencourseware
Inverse Ising problem in continuous time: A latent variable approach
NASA Astrophysics Data System (ADS)
Donner, Christian; Opper, Manfred
2017-12-01
We consider the inverse Ising problem: the inference of network couplings from observed spin trajectories for a model with continuous time Glauber dynamics. By introducing two sets of auxiliary latent random variables we render the likelihood into a form which allows for simple iterative inference algorithms with analytical updates. The variables are (1) Poisson variables to linearize an exponential term which is typical for point process likelihoods and (2) Pólya-Gamma variables, which make the likelihood quadratic in the coupling parameters. Using the augmented likelihood, we derive an expectation-maximization (EM) algorithm to obtain the maximum likelihood estimate of network parameters. Using a third set of latent variables we extend the EM algorithm to sparse couplings via L1 regularization. Finally, we develop an efficient approximate Bayesian inference algorithm using a variational approach. We demonstrate the performance of our algorithms on data simulated from an Ising model. For data which are simulated from a more biologically plausible network with spiking neurons, we show that the Ising model captures well the low order statistics of the data and how the Ising couplings are related to the underlying synaptic structure of the simulated network.
The Bayesian group lasso for confounded spatial data
Hefley, Trevor J.; Hooten, Mevin B.; Hanks, Ephraim M.; Russell, Robin E.; Walsh, Daniel P.
2017-01-01
Generalized linear mixed models for spatial processes are widely used in applied statistics. In many applications of the spatial generalized linear mixed model (SGLMM), the goal is to obtain inference about regression coefficients while achieving optimal predictive ability. When implementing the SGLMM, multicollinearity among covariates and the spatial random effects can make computation challenging and influence inference. We present a Bayesian group lasso prior with a single tuning parameter that can be chosen to optimize predictive ability of the SGLMM and jointly regularize the regression coefficients and spatial random effect. We implement the group lasso SGLMM using efficient Markov chain Monte Carlo (MCMC) algorithms and demonstrate how multicollinearity among covariates and the spatial random effect can be monitored as a derived quantity. To test our method, we compared several parameterizations of the SGLMM using simulated data and two examples from plant ecology and disease ecology. In all examples, problematic levels multicollinearity occurred and influenced sampling efficiency and inference. We found that the group lasso prior resulted in roughly twice the effective sample size for MCMC samples of regression coefficients and can have higher and less variable predictive accuracy based on out-of-sample data when compared to the standard SGLMM.
Research participant compensation: A matter of statistical inference as well as ethics.
Swanson, David M; Betensky, Rebecca A
2015-11-01
The ethics of compensation of research subjects for participation in clinical trials has been debated for years. One ethical issue of concern is variation among subjects in the level of compensation for identical treatments. Surprisingly, the impact of variation on the statistical inferences made from trial results has not been examined. We seek to identify how variation in compensation may influence any existing dependent censoring in clinical trials, thereby also influencing inference about the survival curve, hazard ratio, or other measures of treatment efficacy. In simulation studies, we consider a model for how compensation structure may influence the censoring model. Under existing dependent censoring, we estimate survival curves under different compensation structures and observe how these structures induce variability in the estimates. We show through this model that if the compensation structure affects the censoring model and dependent censoring is present, then variation in that structure induces variation in the estimates and affects the accuracy of estimation and inference on treatment efficacy. From the perspectives of both ethics and statistical inference, standardization and transparency in the compensation of participants in clinical trials are warranted. Copyright © 2015 Elsevier Inc. All rights reserved.
RELATING ACCUMULATOR MODEL PARAMETERS AND NEURAL DYNAMICS
Purcell, Braden A.; Palmeri, Thomas J.
2016-01-01
Accumulator models explain decision-making as an accumulation of evidence to a response threshold. Specific model parameters are associated with specific model mechanisms, such as the time when accumulation begins, the average rate of evidence accumulation, and the threshold. These mechanisms determine both the within-trial dynamics of evidence accumulation and the predicted behavior. Cognitive modelers usually infer what mechanisms vary during decision-making by seeing what parameters vary when a model is fitted to observed behavior. The recent identification of neural activity with evidence accumulation suggests that it may be possible to directly infer what mechanisms vary from an analysis of how neural dynamics vary. However, evidence accumulation is often noisy, and noise complicates the relationship between accumulator dynamics and the underlying mechanisms leading to those dynamics. To understand what kinds of inferences can be made about decision-making mechanisms based on measures of neural dynamics, we measured simulated accumulator model dynamics while systematically varying model parameters. In some cases, decision- making mechanisms can be directly inferred from dynamics, allowing us to distinguish between models that make identical behavioral predictions. In other cases, however, different parameterized mechanisms produce surprisingly similar dynamics, limiting the inferences that can be made based on measuring dynamics alone. Analyzing neural dynamics can provide a powerful tool to resolve model mimicry at the behavioral level, but we caution against drawing inferences based solely on neural analyses. Instead, simultaneous modeling of behavior and neural dynamics provides the most powerful approach to understand decision-making and likely other aspects of cognition and perception. PMID:28392584
Phylogeography Takes a Relaxed Random Walk in Continuous Space and Time
Lemey, Philippe; Rambaut, Andrew; Welch, John J.; Suchard, Marc A.
2010-01-01
Research aimed at understanding the geographic context of evolutionary histories is burgeoning across biological disciplines. Recent endeavors attempt to interpret contemporaneous genetic variation in the light of increasingly detailed geographical and environmental observations. Such interest has promoted the development of phylogeographic inference techniques that explicitly aim to integrate such heterogeneous data. One promising development involves reconstructing phylogeographic history on a continuous landscape. Here, we present a Bayesian statistical approach to infer continuous phylogeographic diffusion using random walk models while simultaneously reconstructing the evolutionary history in time from molecular sequence data. Moreover, by accommodating branch-specific variation in dispersal rates, we relax the most restrictive assumption of the standard Brownian diffusion process and demonstrate increased statistical efficiency in spatial reconstructions of overdispersed random walks by analyzing both simulated and real viral genetic data. We further illustrate how drawing inference about summary statistics from a fully specified stochastic process over both sequence evolution and spatial movement reveals important characteristics of a rabies epidemic. Together with recent advances in discrete phylogeographic inference, the continuous model developments furnish a flexible statistical framework for biogeographical reconstructions that is easily expanded upon to accommodate various landscape genetic features. PMID:20203288
Variation in reaction norms: Statistical considerations and biological interpretation.
Morrissey, Michael B; Liefting, Maartje
2016-09-01
Analysis of reaction norms, the functions by which the phenotype produced by a given genotype depends on the environment, is critical to studying many aspects of phenotypic evolution. Different techniques are available for quantifying different aspects of reaction norm variation. We examine what biological inferences can be drawn from some of the more readily applicable analyses for studying reaction norms. We adopt a strongly biologically motivated view, but draw on statistical theory to highlight strengths and drawbacks of different techniques. In particular, consideration of some formal statistical theory leads to revision of some recently, and forcefully, advocated opinions on reaction norm analysis. We clarify what simple analysis of the slope between mean phenotype in two environments can tell us about reaction norms, explore the conditions under which polynomial regression can provide robust inferences about reaction norm shape, and explore how different existing approaches may be used to draw inferences about variation in reaction norm shape. We show how mixed model-based approaches can provide more robust inferences than more commonly used multistep statistical approaches, and derive new metrics of the relative importance of variation in reaction norm intercepts, slopes, and curvatures. © 2016 The Author(s). Evolution © 2016 The Society for the Study of Evolution.
Affective cognition: Exploring lay theories of emotion.
Ong, Desmond C; Zaki, Jamil; Goodman, Noah D
2015-10-01
Humans skillfully reason about others' emotions, a phenomenon we term affective cognition. Despite its importance, few formal, quantitative theories have described the mechanisms supporting this phenomenon. We propose that affective cognition involves applying domain-general reasoning processes to domain-specific content knowledge. Observers' knowledge about emotions is represented in rich and coherent lay theories, which comprise consistent relationships between situations, emotions, and behaviors. Observers utilize this knowledge in deciphering social agents' behavior and signals (e.g., facial expressions), in a manner similar to rational inference in other domains. We construct a computational model of a lay theory of emotion, drawing on tools from Bayesian statistics, and test this model across four experiments in which observers drew inferences about others' emotions in a simple gambling paradigm. This work makes two main contributions. First, the model accurately captures observers' flexible but consistent reasoning about the ways that events and others' emotional responses to those events relate to each other. Second, our work models the problem of emotional cue integration-reasoning about others' emotion from multiple emotional cues-as rational inference via Bayes' rule, and we show that this model tightly tracks human observers' empirical judgments. Our results reveal a deep structural relationship between affective cognition and other forms of inference, and suggest wide-ranging applications to basic psychological theory and psychiatry. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
Cusimano, Natalie; Sousa, Aretuza; Renner, Susanne S.
2012-01-01
Background and Aims For 84 years, botanists have relied on calculating the highest common factor for series of haploid chromosome numbers to arrive at a so-called basic number, x. This was done without consistent (reproducible) reference to species relationships and frequencies of different numbers in a clade. Likelihood models that treat polyploidy, chromosome fusion and fission as events with particular probabilities now allow reconstruction of ancestral chromosome numbers in an explicit framework. We have used a modelling approach to reconstruct chromosome number change in the large monocot family Araceae and to test earlier hypotheses about basic numbers in the family. Methods Using a maximum likelihood approach and chromosome counts for 26 % of the 3300 species of Araceae and representative numbers for each of the other 13 families of Alismatales, polyploidization events and single chromosome changes were inferred on a genus-level phylogenetic tree for 113 of the 117 genera of Araceae. Key Results The previously inferred basic numbers x = 14 and x = 7 are rejected. Instead, maximum likelihood optimization revealed an ancestral haploid chromosome number of n = 16, Bayesian inference of n = 18. Chromosome fusion (loss) is the predominant inferred event, whereas polyploidization events occurred less frequently and mainly towards the tips of the tree. Conclusions The bias towards low basic numbers (x) introduced by the algebraic approach to inferring chromosome number changes, prevalent among botanists, may have contributed to an unrealistic picture of ancestral chromosome numbers in many plant clades. The availability of robust quantitative methods for reconstructing ancestral chromosome numbers on molecular phylogenetic trees (with or without branch length information), with confidence statistics, makes the calculation of x an obsolete approach, at least when applied to large clades. PMID:22210850
Fisher, Charles K.; Mehta, Pankaj
2014-01-01
Human associated microbial communities exert tremendous influence over human health and disease. With modern metagenomic sequencing methods it is now possible to follow the relative abundance of microbes in a community over time. These microbial communities exhibit rich ecological dynamics and an important goal of microbial ecology is to infer the ecological interactions between species directly from sequence data. Any algorithm for inferring ecological interactions must overcome three major obstacles: 1) a correlation between the abundances of two species does not imply that those species are interacting, 2) the sum constraint on the relative abundances obtained from metagenomic studies makes it difficult to infer the parameters in timeseries models, and 3) errors due to experimental uncertainty, or mis-assignment of sequencing reads into operational taxonomic units, bias inferences of species interactions due to a statistical problem called “errors-in-variables”. Here we introduce an approach, Learning Interactions from MIcrobial Time Series (LIMITS), that overcomes these obstacles. LIMITS uses sparse linear regression with boostrap aggregation to infer a discrete-time Lotka-Volterra model for microbial dynamics. We tested LIMITS on synthetic data and showed that it could reliably infer the topology of the inter-species ecological interactions. We then used LIMITS to characterize the species interactions in the gut microbiomes of two individuals and found that the interaction networks varied significantly between individuals. Furthermore, we found that the interaction networks of the two individuals are dominated by distinct “keystone species”, Bacteroides fragilis and Bacteroided stercosis, that have a disproportionate influence on the structure of the gut microbiome even though they are only found in moderate abundance. Based on our results, we hypothesize that the abundances of certain keystone species may be responsible for individuality in the human gut microbiome. PMID:25054627
Butler, Lucas P; Tomasello, Michael
2016-05-01
Young children can in principle make generic inferences (e.g., "doffels are magnetic") on the basis of their own individual experience. Recent evidence, however, shows that by 4 years of age children make strong generic inferences on the basis of a single pedagogical demonstration with an individual (e.g., an adult demonstrates for the child that a single "doffel" is magnetic). In the current experiments, we extended this to look at younger children, investigating how the mechanisms underlying this phenomenon are integrated with other aspects of inductive inference during early development. We found that both 2- and 3-year-olds used pedagogical cues to guide such generic inferences, but only so long as the "doffel" was linguistically labeled. In a follow-up study, 3-year-olds, but not 2-year-olds, continued to make this generic inference even if the word "doffel" was uttered incidentally and non-referentially in a context preceding the pedagogical demonstration, thereby simply marking the opportunity to learn about a culturally important category. By 3 years of age, then, young children show a remarkable ability to flexibly combine different sources of culturally relevant information (e.g., linguistic labeling, pedagogy) to make the kinds of generic inferences so central in human cultural learning. Copyright © 2015 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Alexandridis, Konstantinos T.
This dissertation adopts a holistic and detailed approach to modeling spatially explicit agent-based artificial intelligent systems, using the Multi Agent-based Behavioral Economic Landscape (MABEL) model. The research questions that addresses stem from the need to understand and analyze the real-world patterns and dynamics of land use change from a coupled human-environmental systems perspective. Describes the systemic, mathematical, statistical, socio-economic and spatial dynamics of the MABEL modeling framework, and provides a wide array of cross-disciplinary modeling applications within the research, decision-making and policy domains. Establishes the symbolic properties of the MABEL model as a Markov decision process, analyzes the decision-theoretic utility and optimization attributes of agents towards comprising statistically and spatially optimal policies and actions, and explores the probabilogic character of the agents' decision-making and inference mechanisms via the use of Bayesian belief and decision networks. Develops and describes a Monte Carlo methodology for experimental replications of agent's decisions regarding complex spatial parcel acquisition and learning. Recognizes the gap on spatially-explicit accuracy assessment techniques for complex spatial models, and proposes an ensemble of statistical tools designed to address this problem. Advanced information assessment techniques such as the Receiver-Operator Characteristic curve, the impurity entropy and Gini functions, and the Bayesian classification functions are proposed. The theoretical foundation for modular Bayesian inference in spatially-explicit multi-agent artificial intelligent systems, and the ensembles of cognitive and scenario assessment modular tools build for the MABEL model are provided. Emphasizes the modularity and robustness as valuable qualitative modeling attributes, and examines the role of robust intelligent modeling as a tool for improving policy-decisions related to land use change. Finally, the major contributions to the science are presented along with valuable directions for future research.
Outlier Responses Reflect Sensitivity to Statistical Structure in the Human Brain
Garrido, Marta I.
2013-01-01
We constantly look for patterns in the environment that allow us to learn its key regularities. These regularities are fundamental in enabling us to make predictions about what is likely to happen next. The physiological study of regularity extraction has focused primarily on repetitive sequence-based rules within the sensory environment, or on stimulus-outcome associations in the context of reward-based decision-making. Here we ask whether we implicitly encode non-sequential stochastic regularities, and detect violations therein. We addressed this question using a novel experimental design and both behavioural and magnetoencephalographic (MEG) metrics associated with responses to pure-tone sounds with frequencies sampled from a Gaussian distribution. We observed that sounds in the tail of the distribution evoked a larger response than those that fell at the centre. This response resembled the mismatch negativity (MMN) evoked by surprising or unlikely events in traditional oddball paradigms. Crucially, responses to physically identical outliers were greater when the distribution was narrower. These results show that humans implicitly keep track of the uncertainty induced by apparently random distributions of sensory events. Source reconstruction suggested that the statistical-context-sensitive responses arose in a temporo-parietal network, areas that have been associated with attention orientation to unexpected events. Our results demonstrate a very early neurophysiological marker of the brain's ability to implicitly encode complex statistical structure in the environment. We suggest that this sensitivity provides a computational basis for our ability to make perceptual inferences in noisy environments and to make decisions in an uncertain world. PMID:23555230
Finding the Cause: Verbal Framing Helps Children Extract Causal Evidence Embedded in a Complex Scene
ERIC Educational Resources Information Center
Butler, Lucas P.; Markman, Ellen M.
2012-01-01
In making causal inferences, children must both identify a causal problem and selectively attend to meaningful evidence. Four experiments demonstrate that verbally framing an event ("Which animals make Lion laugh?") helps 4-year-olds extract evidence from a complex scene to make accurate causal inferences. Whereas framing was unnecessary when…
Maximum entropy PDF projection: A review
NASA Astrophysics Data System (ADS)
Baggenstoss, Paul M.
2017-06-01
We review maximum entropy (MaxEnt) PDF projection, a method with wide potential applications in statistical inference. The method constructs a sampling distribution for a high-dimensional vector x based on knowing the sampling distribution p(z) of a lower-dimensional feature z = T (x). Under mild conditions, the distribution p(x) having highest possible entropy among all distributions consistent with p(z) may be readily found. Furthermore, the MaxEnt p(x) may be sampled, making the approach useful in Monte Carlo methods. We review the theorem and present a case study in model order selection and classification for handwritten character recognition.
The Role of Probability-Based Inference in an Intelligent Tutoring System.
ERIC Educational Resources Information Center
Mislevy, Robert J.; Gitomer, Drew H.
Probability-based inference in complex networks of interdependent variables is an active topic in statistical research, spurred by such diverse applications as forecasting, pedigree analysis, troubleshooting, and medical diagnosis. This paper concerns the role of Bayesian inference networks for updating student models in intelligent tutoring…
Challenges in Species Tree Estimation Under the Multispecies Coalescent Model
Xu, Bo; Yang, Ziheng
2016-01-01
The multispecies coalescent (MSC) model has emerged as a powerful framework for inferring species phylogenies while accounting for ancestral polymorphism and gene tree-species tree conflict. A number of methods have been developed in the past few years to estimate the species tree under the MSC. The full likelihood methods (including maximum likelihood and Bayesian inference) average over the unknown gene trees and accommodate their uncertainties properly but involve intensive computation. The approximate or summary coalescent methods are computationally fast and are applicable to genomic datasets with thousands of loci, but do not make an efficient use of information in the multilocus data. Most of them take the two-step approach of reconstructing the gene trees for multiple loci by phylogenetic methods and then treating the estimated gene trees as observed data, without accounting for their uncertainties appropriately. In this article we review the statistical nature of the species tree estimation problem under the MSC, and explore the conceptual issues and challenges of species tree estimation by focusing mainly on simple cases of three or four closely related species. We use mathematical analysis and computer simulation to demonstrate that large differences in statistical performance may exist between the two classes of methods. We illustrate that several counterintuitive behaviors may occur with the summary methods but they are due to inefficient use of information in the data by summary methods and vanish when the data are analyzed using full-likelihood methods. These include (i) unidentifiability of parameters in the model, (ii) inconsistency in the so-called anomaly zone, (iii) singularity on the likelihood surface, and (iv) deterioration of performance upon addition of more data. We discuss the challenges and strategies of species tree inference for distantly related species when the molecular clock is violated, and highlight the need for improving the computational efficiency and model realism of the likelihood methods as well as the statistical efficiency of the summary methods. PMID:27927902
Chimpanzees know that others make inferences
Schmelz, Martin; Call, Josep; Tomasello, Michael
2011-01-01
If chimpanzees are faced with two opaque boards on a table, in the context of searching for a single piece of food, they do not choose the board lying flat (because if food was under there it would not be lying flat) but, rather, they choose the slanted one— presumably inferring that some unperceived food underneath is causing the slant. Here we demonstrate that chimpanzees know that other chimpanzees in the same situation will make a similar inference. In a back-and-forth foraging game, when their competitor had chosen before them, chimpanzees tended to avoid the slanted board on the assumption that the competitor had already chosen it. Chimpanzees can determine the inferences that a conspecific is likely to make and then adjust their competitive strategies accordingly. PMID:21282649
Bayesian Monte Carlo and Maximum Likelihood Approach for ...
Model uncertainty estimation and risk assessment is essential to environmental management and informed decision making on pollution mitigation strategies. In this study, we apply a probabilistic methodology, which combines Bayesian Monte Carlo simulation and Maximum Likelihood estimation (BMCML) to calibrate a lake oxygen recovery model. We first derive an analytical solution of the differential equation governing lake-averaged oxygen dynamics as a function of time-variable wind speed. Statistical inferences on model parameters and predictive uncertainty are then drawn by Bayesian conditioning of the analytical solution on observed daily wind speed and oxygen concentration data obtained from an earlier study during two recovery periods on a eutrophic lake in upper state New York. The model is calibrated using oxygen recovery data for one year and statistical inferences were validated using recovery data for another year. Compared with essentially two-step, regression and optimization approach, the BMCML results are more comprehensive and performed relatively better in predicting the observed temporal dissolved oxygen levels (DO) in the lake. BMCML also produced comparable calibration and validation results with those obtained using popular Markov Chain Monte Carlo technique (MCMC) and is computationally simpler and easier to implement than the MCMC. Next, using the calibrated model, we derive an optimal relationship between liquid film-transfer coefficien
Combining statistical inference and decisions in ecology
Williams, Perry J.; Hooten, Mevin B.
2016-01-01
Statistical decision theory (SDT) is a sub-field of decision theory that formally incorporates statistical investigation into a decision-theoretic framework to account for uncertainties in a decision problem. SDT provides a unifying analysis of three types of information: statistical results from a data set, knowledge of the consequences of potential choices (i.e., loss), and prior beliefs about a system. SDT links the theoretical development of a large body of statistical methods including point estimation, hypothesis testing, and confidence interval estimation. The theory and application of SDT have mainly been developed and published in the fields of mathematics, statistics, operations research, and other decision sciences, but have had limited exposure in ecology. Thus, we provide an introduction to SDT for ecologists and describe its utility for linking the conventionally separate tasks of statistical investigation and decision making in a single framework. We describe the basic framework of both Bayesian and frequentist SDT, its traditional use in statistics, and discuss its application to decision problems that occur in ecology. We demonstrate SDT with two types of decisions: Bayesian point estimation, and an applied management problem of selecting a prescribed fire rotation for managing a grassland bird species. Central to SDT, and decision theory in general, are loss functions. Thus, we also provide basic guidance and references for constructing loss functions for an SDT problem.
ERIC Educational Resources Information Center
Freed, Jenny; Cain, Kate
2017-01-01
Background: Comprehension is critical for classroom learning and educational success. Inferences are integral to good comprehension: successful comprehension requires the listener to generate local coherence inferences, which involve integrating information between clauses, and global coherence inferences, which involve integrating textual…
PyClone: statistical inference of clonal population structure in cancer.
Roth, Andrew; Khattra, Jaswinder; Yap, Damian; Wan, Adrian; Laks, Emma; Biele, Justina; Ha, Gavin; Aparicio, Samuel; Bouchard-Côté, Alexandre; Shah, Sohrab P
2014-04-01
We introduce PyClone, a statistical model for inference of clonal population structures in cancers. PyClone is a Bayesian clustering method for grouping sets of deeply sequenced somatic mutations into putative clonal clusters while estimating their cellular prevalences and accounting for allelic imbalances introduced by segmental copy-number changes and normal-cell contamination. Single-cell sequencing validation demonstrates PyClone's accuracy.
Statistical Signal Models and Algorithms for Image Analysis
1984-10-25
In this report, two-dimensional stochastic linear models are used in developing algorithms for image analysis such as classification, segmentation, and object detection in images characterized by textured backgrounds. These models generate two-dimensional random processes as outputs to which statistical inference procedures can naturally be applied. A common thread throughout our algorithms is the interpretation of the inference procedures in terms of linear prediction
Nabi, Razieh; Shpitser, Ilya
2017-01-01
In this paper, we consider the problem of fair statistical inference involving outcome variables. Examples include classification and regression problems, and estimating treatment effects in randomized trials or observational data. The issue of fairness arises in such problems where some covariates or treatments are “sensitive,” in the sense of having potential of creating discrimination. In this paper, we argue that the presence of discrimination can be formalized in a sensible way as the presence of an effect of a sensitive covariate on the outcome along certain causal pathways, a view which generalizes (Pearl 2009). A fair outcome model can then be learned by solving a constrained optimization problem. We discuss a number of complications that arise in classical statistical inference due to this view and provide workarounds based on recent work in causal and semi-parametric inference.
Long-Branch Attraction Bias and Inconsistency in Bayesian Phylogenetics
Kolaczkowski, Bryan; Thornton, Joseph W.
2009-01-01
Bayesian inference (BI) of phylogenetic relationships uses the same probabilistic models of evolution as its precursor maximum likelihood (ML), so BI has generally been assumed to share ML's desirable statistical properties, such as largely unbiased inference of topology given an accurate model and increasingly reliable inferences as the amount of data increases. Here we show that BI, unlike ML, is biased in favor of topologies that group long branches together, even when the true model and prior distributions of evolutionary parameters over a group of phylogenies are known. Using experimental simulation studies and numerical and mathematical analyses, we show that this bias becomes more severe as more data are analyzed, causing BI to infer an incorrect tree as the maximum a posteriori phylogeny with asymptotically high support as sequence length approaches infinity. BI's long branch attraction bias is relatively weak when the true model is simple but becomes pronounced when sequence sites evolve heterogeneously, even when this complexity is incorporated in the model. This bias—which is apparent under both controlled simulation conditions and in analyses of empirical sequence data—also makes BI less efficient and less robust to the use of an incorrect evolutionary model than ML. Surprisingly, BI's bias is caused by one of the method's stated advantages—that it incorporates uncertainty about branch lengths by integrating over a distribution of possible values instead of estimating them from the data, as ML does. Our findings suggest that trees inferred using BI should be interpreted with caution and that ML may be a more reliable framework for modern phylogenetic analysis. PMID:20011052
Long-branch attraction bias and inconsistency in Bayesian phylogenetics.
Kolaczkowski, Bryan; Thornton, Joseph W
2009-12-09
Bayesian inference (BI) of phylogenetic relationships uses the same probabilistic models of evolution as its precursor maximum likelihood (ML), so BI has generally been assumed to share ML's desirable statistical properties, such as largely unbiased inference of topology given an accurate model and increasingly reliable inferences as the amount of data increases. Here we show that BI, unlike ML, is biased in favor of topologies that group long branches together, even when the true model and prior distributions of evolutionary parameters over a group of phylogenies are known. Using experimental simulation studies and numerical and mathematical analyses, we show that this bias becomes more severe as more data are analyzed, causing BI to infer an incorrect tree as the maximum a posteriori phylogeny with asymptotically high support as sequence length approaches infinity. BI's long branch attraction bias is relatively weak when the true model is simple but becomes pronounced when sequence sites evolve heterogeneously, even when this complexity is incorporated in the model. This bias--which is apparent under both controlled simulation conditions and in analyses of empirical sequence data--also makes BI less efficient and less robust to the use of an incorrect evolutionary model than ML. Surprisingly, BI's bias is caused by one of the method's stated advantages--that it incorporates uncertainty about branch lengths by integrating over a distribution of possible values instead of estimating them from the data, as ML does. Our findings suggest that trees inferred using BI should be interpreted with caution and that ML may be a more reliable framework for modern phylogenetic analysis.
Statistical inference of the generation probability of T-cell receptors from sequence repertoires.
Murugan, Anand; Mora, Thierry; Walczak, Aleksandra M; Callan, Curtis G
2012-10-02
Stochastic rearrangement of germline V-, D-, and J-genes to create variable coding sequence for certain cell surface receptors is at the origin of immune system diversity. This process, known as "VDJ recombination", is implemented via a series of stochastic molecular events involving gene choices and random nucleotide insertions between, and deletions from, genes. We use large sequence repertoires of the variable CDR3 region of human CD4+ T-cell receptor beta chains to infer the statistical properties of these basic biochemical events. Because any given CDR3 sequence can be produced in multiple ways, the probability distribution of hidden recombination events cannot be inferred directly from the observed sequences; we therefore develop a maximum likelihood inference method to achieve this end. To separate the properties of the molecular rearrangement mechanism from the effects of selection, we focus on nonproductive CDR3 sequences in T-cell DNA. We infer the joint distribution of the various generative events that occur when a new T-cell receptor gene is created. We find a rich picture of correlation (and absence thereof), providing insight into the molecular mechanisms involved. The generative event statistics are consistent between individuals, suggesting a universal biochemical process. Our probabilistic model predicts the generation probability of any specific CDR3 sequence by the primitive recombination process, allowing us to quantify the potential diversity of the T-cell repertoire and to understand why some sequences are shared between individuals. We argue that the use of formal statistical inference methods, of the kind presented in this paper, will be essential for quantitative understanding of the generation and evolution of diversity in the adaptive immune system.
Zhang, Shujun
2018-01-01
Genome-wide association studies (GWASs) have identified many disease associated loci, the majority of which have unknown biological functions. Understanding the mechanism underlying trait associations requires identifying trait-relevant tissues and investigating associations in a trait-specific fashion. Here, we extend the widely used linear mixed model to incorporate multiple SNP functional annotations from omics studies with GWAS summary statistics to facilitate the identification of trait-relevant tissues, with which to further construct powerful association tests. Specifically, we rely on a generalized estimating equation based algorithm for parameter inference, a mixture modeling framework for trait-tissue relevance classification, and a weighted sequence kernel association test constructed based on the identified trait-relevant tissues for powerful association analysis. We refer to our analytic procedure as the Scalable Multiple Annotation integration for trait-Relevant Tissue identification and usage (SMART). With extensive simulations, we show how our method can make use of multiple complementary annotations to improve the accuracy for identifying trait-relevant tissues. In addition, our procedure allows us to make use of the inferred trait-relevant tissues, for the first time, to construct more powerful SNP set tests. We apply our method for an in-depth analysis of 43 traits from 28 GWASs using tissue-specific annotations in 105 tissues derived from ENCODE and Roadmap. Our results reveal new trait-tissue relevance, pinpoint important annotations that are informative of trait-tissue relationship, and illustrate how we can use the inferred trait-relevant tissues to construct more powerful association tests in the Wellcome trust case control consortium study. PMID:29377896
Emmert-Streib, Frank; Glazko, Galina V.; Altay, Gökmen; de Matos Simoes, Ricardo
2012-01-01
In this paper, we present a systematic and conceptual overview of methods for inferring gene regulatory networks from observational gene expression data. Further, we discuss two classic approaches to infer causal structures and compare them with contemporary methods by providing a conceptual categorization thereof. We complement the above by surveying global and local evaluation measures for assessing the performance of inference algorithms. PMID:22408642
Information Entropy Production of Maximum Entropy Markov Chains from Spike Trains
NASA Astrophysics Data System (ADS)
Cofré, Rodrigo; Maldonado, Cesar
2018-01-01
We consider the maximum entropy Markov chain inference approach to characterize the collective statistics of neuronal spike trains, focusing on the statistical properties of the inferred model. We review large deviations techniques useful in this context to describe properties of accuracy and convergence in terms of sampling size. We use these results to study the statistical fluctuation of correlations, distinguishability and irreversibility of maximum entropy Markov chains. We illustrate these applications using simple examples where the large deviation rate function is explicitly obtained for maximum entropy models of relevance in this field.
Markov modulated Poisson process models incorporating covariates for rainfall intensity.
Thayakaran, R; Ramesh, N I
2013-01-01
Time series of rainfall bucket tip times at the Beaufort Park station, Bracknell, in the UK are modelled by a class of Markov modulated Poisson processes (MMPP) which may be thought of as a generalization of the Poisson process. Our main focus in this paper is to investigate the effects of including covariate information into the MMPP model framework on statistical properties. In particular, we look at three types of time-varying covariates namely temperature, sea level pressure, and relative humidity that are thought to be affecting the rainfall arrival process. Maximum likelihood estimation is used to obtain the parameter estimates, and likelihood ratio tests are employed in model comparison. Simulated data from the fitted model are used to make statistical inferences about the accumulated rainfall in the discrete time interval. Variability of the daily Poisson arrival rates is studied.
Estimating trends in the global mean temperature record
NASA Astrophysics Data System (ADS)
Poppick, Andrew; Moyer, Elisabeth J.; Stein, Michael L.
2017-06-01
Given uncertainties in physical theory and numerical climate simulations, the historical temperature record is often used as a source of empirical information about climate change. Many historical trend analyses appear to de-emphasize physical and statistical assumptions: examples include regression models that treat time rather than radiative forcing as the relevant covariate, and time series methods that account for internal variability in nonparametric rather than parametric ways. However, given a limited data record and the presence of internal variability, estimating radiatively forced temperature trends in the historical record necessarily requires some assumptions. Ostensibly empirical methods can also involve an inherent conflict in assumptions: they require data records that are short enough for naive trend models to be applicable, but long enough for long-timescale internal variability to be accounted for. In the context of global mean temperatures, empirical methods that appear to de-emphasize assumptions can therefore produce misleading inferences, because the trend over the twentieth century is complex and the scale of temporal correlation is long relative to the length of the data record. We illustrate here how a simple but physically motivated trend model can provide better-fitting and more broadly applicable trend estimates and can allow for a wider array of questions to be addressed. In particular, the model allows one to distinguish, within a single statistical framework, between uncertainties in the shorter-term vs. longer-term response to radiative forcing, with implications not only on historical trends but also on uncertainties in future projections. We also investigate the consequence on inferred uncertainties of the choice of a statistical description of internal variability. While nonparametric methods may seem to avoid making explicit assumptions, we demonstrate how even misspecified parametric statistical methods, if attuned to the important characteristics of internal variability, can result in more accurate uncertainty statements about trends.
Emura, Takeshi; Konno, Yoshihiko; Michimae, Hirofumi
2015-07-01
Doubly truncated data consist of samples whose observed values fall between the right- and left- truncation limits. With such samples, the distribution function of interest is estimated using the nonparametric maximum likelihood estimator (NPMLE) that is obtained through a self-consistency algorithm. Owing to the complicated asymptotic distribution of the NPMLE, the bootstrap method has been suggested for statistical inference. This paper proposes a closed-form estimator for the asymptotic covariance function of the NPMLE, which is computationally attractive alternative to bootstrapping. Furthermore, we develop various statistical inference procedures, such as confidence interval, goodness-of-fit tests, and confidence bands to demonstrate the usefulness of the proposed covariance estimator. Simulations are performed to compare the proposed method with both the bootstrap and jackknife methods. The methods are illustrated using the childhood cancer dataset.
NASA Technical Reports Server (NTRS)
Xu, Kuan-Man
2006-01-01
A new method is proposed to compare statistical differences between summary histograms, which are the histograms summed over a large ensemble of individual histograms. It consists of choosing a distance statistic for measuring the difference between summary histograms and using a bootstrap procedure to calculate the statistical significance level. Bootstrapping is an approach to statistical inference that makes few assumptions about the underlying probability distribution that describes the data. Three distance statistics are compared in this study. They are the Euclidean distance, the Jeffries-Matusita distance and the Kuiper distance. The data used in testing the bootstrap method are satellite measurements of cloud systems called cloud objects. Each cloud object is defined as a contiguous region/patch composed of individual footprints or fields of view. A histogram of measured values over footprints is generated for each parameter of each cloud object and then summary histograms are accumulated over all individual histograms in a given cloud-object size category. The results of statistical hypothesis tests using all three distances as test statistics are generally similar, indicating the validity of the proposed method. The Euclidean distance is determined to be most suitable after comparing the statistical tests of several parameters with distinct probability distributions among three cloud-object size categories. Impacts on the statistical significance levels resulting from differences in the total lengths of satellite footprint data between two size categories are also discussed.
Lu, Qiongshi; Hu, Yiming; Sun, Jiehuan; Cheng, Yuwei; Cheung, Kei-Hoi; Zhao, Hongyu
2015-05-27
Identifying functional regions in the human genome is a major goal in human genetics. Great efforts have been made to functionally annotate the human genome either through computational predictions, such as genomic conservation, or high-throughput experiments, such as the ENCODE project. These efforts have resulted in a rich collection of functional annotation data of diverse types that need to be jointly analyzed for integrated interpretation and annotation. Here we present GenoCanyon, a whole-genome annotation method that performs unsupervised statistical learning using 22 computational and experimental annotations thereby inferring the functional potential of each position in the human genome. With GenoCanyon, we are able to predict many of the known functional regions. The ability of predicting functional regions as well as its generalizable statistical framework makes GenoCanyon a unique and powerful tool for whole-genome annotation. The GenoCanyon web server is available at http://genocanyon.med.yale.edu.
Statistically optimal perception and learning: from behavior to neural representations
Fiser, József; Berkes, Pietro; Orbán, Gergő; Lengyel, Máté
2010-01-01
Human perception has recently been characterized as statistical inference based on noisy and ambiguous sensory inputs. Moreover, suitable neural representations of uncertainty have been identified that could underlie such probabilistic computations. In this review, we argue that learning an internal model of the sensory environment is another key aspect of the same statistical inference procedure and thus perception and learning need to be treated jointly. We review evidence for statistically optimal learning in humans and animals, and reevaluate possible neural representations of uncertainty based on their potential to support statistically optimal learning. We propose that spontaneous activity can have a functional role in such representations leading to a new, sampling-based, framework of how the cortex represents information and uncertainty. PMID:20153683
On the analysis of very small samples of Gaussian repeated measurements: an alternative approach.
Westgate, Philip M; Burchett, Woodrow W
2017-03-15
The analysis of very small samples of Gaussian repeated measurements can be challenging. First, due to a very small number of independent subjects contributing outcomes over time, statistical power can be quite small. Second, nuisance covariance parameters must be appropriately accounted for in the analysis in order to maintain the nominal test size. However, available statistical strategies that ensure valid statistical inference may lack power, whereas more powerful methods may have the potential for inflated test sizes. Therefore, we explore an alternative approach to the analysis of very small samples of Gaussian repeated measurements, with the goal of maintaining valid inference while also improving statistical power relative to other valid methods. This approach uses generalized estimating equations with a bias-corrected empirical covariance matrix that accounts for all small-sample aspects of nuisance correlation parameter estimation in order to maintain valid inference. Furthermore, the approach utilizes correlation selection strategies with the goal of choosing the working structure that will result in the greatest power. In our study, we show that when accurate modeling of the nuisance correlation structure impacts the efficiency of regression parameter estimation, this method can improve power relative to existing methods that yield valid inference. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Implementation of the Iterative Proportion Fitting Algorithm for Geostatistical Facies Modeling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li Yupeng, E-mail: yupeng@ualberta.ca; Deutsch, Clayton V.
2012-06-15
In geostatistics, most stochastic algorithm for simulation of categorical variables such as facies or rock types require a conditional probability distribution. The multivariate probability distribution of all the grouped locations including the unsampled location permits calculation of the conditional probability directly based on its definition. In this article, the iterative proportion fitting (IPF) algorithm is implemented to infer this multivariate probability. Using the IPF algorithm, the multivariate probability is obtained by iterative modification to an initial estimated multivariate probability using lower order bivariate probabilities as constraints. The imposed bivariate marginal probabilities are inferred from profiles along drill holes or wells.more » In the IPF process, a sparse matrix is used to calculate the marginal probabilities from the multivariate probability, which makes the iterative fitting more tractable and practical. This algorithm can be extended to higher order marginal probability constraints as used in multiple point statistics. The theoretical framework is developed and illustrated with estimation and simulation example.« less
On the effect of subliminal priming on subjective perception of images: a machine learning approach.
Kumar, Parmod; Mahmood, Faisal; Mohan, Dhanya Menoth; Wong, Ken; Agrawal, Abhishek; Elgendi, Mohamed; Shukla, Rohit; Dauwels, Justin; Chan, Alice H D
2014-01-01
The research presented in this article investigates the influence of subliminal prime words on peoples' judgment about images, through electroencephalograms (EEGs). In this cross domain priming paradigm, the participants are asked to rate how much they like the stimulus images, on a 7-point Likert scale, after being subliminally exposed to masked lexical prime words, with EEG recorded simultaneously. Statistical analysis tools are used to analyze the effect of priming on behavior, and machine learning techniques to infer the primes from EEGs. The experiment reveals strong effects of subliminal priming on the participants' explicit rating of images. The subjective judgment affected by the priming makes visible change in event-related potentials (ERPs); results show larger ERP amplitude for the negative primes compared with positive and neutral primes. In addition, Support Vector Machine (SVM) based classifiers are proposed to infer the prime types from the average ERPs, which yields a classification rate of 70%.
The logical foundations of forensic science: towards reliable knowledge
Evett, Ian
2015-01-01
The generation of observations is a technical process and the advances that have been made in forensic science techniques over the last 50 years have been staggering. But science is about reasoning—about making sense from observations. For the forensic scientist, this is the challenge of interpreting a pattern of observations within the context of a legal trial. Here too, there have been major advances over recent years and there is a broad consensus among serious thinkers, both scientific and legal, that the logical framework is furnished by Bayesian inference (Aitken et al. Fundamentals of Probability and Statistical Evidence in Criminal Proceedings). This paper shows how the paradigm has matured, centred on the notion of the balanced scientist. Progress through the courts has not been always smooth and difficulties arising from recent judgments are discussed. Nevertheless, the future holds exciting prospects, in particular the opportunities for managing and calibrating the knowledge of the forensic scientists who assign the probabilities that are at the foundation of logical inference in the courtroom. PMID:26101288
Morgan-Lopez, Antonio A.; Fals-Stewart, William
2015-01-01
Historically, difficulties in analyzing treatment outcome data from open enrollment groups have led to their avoidance in use in federally-funded treatment trials, despite the fact that 79% of treatment programs use open enrollment groups. Recently, latent class pattern mixture models (LCPMM) have shown promise as a defensible approach for making overall (and attendance class-specific) inferences from open enrollment groups with membership turnover. We present a statistical simulation study comparing LCPMMs to longitudinal growth models (LGM) to understand when both frameworks are likely to produce conflicting inferences concerning overall treatment efficacy. LCPMMs performed well under all conditions examined; meanwhile LGMs produced problematic levels of bias and Type I errors under two joint conditions: moderate-to-high dropout (30–50%) and treatment by attendance class interactions exceeding Cohen's d ≈.2. This study highlights key concerns about using LGM for open enrollment data: treatment effect overestimation and advocacy for treatments that may be ineffective in reality. PMID:18513917
Data Analysis with Graphical Models: Software Tools
NASA Technical Reports Server (NTRS)
Buntine, Wray L.
1994-01-01
Probabilistic graphical models (directed and undirected Markov fields, and combined in chain graphs) are used widely in expert systems, image processing and other areas as a framework for representing and reasoning with probabilities. They come with corresponding algorithms for performing probabilistic inference. This paper discusses an extension to these models by Spiegelhalter and Gilks, plates, used to graphically model the notion of a sample. This offers a graphical specification language for representing data analysis problems. When combined with general methods for statistical inference, this also offers a unifying framework for prototyping and/or generating data analysis algorithms from graphical specifications. This paper outlines the framework and then presents some basic tools for the task: a graphical version of the Pitman-Koopman Theorem for the exponential family, problem decomposition, and the calculation of exact Bayes factors. Other tools already developed, such as automatic differentiation, Gibbs sampling, and use of the EM algorithm, make this a broad basis for the generation of data analysis software.
An Adaptive Handover Prediction Scheme for Seamless Mobility Based Wireless Networks
Safa Sadiq, Ali; Fisal, Norsheila Binti; Ghafoor, Kayhan Zrar; Lloret, Jaime
2014-01-01
We propose an adaptive handover prediction (AHP) scheme for seamless mobility based wireless networks. That is, the AHP scheme incorporates fuzzy logic with AP prediction process in order to lend cognitive capability to handover decision making. Selection metrics, including received signal strength, mobile node relative direction towards the access points in the vicinity, and access point load, are collected and considered inputs of the fuzzy decision making system in order to select the best preferable AP around WLANs. The obtained handover decision which is based on the calculated quality cost using fuzzy inference system is also based on adaptable coefficients instead of fixed coefficients. In other words, the mean and the standard deviation of the normalized network prediction metrics of fuzzy inference system, which are collected from available WLANs are obtained adaptively. Accordingly, they are applied as statistical information to adjust or adapt the coefficients of membership functions. In addition, we propose an adjustable weight vector concept for input metrics in order to cope with the continuous, unpredictable variation in their membership degrees. Furthermore, handover decisions are performed in each MN independently after knowing RSS, direction toward APs, and AP load. Finally, performance evaluation of the proposed scheme shows its superiority compared with representatives of the prediction approaches. PMID:25574490
An adaptive handover prediction scheme for seamless mobility based wireless networks.
Sadiq, Ali Safa; Fisal, Norsheila Binti; Ghafoor, Kayhan Zrar; Lloret, Jaime
2014-01-01
We propose an adaptive handover prediction (AHP) scheme for seamless mobility based wireless networks. That is, the AHP scheme incorporates fuzzy logic with AP prediction process in order to lend cognitive capability to handover decision making. Selection metrics, including received signal strength, mobile node relative direction towards the access points in the vicinity, and access point load, are collected and considered inputs of the fuzzy decision making system in order to select the best preferable AP around WLANs. The obtained handover decision which is based on the calculated quality cost using fuzzy inference system is also based on adaptable coefficients instead of fixed coefficients. In other words, the mean and the standard deviation of the normalized network prediction metrics of fuzzy inference system, which are collected from available WLANs are obtained adaptively. Accordingly, they are applied as statistical information to adjust or adapt the coefficients of membership functions. In addition, we propose an adjustable weight vector concept for input metrics in order to cope with the continuous, unpredictable variation in their membership degrees. Furthermore, handover decisions are performed in each MN independently after knowing RSS, direction toward APs, and AP load. Finally, performance evaluation of the proposed scheme shows its superiority compared with representatives of the prediction approaches.
Probabilistic Graphical Model Representation in Phylogenetics
Höhna, Sebastian; Heath, Tracy A.; Boussau, Bastien; Landis, Michael J.; Ronquist, Fredrik; Huelsenbeck, John P.
2014-01-01
Recent years have seen a rapid expansion of the model space explored in statistical phylogenetics, emphasizing the need for new approaches to statistical model representation and software development. Clear communication and representation of the chosen model is crucial for: (i) reproducibility of an analysis, (ii) model development, and (iii) software design. Moreover, a unified, clear and understandable framework for model representation lowers the barrier for beginners and nonspecialists to grasp complex phylogenetic models, including their assumptions and parameter/variable dependencies. Graphical modeling is a unifying framework that has gained in popularity in the statistical literature in recent years. The core idea is to break complex models into conditionally independent distributions. The strength lies in the comprehensibility, flexibility, and adaptability of this formalism, and the large body of computational work based on it. Graphical models are well-suited to teach statistical models, to facilitate communication among phylogeneticists and in the development of generic software for simulation and statistical inference. Here, we provide an introduction to graphical models for phylogeneticists and extend the standard graphical model representation to the realm of phylogenetics. We introduce a new graphical model component, tree plates, to capture the changing structure of the subgraph corresponding to a phylogenetic tree. We describe a range of phylogenetic models using the graphical model framework and introduce modules to simplify the representation of standard components in large and complex models. Phylogenetic model graphs can be readily used in simulation, maximum likelihood inference, and Bayesian inference using, for example, Metropolis–Hastings or Gibbs sampling of the posterior distribution. [Computation; graphical models; inference; modularization; statistical phylogenetics; tree plate.] PMID:24951559
Protein and gene model inference based on statistical modeling in k-partite graphs.
Gerster, Sarah; Qeli, Ermir; Ahrens, Christian H; Bühlmann, Peter
2010-07-06
One of the major goals of proteomics is the comprehensive and accurate description of a proteome. Shotgun proteomics, the method of choice for the analysis of complex protein mixtures, requires that experimentally observed peptides are mapped back to the proteins they were derived from. This process is also known as protein inference. We present Markovian Inference of Proteins and Gene Models (MIPGEM), a statistical model based on clearly stated assumptions to address the problem of protein and gene model inference for shotgun proteomics data. In particular, we are dealing with dependencies among peptides and proteins using a Markovian assumption on k-partite graphs. We are also addressing the problems of shared peptides and ambiguous proteins by scoring the encoding gene models. Empirical results on two control datasets with synthetic mixtures of proteins and on complex protein samples of Saccharomyces cerevisiae, Drosophila melanogaster, and Arabidopsis thaliana suggest that the results with MIPGEM are competitive with existing tools for protein inference.
Meyer, Patrick E; Lafitte, Frédéric; Bontempi, Gianluca
2008-10-29
This paper presents the R/Bioconductor package minet (version 1.1.6) which provides a set of functions to infer mutual information networks from a dataset. Once fed with a microarray dataset, the package returns a network where nodes denote genes, edges model statistical dependencies between genes and the weight of an edge quantifies the statistical evidence of a specific (e.g transcriptional) gene-to-gene interaction. Four different entropy estimators are made available in the package minet (empirical, Miller-Madow, Schurmann-Grassberger and shrink) as well as four different inference methods, namely relevance networks, ARACNE, CLR and MRNET. Also, the package integrates accuracy assessment tools, like F-scores, PR-curves and ROC-curves in order to compare the inferred network with a reference one. The package minet provides a series of tools for inferring transcriptional networks from microarray data. It is freely available from the Comprehensive R Archive Network (CRAN) as well as from the Bioconductor website.
Computational statistics using the Bayesian Inference Engine
NASA Astrophysics Data System (ADS)
Weinberg, Martin D.
2013-09-01
This paper introduces the Bayesian Inference Engine (BIE), a general parallel, optimized software package for parameter inference and model selection. This package is motivated by the analysis needs of modern astronomical surveys and the need to organize and reuse expensive derived data. The BIE is the first platform for computational statistics designed explicitly to enable Bayesian update and model comparison for astronomical problems. Bayesian update is based on the representation of high-dimensional posterior distributions using metric-ball-tree based kernel density estimation. Among its algorithmic offerings, the BIE emphasizes hybrid tempered Markov chain Monte Carlo schemes that robustly sample multimodal posterior distributions in high-dimensional parameter spaces. Moreover, the BIE implements a full persistence or serialization system that stores the full byte-level image of the running inference and previously characterized posterior distributions for later use. Two new algorithms to compute the marginal likelihood from the posterior distribution, developed for and implemented in the BIE, enable model comparison for complex models and data sets. Finally, the BIE was designed to be a collaborative platform for applying Bayesian methodology to astronomy. It includes an extensible object-oriented and easily extended framework that implements every aspect of the Bayesian inference. By providing a variety of statistical algorithms for all phases of the inference problem, a scientist may explore a variety of approaches with a single model and data implementation. Additional technical details and download details are available from http://www.astro.umass.edu/bie. The BIE is distributed under the GNU General Public License.
A review of causal inference for biomedical informatics
Kleinberg, Samantha; Hripcsak, George
2011-01-01
Causality is an important concept throughout the health sciences and is particularly vital for informatics work such as finding adverse drug events or risk factors for disease using electronic health records. While philosophers and scientists working for centuries on formalizing what makes something a cause have not reached a consensus, new methods for inference show that we can make progress in this area in many practical cases. This article reviews core concepts in understanding and identifying causality and then reviews current computational methods for inference and explanation, focusing on inference from large-scale observational data. While the problem is not fully solved, we show that graphical models and Granger causality provide useful frameworks for inference and that a more recent approach based on temporal logic addresses some of the limitations of these methods. PMID:21782035
Teach a Confidence Interval for the Median in the First Statistics Course
ERIC Educational Resources Information Center
Howington, Eric B.
2017-01-01
Few introductory statistics courses consider statistical inference for the median. This article argues in favour of adding a confidence interval for the median to the first statistics course. Several methods suitable for introductory statistics students are identified and briefly reviewed.
Höhna, Sebastian; Landis, Michael J.
2016-01-01
Programs for Bayesian inference of phylogeny currently implement a unique and fixed suite of models. Consequently, users of these software packages are simultaneously forced to use a number of programs for a given study, while also lacking the freedom to explore models that have not been implemented by the developers of those programs. We developed a new open-source software package, RevBayes, to address these problems. RevBayes is entirely based on probabilistic graphical models, a powerful generic framework for specifying and analyzing statistical models. Phylogenetic-graphical models can be specified interactively in RevBayes, piece by piece, using a new succinct and intuitive language called Rev. Rev is similar to the R language and the BUGS model-specification language, and should be easy to learn for most users. The strength of RevBayes is the simplicity with which one can design, specify, and implement new and complex models. Fortunately, this tremendous flexibility does not come at the cost of slower computation; as we demonstrate, RevBayes outperforms competing software for several standard analyses. Compared with other programs, RevBayes has fewer black-box elements. Users need to explicitly specify each part of the model and analysis. Although this explicitness may initially be unfamiliar, we are convinced that this transparency will improve understanding of phylogenetic models in our field. Moreover, it will motivate the search for improvements to existing methods by brazenly exposing the model choices that we make to critical scrutiny. RevBayes is freely available at http://www.RevBayes.com. [Bayesian inference; Graphical models; MCMC; statistical phylogenetics.] PMID:27235697
NASA Astrophysics Data System (ADS)
Shafii, M.; Tolson, B.; Matott, L. S.
2012-04-01
Hydrologic modeling has benefited from significant developments over the past two decades. This has resulted in building of higher levels of complexity into hydrologic models, which eventually makes the model evaluation process (parameter estimation via calibration and uncertainty analysis) more challenging. In order to avoid unreasonable parameter estimates, many researchers have suggested implementation of multi-criteria calibration schemes. Furthermore, for predictive hydrologic models to be useful, proper consideration of uncertainty is essential. Consequently, recent research has emphasized comprehensive model assessment procedures in which multi-criteria parameter estimation is combined with statistically-based uncertainty analysis routines such as Bayesian inference using Markov Chain Monte Carlo (MCMC) sampling. Such a procedure relies on the use of formal likelihood functions based on statistical assumptions, and moreover, the Bayesian inference structured on MCMC samplers requires a considerably large number of simulations. Due to these issues, especially in complex non-linear hydrological models, a variety of alternative informal approaches have been proposed for uncertainty analysis in the multi-criteria context. This study aims at exploring a number of such informal uncertainty analysis techniques in multi-criteria calibration of hydrological models. The informal methods addressed in this study are (i) Pareto optimality which quantifies the parameter uncertainty using the Pareto solutions, (ii) DDS-AU which uses the weighted sum of objective functions to derive the prediction limits, and (iii) GLUE which describes the total uncertainty through identification of behavioral solutions. The main objective is to compare such methods with MCMC-based Bayesian inference with respect to factors such as computational burden, and predictive capacity, which are evaluated based on multiple comparative measures. The measures for comparison are calculated both for calibration and evaluation periods. The uncertainty analysis methodologies are applied to a simple 5-parameter rainfall-runoff model, called HYMOD.
Höhna, Sebastian; Landis, Michael J; Heath, Tracy A; Boussau, Bastien; Lartillot, Nicolas; Moore, Brian R; Huelsenbeck, John P; Ronquist, Fredrik
2016-07-01
Programs for Bayesian inference of phylogeny currently implement a unique and fixed suite of models. Consequently, users of these software packages are simultaneously forced to use a number of programs for a given study, while also lacking the freedom to explore models that have not been implemented by the developers of those programs. We developed a new open-source software package, RevBayes, to address these problems. RevBayes is entirely based on probabilistic graphical models, a powerful generic framework for specifying and analyzing statistical models. Phylogenetic-graphical models can be specified interactively in RevBayes, piece by piece, using a new succinct and intuitive language called Rev. Rev is similar to the R language and the BUGS model-specification language, and should be easy to learn for most users. The strength of RevBayes is the simplicity with which one can design, specify, and implement new and complex models. Fortunately, this tremendous flexibility does not come at the cost of slower computation; as we demonstrate, RevBayes outperforms competing software for several standard analyses. Compared with other programs, RevBayes has fewer black-box elements. Users need to explicitly specify each part of the model and analysis. Although this explicitness may initially be unfamiliar, we are convinced that this transparency will improve understanding of phylogenetic models in our field. Moreover, it will motivate the search for improvements to existing methods by brazenly exposing the model choices that we make to critical scrutiny. RevBayes is freely available at http://www.RevBayes.com [Bayesian inference; Graphical models; MCMC; statistical phylogenetics.]. © The Author(s) 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.
A Comparative Study of Exact versus Propensity Matching Techniques Using Monte Carlo Simulation
ERIC Educational Resources Information Center
Itang'ata, Mukaria J. J.
2013-01-01
Often researchers face situations where comparative studies between two or more programs are necessary to make causal inferences for informed policy decision-making. Experimental designs employing randomization provide the strongest evidence for causal inferences. However, many pragmatic and ethical challenges may preclude the use of randomized…
A quantum probability account of order effects in inference.
Trueblood, Jennifer S; Busemeyer, Jerome R
2011-01-01
Order of information plays a crucial role in the process of updating beliefs across time. In fact, the presence of order effects makes a classical or Bayesian approach to inference difficult. As a result, the existing models of inference, such as the belief-adjustment model, merely provide an ad hoc explanation for these effects. We postulate a quantum inference model for order effects based on the axiomatic principles of quantum probability theory. The quantum inference model explains order effects by transforming a state vector with different sequences of operators for different orderings of information. We demonstrate this process by fitting the quantum model to data collected in a medical diagnostic task and a jury decision-making task. To further test the quantum inference model, a new jury decision-making experiment is developed. Using the results of this experiment, we compare the quantum inference model with two versions of the belief-adjustment model, the adding model and the averaging model. We show that both the quantum model and the adding model provide good fits to the data. To distinguish the quantum model from the adding model, we develop a new experiment involving extreme evidence. The results from this new experiment suggest that the adding model faces limitations when accounting for tasks involving extreme evidence, whereas the quantum inference model does not. Ultimately, we argue that the quantum model provides a more coherent account for order effects that was not possible before. Copyright © 2011 Cognitive Science Society, Inc.
Less is more: The availability heuristic in early childhood.
Geurten, Marie; Willems, Sylvie; Germain, Sophie; Meulemans, Thierry
2015-11-01
This study examined whether young children are influenced by the subjective experience associated with an easy or difficult recall when making memory decisions. Seventy-one children, aged 4, 6, and 8 years, were asked to generate either a small (easy condition) or large (hard condition) number of first names. Statistical analyses revealed that participants in the hard condition were more likely to infer that they did not know many names than participants in the easy condition, contrary to what would be expected if children based their memory judgement on the objective number of recalled items. Overall, our results support the hypothesis that children as young as 4 years old rely on the subjective experience of ease to regulate their decision-making processes. Theoretical implications of these findings are discussed. © 2015 The British Psychological Society.
Protein-driven inference of miRNA–disease associations
Mørk, Søren; Pletscher-Frankild, Sune; Palleja Caro, Albert; Gorodkin, Jan; Jensen, Lars Juhl
2014-01-01
Motivation: MicroRNAs (miRNAs) are a highly abundant class of non-coding RNA genes involved in cellular regulation and thus also diseases. Despite miRNAs being important disease factors, miRNA–disease associations remain low in number and of variable reliability. Furthermore, existing databases and prediction methods do not explicitly facilitate forming hypotheses about the possible molecular causes of the association, thereby making the path to experimental follow-up longer. Results: Here we present miRPD in which miRNA–Protein–Disease associations are explicitly inferred. Besides linking miRNAs to diseases, it directly suggests the underlying proteins involved, which can be used to form hypotheses that can be experimentally tested. The inference of miRNAs and diseases is made by coupling known and predicted miRNA–protein associations with protein–disease associations text mined from the literature. We present scoring schemes that allow us to rank miRNA–disease associations inferred from both curated and predicted miRNA targets by reliability and thereby to create high- and medium-confidence sets of associations. Analyzing these, we find statistically significant enrichment for proteins involved in pathways related to cancer and type I diabetes mellitus, suggesting either a literature bias or a genuine biological trend. We show by example how the associations can be used to extract proteins for disease hypothesis. Availability and implementation: All datasets, software and a searchable Web site are available at http://mirpd.jensenlab.org. Contact: lars.juhl.jensen@cpr.ku.dk or gorodkin@rth.dk PMID:24273243
Indexing the Environmental Quality Performance Based on A Fuzzy Inference Approach
NASA Astrophysics Data System (ADS)
Iswari, Lizda
2018-03-01
Environmental performance strongly deals with the quality of human life. In Indonesia, this performance is quantified through Environmental Quality Index (EQI) which consists of three indicators, i.e. river quality index, air quality index, and coverage of land cover. The current of this instrument data processing was done by averaging and weighting each index to represent the EQI at the provincial level. However, we found EQI interpretations that may contain some uncertainties and have a range of circumstances possibly less appropriate if processed under a common statistical approach. In this research, we aim to manage the indicators of EQI with a more intuitive computation technique and make some inferences related to the environmental performance in 33 provinces in Indonesia. Research was conducted in three stages of Mamdani Fuzzy Inference System (MAFIS), i.e. fuzzification, data inference, and defuzzification. Data input consists of 10 environmental parameters and the output is an index of Environmental Quality Performance (EQP). Research was applied to the environmental condition data set in 2015 and quantified the results into the scale of 0 to 100, i.e. 10 provinces at good performance with the EQP above 80 dominated by provinces in eastern part of Indonesia, 22 provinces with the EQP between 80 to 50, and one province in Java Island with the EQP below 20. This research shows that environmental quality performance can be quantified without eliminating the natures of the data set and simultaneously is able to show the environment behavior along with its spatial pattern distribution.
Learning abstract visual concepts via probabilistic program induction in a Language of Thought.
Overlan, Matthew C; Jacobs, Robert A; Piantadosi, Steven T
2017-11-01
The ability to learn abstract concepts is a powerful component of human cognition. It has been argued that variable binding is the key element enabling this ability, but the computational aspects of variable binding remain poorly understood. Here, we address this shortcoming by formalizing the Hierarchical Language of Thought (HLOT) model of rule learning. Given a set of data items, the model uses Bayesian inference to infer a probability distribution over stochastic programs that implement variable binding. Because the model makes use of symbolic variables as well as Bayesian inference and programs with stochastic primitives, it combines many of the advantages of both symbolic and statistical approaches to cognitive modeling. To evaluate the model, we conducted an experiment in which human subjects viewed training items and then judged which test items belong to the same concept as the training items. We found that the HLOT model provides a close match to human generalization patterns, significantly outperforming two variants of the Generalized Context Model, one variant based on string similarity and the other based on visual similarity using features from a deep convolutional neural network. Additional results suggest that variable binding happens automatically, implying that binding operations do not add complexity to peoples' hypothesized rules. Overall, this work demonstrates that a cognitive model combining symbolic variables with Bayesian inference and stochastic program primitives provides a new perspective for understanding people's patterns of generalization. Copyright © 2017 Elsevier B.V. All rights reserved.
Pointwise probability reinforcements for robust statistical inference.
Frénay, Benoît; Verleysen, Michel
2014-02-01
Statistical inference using machine learning techniques may be difficult with small datasets because of abnormally frequent data (AFDs). AFDs are observations that are much more frequent in the training sample that they should be, with respect to their theoretical probability, and include e.g. outliers. Estimates of parameters tend to be biased towards models which support such data. This paper proposes to introduce pointwise probability reinforcements (PPRs): the probability of each observation is reinforced by a PPR and a regularisation allows controlling the amount of reinforcement which compensates for AFDs. The proposed solution is very generic, since it can be used to robustify any statistical inference method which can be formulated as a likelihood maximisation. Experiments show that PPRs can be easily used to tackle regression, classification and projection: models are freed from the influence of outliers. Moreover, outliers can be filtered manually since an abnormality degree is obtained for each observation. Copyright © 2013 Elsevier Ltd. All rights reserved.
Model averaging and muddled multimodel inferences.
Cade, Brian S
2015-09-01
Three flawed practices associated with model averaging coefficients for predictor variables in regression models commonly occur when making multimodel inferences in analyses of ecological data. Model-averaged regression coefficients based on Akaike information criterion (AIC) weights have been recommended for addressing model uncertainty but they are not valid, interpretable estimates of partial effects for individual predictors when there is multicollinearity among the predictor variables. Multicollinearity implies that the scaling of units in the denominators of the regression coefficients may change across models such that neither the parameters nor their estimates have common scales, therefore averaging them makes no sense. The associated sums of AIC model weights recommended to assess relative importance of individual predictors are really a measure of relative importance of models, with little information about contributions by individual predictors compared to other measures of relative importance based on effects size or variance reduction. Sometimes the model-averaged regression coefficients for predictor variables are incorrectly used to make model-averaged predictions of the response variable when the models are not linear in the parameters. I demonstrate the issues with the first two practices using the college grade point average example extensively analyzed by Burnham and Anderson. I show how partial standard deviations of the predictor variables can be used to detect changing scales of their estimates with multicollinearity. Standardizing estimates based on partial standard deviations for their variables can be used to make the scaling of the estimates commensurate across models, a necessary but not sufficient condition for model averaging of the estimates to be sensible. A unimodal distribution of estimates and valid interpretation of individual parameters are additional requisite conditions. The standardized estimates or equivalently the t statistics on unstandardized estimates also can be used to provide more informative measures of relative importance than sums of AIC weights. Finally, I illustrate how seriously compromised statistical interpretations and predictions can be for all three of these flawed practices by critiquing their use in a recent species distribution modeling technique developed for predicting Greater Sage-Grouse (Centrocercus urophasianus) distribution in Colorado, USA. These model averaging issues are common in other ecological literature and ought to be discontinued if we are to make effective scientific contributions to ecological knowledge and conservation of natural resources.
Probability, statistics, and computational science.
Beerenwinkel, Niko; Siebourg, Juliane
2012-01-01
In this chapter, we review basic concepts from probability theory and computational statistics that are fundamental to evolutionary genomics. We provide a very basic introduction to statistical modeling and discuss general principles, including maximum likelihood and Bayesian inference. Markov chains, hidden Markov models, and Bayesian network models are introduced in more detail as they occur frequently and in many variations in genomics applications. In particular, we discuss efficient inference algorithms and methods for learning these models from partially observed data. Several simple examples are given throughout the text, some of which point to models that are discussed in more detail in subsequent chapters.
A Not-So-Fundamental Limitation on Studying Complex Systems with Statistics: Comment on Rabin (2011)
NASA Astrophysics Data System (ADS)
Thomas, Drew M.
2012-12-01
Although living organisms are affected by many interrelated and unidentified variables, this complexity does not automatically impose a fundamental limitation on statistical inference. Nor need one invoke such complexity as an explanation of the "Truth Wears Off" or "decline" effect; similar "decline" effects occur with far simpler systems studied in physics. Selective reporting and publication bias, and scientists' biases in favor of reporting eye-catching results (in general) or conforming to others' results (in physics) better explain this feature of the "Truth Wears Off" effect than Rabin's suggested limitation on statistical inference.
Analogical and Category-Based Inference: A Theoretical Integration with Bayesian Causal Models
ERIC Educational Resources Information Center
Holyoak, Keith J.; Lee, Hee Seung; Lu, Hongjing
2010-01-01
A fundamental issue for theories of human induction is to specify constraints on potential inferences. For inferences based on shared category membership, an analogy, and/or a relational schema, it appears that the basic goal of induction is to make accurate and goal-relevant inferences that are sensitive to uncertainty. People can use source…
Self-enforcing Private Inference Control
NASA Astrophysics Data System (ADS)
Yang, Yanjiang; Li, Yingjiu; Weng, Jian; Zhou, Jianying; Bao, Feng
Private inference control enables simultaneous enforcement of inference control and protection of users' query privacy. Private inference control is a useful tool for database applications, especially when users are increasingly concerned about individual privacy nowadays. However, protection of query privacy on top of inference control is a double-edged sword: without letting the database server know the content of user queries, users can easily launch DoS attacks. To assuage DoS attacks in private inference control, we propose the concept of self-enforcing private inference control, whose intuition is to force users to only make inference-free queries by enforcing inference control themselves; otherwise, penalty will inflict upon the violating users.
Effective Online Bayesian Phylogenetics via Sequential Monte Carlo with Guided Proposals
Fourment, Mathieu; Claywell, Brian C; Dinh, Vu; McCoy, Connor; Matsen IV, Frederick A; Darling, Aaron E
2018-01-01
Abstract Modern infectious disease outbreak surveillance produces continuous streams of sequence data which require phylogenetic analysis as data arrives. Current software packages for Bayesian phylogenetic inference are unable to quickly incorporate new sequences as they become available, making them less useful for dynamically unfolding evolutionary stories. This limitation can be addressed by applying a class of Bayesian statistical inference algorithms called sequential Monte Carlo (SMC) to conduct online inference, wherein new data can be continuously incorporated to update the estimate of the posterior probability distribution. In this article, we describe and evaluate several different online phylogenetic sequential Monte Carlo (OPSMC) algorithms. We show that proposing new phylogenies with a density similar to the Bayesian prior suffers from poor performance, and we develop “guided” proposals that better match the proposal density to the posterior. Furthermore, we show that the simplest guided proposals can exhibit pathological behavior in some situations, leading to poor results, and that the situation can be resolved by heating the proposal density. The results demonstrate that relative to the widely used MCMC-based algorithm implemented in MrBayes, the total time required to compute a series of phylogenetic posteriors as sequences arrive can be significantly reduced by the use of OPSMC, without incurring a significant loss in accuracy. PMID:29186587
Adaptation in pronoun resolution: Evidence from Brazilian and European Portuguese.
Fernandes, Eunice G; Luegi, Paula; Correa Soares, Eduardo; de la Fuente, Israel; Hemforth, Barbara
2018-04-26
Previous research accounting for pronoun resolution as a problem of probabilistic inference has not explored the phenomenon of adaptation, whereby the processor constantly tracks and adapts, rationally, to changes in a statistical environment. We investigate whether Brazilian (BP) and European Portuguese (EP) speakers adapt to variations in the probability of occurrence of ambiguous overt and null pronouns, in two experiments assessing resolution toward subject and object referents. For each variety (BP, EP), participants were faced with either the same number of null and overt pronouns (equal distribution), or with an environment with fewer overt (than null) pronouns (unequal distribution). We find that the preference for interpreting overt pronouns as referring back to an object referent (object-biased interpretation) is higher when there are fewer overt pronouns (i.e., in the unequal, relative to the equal distribution condition). This is especially the case for BP, a variety with higher prior frequency and smaller object-biased interpretation of overt pronouns, suggesting that participants adapted incrementally and integrated prior statistical knowledge with the knowledge obtained in the experiment. We hypothesize that comprehenders adapted rationally, with the goal of maintaining, across variations in pronoun probability, the likelihood of subject and object referents. Our findings unify insights from research in pronoun resolution and in adaptation, and add to previous studies in both topics: They provide evidence for the influence of pronoun probability in pronoun resolution, and for an adaptation process whereby the language processor not only tracks statistical information, but uses it to make interpretational inferences. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
An integrated data model to estimate spatiotemporal occupancy, abundance, and colonization dynamics
Williams, Perry J.; Hooten, Mevin B.; Womble, Jamie N.; Esslinger, George G.; Bower, Michael R.; Hefley, Trevor J.
2017-01-01
Ecological invasions and colonizations occur dynamically through space and time. Estimating the distribution and abundance of colonizing species is critical for efficient management or conservation. We describe a statistical framework for simultaneously estimating spatiotemporal occupancy and abundance dynamics of a colonizing species. Our method accounts for several issues that are common when modeling spatiotemporal ecological data including multiple levels of detection probability, multiple data sources, and computational limitations that occur when making fine-scale inference over a large spatiotemporal domain. We apply the model to estimate the colonization dynamics of sea otters (Enhydra lutris) in Glacier Bay, in southeastern Alaska.
Motor control is decision-making.
Wolpert, Daniel M; Landy, Michael S
2012-12-01
Motor behavior may be viewed as a problem of maximizing the utility of movement outcome in the face of sensory, motor and task uncertainty. Viewed in this way, and allowing for the availability of prior knowledge in the form of a probability distribution over possible states of the world, the choice of a movement plan and strategy for motor control becomes an application of statistical decision theory. This point of view has proven successful in recent years in accounting for movement under risk, inferring the loss function used in motor tasks, and explaining motor behavior in a wide variety of circumstances. Copyright © 2012 Elsevier Ltd. All rights reserved.
Schick, Robert S; Kraus, Scott D; Rolland, Rosalind M; Knowlton, Amy R; Hamilton, Philip K; Pettis, Heather M; Thomas, Len; Harwood, John; Clark, James S
2016-01-01
Right whales are vulnerable to many sources of anthropogenic disturbance including ship strikes, entanglement with fishing gear, and anthropogenic noise. The effect of these factors on individual health is unclear. A statistical model using photographic evidence of health was recently built to infer the true or hidden health of individual right whales. However, two important prior assumptions about the role of missing data and unexplained variance on the estimates were not previously assessed. Here we tested these factors by varying prior assumptions and model formulation. We found sensitivity to each assumption and used the output to make guidelines on future model formulation.
Bayesian Inference: with ecological applications
Link, William A.; Barker, Richard J.
2010-01-01
This text provides a mathematically rigorous yet accessible and engaging introduction to Bayesian inference with relevant examples that will be of interest to biologists working in the fields of ecology, wildlife management and environmental studies as well as students in advanced undergraduate statistics.. This text opens the door to Bayesian inference, taking advantage of modern computational efficiencies and easily accessible software to evaluate complex hierarchical models.
Theory-based Bayesian Models of Inductive Inference
2010-07-19
Subjective randomness and natural scene statistics. Psychonomic Bulletin & Review . http://cocosci.berkeley.edu/tom/papers/randscenes.pdf Page 1...in press). Exemplar models as a mechanism for performing Bayesian inference. Psychonomic Bulletin & Review . http://cocosci.berkeley.edu/tom
Differences in Performance Among Test Statistics for Assessing Phylogenomic Model Adequacy.
Duchêne, David A; Duchêne, Sebastian; Ho, Simon Y W
2018-05-18
Statistical phylogenetic analyses of genomic data depend on models of nucleotide or amino acid substitution. The adequacy of these substitution models can be assessed using a number of test statistics, allowing the model to be rejected when it is found to provide a poor description of the evolutionary process. A potentially valuable use of model-adequacy test statistics is to identify when data sets are likely to produce unreliable phylogenetic estimates, but their differences in performance are rarely explored. We performed a comprehensive simulation study to identify test statistics that are sensitive to some of the most commonly cited sources of phylogenetic estimation error. Our results show that, for many test statistics, traditional thresholds for assessing model adequacy can fail to reject the model when the phylogenetic inferences are inaccurate and imprecise. This is particularly problematic when analysing loci that have few variable informative sites. We propose new thresholds for assessing substitution model adequacy and demonstrate their effectiveness in analyses of three phylogenomic data sets. These thresholds lead to frequent rejection of the model for loci that yield topological inferences that are imprecise and are likely to be inaccurate. We also propose the use of a summary statistic that provides a practical assessment of overall model adequacy. Our approach offers a promising means of enhancing model choice in genome-scale data sets, potentially leading to improvements in the reliability of phylogenomic inference.
Inferring action structure and causal relationships in continuous sequences of human action.
Buchsbaum, Daphna; Griffiths, Thomas L; Plunkett, Dillon; Gopnik, Alison; Baldwin, Dare
2015-02-01
In the real world, causal variables do not come pre-identified or occur in isolation, but instead are embedded within a continuous temporal stream of events. A challenge faced by both human learners and machine learning algorithms is identifying subsequences that correspond to the appropriate variables for causal inference. A specific instance of this problem is action segmentation: dividing a sequence of observed behavior into meaningful actions, and determining which of those actions lead to effects in the world. Here we present a Bayesian analysis of how statistical and causal cues to segmentation should optimally be combined, as well as four experiments investigating human action segmentation and causal inference. We find that both people and our model are sensitive to statistical regularities and causal structure in continuous action, and are able to combine these sources of information in order to correctly infer both causal relationships and segmentation boundaries. Copyright © 2014. Published by Elsevier Inc.
Applications of statistics to medical science (1) Fundamental concepts.
Watanabe, Hiroshi
2011-01-01
The conceptual framework of statistical tests and statistical inferences are discussed, and the epidemiological background of statistics is briefly reviewed. This study is one of a series in which we survey the basics of statistics and practical methods used in medical statistics. Arguments related to actual statistical analysis procedures will be made in subsequent papers.
Inference of beliefs and emotions in patients with Alzheimer's disease.
Zaitchik, Deborah; Koff, Elissa; Brownell, Hiram; Winner, Ellen; Albert, Marilyn
2006-01-01
The present study compared 20 patients with mild to moderate Alzheimer's disease with 20 older controls (ages 69-94 years) on their ability to make inferences about emotions and beliefs in others. Six tasks tested their ability to make 1st-order and 2nd-order inferences as well as to offer explanations and moral evaluations of human action by appeal to emotions and beliefs. Results showed that the ability to infer emotions and beliefs in 1st-order tasks remains largely intact in patients with mild to moderate Alzheimer's. Patients were able to use mental states in the prediction, explanation, and moral evaluation of behavior. Impairment on 2nd-order tasks involving inference of mental states was equivalent to impairment on control tasks, suggesting that patients' difficulty is secondary to their cognitive impairments. ((c) 2006 APA, all rights reserved).
A probabilistic, distributed, recursive mechanism for decision-making in the brain
Gurney, Kevin N.
2018-01-01
Decision formation recruits many brain regions, but the procedure they jointly execute is unknown. Here we characterize its essential composition, using as a framework a novel recursive Bayesian algorithm that makes decisions based on spike-trains with the statistics of those in sensory cortex (MT). Using it to simulate the random-dot-motion task, we demonstrate it quantitatively replicates the choice behaviour of monkeys, whilst predicting losses of otherwise usable information from MT. Its architecture maps to the recurrent cortico-basal-ganglia-thalamo-cortical loops, whose components are all implicated in decision-making. We show that the dynamics of its mapped computations match those of neural activity in the sensorimotor cortex and striatum during decisions, and forecast those of basal ganglia output and thalamus. This also predicts which aspects of neural dynamics are and are not part of inference. Our single-equation algorithm is probabilistic, distributed, recursive, and parallel. Its success at capturing anatomy, behaviour, and electrophysiology suggests that the mechanism implemented by the brain has these same characteristics. PMID:29614077
Methodological Limitations of the Application of Expert Systems Methodology in Reading.
ERIC Educational Resources Information Center
Willson, Victor L.
Methodological deficiencies inherent in expert-novice reading research make it impossible to draw inferences about curriculum change. First, comparisons of intact groups are often used as a basis for making causal inferences about how observed characteristics affect behaviors. While comparing different groups is not by itself a useless activity,…
Combining statistical inference and decisions in ecology.
Williams, Perry J; Hooten, Mevin B
2016-09-01
Statistical decision theory (SDT) is a sub-field of decision theory that formally incorporates statistical investigation into a decision-theoretic framework to account for uncertainties in a decision problem. SDT provides a unifying analysis of three types of information: statistical results from a data set, knowledge of the consequences of potential choices (i.e., loss), and prior beliefs about a system. SDT links the theoretical development of a large body of statistical methods, including point estimation, hypothesis testing, and confidence interval estimation. The theory and application of SDT have mainly been developed and published in the fields of mathematics, statistics, operations research, and other decision sciences, but have had limited exposure in ecology. Thus, we provide an introduction to SDT for ecologists and describe its utility for linking the conventionally separate tasks of statistical investigation and decision making in a single framework. We describe the basic framework of both Bayesian and frequentist SDT, its traditional use in statistics, and discuss its application to decision problems that occur in ecology. We demonstrate SDT with two types of decisions: Bayesian point estimation and an applied management problem of selecting a prescribed fire rotation for managing a grassland bird species. Central to SDT, and decision theory in general, are loss functions. Thus, we also provide basic guidance and references for constructing loss functions for an SDT problem. © 2016 by the Ecological Society of America.
Jacquin, Hugo; Gilson, Amy; Shakhnovich, Eugene; Cocco, Simona; Monasson, Rémi
2016-05-01
Inverse statistical approaches to determine protein structure and function from Multiple Sequence Alignments (MSA) are emerging as powerful tools in computational biology. However the underlying assumptions of the relationship between the inferred effective Potts Hamiltonian and real protein structure and energetics remain untested so far. Here we use lattice protein model (LP) to benchmark those inverse statistical approaches. We build MSA of highly stable sequences in target LP structures, and infer the effective pairwise Potts Hamiltonians from those MSA. We find that inferred Potts Hamiltonians reproduce many important aspects of 'true' LP structures and energetics. Careful analysis reveals that effective pairwise couplings in inferred Potts Hamiltonians depend not only on the energetics of the native structure but also on competing folds; in particular, the coupling values reflect both positive design (stabilization of native conformation) and negative design (destabilization of competing folds). In addition to providing detailed structural information, the inferred Potts models used as protein Hamiltonian for design of new sequences are able to generate with high probability completely new sequences with the desired folds, which is not possible using independent-site models. Those are remarkable results as the effective LP Hamiltonians used to generate MSA are not simple pairwise models due to the competition between the folds. Our findings elucidate the reasons for the success of inverse approaches to the modelling of proteins from sequence data, and their limitations.
Localized Smart-Interpretation
NASA Astrophysics Data System (ADS)
Lundh Gulbrandsen, Mats; Mejer Hansen, Thomas; Bach, Torben; Pallesen, Tom
2014-05-01
The complex task of setting up a geological model consists not only of combining available geological information into a conceptual plausible model, but also requires consistency with availably data, e.g. geophysical data. However, in many cases the direct geological information, e.g borehole samples, are very sparse, so in order to create a geological model, the geologist needs to rely on the geophysical data. The problem is however, that the amount of geophysical data in many cases are so vast that it is practically impossible to integrate all of them in the manual interpretation process. This means that a lot of the information available from the geophysical surveys are unexploited, which is a problem, due to the fact that the resulting geological model does not fulfill its full potential and hence are less trustworthy. We suggest an approach to geological modeling that 1. allow all geophysical data to be considered when building the geological model 2. is fast 3. allow quantification of geological modeling. The method is constructed to build a statistical model, f(d,m), describing the relation between what the geologists interpret, d, and what the geologist knows, m. The para- meter m reflects any available information that can be quantified, such as geophysical data, the result of a geophysical inversion, elevation maps, etc... The parameter d reflects an actual interpretation, such as for example the depth to the base of a ground water reservoir. First we infer a statistical model f(d,m), by examining sets of actual interpretations made by a geological expert, [d1, d2, ...], and the information used to perform the interpretation; [m1, m2, ...]. This makes it possible to quantify how the geological expert performs interpolation through f(d,m). As the geological expert proceeds interpreting, the number of interpreted datapoints from which the statistical model is inferred increases, and therefore the accuracy of the statistical model increases. When a model f(d,m) successfully has been inferred, we are able to simulate how the geological expert would perform an interpretation given some external information m, through f(d|m). We will demonstrate this method applied on geological interpretation and densely sampled airborne electromagnetic data. In short, our goal is to build a statistical model describing how a geological expert performs geological interpretation given some geophysical data. We then wish to use this statistical model to perform semi automatic interpretation, everywhere where such geophysical data exist, in a manner consistent with the choices made by a geological expert. Benefits of such a statistical model are that 1. it provides a quantification of how a geological expert performs interpretation based on available diverse data 2. all available geophysical information can be used 3. it allows much faster interpretation of large data sets.
Graffelman, Jan; Sánchez, Milagros; Cook, Samantha; Moreno, Victor
2013-01-01
In genetic association studies, tests for Hardy-Weinberg proportions are often employed as a quality control checking procedure. Missing genotypes are typically discarded prior to testing. In this paper we show that inference for Hardy-Weinberg proportions can be biased when missing values are discarded. We propose to use multiple imputation of missing values in order to improve inference for Hardy-Weinberg proportions. For imputation we employ a multinomial logit model that uses information from allele intensities and/or neighbouring markers. Analysis of an empirical data set of single nucleotide polymorphisms possibly related to colon cancer reveals that missing genotypes are not missing completely at random. Deviation from Hardy-Weinberg proportions is mostly due to a lack of heterozygotes. Inbreeding coefficients estimated by multiple imputation of the missings are typically lowered with respect to inbreeding coefficients estimated by discarding the missings. Accounting for missings by multiple imputation qualitatively changed the results of 10 to 17% of the statistical tests performed. Estimates of inbreeding coefficients obtained by multiple imputation showed high correlation with estimates obtained by single imputation using an external reference panel. Our conclusion is that imputation of missing data leads to improved statistical inference for Hardy-Weinberg proportions.
Ghosh, Sujit K
2010-01-01
Bayesian methods are rapidly becoming popular tools for making statistical inference in various fields of science including biology, engineering, finance, and genetics. One of the key aspects of Bayesian inferential method is its logical foundation that provides a coherent framework to utilize not only empirical but also scientific information available to a researcher. Prior knowledge arising from scientific background, expert judgment, or previously collected data is used to build a prior distribution which is then combined with current data via the likelihood function to characterize the current state of knowledge using the so-called posterior distribution. Bayesian methods allow the use of models of complex physical phenomena that were previously too difficult to estimate (e.g., using asymptotic approximations). Bayesian methods offer a means of more fully understanding issues that are central to many practical problems by allowing researchers to build integrated models based on hierarchical conditional distributions that can be estimated even with limited amounts of data. Furthermore, advances in numerical integration methods, particularly those based on Monte Carlo methods, have made it possible to compute the optimal Bayes estimators. However, there is a reasonably wide gap between the background of the empirically trained scientists and the full weight of Bayesian statistical inference. Hence, one of the goals of this chapter is to bridge the gap by offering elementary to advanced concepts that emphasize linkages between standard approaches and full probability modeling via Bayesian methods.
[Application of statistics on chronic-diseases-relating observational research papers].
Hong, Zhi-heng; Wang, Ping; Cao, Wei-hua
2012-09-01
To study the application of statistics on Chronic-diseases-relating observational research papers which were recently published in the Chinese Medical Association Magazines, with influential index above 0.5. Using a self-developed criterion, two investigators individually participated in assessing the application of statistics on Chinese Medical Association Magazines, with influential index above 0.5. Different opinions reached an agreement through discussion. A total number of 352 papers from 6 magazines, including the Chinese Journal of Epidemiology, Chinese Journal of Oncology, Chinese Journal of Preventive Medicine, Chinese Journal of Cardiology, Chinese Journal of Internal Medicine and Chinese Journal of Endocrinology and Metabolism, were reviewed. The rate of clear statement on the following contents as: research objectives, t target audience, sample issues, objective inclusion criteria and variable definitions were 99.43%, 98.57%, 95.43%, 92.86% and 96.87%. The correct rates of description on quantitative and qualitative data were 90.94% and 91.46%, respectively. The rates on correctly expressing the results, on statistical inference methods related to quantitative, qualitative data and modeling were 100%, 95.32% and 87.19%, respectively. 89.49% of the conclusions could directly response to the research objectives. However, 69.60% of the papers did not mention the exact names of the study design, statistically, that the papers were using. 11.14% of the papers were in lack of further statement on the exclusion criteria. Percentage of the papers that could clearly explain the sample size estimation only taking up as 5.16%. Only 24.21% of the papers clearly described the variable value assignment. Regarding the introduction on statistical conduction and on database methods, the rate was only 24.15%. 18.75% of the papers did not express the statistical inference methods sufficiently. A quarter of the papers did not use 'standardization' appropriately. As for the aspect of statistical inference, the rate of description on statistical testing prerequisite was only 24.12% while 9.94% papers did not even employ the statistical inferential method that should be used. The main deficiencies on the application of Statistics used in papers related to Chronic-diseases-related observational research were as follows: lack of sample-size determination, variable value assignment description not sufficient, methods on statistics were not introduced clearly or properly, lack of consideration for pre-requisition regarding the use of statistical inferences.
Generative Inferences Based on Learned Relations
ERIC Educational Resources Information Center
Chen, Dawn; Lu, Hongjing; Holyoak, Keith J.
2017-01-01
A key property of relational representations is their "generativity": From partial descriptions of relations between entities, additional inferences can be drawn about other entities. A major theoretical challenge is to demonstrate how the capacity to make generative inferences could arise as a result of learning relations from…
Royle, J. Andrew; Dorazio, Robert M.
2008-01-01
A guide to data collection, modeling and inference strategies for biological survey data using Bayesian and classical statistical methods. This book describes a general and flexible framework for modeling and inference in ecological systems based on hierarchical models, with a strict focus on the use of probability models and parametric inference. Hierarchical models represent a paradigm shift in the application of statistics to ecological inference problems because they combine explicit models of ecological system structure or dynamics with models of how ecological systems are observed. The principles of hierarchical modeling are developed and applied to problems in population, metapopulation, community, and metacommunity systems. The book provides the first synthetic treatment of many recent methodological advances in ecological modeling and unifies disparate methods and procedures. The authors apply principles of hierarchical modeling to ecological problems, including * occurrence or occupancy models for estimating species distribution * abundance models based on many sampling protocols, including distance sampling * capture-recapture models with individual effects * spatial capture-recapture models based on camera trapping and related methods * population and metapopulation dynamic models * models of biodiversity, community structure and dynamics.
Truth, models, model sets, AIC, and multimodel inference: a Bayesian perspective
Barker, Richard J.; Link, William A.
2015-01-01
Statistical inference begins with viewing data as realizations of stochastic processes. Mathematical models provide partial descriptions of these processes; inference is the process of using the data to obtain a more complete description of the stochastic processes. Wildlife and ecological scientists have become increasingly concerned with the conditional nature of model-based inference: what if the model is wrong? Over the last 2 decades, Akaike's Information Criterion (AIC) has been widely and increasingly used in wildlife statistics for 2 related purposes, first for model choice and second to quantify model uncertainty. We argue that for the second of these purposes, the Bayesian paradigm provides the natural framework for describing uncertainty associated with model choice and provides the most easily communicated basis for model weighting. Moreover, Bayesian arguments provide the sole justification for interpreting model weights (including AIC weights) as coherent (mathematically self consistent) model probabilities. This interpretation requires treating the model as an exact description of the data-generating mechanism. We discuss the implications of this assumption, and conclude that more emphasis is needed on model checking to provide confidence in the quality of inference.
Li, Huanjie; Nickerson, Lisa D; Nichols, Thomas E; Gao, Jia-Hong
2017-03-01
Two powerful methods for statistical inference on MRI brain images have been proposed recently, a non-stationary voxelation-corrected cluster-size test (CST) based on random field theory and threshold-free cluster enhancement (TFCE) based on calculating the level of local support for a cluster, then using permutation testing for inference. Unlike other statistical approaches, these two methods do not rest on the assumptions of a uniform and high degree of spatial smoothness of the statistic image. Thus, they are strongly recommended for group-level fMRI analysis compared to other statistical methods. In this work, the non-stationary voxelation-corrected CST and TFCE methods for group-level analysis were evaluated for both stationary and non-stationary images under varying smoothness levels, degrees of freedom and signal to noise ratios. Our results suggest that, both methods provide adequate control for the number of voxel-wise statistical tests being performed during inference on fMRI data and they are both superior to current CSTs implemented in popular MRI data analysis software packages. However, TFCE is more sensitive and stable for group-level analysis of VBM data. Thus, the voxelation-corrected CST approach may confer some advantages by being computationally less demanding for fMRI data analysis than TFCE with permutation testing and by also being applicable for single-subject fMRI analyses, while the TFCE approach is advantageous for VBM data. Hum Brain Mapp 38:1269-1280, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Wright, Marvin N; Dankowski, Theresa; Ziegler, Andreas
2017-04-15
The most popular approach for analyzing survival data is the Cox regression model. The Cox model may, however, be misspecified, and its proportionality assumption may not always be fulfilled. An alternative approach for survival prediction is random forests for survival outcomes. The standard split criterion for random survival forests is the log-rank test statistic, which favors splitting variables with many possible split points. Conditional inference forests avoid this split variable selection bias. However, linear rank statistics are utilized by default in conditional inference forests to select the optimal splitting variable, which cannot detect non-linear effects in the independent variables. An alternative is to use maximally selected rank statistics for the split point selection. As in conditional inference forests, splitting variables are compared on the p-value scale. However, instead of the conditional Monte-Carlo approach used in conditional inference forests, p-value approximations are employed. We describe several p-value approximations and the implementation of the proposed random forest approach. A simulation study demonstrates that unbiased split variable selection is possible. However, there is a trade-off between unbiased split variable selection and runtime. In benchmark studies of prediction performance on simulated and real datasets, the new method performs better than random survival forests if informative dichotomous variables are combined with uninformative variables with more categories and better than conditional inference forests if non-linear covariate effects are included. In a runtime comparison, the method proves to be computationally faster than both alternatives, if a simple p-value approximation is used. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
ERIC Educational Resources Information Center
Jang, Eunice Eunhee
2010-01-01
This article presents the author's response to the commentary essays by Charles Alderson and Fred Davidson on "Demystifying a Q-matrix for making diagnostic inferences about L2 reading skills" (Jang, 2009). The author stresses that their commentaries provides vital perspectives on the challenges to pushing the boundaries of current testing…
Children's Evaluation of the Certainty of Another Person's Inductive Inferences and Guesses
ERIC Educational Resources Information Center
Pillow, Bradford H.; Pearson, RaeAnne M.
2012-01-01
In three studies, 5-10-year-old children and an adult comparison group judged another's certainty in making inductive inferences and guesses. Participants observed a puppet make strong inductions, weak inductions, and guesses. Participants either had no information about the correctness of the puppet's conclusion, knew that the puppet was correct,…
Detective Questions: A Strategy for Improving Inference-Making in Children With Mild Disabilities
ERIC Educational Resources Information Center
Jiménez-Fernández, Gracia
2015-01-01
One of the most frequent problems in reading comprehension is the difficulty in making inferences from the text, especially for students with mild disabilities (i.e., children with learning disabilities or with high-functioning autism). It is essential, therefore, that educators include the teaching of reading strategies to improve their students'…
Statistics in biomedical laboratory and clinical science: applications, issues and pitfalls.
Ludbrook, John
2008-01-01
This review is directed at biomedical scientists who want to gain a better understanding of statistics: what tests to use, when, and why. In my view, even during the planning stage of a study it is very important to seek the advice of a qualified biostatistician. When designing and analyzing a study, it is important to construct and test global hypotheses, rather than to make multiple tests on the data. If the latter cannot be avoided, it is essential to control the risk of making false-positive inferences by applying multiple comparison procedures. For comparing two means or two proportions, it is best to use exact permutation tests rather then the better known, classical, ones. For comparing many means, analysis of variance, often of a complex type, is the most powerful approach. The correlation coefficient should never be used to compare the performances of two methods of measurement, or two measures, because it does not detect bias. Instead the Altman-Bland method of differences or least-products linear regression analysis should be preferred. Finally, the educational value to investigators of interaction with a biostatistician, before, during and after a study, cannot be overemphasized. (c) 2007 S. Karger AG, Basel.
Teaching machines to find mantle composition
NASA Astrophysics Data System (ADS)
Atkins, Suzanne; Tackley, Paul; Trampert, Jeannot; Valentine, Andrew
2017-04-01
The composition of the mantle affects many geodynamical processes by altering factors such as the density, the location of phase changes, and melting temperature. The inferences we make about mantle composition also determine how we interpret the changes in velocity, reflections, attenuation and scattering seen by seismologists. However, the bulk composition of the mantle is very poorly constrained. Inferences are made from meteorite samples, rock samples from the Earth and inferences made from geophysical data. All of these approaches require significant assumptions and the inferences made are subject to large uncertainties. Here we present a new method for inferring mantle composition, based on pattern recognition machine learning, which uses large scale in situ observations of the mantle to make fully probabilistic inferences of composition for convection simulations. Our method has an advantage over other petrological approaches because we use large scale geophysical observations. This means that we average over much greater length scales and do not need to rely on extrapolating from localised samples of the mantle or planetary disk. Another major advantage of our method is that it is fully probabilistic. This allows us to include all of the uncertainties inherent in the inference process, giving us far more information about the reliability of the result than other methods. Finally our method includes the impact of composition on mantle convection. This allows us to make much more precise inferences from geophysical data than other geophysical approaches, which attempt to invert one observation with no consideration of the relationship between convection and composition. We use a sampling based inversion method, using hundreds of convection simulations run using StagYY with self consistent mineral physics properties calculated using the PerpleX package. The observations from these simulations are used to train a neural network to make a probabilistic inference for major element oxide composition of the mantle. We find we can constrain bulk mantle FeO molar percent, FeO/MgO and FeO/SiO2 using observations of the temperature and density structure of the mantle in convection simulations.
A Review of Some Aspects of Robust Inference for Time Series.
1984-09-01
REVIEW OF SOME ASPECTSOF ROBUST INFERNCE FOR TIME SERIES by Ad . Dougla Main TE "iAL REPOW No. 63 Septermber 1984 Department of Statistics University of ...clear. One cannot hope to have a good method for dealing with outliers in time series by using only an instantaneous nonlinear transformation of the data...AI.49 716 A REVIEWd OF SOME ASPECTS OF ROBUST INFERENCE FOR TIME 1/1 SERIES(U) WASHINGTON UNIV SEATTLE DEPT OF STATISTICS R D MARTIN SEP 84 TR-53
The researcher and the consultant: from testing to probability statements.
Hamra, Ghassan B; Stang, Andreas; Poole, Charles
2015-09-01
In the first instalment of this series, Stang and Poole provided an overview of Fisher significance testing (ST), Neyman-Pearson null hypothesis testing (NHT), and their unfortunate and unintended offspring, null hypothesis significance testing. In addition to elucidating the distinction between the first two and the evolution of the third, the authors alluded to alternative models of statistical inference; namely, Bayesian statistics. Bayesian inference has experienced a revival in recent decades, with many researchers advocating for its use as both a complement and an alternative to NHT and ST. This article will continue in the direction of the first instalment, providing practicing researchers with an introduction to Bayesian inference. Our work will draw on the examples and discussion of the previous dialogue.
Inference generation and story comprehension among children with ADHD.
Van Neste, Jessica; Hayden, Angela; Lorch, Elizabeth P; Milich, Richard
2015-02-01
Academic difficulties are well-documented among children with ADHD. Exploring these difficulties through story comprehension research has revealed deficits among children with ADHD in making causal connections between events and in using causal structure and thematic importance to guide recall of stories. Important to theories of story comprehension and implied in these deficits is the ability to make inferences. Often, characters' goals are implicit and explanations of events must be inferred. The purpose of the present study was to compare the inferences generated during story comprehension by 23 7- to 11-year-old children with ADHD (16 males) and 35 comparison peers (19 males). Children watched two televised stories, each paused at five points. In the experimental condition, at each pause children told what they were thinking about the story, whereas in the control condition no responses were made during pauses. After viewing, children recalled the story. Several types of inferences and inference plausibility were coded. Children with ADHD generated fewer of the most essential inferences, plausible explanatory inferences, than did comparison children, both during story processing and during story recall. The groups did not differ on production of other types of inferences. Group differences in generating inferences during the think-aloud task significantly mediated group differences in patterns of recall. Both groups recalled more of the most important story information after completing the think-aloud task. Generating fewer explanatory inferences has important implications for story comprehension deficits in children with ADHD.
Influence of valproate on language functions in children with epilepsy.
Doo, Jin Woong; Kim, Soon Chul; Kim, Sun Jun
2018-01-01
The aim of the current study was to assess the influences of valproate (VPA) on the language functions in newly diagnosed pediatric patients with epilepsy. We reviewed medical records of 53 newly diagnosed patients with epilepsy, who were being treated with VPA monotherapy (n=53; 22 male patients and 31 female patients). The subjects underwent standardized language tests, at least twice, before and after the initiation of VPA. The standardized language tests used were The Test of Language Problem Solving Abilities, a Korean version of The Expressive/Receptive Language Function Test, and the Urimal Test of Articulation and Phonology. Since all the patients analyzed spoke Korean as their first language, we used Korean language tests to reduce the bias within the data. All the language parameters of the Test of Language Problem Solving Abilities slightly improved after the initiation of VPA in the 53 pediatric patients with epilepsy (mean age: 11.6±3.2years), but only "prediction" was statistically significant (determining cause, 14.9±5.1 to 15.5±4.3; making inference, 16.1±5.8 to 16.9±5.6; prediction, 11.1±4.9 to 11.9±4.2; total score of TOPS, 42.0±14.4 to 44.2±12.5). The patients treated with VPA also exhibited a small extension in mean length of utterance in words (MLU-w) when responding, but this was not statistically significant (determining cause, 5.4±2.0 to 5.7±1.6; making inference, 5.8±2.2 to 6.0±1.8; prediction, 5.9±2.5 to 5.9±2.1; total, 5.7±2.1 to 5.9±1.7). The administration of VPA led to a slight, but not statistically significant, improvement in the receptive language function (range: 144.7±41.1 to 148.2±39.7). Finally, there were no statistically significant changes in the percentage of articulation performance after taking VPA. Therefore, our data suggested that VPA did not have negative impact on the language function, but rather slightly improved problem-solving abilities. Copyright © 2017 Elsevier Inc. All rights reserved.
Enhancing Transparency and Control When Drawing Data-Driven Inferences About Individuals.
Chen, Daizhuo; Fraiberger, Samuel P; Moakler, Robert; Provost, Foster
2017-09-01
Recent studies show the remarkable power of fine-grained information disclosed by users on social network sites to infer users' personal characteristics via predictive modeling. Similar fine-grained data are being used successfully in other commercial applications. In response, attention is turning increasingly to the transparency that organizations provide to users as to what inferences are drawn and why, as well as to what sort of control users can be given over inferences that are drawn about them. In this article, we focus on inferences about personal characteristics based on information disclosed by users' online actions. As a use case, we explore personal inferences that are made possible from "Likes" on Facebook. We first present a means for providing transparency into the information responsible for inferences drawn by data-driven models. We then introduce the "cloaking device"-a mechanism for users to inhibit the use of particular pieces of information in inference. Using these analytical tools we ask two main questions: (1) How much information must users cloak to significantly affect inferences about their personal traits? We find that usually users must cloak only a small portion of their actions to inhibit inference. We also find that, encouragingly, false-positive inferences are significantly easier to cloak than true-positive inferences. (2) Can firms change their modeling behavior to make cloaking more difficult? The answer is a definitive yes. We demonstrate a simple modeling change that requires users to cloak substantially more information to affect the inferences drawn. The upshot is that organizations can provide transparency and control even into complicated, predictive model-driven inferences, but they also can make control easier or harder for their users.
Enhancing Transparency and Control When Drawing Data-Driven Inferences About Individuals
Chen, Daizhuo; Fraiberger, Samuel P.; Moakler, Robert; Provost, Foster
2017-01-01
Abstract Recent studies show the remarkable power of fine-grained information disclosed by users on social network sites to infer users' personal characteristics via predictive modeling. Similar fine-grained data are being used successfully in other commercial applications. In response, attention is turning increasingly to the transparency that organizations provide to users as to what inferences are drawn and why, as well as to what sort of control users can be given over inferences that are drawn about them. In this article, we focus on inferences about personal characteristics based on information disclosed by users' online actions. As a use case, we explore personal inferences that are made possible from “Likes” on Facebook. We first present a means for providing transparency into the information responsible for inferences drawn by data-driven models. We then introduce the “cloaking device”—a mechanism for users to inhibit the use of particular pieces of information in inference. Using these analytical tools we ask two main questions: (1) How much information must users cloak to significantly affect inferences about their personal traits? We find that usually users must cloak only a small portion of their actions to inhibit inference. We also find that, encouragingly, false-positive inferences are significantly easier to cloak than true-positive inferences. (2) Can firms change their modeling behavior to make cloaking more difficult? The answer is a definitive yes. We demonstrate a simple modeling change that requires users to cloak substantially more information to affect the inferences drawn. The upshot is that organizations can provide transparency and control even into complicated, predictive model-driven inferences, but they also can make control easier or harder for their users. PMID:28933942
Zeng, Irene Sui Lan; Lumley, Thomas
2018-01-01
Integrated omics is becoming a new channel for investigating the complex molecular system in modern biological science and sets a foundation for systematic learning for precision medicine. The statistical/machine learning methods that have emerged in the past decade for integrated omics are not only innovative but also multidisciplinary with integrated knowledge in biology, medicine, statistics, machine learning, and artificial intelligence. Here, we review the nontrivial classes of learning methods from the statistical aspects and streamline these learning methods within the statistical learning framework. The intriguing findings from the review are that the methods used are generalizable to other disciplines with complex systematic structure, and the integrated omics is part of an integrated information science which has collated and integrated different types of information for inferences and decision making. We review the statistical learning methods of exploratory and supervised learning from 42 publications. We also discuss the strengths and limitations of the extended principal component analysis, cluster analysis, network analysis, and regression methods. Statistical techniques such as penalization for sparsity induction when there are fewer observations than the number of features and using Bayesian approach when there are prior knowledge to be integrated are also included in the commentary. For the completeness of the review, a table of currently available software and packages from 23 publications for omics are summarized in the appendix.
What You Learn is What You See: Using Eye Movements to Study Infant Cross-Situational Word Learning
Smith, Linda
2016-01-01
Recent studies show that both adults and young children possess powerful statistical learning capabilities to solve the word-to-world mapping problem. However, the underlying mechanisms that make statistical learning possible and powerful are not yet known. With the goal of providing new insights into this issue, the research reported in this paper used an eye tracker to record the moment-by-moment eye movement data of 14-month-old babies in statistical learning tasks. Various measures are applied to such fine-grained temporal data, such as looking duration and shift rate (the number of shifts in gaze from one visual object to the other) trial by trial, showing different eye movement patterns between strong and weak statistical learners. Moreover, an information-theoretic measure is developed and applied to gaze data to quantify the degree of learning uncertainty trial by trial. Next, a simple associative statistical learning model is applied to eye movement data and these simulation results are compared with empirical results from young children, showing strong correlations between these two. This suggests that an associative learning mechanism with selective attention can provide a cognitively plausible model of cross-situational statistical learning. The work represents the first steps to use eye movement data to infer underlying real-time processes in statistical word learning. PMID:22213894
Statistics, Computation, and Modeling in Cosmology
NASA Astrophysics Data System (ADS)
Jewell, Jeff; Guiness, Joe; SAMSI 2016 Working Group in Cosmology
2017-01-01
Current and future ground and space based missions are designed to not only detect, but map out with increasing precision, details of the universe in its infancy to the present-day. As a result we are faced with the challenge of analyzing and interpreting observations from a wide variety of instruments to form a coherent view of the universe. Finding solutions to a broad range of challenging inference problems in cosmology is one of the goals of the “Statistics, Computation, and Modeling in Cosmology” workings groups, formed as part of the year long program on ‘Statistical, Mathematical, and Computational Methods for Astronomy’, hosted by the Statistical and Applied Mathematical Sciences Institute (SAMSI), a National Science Foundation funded institute. Two application areas have emerged for focused development in the cosmology working group involving advanced algorithmic implementations of exact Bayesian inference for the Cosmic Microwave Background, and statistical modeling of galaxy formation. The former includes study and development of advanced Markov Chain Monte Carlo algorithms designed to confront challenging inference problems including inference for spatial Gaussian random fields in the presence of sources of galactic emission (an example of a source separation problem). Extending these methods to future redshift survey data probing the nonlinear regime of large scale structure formation is also included in the working group activities. In addition, the working group is also focused on the study of ‘Galacticus’, a galaxy formation model applied to dark matter-only cosmological N-body simulations operating on time-dependent halo merger trees. The working group is interested in calibrating the Galacticus model to match statistics of galaxy survey observations; specifically stellar mass functions, luminosity functions, and color-color diagrams. The group will use subsampling approaches and fractional factorial designs to statistically and computationally efficiently explore the Galacticus parameter space. The group will also use the Galacticus simulations to study the relationship between the topological and physical structure of the halo merger trees and the properties of the resulting galaxies.
Inferring Learners' Knowledge from Their Actions
ERIC Educational Resources Information Center
Rafferty, Anna N.; LaMar, Michelle M.; Griffiths, Thomas L.
2015-01-01
Watching another person take actions to complete a goal and making inferences about that person's knowledge is a relatively natural task for people. This ability can be especially important in educational settings, where the inferences can be used for assessment, diagnosing misconceptions, and providing informative feedback. In this paper, we…
Activation of Background Knowledge for Inference Making: Effects on Reading Comprehension
ERIC Educational Resources Information Center
Elbro, Carsten; Buch-Iversen, Ida
2013-01-01
Failure to "activate" relevant, existing background knowledge may be a cause of poor reading comprehension. This failure may cause particular problems with inferences that depend heavily on prior knowledge. Conversely, teaching how to use background knowledge in the context of gap-filling inferences could improve reading comprehension in…
Observers Exploit Stochastic Models of Sensory Change to Help Judge the Passage of Time
Ahrens, Misha B.; Sahani, Maneesh
2011-01-01
Summary Sensory stimulation can systematically bias the perceived passage of time [1–5], but why and how this happens is mysterious. In this report, we provide evidence that such biases may ultimately derive from an innate and adaptive use of stochastically evolving dynamic stimuli to help refine estimates derived from internal timekeeping mechanisms [6–15]. A simplified statistical model based on probabilistic expectations of stimulus change derived from the second-order temporal statistics of the natural environment [16, 17] makes three predictions. First, random noise-like stimuli whose statistics violate natural expectations should induce timing bias. Second, a previously unexplored obverse of this effect is that similar noise stimuli with natural statistics should reduce the variability of timing estimates. Finally, this reduction in variability should scale with the interval being timed, so as to preserve the overall Weber law of interval timing. All three predictions are borne out experimentally. Thus, in the context of our novel theoretical framework, these results suggest that observers routinely rely on sensory input to augment their sense of the passage of time, through a process of Bayesian inference based on expectations of change in the natural environment. PMID:21256018
Vyas, Shreena A; Desai, Sukumar P
2015-05-01
Sir Ronald Aylmer Fisher and William Sealy Gosset were responsible for laying the foundations of statistical inference. Tests that bear their names are used by students and researchers in a wide variety of scientific disciplines. Similar and different in many respects, their lives and careers are the subject of this essay. They were not teacher and pupil; in fact the student was 14 years older than the professor. Their careers did not require them to interact with one another much but they were aware of one another's work. Although Sir Ronald is assigned the role of the professor, his success as a teacher was impaired by his inability to understand the limitations of his students. Meanwhile Gosset was forced to publish his work under the pseudonym 'Student' in order to make contributions to the field of mathematical statistics. Both men are undisputed giants in the field of statistics and we celebrate their achievements as much as we try to understand their struggles. © The Author(s) 2015 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.
Huang, Shi; MacKinnon, David P.; Perrino, Tatiana; Gallo, Carlos; Cruden, Gracelyn; Brown, C Hendricks
2016-01-01
Mediation analysis often requires larger sample sizes than main effect analysis to achieve the same statistical power. Combining results across similar trials may be the only practical option for increasing statistical power for mediation analysis in some situations. In this paper, we propose a method to estimate: 1) marginal means for mediation path a, the relation of the independent variable to the mediator; 2) marginal means for path b, the relation of the mediator to the outcome, across multiple trials; and 3) the between-trial level variance-covariance matrix based on a bivariate normal distribution. We present the statistical theory and an R computer program to combine regression coefficients from multiple trials to estimate a combined mediated effect and confidence interval under a random effects model. Values of coefficients a and b, along with their standard errors from each trial are the input for the method. This marginal likelihood based approach with Monte Carlo confidence intervals provides more accurate inference than the standard meta-analytic approach. We discuss computational issues, apply the method to two real-data examples and make recommendations for the use of the method in different settings. PMID:28239330
A stochastic and dynamical view of pluripotency in mouse embryonic stem cells
Lee, Esther J.
2018-01-01
Pluripotent embryonic stem cells are of paramount importance for biomedical sciences because of their innate ability for self-renewal and differentiation into all major cell lines. The fateful decision to exit or remain in the pluripotent state is regulated by complex genetic regulatory networks. The rapid growth of single-cell sequencing data has greatly stimulated applications of statistical and machine learning methods for inferring topologies of pluripotency regulating genetic networks. The inferred network topologies, however, often only encode Boolean information while remaining silent about the roles of dynamics and molecular stochasticity inherent in gene expression. Herein we develop a framework for systematically extending Boolean-level network topologies into higher resolution models of networks which explicitly account for the promoter architectures and gene state switching dynamics. We show the framework to be useful for disentangling the various contributions that gene switching, external signaling, and network topology make to the global heterogeneity and dynamics of transcription factor populations. We find the pluripotent state of the network to be a steady state which is robust to global variations of gene switching rates which we argue are a good proxy for epigenetic states of individual promoters. The temporal dynamics of exiting the pluripotent state, on the other hand, is significantly influenced by the rates of genetic switching which makes cells more responsive to changes in extracellular signals. PMID:29451874
Montero-Dorta, Antonio D.; Bolton, Adam S.; Brownstein, Joel R.; ...
2016-06-09
The history of the expanding universe is encoded in the large-scale distribution of galaxies throughout space. By mapping out the three-dimensional locations of millions of galaxies with powerful telescopes, we can directly measure this expansion history. When interpreted using Einstein's theory of gravity, this expansion history lets us infer the contents of the universe, including the amount and nature of "dark energy", an as-yet unexplained energy density associated with the empty vacuum of space. However, to make these measurements and inferences accurately, we must understand and control for a large number of experimental effects. This paper develops a novel methodmore » for large cosmological galaxy surveys, and applies it to data from the "BOSS" experiment of the Third Sloan Digital Sky Survey. This method enables an accurate statistical characterization of the "completeness" of the BOSS experiment: the probability that a given galaxy at a given place in the universe is actually detected and successfully measured. It also enables the accurate determination of the underlying demographics of the galaxy population being studied by the experiment. These two ingredients can then be used to make a more accurate comparison between the results of the experiment and the theoretical models that predict the observable effects of dark energy.« less
Hybrid regulatory models: a statistically tractable approach to model regulatory network dynamics.
Ocone, Andrea; Millar, Andrew J; Sanguinetti, Guido
2013-04-01
Computational modelling of the dynamics of gene regulatory networks is a central task of systems biology. For networks of small/medium scale, the dominant paradigm is represented by systems of coupled non-linear ordinary differential equations (ODEs). ODEs afford great mechanistic detail and flexibility, but calibrating these models to data is often an extremely difficult statistical problem. Here, we develop a general statistical inference framework for stochastic transcription-translation networks. We use a coarse-grained approach, which represents the system as a network of stochastic (binary) promoter and (continuous) protein variables. We derive an exact inference algorithm and an efficient variational approximation that allows scalable inference and learning of the model parameters. We demonstrate the power of the approach on two biological case studies, showing that the method allows a high degree of flexibility and is capable of testable novel biological predictions. http://homepages.inf.ed.ac.uk/gsanguin/software.html. Supplementary data are available at Bioinformatics online.
Intelligent Machines in the 21st Century: Automating the Processes of Inference and Inquiry
NASA Technical Reports Server (NTRS)
Knuth, Kevin H.
2003-01-01
The last century saw the application of Boolean algebra toward the construction of computing machines, which work by applying logical transformations to information contained in their memory. The development of information theory and the generalization of Boolean algebra to Bayesian inference have enabled these computing machines. in the last quarter of the twentieth century, to be endowed with the ability to learn by making inferences from data. This revolution is just beginning as new computational techniques continue to make difficult problems more accessible. However, modern intelligent machines work by inferring knowledge using only their pre-programmed prior knowledge and the data provided. They lack the ability to ask questions, or request data that would aid their inferences. Recent advances in understanding the foundations of probability theory have revealed implications for areas other than logic. Of relevance to intelligent machines, we identified the algebra of questions as the free distributive algebra, which now allows us to work with questions in a way analogous to that which Boolean algebra enables us to work with logical statements. In this paper we describe this logic of inference and inquiry using the mathematics of partially ordered sets and the scaffolding of lattice theory, discuss the far-reaching implications of the methodology, and demonstrate its application with current examples in machine learning. Automation of both inference and inquiry promises to allow robots to perform science in the far reaches of our solar system and in other star systems by enabling them to not only make inferences from data, but also decide which question to ask, experiment to perform, or measurement to take given what they have learned and what they are designed to understand.
The Emergence of Probabilistic Reasoning in Very Young Infants: Evidence from 4.5- and 6-Month-Olds
ERIC Educational Resources Information Center
Denison, Stephanie; Reed, Christie; Xu, Fei
2013-01-01
How do people make rich inferences from such sparse data? Recent research has explored this inferential ability by investigating probabilistic reasoning in infancy. For example, 8- and 11-month-old infants can make inferences from samples to populations and vice versa (Denison & Xu, 2010a; Xu & Denison, 2009; Xu & Garcia, 2008a). The…
Reading biological processes from nucleotide sequences
NASA Astrophysics Data System (ADS)
Murugan, Anand
Cellular processes have traditionally been investigated by techniques of imaging and biochemical analysis of the molecules involved. The recent rapid progress in our ability to manipulate and read nucleic acid sequences gives us direct access to the genetic information that directs and constrains biological processes. While sequence data is being used widely to investigate genotype-phenotype relationships and population structure, here we use sequencing to understand biophysical mechanisms. We present work on two different systems. First, in chapter 2, we characterize the stochastic genetic editing mechanism that produces diverse T-cell receptors in the human immune system. We do this by inferring statistical distributions of the underlying biochemical events that generate T-cell receptor coding sequences from the statistics of the observed sequences. This inferred model quantitatively describes the potential repertoire of T-cell receptors that can be produced by an individual, providing insight into its potential diversity and the probability of generation of any specific T-cell receptor. Then in chapter 3, we present work on understanding the functioning of regulatory DNA sequences in both prokaryotes and eukaryotes. Here we use experiments that measure the transcriptional activity of large libraries of mutagenized promoters and enhancers and infer models of the sequence-function relationship from this data. For the bacterial promoter, we infer a physically motivated 'thermodynamic' model of the interaction of DNA-binding proteins and RNA polymerase determining the transcription rate of the downstream gene. For the eukaryotic enhancers, we infer heuristic models of the sequence-function relationship and use these models to find synthetic enhancer sequences that optimize inducibility of expression. Both projects demonstrate the utility of sequence information in conjunction with sophisticated statistical inference techniques for dissecting underlying biophysical mechanisms.
Exclusion probabilities and likelihood ratios with applications to kinship problems.
Slooten, Klaas-Jan; Egeland, Thore
2014-05-01
In forensic genetics, DNA profiles are compared in order to make inferences, paternity cases being a standard example. The statistical evidence can be summarized and reported in several ways. For example, in a paternity case, the likelihood ratio (LR) and the probability of not excluding a random man as father (RMNE) are two common summary statistics. There has been a long debate on the merits of the two statistics, also in the context of DNA mixture interpretation, and no general consensus has been reached. In this paper, we show that the RMNE is a certain weighted average of inverse likelihood ratios. This is true in any forensic context. We show that the likelihood ratio in favor of the correct hypothesis is, in expectation, bigger than the reciprocal of the RMNE probability. However, with the exception of pathological cases, it is also possible to obtain smaller likelihood ratios. We illustrate this result for paternity cases. Moreover, some theoretical properties of the likelihood ratio for a large class of general pairwise kinship cases, including expected value and variance, are derived. The practical implications of the findings are discussed and exemplified.
Skelly, Daniel A.; Johansson, Marnie; Madeoy, Jennifer; Wakefield, Jon; Akey, Joshua M.
2011-01-01
Variation in gene expression is thought to make a significant contribution to phenotypic diversity among individuals within populations. Although high-throughput cDNA sequencing offers a unique opportunity to delineate the genome-wide architecture of regulatory variation, new statistical methods need to be developed to capitalize on the wealth of information contained in RNA-seq data sets. To this end, we developed a powerful and flexible hierarchical Bayesian model that combines information across loci to allow both global and locus-specific inferences about allele-specific expression (ASE). We applied our methodology to a large RNA-seq data set obtained in a diploid hybrid of two diverse Saccharomyces cerevisiae strains, as well as to RNA-seq data from an individual human genome. Our statistical framework accurately quantifies levels of ASE with specified false-discovery rates, achieving high reproducibility between independent sequencing platforms. We pinpoint loci that show unusual and biologically interesting patterns of ASE, including allele-specific alternative splicing and transcription termination sites. Our methodology provides a rigorous, quantitative, and high-resolution tool for profiling ASE across whole genomes. PMID:21873452
Wolfson, Julian; Henn, Lisa
2014-01-01
In many areas of clinical investigation there is great interest in identifying and validating surrogate endpoints, biomarkers that can be measured a relatively short time after a treatment has been administered and that can reliably predict the effect of treatment on the clinical outcome of interest. However, despite dramatic advances in the ability to measure biomarkers, the recent history of clinical research is littered with failed surrogates. In this paper, we present a statistical perspective on why identifying surrogate endpoints is so difficult. We view the problem from the framework of causal inference, with a particular focus on the technique of principal stratification (PS), an approach which is appealing because the resulting estimands are not biased by unmeasured confounding. In many settings, PS estimands are not statistically identifiable and their degree of non-identifiability can be thought of as representing the statistical difficulty of assessing the surrogate value of a biomarker. In this work, we examine the identifiability issue and present key simplifying assumptions and enhanced study designs that enable the partial or full identification of PS estimands. We also present example situations where these assumptions and designs may or may not be feasible, providing insight into the problem characteristics which make the statistical evaluation of surrogate endpoints so challenging.
2014-01-01
In many areas of clinical investigation there is great interest in identifying and validating surrogate endpoints, biomarkers that can be measured a relatively short time after a treatment has been administered and that can reliably predict the effect of treatment on the clinical outcome of interest. However, despite dramatic advances in the ability to measure biomarkers, the recent history of clinical research is littered with failed surrogates. In this paper, we present a statistical perspective on why identifying surrogate endpoints is so difficult. We view the problem from the framework of causal inference, with a particular focus on the technique of principal stratification (PS), an approach which is appealing because the resulting estimands are not biased by unmeasured confounding. In many settings, PS estimands are not statistically identifiable and their degree of non-identifiability can be thought of as representing the statistical difficulty of assessing the surrogate value of a biomarker. In this work, we examine the identifiability issue and present key simplifying assumptions and enhanced study designs that enable the partial or full identification of PS estimands. We also present example situations where these assumptions and designs may or may not be feasible, providing insight into the problem characteristics which make the statistical evaluation of surrogate endpoints so challenging. PMID:25342953
Daniel Goodman’s empirical approach to Bayesian statistics
Gerrodette, Tim; Ward, Eric; Taylor, Rebecca L.; Schwarz, Lisa K.; Eguchi, Tomoharu; Wade, Paul; Himes Boor, Gina
2016-01-01
Bayesian statistics, in contrast to classical statistics, uses probability to represent uncertainty about the state of knowledge. Bayesian statistics has often been associated with the idea that knowledge is subjective and that a probability distribution represents a personal degree of belief. Dr. Daniel Goodman considered this viewpoint problematic for issues of public policy. He sought to ground his Bayesian approach in data, and advocated the construction of a prior as an empirical histogram of “similar” cases. In this way, the posterior distribution that results from a Bayesian analysis combined comparable previous data with case-specific current data, using Bayes’ formula. Goodman championed such a data-based approach, but he acknowledged that it was difficult in practice. If based on a true representation of our knowledge and uncertainty, Goodman argued that risk assessment and decision-making could be an exact science, despite the uncertainties. In his view, Bayesian statistics is a critical component of this science because a Bayesian analysis produces the probabilities of future outcomes. Indeed, Goodman maintained that the Bayesian machinery, following the rules of conditional probability, offered the best legitimate inference from available data. We give an example of an informative prior in a recent study of Steller sea lion spatial use patterns in Alaska.
NASA Technical Reports Server (NTRS)
Matney, M.; Barker, E.; Seitzer, P.; Abercromby, K. J.; Rodriquez, H. M.
2006-01-01
NASA's Orbital Debris measurements program has a goal to characterize the small debris environment in the geosynchronous Earth-orbit (GEO) region using optical telescopes ("small" refers to objects too small to catalog and track with current systems). Traditionally, observations of GEO and near-GEO objects involve following the object with the telescope long enough to obtain an orbit suitable for tracking purposes. Telescopes operating in survey mode, however, randomly observe objects that pass through their field of view. Typically, these short-arc observation are inadequate to obtain detailed orbits, but can be used to estimate approximate circular orbit elements (semimajor axis, inclination, and ascending node). From this information, it should be possible to make statistical inferences about the orbital distributions of the GEO population bright enough to be observed by the system. The Michigan Orbital Debris Survey Telescope (MODEST) has been making such statistical surveys of the GEO region for four years. During that time, the telescope has made enough observations in enough areas of the GEO belt to have had nearly complete coverage. That means that almost all objects in all possible orbits in the GEO and near- GEO region had a non-zero chance of being observed. Some regions (such as those near zero inclination) have had good coverage, while others are poorly covered. Nevertheless, it is possible to remove these statistical biases and reconstruct the orbit populations within the limits of sampling error. In this paper, these statistical techniques and assumptions are described, and the techniques are applied to the current MODEST data set to arrive at our best estimate of the GEO orbit population distribution.
Reasoning about Causal Relationships: Inferences on Causal Networks
Rottman, Benjamin Margolin; Hastie, Reid
2013-01-01
Over the last decade, a normative framework for making causal inferences, Bayesian Probabilistic Causal Networks, has come to dominate psychological studies of inference based on causal relationships. The following causal networks—[X→Y→Z, X←Y→Z, X→Y←Z]—supply answers for questions like, “Suppose both X and Y occur, what is the probability Z occurs?” or “Suppose you intervene and make Y occur, what is the probability Z occurs?” In this review, we provide a tutorial for how normatively to calculate these inferences. Then, we systematically detail the results of behavioral studies comparing human qualitative and quantitative judgments to the normative calculations for many network structures and for several types of inferences on those networks. Overall, when the normative calculations imply that an inference should increase, judgments usually go up; when calculations imply a decrease, judgments usually go down. However, two systematic deviations appear. First, people’s inferences violate the Markov assumption. For example, when inferring Z from the structure X→Y→Z, people think that X is relevant even when Y completely mediates the relationship between X and Z. Second, even when people’s inferences are directionally consistent with the normative calculations, they are often not as sensitive to the parameters and the structure of the network as they should be. We conclude with a discussion of productive directions for future research. PMID:23544658
Hofmann, B
2008-06-01
Are there similarities between scientific and moral inference? This is the key question in this article. It takes as its point of departure an instance of one person's story in the media changing both Norwegian public opinion and a brand-new Norwegian law prohibiting the use of saviour siblings. The case appears to falsify existing norms and to establish new ones. The analysis of this case reveals similarities in the modes of inference in science and morals, inasmuch as (a) a single case functions as a counter-example to an existing rule; (b) there is a common presupposition of stability, similarity and order, which makes it possible to reason from a few cases to a general rule; and (c) this makes it possible to hold things together and retain order. In science, these modes of inference are referred to as falsification, induction and consistency. In morals, they have a variety of other names. Hence, even without abandoning the fact-value divide, there appear to be similarities between inference in science and inference in morals, which may encourage communication across the boundaries between "the two cultures" and which are relevant to medical humanities.
CADDIS Volume 4. Data Analysis: PECBO Appendix - R Scripts for Non-Parametric Regressions
Script for computing nonparametric regression analysis. Overview of using scripts to infer environmental conditions from biological observations, statistically estimating species-environment relationships, statistical scripts.
Global Quantitative Modeling of Chromatin Factor Interactions
Zhou, Jian; Troyanskaya, Olga G.
2014-01-01
Chromatin is the driver of gene regulation, yet understanding the molecular interactions underlying chromatin factor combinatorial patterns (or the “chromatin codes”) remains a fundamental challenge in chromatin biology. Here we developed a global modeling framework that leverages chromatin profiling data to produce a systems-level view of the macromolecular complex of chromatin. Our model ultilizes maximum entropy modeling with regularization-based structure learning to statistically dissect dependencies between chromatin factors and produce an accurate probability distribution of chromatin code. Our unsupervised quantitative model, trained on genome-wide chromatin profiles of 73 histone marks and chromatin proteins from modENCODE, enabled making various data-driven inferences about chromatin profiles and interactions. We provided a highly accurate predictor of chromatin factor pairwise interactions validated by known experimental evidence, and for the first time enabled higher-order interaction prediction. Our predictions can thus help guide future experimental studies. The model can also serve as an inference engine for predicting unknown chromatin profiles — we demonstrated that with this approach we can leverage data from well-characterized cell types to help understand less-studied cell type or conditions. PMID:24675896
p-Curve and p-Hacking in Observational Research.
Bruns, Stephan B; Ioannidis, John P A
2016-01-01
The p-curve, the distribution of statistically significant p-values of published studies, has been used to make inferences on the proportion of true effects and on the presence of p-hacking in the published literature. We analyze the p-curve for observational research in the presence of p-hacking. We show by means of simulations that even with minimal omitted-variable bias (e.g., unaccounted confounding) p-curves based on true effects and p-curves based on null-effects with p-hacking cannot be reliably distinguished. We also demonstrate this problem using as practical example the evaluation of the effect of malaria prevalence on economic growth between 1960 and 1996. These findings call recent studies into question that use the p-curve to infer that most published research findings are based on true effects in the medical literature and in a wide range of disciplines. p-values in observational research may need to be empirically calibrated to be interpretable with respect to the commonly used significance threshold of 0.05. Violations of randomization in experimental studies may also result in situations where the use of p-curves is similarly unreliable.
The logical foundations of forensic science: towards reliable knowledge.
Evett, Ian
2015-08-05
The generation of observations is a technical process and the advances that have been made in forensic science techniques over the last 50 years have been staggering. But science is about reasoning-about making sense from observations. For the forensic scientist, this is the challenge of interpreting a pattern of observations within the context of a legal trial. Here too, there have been major advances over recent years and there is a broad consensus among serious thinkers, both scientific and legal, that the logical framework is furnished by Bayesian inference (Aitken et al. Fundamentals of Probability and Statistical Evidence in Criminal Proceedings). This paper shows how the paradigm has matured, centred on the notion of the balanced scientist. Progress through the courts has not been always smooth and difficulties arising from recent judgments are discussed. Nevertheless, the future holds exciting prospects, in particular the opportunities for managing and calibrating the knowledge of the forensic scientists who assign the probabilities that are at the foundation of logical inference in the courtroom. © 2015 The Author(s) Published by the Royal Society. All rights reserved.
Alphabetic letter identification: Effects of perceivability, similarity, and bias☆
Mueller, Shane T.; Weidemann, Christoph T.
2012-01-01
The legibility of the letters in the Latin alphabet has been measured numerous times since the beginning of experimental psychology. To identify the theoretical mechanisms attributed to letter identification, we report a comprehensive review of literature, spanning more than a century. This review revealed that identification accuracy has frequently been attributed to a subset of three common sources: perceivability, bias, and similarity. However, simultaneous estimates of these values have rarely (if ever) been performed. We present the results of two new experiments which allow for the simultaneous estimation of these factors, and examine how the shape of a visual mask impacts each of them, as inferred through a new statistical model. Results showed that the shape and identity of the mask impacted the inferred perceivability, bias, and similarity space of a letter set, but that there were aspects of similarity that were robust to the choice of mask. The results illustrate how the psychological concepts of perceivability, bias, and similarity can be estimated simultaneously, and how each make powerful contributions to visual letter identification. PMID:22036587
Kwak, Sehyun; Svensson, J; Brix, M; Ghim, Y-C
2016-02-01
A Bayesian model of the emission spectrum of the JET lithium beam has been developed to infer the intensity of the Li I (2p-2s) line radiation and associated uncertainties. The detected spectrum for each channel of the lithium beam emission spectroscopy system is here modelled by a single Li line modified by an instrumental function, Bremsstrahlung background, instrumental offset, and interference filter curve. Both the instrumental function and the interference filter curve are modelled with non-parametric Gaussian processes. All free parameters of the model, the intensities of the Li line, Bremsstrahlung background, and instrumental offset, are inferred using Bayesian probability theory with a Gaussian likelihood for photon statistics and electronic background noise. The prior distributions of the free parameters are chosen as Gaussians. Given these assumptions, the intensity of the Li line and corresponding uncertainties are analytically available using a Bayesian linear inversion technique. The proposed approach makes it possible to extract the intensity of Li line without doing a separate background subtraction through modulation of the Li beam.
The perception of rational, goal-directed action in nonhuman primates.
Wood, Justin N; Glynn, David D; Phillips, Brenda C; Hauser, Marc D
2007-09-07
Humans are capable of making inferences about other individuals' intentions and goals by evaluating their actions in relation to the constraints imposed by the environment. This capacity enables humans to go beyond the surface appearance of behavior to draw inferences about an individual's mental states. Presently unclear is whether this capacity is uniquely human or is shared with other animals. We show that cotton-top tamarins, rhesus macaques, and chimpanzees all make spontaneous inferences about a human experimenter's goal by attending to the environmental constraints that guide rational action. These findings rule out simple associative accounts of action perception and show that our capacity to infer rational, goal-directed action likely arose at least as far back as the New World monkeys, some 40 million years ago.
ERIC Educational Resources Information Center
Mislevy, Robert J.
Educational test theory consists of statistical and methodological tools to support inferences about examinees' knowledge, skills, and accomplishments. The evolution of test theory has been shaped by the nature of users' inferences which, until recently, have been framed almost exclusively in terms of trait and behavioral psychology. Progress in…
Data-driven sensitivity inference for Thomson scattering electron density measurement systems.
Fujii, Keisuke; Yamada, Ichihiro; Hasuo, Masahiro
2017-01-01
We developed a method to infer the calibration parameters of multichannel measurement systems, such as channel variations of sensitivity and noise amplitude, from experimental data. We regard such uncertainties of the calibration parameters as dependent noise. The statistical properties of the dependent noise and that of the latent functions were modeled and implemented in the Gaussian process kernel. Based on their statistical difference, both parameters were inferred from the data. We applied this method to the electron density measurement system by Thomson scattering for the Large Helical Device plasma, which is equipped with 141 spatial channels. Based on the 210 sets of experimental data, we evaluated the correction factor of the sensitivity and noise amplitude for each channel. The correction factor varies by ≈10%, and the random noise amplitude is ≈2%, i.e., the measurement accuracy increases by a factor of 5 after this sensitivity correction. The certainty improvement in the spatial derivative inference was demonstrated.
An integrated data model to estimate spatiotemporal occupancy, abundance, and colonization dynamics.
Williams, Perry J; Hooten, Mevin B; Womble, Jamie N; Esslinger, George G; Bower, Michael R; Hefley, Trevor J
2017-02-01
Ecological invasions and colonizations occur dynamically through space and time. Estimating the distribution and abundance of colonizing species is critical for efficient management or conservation. We describe a statistical framework for simultaneously estimating spatiotemporal occupancy and abundance dynamics of a colonizing species. Our method accounts for several issues that are common when modeling spatiotemporal ecological data including multiple levels of detection probability, multiple data sources, and computational limitations that occur when making fine-scale inference over a large spatiotemporal domain. We apply the model to estimate the colonization dynamics of sea otters (Enhydra lutris) in Glacier Bay, in southeastern Alaska. © 2016 by the Ecological Society of America.
Mukaka, Mavuto; White, Sarah A; Terlouw, Dianne J; Mwapasa, Victor; Kalilani-Phiri, Linda; Faragher, E Brian
2016-07-22
Missing outcomes can seriously impair the ability to make correct inferences from randomized controlled trials (RCTs). Complete case (CC) analysis is commonly used, but it reduces sample size and is perceived to lead to reduced statistical efficiency of estimates while increasing the potential for bias. As multiple imputation (MI) methods preserve sample size, they are generally viewed as the preferred analytical approach. We examined this assumption, comparing the performance of CC and MI methods to determine risk difference (RD) estimates in the presence of missing binary outcomes. We conducted simulation studies of 5000 simulated data sets with 50 imputations of RCTs with one primary follow-up endpoint at different underlying levels of RD (3-25 %) and missing outcomes (5-30 %). For missing at random (MAR) or missing completely at random (MCAR) outcomes, CC method estimates generally remained unbiased and achieved precision similar to or better than MI methods, and high statistical coverage. Missing not at random (MNAR) scenarios yielded invalid inferences with both methods. Effect size estimate bias was reduced in MI methods by always including group membership even if this was unrelated to missingness. Surprisingly, under MAR and MCAR conditions in the assessed scenarios, MI offered no statistical advantage over CC methods. While MI must inherently accompany CC methods for intention-to-treat analyses, these findings endorse CC methods for per protocol risk difference analyses in these conditions. These findings provide an argument for the use of the CC approach to always complement MI analyses, with the usual caveat that the validity of the mechanism for missingness be thoroughly discussed. More importantly, researchers should strive to collect as much data as possible.
Wen, Shihua; Zhang, Lanju; Yang, Bo
2014-07-01
The Problem formulation, Objectives, Alternatives, Consequences, Trade-offs, Uncertainties, Risk attitude, and Linked decisions (PrOACT-URL) framework and multiple criteria decision analysis (MCDA) have been recommended by the European Medicines Agency for structured benefit-risk assessment of medicinal products undergoing regulatory review. The objective of this article was to provide solutions to incorporate the uncertainty from clinical data into the MCDA model when evaluating the overall benefit-risk profiles among different treatment options. Two statistical approaches, the δ-method approach and the Monte-Carlo approach, were proposed to construct the confidence interval of the overall benefit-risk score from the MCDA model as well as other probabilistic measures for comparing the benefit-risk profiles between treatment options. Both approaches can incorporate the correlation structure between clinical parameters (criteria) in the MCDA model and are straightforward to implement. The two proposed approaches were applied to a case study to evaluate the benefit-risk profile of an add-on therapy for rheumatoid arthritis (drug X) relative to placebo. It demonstrated a straightforward way to quantify the impact of the uncertainty from clinical data to the benefit-risk assessment and enabled statistical inference on evaluating the overall benefit-risk profiles among different treatment options. The δ-method approach provides a closed form to quantify the variability of the overall benefit-risk score in the MCDA model, whereas the Monte-Carlo approach is more computationally intensive but can yield its true sampling distribution for statistical inference. The obtained confidence intervals and other probabilistic measures from the two approaches enhance the benefit-risk decision making of medicinal products. Copyright © 2014 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
Feder, Paul I; Ma, Zhenxu J; Bull, Richard J; Teuschler, Linda K; Rice, Glenn
2009-01-01
In chemical mixtures risk assessment, the use of dose-response data developed for one mixture to estimate risk posed by a second mixture depends on whether the two mixtures are sufficiently similar. While evaluations of similarity may be made using qualitative judgments, this article uses nonparametric statistical methods based on the "bootstrap" resampling technique to address the question of similarity among mixtures of chemical disinfectant by-products (DBP) in drinking water. The bootstrap resampling technique is a general-purpose, computer-intensive approach to statistical inference that substitutes empirical sampling for theoretically based parametric mathematical modeling. Nonparametric, bootstrap-based inference involves fewer assumptions than parametric normal theory based inference. The bootstrap procedure is appropriate, at least in an asymptotic sense, whether or not the parametric, distributional assumptions hold, even approximately. The statistical analysis procedures in this article are initially illustrated with data from 5 water treatment plants (Schenck et al., 2009), and then extended using data developed from a study of 35 drinking-water utilities (U.S. EPA/AMWA, 1989), which permits inclusion of a greater number of water constituents and increased structure in the statistical models.
White, H; Racine, J
2001-01-01
We propose tests for individual and joint irrelevance of network inputs. Such tests can be used to determine whether an input or group of inputs "belong" in a particular model, thus permitting valid statistical inference based on estimated feedforward neural-network models. The approaches employ well-known statistical resampling techniques. We conduct a small Monte Carlo experiment showing that our tests have reasonable level and power behavior, and we apply our methods to examine whether there are predictable regularities in foreign exchange rates. We find that exchange rates do appear to contain information that is exploitable for enhanced point prediction, but the nature of the predictive relations evolves through time.
Evaluating the Impact of a Multistrategy Inference Intervention for Middle-Grade Struggling Readers
ERIC Educational Resources Information Center
Barth, Amy E.; Elleman, Amy
2017-01-01
Purpose: We examined the effectiveness of a multistrategy inference intervention designed to increase inference making and reading comprehension for middle-grade struggling readers. Method: A total of 66 middle-grade struggling readers were randomized to treatment (n = 33) and comparison (n = 33) conditions. Students in the treatment group…
ERIC Educational Resources Information Center
Hall, Colby; Barnes, Marcia A.
2017-01-01
Making inferences during reading is a critical standards-based skill and is important for reading comprehension. This article supports the improvement of reading comprehension for students with learning disabilities (LD) in upper elementary grades by reviewing what is currently known about inference instruction for students with LD and providing…
FUNSTAT and statistical image representations
NASA Technical Reports Server (NTRS)
Parzen, E.
1983-01-01
General ideas of functional statistical inference analysis of one sample and two samples, univariate and bivariate are outlined. ONESAM program is applied to analyze the univariate probability distributions of multi-spectral image data.
Preparing for the first meeting with a statistician.
De Muth, James E
2008-12-15
Practical statistical issues that should be considered when performing data collection and analysis are reviewed. The meeting with a statistician should take place early in the research development before any study data are collected. The process of statistical analysis involves establishing the research question, formulating a hypothesis, selecting an appropriate test, sampling correctly, collecting data, performing tests, and making decisions. Once the objectives are established, the researcher can determine the characteristics or demographics of the individuals required for the study, how to recruit volunteers, what type of data are needed to answer the research question(s), and the best methods for collecting the required information. There are two general types of statistics: descriptive and inferential. Presenting data in a more palatable format for the reader is called descriptive statistics. Inferential statistics involve making an inference or decision about a population based on results obtained from a sample of that population. In order for the results of a statistical test to be valid, the sample should be representative of the population from which it is drawn. When collecting information about volunteers, researchers should only collect information that is directly related to the study objectives. Important information that a statistician will require first is an understanding of the type of variables involved in the study and which variables can be controlled by researchers and which are beyond their control. Data can be presented in one of four different measurement scales: nominal, ordinal, interval, or ratio. Hypothesis testing involves two mutually exclusive and exhaustive statements related to the research question. Statisticians should not be replaced by computer software, and they should be consulted before any research data are collected. When preparing to meet with a statistician, the pharmacist researcher should be familiar with the steps of statistical analysis and consider several questions related to the study to be conducted.
Intelligent machines in the twenty-first century: foundations of inference and inquiry.
Knuth, Kevin H
2003-12-15
The last century saw the application of Boolean algebra to the construction of computing machines, which work by applying logical transformations to information contained in their memory. The development of information theory and the generalization of Boolean algebra to Bayesian inference have enabled these computing machines, in the last quarter of the twentieth century, to be endowed with the ability to learn by making inferences from data. This revolution is just beginning as new computational techniques continue to make difficult problems more accessible. Recent advances in our understanding of the foundations of probability theory have revealed implications for areas other than logic. Of relevance to intelligent machines, we recently identified the algebra of questions as the free distributive algebra, which will now allow us to work with questions in a way analogous to that which Boolean algebra enables us to work with logical statements. In this paper, we examine the foundations of inference and inquiry. We begin with a history of inferential reasoning, highlighting key concepts that have led to the automation of inference in modern machine-learning systems. We then discuss the foundations of inference in more detail using a modern viewpoint that relies on the mathematics of partially ordered sets and the scaffolding of lattice theory. This new viewpoint allows us to develop the logic of inquiry and introduce a measure describing the relevance of a proposed question to an unresolved issue. Last, we will demonstrate the automation of inference, and discuss how this new logic of inquiry will enable intelligent machines to ask questions. Automation of both inference and inquiry promises to allow robots to perform science in the far reaches of our solar system and in other star systems by enabling them not only to make inferences from data, but also to decide which question to ask, which experiment to perform, or which measurement to take given what they have learned and what they are designed to understand.
Intelligent machines in the twenty-first century: foundations of inference and inquiry
NASA Technical Reports Server (NTRS)
Knuth, Kevin H.
2003-01-01
The last century saw the application of Boolean algebra to the construction of computing machines, which work by applying logical transformations to information contained in their memory. The development of information theory and the generalization of Boolean algebra to Bayesian inference have enabled these computing machines, in the last quarter of the twentieth century, to be endowed with the ability to learn by making inferences from data. This revolution is just beginning as new computational techniques continue to make difficult problems more accessible. Recent advances in our understanding of the foundations of probability theory have revealed implications for areas other than logic. Of relevance to intelligent machines, we recently identified the algebra of questions as the free distributive algebra, which will now allow us to work with questions in a way analogous to that which Boolean algebra enables us to work with logical statements. In this paper, we examine the foundations of inference and inquiry. We begin with a history of inferential reasoning, highlighting key concepts that have led to the automation of inference in modern machine-learning systems. We then discuss the foundations of inference in more detail using a modern viewpoint that relies on the mathematics of partially ordered sets and the scaffolding of lattice theory. This new viewpoint allows us to develop the logic of inquiry and introduce a measure describing the relevance of a proposed question to an unresolved issue. Last, we will demonstrate the automation of inference, and discuss how this new logic of inquiry will enable intelligent machines to ask questions. Automation of both inference and inquiry promises to allow robots to perform science in the far reaches of our solar system and in other star systems by enabling them not only to make inferences from data, but also to decide which question to ask, which experiment to perform, or which measurement to take given what they have learned and what they are designed to understand.
Data free inference with processed data products
Chowdhary, K.; Najm, H. N.
2014-07-12
Here, we consider the context of probabilistic inference of model parameters given error bars or confidence intervals on model output values, when the data is unavailable. We introduce a class of algorithms in a Bayesian framework, relying on maximum entropy arguments and approximate Bayesian computation methods, to generate consistent data with the given summary statistics. Once we obtain consistent data sets, we pool the respective posteriors, to arrive at a single, averaged density on the parameters. This approach allows us to perform accurate forward uncertainty propagation consistent with the reported statistics.
Targeted estimation of nuisance parameters to obtain valid statistical inference.
van der Laan, Mark J
2014-01-01
In order to obtain concrete results, we focus on estimation of the treatment specific mean, controlling for all measured baseline covariates, based on observing independent and identically distributed copies of a random variable consisting of baseline covariates, a subsequently assigned binary treatment, and a final outcome. The statistical model only assumes possible restrictions on the conditional distribution of treatment, given the covariates, the so-called propensity score. Estimators of the treatment specific mean involve estimation of the propensity score and/or estimation of the conditional mean of the outcome, given the treatment and covariates. In order to make these estimators asymptotically unbiased at any data distribution in the statistical model, it is essential to use data-adaptive estimators of these nuisance parameters such as ensemble learning, and specifically super-learning. Because such estimators involve optimal trade-off of bias and variance w.r.t. the infinite dimensional nuisance parameter itself, they result in a sub-optimal bias/variance trade-off for the resulting real-valued estimator of the estimand. We demonstrate that additional targeting of the estimators of these nuisance parameters guarantees that this bias for the estimand is second order and thereby allows us to prove theorems that establish asymptotic linearity of the estimator of the treatment specific mean under regularity conditions. These insights result in novel targeted minimum loss-based estimators (TMLEs) that use ensemble learning with additional targeted bias reduction to construct estimators of the nuisance parameters. In particular, we construct collaborative TMLEs (C-TMLEs) with known influence curve allowing for statistical inference, even though these C-TMLEs involve variable selection for the propensity score based on a criterion that measures how effective the resulting fit of the propensity score is in removing bias for the estimand. As a particular special case, we also demonstrate the required targeting of the propensity score for the inverse probability of treatment weighted estimator using super-learning to fit the propensity score.
Statistics at the Chinese Universities.
1981-09-01
education in China in the postwar years is pro- vided to give some perspective. My observa- tions on statistics at the Chinese universities are necessarily...has been accepted as a member society of ISI. 3. Education in China Understanding of statistics in universities in China will be enhanced through some...programaming), Statistical Mathematics (infer- ence, data analysis, industrial statistics , information theory), tiathematical Physics (dif- ferential
The role of familiarity in binary choice inferences.
Honda, Hidehito; Abe, Keiga; Matsuka, Toshihiko; Yamagishi, Kimihiko
2011-07-01
In research on the recognition heuristic (Goldstein & Gigerenzer, Psychological Review, 109, 75-90, 2002), knowledge of recognized objects has been categorized as "recognized" or "unrecognized" without regard to the degree of familiarity of the recognized object. In the present article, we propose a new inference model--familiarity-based inference. We hypothesize that when subjective knowledge levels (familiarity) of recognized objects differ, the degree of familiarity of recognized objects will influence inferences. Specifically, people are predicted to infer that the more familiar object in a pair of two objects has a higher criterion value on the to-be-judged dimension. In two experiments, using a binary choice task, we examined inferences about populations in a pair of two cities. Results support predictions of familiarity-based inference. Participants inferred that the more familiar city in a pair was more populous. Statistical modeling showed that individual differences in familiarity-based inference lie in the sensitivity to differences in familiarity. In addition, we found that familiarity-based inference can be generally regarded as an ecologically rational inference. Furthermore, when cue knowledge about the inference criterion was available, participants made inferences based on the cue knowledge about population instead of familiarity. Implications of the role of familiarity in psychological processes are discussed.
Unconscious relational inference recruits the hippocampus.
Reber, Thomas P; Luechinger, Roger; Boesiger, Peter; Henke, Katharina
2012-05-02
Relational inference denotes the capacity to encode, flexibly retrieve, and integrate multiple memories to combine past experiences to update knowledge and improve decision-making in new situations. Although relational inference is thought to depend on the hippocampus and consciousness, we now show in young, healthy men that it may occur outside consciousness but still recruits the hippocampus. In temporally distinct and unique subliminal episodes, we presented word pairs that either overlapped ("winter-red", "red-computer") or not. Effects of unconscious relational inference emerged in reaction times recorded during unconscious encoding and in the outcome of decisions made 1 min later at test, when participants judged the semantic relatedness of two supraliminal words. These words were either episodically related through a common word ("winter-computer" related through "red") or unrelated. Hippocampal activity increased during the unconscious encoding of overlapping versus nonoverlapping word pairs and during the unconscious retrieval of episodically related versus unrelated words. Furthermore, hippocampal activity during unconscious encoding predicted the outcome of decisions made at test. Hence, unconscious inference may influence decision-making in new situations.
Win-Stay, Lose-Sample: a simple sequential algorithm for approximating Bayesian inference.
Bonawitz, Elizabeth; Denison, Stephanie; Gopnik, Alison; Griffiths, Thomas L
2014-11-01
People can behave in a way that is consistent with Bayesian models of cognition, despite the fact that performing exact Bayesian inference is computationally challenging. What algorithms could people be using to make this possible? We show that a simple sequential algorithm "Win-Stay, Lose-Sample", inspired by the Win-Stay, Lose-Shift (WSLS) principle, can be used to approximate Bayesian inference. We investigate the behavior of adults and preschoolers on two causal learning tasks to test whether people might use a similar algorithm. These studies use a "mini-microgenetic method", investigating how people sequentially update their beliefs as they encounter new evidence. Experiment 1 investigates a deterministic causal learning scenario and Experiments 2 and 3 examine how people make inferences in a stochastic scenario. The behavior of adults and preschoolers in these experiments is consistent with our Bayesian version of the WSLS principle. This algorithm provides both a practical method for performing Bayesian inference and a new way to understand people's judgments. Copyright © 2014 Elsevier Inc. All rights reserved.
Inferences of Recent and Ancient Human Population History Using Genetic and Non-Genetic Data
ERIC Educational Resources Information Center
Kitchen, Andrew
2008-01-01
I have adopted complementary approaches to inferring human demographic history utilizing human and non-human genetic data as well as cultural data. These complementary approaches form an interdisciplinary perspective that allows one to make inferences of human history at varying timescales, from the events that occurred tens of thousands of years…
ERIC Educational Resources Information Center
Drummond, Dian
2016-01-01
The purpose of this qualitative phenomenological study was to ascertain the perceptions of second- through fourth-grade teachers' knowledge of inference, strategies used by teachers for developing inference, and the instructional tools that teachers use to teach their students to make inferences. The participants in this study were eight…
Theory of Mind: An Overview and Behavioral Perspective
ERIC Educational Resources Information Center
Schlinger, Henry D., Jr.
2009-01-01
Theory of mind (ToM) refers to the ability of an individual to make inferences about what others may be thinking or feeling and to predict what they may do in a given situation based on those inferences. Discussions of ToM focus almost exclusively on inferred cognitive structures and processes and shed little light on the actual behaviors…
Optimal Contrast: Competition between Two Referents Improves Word Learning
ERIC Educational Resources Information Center
Zosh, Jennifer M.; Brinster, Meredith; Halberda, Justin
2013-01-01
Does making an inference lead to better learning than being instructed directly? Two experiments evaluated preschoolers' ability to learn new words, comparing their memory for words learned via inference or instruction. On Inference trials, one familiar and one novel object was presented and children were asked to "Point at the [object name (i.e.,…
Why environmental scientists are becoming Bayesians
James S. Clark
2005-01-01
Advances in computational statistics provide a general framework for the high dimensional models typically needed for ecological inference and prediction. Hierarchical Bayes (HB) represents a modelling structure with capacity to exploit diverse sources of information, to accommodate influences that are unknown (or unknowable), and to draw inference on large numbers of...
ERIC Educational Resources Information Center
Meiser, Thorsten; Rummel, Jan; Fleig, Hanna
2018-01-01
Pseudocontingencies are inferences about correlations in the environment that are formed on the basis of statistical regularities like skewed base rates or varying base rates across environmental contexts. Previous research has demonstrated that pseudocontingencies provide a pervasive mechanism of inductive inference in numerous social judgment…
Cross-Situational Learning of Minimal Word Pairs
ERIC Educational Resources Information Center
Escudero, Paola; Mulak, Karen E.; Vlach, Haley A.
2016-01-01
"Cross-situational statistical learning" of words involves tracking co-occurrences of auditory words and objects across time to infer word-referent mappings. Previous research has demonstrated that learners can infer referents across sets of very phonologically distinct words (e.g., WUG, DAX), but it remains unknown whether learners can…
Dexter, Franklin; O'Neill, Liam; Xin, Lei; Ledolter, Johannes
2008-12-01
We use resampling of data to explore the basic statistical properties of super-efficient data envelopment analysis (DEA) when used as a benchmarking tool by the manager of a single decision-making unit. Our focus is the gaps in the outputs (i.e., slacks adjusted for upward bias), as they reveal which outputs can be increased. The numerical experiments show that the estimates of the gaps fail to exhibit asymptotic consistency, a property expected for standard statistical inference. Specifically, increased sample sizes were not always associated with more accurate forecasts of the output gaps. The baseline DEA's gaps equaled the mode of the jackknife and the mode of resampling with/without replacement from any subset of the population; usually, the baseline DEA's gaps also equaled the median. The quartile deviations of gaps were close to zero when few decision-making units were excluded from the sample and the study unit happened to have few other units contributing to its benchmark. The results for the quartile deviations can be explained in terms of the effective combinations of decision-making units that contribute to the DEA solution. The jackknife can provide all the combinations contributing to the quartile deviation and only needs to be performed for those units that are part of the benchmark set. These results show that there is a strong rationale for examining DEA results with a sensitivity analysis that excludes one benchmark hospital at a time. This analysis enhances the quality of decision support using DEA estimates for the potential ofa decision-making unit to grow one or more of its outputs.
Caley, Peter; Ramsey, David S L; Barry, Simon C
2015-01-01
A recent study has inferred that the red fox (Vulpes vulpes) is now widespread in Tasmania as of 2010, based on the extraction of fox DNA from predator scats. Heuristically, this inference appears at first glance to be at odds with the lack of recent confirmed discoveries of either road-killed foxes--the last of which occurred in 2006, or hunter killed foxes--the most recent in 2001. This paper demonstrates a method to codify this heuristic analysis and produce inferences consistent with assumptions and data. It does this by formalising the analysis in a transparent and repeatable manner to make inference on the past, present and future distribution of an invasive species. It utilizes Approximate Bayesian Computation to make inferences. Importantly, the method is able to inform management of invasive species within realistic time frames, and can be applied widely. We illustrate the technique using the Tasmanian fox data. Based on the pattern of carcass discoveries of foxes in Tasmania, we infer that the population of foxes in Tasmania is most likely extinct, or restricted in distribution and demographically weak as of 2013. It is possible, though unlikely, that that population is widespread and/or demographically robust. This inference is largely at odds with the inference from the predator scat survey data. Our results suggest the chances of successfully eradicating the introduced red fox population in Tasmania may be significantly higher than previously thought.
Caley, Peter; Ramsey, David S. L.; Barry, Simon C.
2015-01-01
A recent study has inferred that the red fox (Vulpes vulpes) is now widespread in Tasmania as of 2010, based on the extraction of fox DNA from predator scats. Heuristically, this inference appears at first glance to be at odds with the lack of recent confirmed discoveries of either road-killed foxes—the last of which occurred in 2006, or hunter killed foxes—the most recent in 2001. This paper demonstrates a method to codify this heuristic analysis and produce inferences consistent with assumptions and data. It does this by formalising the analysis in a transparent and repeatable manner to make inference on the past, present and future distribution of an invasive species. It utilizes Approximate Bayesian Computation to make inferences. Importantly, the method is able to inform management of invasive species within realistic time frames, and can be applied widely. We illustrate the technique using the Tasmanian fox data. Based on the pattern of carcass discoveries of foxes in Tasmania, we infer that the population of foxes in Tasmania is most likely extinct, or restricted in distribution and demographically weak as of 2013. It is possible, though unlikely, that that population is widespread and/or demographically robust. This inference is largely at odds with the inference from the predator scat survey data. Our results suggest the chances of successfully eradicating the introduced red fox population in Tasmania may be significantly higher than previously thought. PMID:25602618
Csilléry, Katalin; Kunstler, Georges; Courbaud, Benoît; Allard, Denis; Lassègues, Pierre; Haslinger, Klaus; Gardiner, Barry
2017-12-01
Damage due to wind-storms and droughts is increasing in many temperate forests, yet little is known about the long-term roles of these key climatic factors in forest dynamics and in the carbon budget. The objective of this study was to estimate individual and coupled effects of droughts and wind-storms on adult tree mortality across a 31-year period in 115 managed, mixed coniferous forest stands from the Western Alps and the Jura mountains. For each stand, yearly mortality was inferred from management records, yearly drought from interpolated fields of monthly temperature, precipitation and soil water holding capacity, and wind-storms from interpolated fields of daily maximum wind speed. We performed a thorough model selection based on a leave-one-out cross-validation of the time series. We compared different critical wind speeds (CWSs) for damage, wind-storm, and stand variables and statistical models. We found that a model including stand characteristics, drought, and storm strength using a CWS of 25 ms -1 performed the best across most stands. Using this best model, we found that drought increased damage risk only in the most southerly forests, and its effect is generally maintained for up to 2 years. Storm strength increased damage risk in all forests in a relatively uniform way. In some stands, we found positive interaction between drought and storm strength most likely because drought weakens trees, and they became more prone to stem breakage under wind-loading. In other stands, we found negative interaction between drought and storm strength, where excessive rain likely leads to soil water saturation making trees more susceptible to overturning in a wind-storm. Our results stress that temporal data are essential to make valid inferences about ecological impacts of disturbance events, and that making inferences about disturbance agents separately can be of limited validity. Under projected future climatic conditions, the direction and strength of these ecological interactions could also change. © 2017 John Wiley & Sons Ltd.
Statistical analysis of fNIRS data: a comprehensive review.
Tak, Sungho; Ye, Jong Chul
2014-01-15
Functional near-infrared spectroscopy (fNIRS) is a non-invasive method to measure brain activities using the changes of optical absorption in the brain through the intact skull. fNIRS has many advantages over other neuroimaging modalities such as positron emission tomography (PET), functional magnetic resonance imaging (fMRI), or magnetoencephalography (MEG), since it can directly measure blood oxygenation level changes related to neural activation with high temporal resolution. However, fNIRS signals are highly corrupted by measurement noises and physiology-based systemic interference. Careful statistical analyses are therefore required to extract neuronal activity-related signals from fNIRS data. In this paper, we provide an extensive review of historical developments of statistical analyses of fNIRS signal, which include motion artifact correction, short source-detector separation correction, principal component analysis (PCA)/independent component analysis (ICA), false discovery rate (FDR), serially-correlated errors, as well as inference techniques such as the standard t-test, F-test, analysis of variance (ANOVA), and statistical parameter mapping (SPM) framework. In addition, to provide a unified view of various existing inference techniques, we explain a linear mixed effect model with restricted maximum likelihood (ReML) variance estimation, and show that most of the existing inference methods for fNIRS analysis can be derived as special cases. Some of the open issues in statistical analysis are also described. Copyright © 2013 Elsevier Inc. All rights reserved.
Accounting for measurement error: a critical but often overlooked process.
Harris, Edward F; Smith, Richard N
2009-12-01
Due to instrument imprecision and human inconsistencies, measurements are not free of error. Technical error of measurement (TEM) is the variability encountered between dimensions when the same specimens are measured at multiple sessions. A goal of a data collection regimen is to minimise TEM. The few studies that actually quantify TEM, regardless of discipline, report that it is substantial and can affect results and inferences. This paper reviews some statistical approaches for identifying and controlling TEM. Statistically, TEM is part of the residual ('unexplained') variance in a statistical test, so accounting for TEM, which requires repeated measurements, enhances the chances of finding a statistically significant difference if one exists. The aim of this paper was to review and discuss common statistical designs relating to types of error and statistical approaches to error accountability. This paper addresses issues of landmark location, validity, technical and systematic error, analysis of variance, scaled measures and correlation coefficients in order to guide the reader towards correct identification of true experimental differences. Researchers commonly infer characteristics about populations from comparatively restricted study samples. Most inferences are statistical and, aside from concerns about adequate accounting for known sources of variation with the research design, an important source of variability is measurement error. Variability in locating landmarks that define variables is obvious in odontometrics, cephalometrics and anthropometry, but the same concerns about measurement accuracy and precision extend to all disciplines. With increasing accessibility to computer-assisted methods of data collection, the ease of incorporating repeated measures into statistical designs has improved. Accounting for this technical source of variation increases the chance of finding biologically true differences when they exist.
Sumner, Jeremy G; Taylor, Amelia; Holland, Barbara R; Jarvis, Peter D
2017-12-01
Recently there has been renewed interest in phylogenetic inference methods based on phylogenetic invariants, alongside the related Markov invariants. Broadly speaking, both these approaches give rise to polynomial functions of sequence site patterns that, in expectation value, either vanish for particular evolutionary trees (in the case of phylogenetic invariants) or have well understood transformation properties (in the case of Markov invariants). While both approaches have been valued for their intrinsic mathematical interest, it is not clear how they relate to each other, and to what extent they can be used as practical tools for inference of phylogenetic trees. In this paper, by focusing on the special case of binary sequence data and quartets of taxa, we are able to view these two different polynomial-based approaches within a common framework. To motivate the discussion, we present three desirable statistical properties that we argue any invariant-based phylogenetic method should satisfy: (1) sensible behaviour under reordering of input sequences; (2) stability as the taxa evolve independently according to a Markov process; and (3) explicit dependence on the assumption of a continuous-time process. Motivated by these statistical properties, we develop and explore several new phylogenetic inference methods. In particular, we develop a statistically bias-corrected version of the Markov invariants approach which satisfies all three properties. We also extend previous work by showing that the phylogenetic invariants can be implemented in such a way as to satisfy property (3). A simulation study shows that, in comparison to other methods, our new proposed approach based on bias-corrected Markov invariants is extremely powerful for phylogenetic inference. The binary case is of particular theoretical interest as-in this case only-the Markov invariants can be expressed as linear combinations of the phylogenetic invariants. A wider implication of this is that, for models with more than two states-for example DNA sequence alignments with four-state models-we find that methods which rely on phylogenetic invariants are incapable of satisfying all three of the stated statistical properties. This is because in these cases the relevant Markov invariants belong to a class of polynomials independent from the phylogenetic invariants.
Imputation approaches for animal movement modeling
Scharf, Henry; Hooten, Mevin B.; Johnson, Devin S.
2017-01-01
The analysis of telemetry data is common in animal ecological studies. While the collection of telemetry data for individual animals has improved dramatically, the methods to properly account for inherent uncertainties (e.g., measurement error, dependence, barriers to movement) have lagged behind. Still, many new statistical approaches have been developed to infer unknown quantities affecting animal movement or predict movement based on telemetry data. Hierarchical statistical models are useful to account for some of the aforementioned uncertainties, as well as provide population-level inference, but they often come with an increased computational burden. For certain types of statistical models, it is straightforward to provide inference if the latent true animal trajectory is known, but challenging otherwise. In these cases, approaches related to multiple imputation have been employed to account for the uncertainty associated with our knowledge of the latent trajectory. Despite the increasing use of imputation approaches for modeling animal movement, the general sensitivity and accuracy of these methods have not been explored in detail. We provide an introduction to animal movement modeling and describe how imputation approaches may be helpful for certain types of models. We also assess the performance of imputation approaches in two simulation studies. Our simulation studies suggests that inference for model parameters directly related to the location of an individual may be more accurate than inference for parameters associated with higher-order processes such as velocity or acceleration. Finally, we apply these methods to analyze a telemetry data set involving northern fur seals (Callorhinus ursinus) in the Bering Sea. Supplementary materials accompanying this paper appear online.
Using a Five-Step Procedure for Inferential Statistical Analyses
ERIC Educational Resources Information Center
Kamin, Lawrence F.
2010-01-01
Many statistics texts pose inferential statistical problems in a disjointed way. By using a simple five-step procedure as a template for statistical inference problems, the student can solve problems in an organized fashion. The problem and its solution will thus be a stand-by-itself organic whole and a single unit of thought and effort. The…
Bayesian outcome-based strategy classification.
Lee, Michael D
2016-03-01
Hilbig and Moshagen (Psychonomic Bulletin & Review, 21, 1431-1443, 2014) recently developed a method for making inferences about the decision processes people use in multi-attribute forced choice tasks. Their paper makes a number of worthwhile theoretical and methodological contributions. Theoretically, they provide an insightful psychological motivation for a probabilistic extension of the widely-used "weighted additive" (WADD) model, and show how this model, as well as other important models like "take-the-best" (TTB), can and should be expressed in terms of meaningful priors. Methodologically, they develop an inference approach based on the Minimum Description Length (MDL) principles that balances both the goodness-of-fit and complexity of the decision models they consider. This paper aims to preserve these useful contributions, but provide a complementary Bayesian approach with some theoretical and methodological advantages. We develop a simple graphical model, implemented in JAGS, that allows for fully Bayesian inferences about which models people use to make decisions. To demonstrate the Bayesian approach, we apply it to the models and data considered by Hilbig and Moshagen (Psychonomic Bulletin & Review, 21, 1431-1443, 2014), showing how a prior predictive analysis of the models, and posterior inferences about which models people use and the parameter settings at which they use them, can contribute to our understanding of human decision making.
Design-based and model-based inference in surveys of freshwater mollusks
Dorazio, R.M.
1999-01-01
Well-known concepts in statistical inference and sampling theory are used to develop recommendations for planning and analyzing the results of quantitative surveys of freshwater mollusks. Two methods of inference commonly used in survey sampling (design-based and model-based) are described and illustrated using examples relevant in surveys of freshwater mollusks. The particular objectives of a survey and the type of information observed in each unit of sampling can be used to help select the sampling design and the method of inference. For example, the mean density of a sparsely distributed population of mollusks can be estimated with higher precision by using model-based inference or by using design-based inference with adaptive cluster sampling than by using design-based inference with conventional sampling. More experience with quantitative surveys of natural assemblages of freshwater mollusks is needed to determine the actual benefits of different sampling designs and inferential procedures.
Network meta-analysis: application and practice using Stata
2017-01-01
This review aimed to arrange the concepts of a network meta-analysis (NMA) and to demonstrate the analytical process of NMA using Stata software under frequentist framework. The NMA tries to synthesize evidences for a decision making by evaluating the comparative effectiveness of more than two alternative interventions for the same condition. Before conducting a NMA, 3 major assumptions—similarity, transitivity, and consistency—should be checked. The statistical analysis consists of 5 steps. The first step is to draw a network geometry to provide an overview of the network relationship. The second step checks the assumption of consistency. The third step is to make the network forest plot or interval plot in order to illustrate the summary size of comparative effectiveness among various interventions. The fourth step calculates cumulative rankings for identifying superiority among interventions. The last step evaluates publication bias or effect modifiers for a valid inference from results. The synthesized evidences through five steps would be very useful to evidence-based decision-making in healthcare. Thus, NMA should be activated in order to guarantee the quality of healthcare system. PMID:29092392
Network meta-analysis: application and practice using Stata.
Shim, Sungryul; Yoon, Byung-Ho; Shin, In-Soo; Bae, Jong-Myon
2017-01-01
This review aimed to arrange the concepts of a network meta-analysis (NMA) and to demonstrate the analytical process of NMA using Stata software under frequentist framework. The NMA tries to synthesize evidences for a decision making by evaluating the comparative effectiveness of more than two alternative interventions for the same condition. Before conducting a NMA, 3 major assumptions-similarity, transitivity, and consistency-should be checked. The statistical analysis consists of 5 steps. The first step is to draw a network geometry to provide an overview of the network relationship. The second step checks the assumption of consistency. The third step is to make the network forest plot or interval plot in order to illustrate the summary size of comparative effectiveness among various interventions. The fourth step calculates cumulative rankings for identifying superiority among interventions. The last step evaluates publication bias or effect modifiers for a valid inference from results. The synthesized evidences through five steps would be very useful to evidence-based decision-making in healthcare. Thus, NMA should be activated in order to guarantee the quality of healthcare system.
Confidence crisis of results in biomechanics research.
Knudson, Duane
2017-11-01
Many biomechanics studies have small sample sizes and incorrect statistical analyses, so reporting of inaccurate inferences and inflated magnitude of effects are common in the field. This review examines these issues in biomechanics research and summarises potential solutions from research in other fields to increase the confidence in the experimental effects reported in biomechanics. Authors, reviewers and editors of biomechanics research reports are encouraged to improve sample sizes and the resulting statistical power, improve reporting transparency, improve the rigour of statistical analyses used, and increase the acceptance of replication studies to improve the validity of inferences from data in biomechanics research. The application of sports biomechanics research results would also improve if a larger percentage of unbiased effects and their uncertainty were reported in the literature.
The Empirical Nature and Statistical Treatment of Missing Data
ERIC Educational Resources Information Center
Tannenbaum, Christyn E.
2009-01-01
Introduction. Missing data is a common problem in research and can produce severely misleading analyses, including biased estimates of statistical parameters, and erroneous conclusions. In its 1999 report, the APA Task Force on Statistical Inference encouraged authors to report complications such as missing data and discouraged the use of…
Cognitive Transfer Outcomes for a Simulation-Based Introductory Statistics Curriculum
ERIC Educational Resources Information Center
Backman, Matthew D.; Delmas, Robert C.; Garfield, Joan
2017-01-01
Cognitive transfer is the ability to apply learned skills and knowledge to new applications and contexts. This investigation evaluates cognitive transfer outcomes for a tertiary-level introductory statistics course using the CATALST curriculum, which exclusively used simulation-based methods to develop foundations of statistical inference. A…
The Role of the Sampling Distribution in Understanding Statistical Inference
ERIC Educational Resources Information Center
Lipson, Kay
2003-01-01
Many statistics educators believe that few students develop the level of conceptual understanding essential for them to apply correctly the statistical techniques at their disposal and to interpret their outcomes appropriately. It is also commonly believed that the sampling distribution plays an important role in developing this understanding.…
Gaussian copula as a likelihood function for environmental models
NASA Astrophysics Data System (ADS)
Wani, O.; Espadas, G.; Cecinati, F.; Rieckermann, J.
2017-12-01
Parameter estimation of environmental models always comes with uncertainty. To formally quantify this parametric uncertainty, a likelihood function needs to be formulated, which is defined as the probability of observations given fixed values of the parameter set. A likelihood function allows us to infer parameter values from observations using Bayes' theorem. The challenge is to formulate a likelihood function that reliably describes the error generating processes which lead to the observed monitoring data, such as rainfall and runoff. If the likelihood function is not representative of the error statistics, the parameter inference will give biased parameter values. Several uncertainty estimation methods that are currently being used employ Gaussian processes as a likelihood function, because of their favourable analytical properties. Box-Cox transformation is suggested to deal with non-symmetric and heteroscedastic errors e.g. for flow data which are typically more uncertain in high flows than in periods with low flows. Problem with transformations is that the results are conditional on hyper-parameters, for which it is difficult to formulate the analyst's belief a priori. In an attempt to address this problem, in this research work we suggest learning the nature of the error distribution from the errors made by the model in the "past" forecasts. We use a Gaussian copula to generate semiparametric error distributions . 1) We show that this copula can be then used as a likelihood function to infer parameters, breaking away from the practice of using multivariate normal distributions. Based on the results from a didactical example of predicting rainfall runoff, 2) we demonstrate that the copula captures the predictive uncertainty of the model. 3) Finally, we find that the properties of autocorrelation and heteroscedasticity of errors are captured well by the copula, eliminating the need to use transforms. In summary, our findings suggest that copulas are an interesting departure from the usage of fully parametric distributions as likelihood functions - and they could help us to better capture the statistical properties of errors and make more reliable predictions.
A statistical method for lung tumor segmentation uncertainty in PET images based on user inference.
Zheng, Chaojie; Wang, Xiuying; Feng, Dagan
2015-01-01
PET has been widely accepted as an effective imaging modality for lung tumor diagnosis and treatment. However, standard criteria for delineating tumor boundary from PET are yet to develop largely due to relatively low quality of PET images, uncertain tumor boundary definition, and variety of tumor characteristics. In this paper, we propose a statistical solution to segmentation uncertainty on the basis of user inference. We firstly define the uncertainty segmentation band on the basis of segmentation probability map constructed from Random Walks (RW) algorithm; and then based on the extracted features of the user inference, we use Principle Component Analysis (PCA) to formulate the statistical model for labeling the uncertainty band. We validated our method on 10 lung PET-CT phantom studies from the public RIDER collections [1] and 16 clinical PET studies where tumors were manually delineated by two experienced radiologists. The methods were validated using Dice similarity coefficient (DSC) to measure the spatial volume overlap. Our method achieved an average DSC of 0.878 ± 0.078 on phantom studies and 0.835 ± 0.039 on clinical studies.
Empirical evidence for acceleration-dependent amplification factors
Borcherdt, R.D.
2002-01-01
Site-specific amplification factors, Fa and Fv, used in current U.S. building codes decrease with increasing base acceleration level as implied by the Loma Prieta earthquake at 0.1g and extrapolated using numerical models and laboratory results. The Northridge earthquake recordings of 17 January 1994 and subsequent geotechnical data permit empirical estimates of amplification at base acceleration levels up to 0.5g. Distance measures and normalization procedures used to infer amplification ratios from soil-rock pairs in predetermined azimuth-distance bins significantly influence the dependence of amplification estimates on base acceleration. Factors inferred using a hypocentral distance norm do not show a statistically significant dependence on base acceleration. Factors inferred using norms implied by the attenuation functions of Abrahamson and Silva show a statistically significant decrease with increasing base acceleration. The decrease is statistically more significant for stiff clay and sandy soil (site class D) sites than for stiffer sites underlain by gravely soils and soft rock (site class C). The decrease in amplification with increasing base acceleration is more pronounced for the short-period amplification factor, Fa, than for the midperiod factor, Fv.
Karl Pearson and eugenics: personal opinions and scientific rigor.
Delzell, Darcie A P; Poliak, Cathy D
2013-09-01
The influence of personal opinions and biases on scientific conclusions is a threat to the advancement of knowledge. Expertise and experience does not render one immune to this temptation. In this work, one of the founding fathers of statistics, Karl Pearson, is used as an illustration of how even the most talented among us can produce misleading results when inferences are made without caution or reference to potential bias and other analysis limitations. A study performed by Pearson on British Jewish schoolchildren is examined in light of ethical and professional statistical practice. The methodology used and inferences made by Pearson and his coauthor are sometimes questionable and offer insight into how Pearson's support of eugenics and his own British nationalism could have potentially influenced his often careless and far-fetched inferences. A short background into Pearson's work and beliefs is provided, along with an in-depth examination of the authors' overall experimental design and statistical practices. In addition, portions of the study regarding intelligence and tuberculosis are discussed in more detail, along with historical reactions to their work.
ERIC Educational Resources Information Center
Gustafsson, Jan-Eric
2013-01-01
In educational effectiveness research, it frequently has proven difficult to make credible inferences about cause and effect relations. The article first identifies the main categories of threats to valid causal inference from observational data, and discusses designs and analytic approaches which protect against them. With the use of data from 22…
Aging and Predicting Inferences: A Diffusion Model Analysis
ERIC Educational Resources Information Center
McKoon, Gail; Ratcliff, Roger
2013-01-01
In the domain of discourse processing, it has been claimed that older adults (60-0-year-olds) are less likely to encode and remember some kinds of information from texts than young adults. The experiment described here shows that they do make a particular kind of inference to the same extent that college-age adults do. The inferences examined were…
First Possession, History, and Young Children's Ownership Judgments
ERIC Educational Resources Information Center
Friedman, Ori; Vondervoort, Julia W.; Defeyter, Margaret A.; Neary, Karen R.
2013-01-01
It is impossible to perceive who owns an object; this must be inferred. One way that children make such inferences is through a first possession bias--when two agents each use an object, children judge the object belongs to the one who used it first. Two experiments show that this bias does not result from children directly inferring ownership…
Adults with Poor Reading Skills and the Inferences They Make during Reading
ERIC Educational Resources Information Center
McKoon, Gail; Ratcliff, Roger
2017-01-01
Millions of U.S. adults lack the literacy skills needed for most living-wage jobs. We investigated one particular comprehension process for these adults: generating predictive inferences. If a sentence says that someone falls from a 14th-story roof, a reader should infer almost certain death. On any test of comprehension, there are two dependent…
Acoustic emissions diagnosis of rotor-stator rubs using the KS statistic
NASA Astrophysics Data System (ADS)
Hall, L. D.; Mba, D.
2004-07-01
Acoustic emission (AE) measurement at the bearings of rotating machinery has become a useful tool for diagnosing incipient fault conditions. In particular, AE can be used to detect unwanted intermittent or partial rubbing between a rotating central shaft and surrounding stationary components. This is a particular problem encountered in turbines used for power generation. For successful fault diagnosis, it is important to adopt AE signal analysis techniques capable of distinguishing between various types of rub mechanisms. It is also useful to develop techniques for inferring information such as the severity of rubbing or the type of seal material making contact on the shaft. It is proposed that modelling the cumulative distribution function of rub-induced AE signals with respect to appropriate theoretical distributions, and quantifying the goodness of fit with the Kolmogorov-Smirnov (KS) statistic, offers a suitable signal feature for diagnosis. This paper demonstrates the successful use of the KS feature for discriminating different classes of shaft-seal rubbing.
An Overview of data science uses in bioimage informatics.
Chessel, Anatole
2017-02-15
This review aims at providing a practical overview of the use of statistical features and associated data science methods in bioimage informatics. To achieve a quantitative link between images and biological concepts, one typically replaces an object coming from an image (a segmented cell or intracellular object, a pattern of expression or localisation, even a whole image) by a vector of numbers. They range from carefully crafted biologically relevant measurements to features learnt through deep neural networks. This replacement allows for the use of practical algorithms for visualisation, comparison and inference, such as the ones from machine learning or multivariate statistics. While originating mainly, for biology, in high content screening, those methods are integral to the use of data science for the quantitative analysis of microscopy images to gain biological insight, and they are sure to gather more interest as the need to make sense of the increasing amount of acquired imaging data grows more pressing. Copyright © 2017 Elsevier Inc. All rights reserved.
Density-dependence at sea for coho salmon (Oncorhynchus kisutch)
Emlen, J.M.; Reisenbichler, R.R.; McGie, A.M.; Nickelson, T.E.
1990-01-01
The success of expanded salmon hatchery programs will depend strongly on the degree of density-induced diminishing returns per smolt released. Several authors have addressed the question of density-dependent mortality at sea in coho salmon (Oncorhynchus kisutch), but have come to conflicting conclusions. We believe there are compelling reasons to reinvestigate the data, and have done so for public hatchery fish, using a variety of approaches. The results provide evidence that survival of these public hatchery fish is negatively affected, directly by the number of public hatchery smolts and indirectly by the number of private hatchery smolts. These results are weak, statistically, and should be considered primarily as a caution to those who, on the basis of other published work, believe that density-dependence does not exist. The results reported here also re-emphasize the often overlooked point that inferences drawn from data are strongly biased by investigators' views of how the systems of interest work and by the statistical assumptions they make preparatory to the analysis of those data.
Why Bayesian Psychologists Should Change the Way They Use the Bayes Factor.
Hoijtink, Herbert; van Kooten, Pascal; Hulsker, Koenraad
2016-01-01
The discussion following Bem's ( 2011 ) psi research highlights that applications of the Bayes factor in psychological research are not without problems. The first problem is the omission to translate subjective prior knowledge into subjective prior distributions. In the words of Savage ( 1961 ): "they make the Bayesian omelet without breaking the Bayesian egg." The second problem occurs if the Bayesian egg is not broken: the omission to choose default prior distributions such that the ensuing inferences are well calibrated. The third problem is the adherence to inadequate rules for the interpretation of the size of the Bayes factor. The current paper will elaborate these problems and show how to avoid them using the basic hypotheses and statistical model used in the first experiment described in Bem ( 2011 ). It will be argued that a thorough investigation of these problems in the context of more encompassing hypotheses and statistical models is called for if Bayesian psychologists want to add a well-founded Bayes factor to the tool kit of psychological researchers.
Bayes factors and multimodel inference
Link, W.A.; Barker, R.J.; Thomson, David L.; Cooch, Evan G.; Conroy, Michael J.
2009-01-01
Multimodel inference has two main themes: model selection, and model averaging. Model averaging is a means of making inference conditional on a model set, rather than on a selected model, allowing formal recognition of the uncertainty associated with model choice. The Bayesian paradigm provides a natural framework for model averaging, and provides a context for evaluation of the commonly used AIC weights. We review Bayesian multimodel inference, noting the importance of Bayes factors. Noting the sensitivity of Bayes factors to the choice of priors on parameters, we define and propose nonpreferential priors as offering a reasonable standard for objective multimodel inference.
In silico model-based inference: a contemporary approach for hypothesis testing in network biology
Klinke, David J.
2014-01-01
Inductive inference plays a central role in the study of biological systems where one aims to increase their understanding of the system by reasoning backwards from uncertain observations to identify causal relationships among components of the system. These causal relationships are postulated from prior knowledge as a hypothesis or simply a model. Experiments are designed to test the model. Inferential statistics are used to establish a level of confidence in how well our postulated model explains the acquired data. This iterative process, commonly referred to as the scientific method, either improves our confidence in a model or suggests that we revisit our prior knowledge to develop a new model. Advances in technology impact how we use prior knowledge and data to formulate models of biological networks and how we observe cellular behavior. However, the approach for model-based inference has remained largely unchanged since Fisher, Neyman and Pearson developed the ideas in the early 1900’s that gave rise to what is now known as classical statistical hypothesis (model) testing. Here, I will summarize conventional methods for model-based inference and suggest a contemporary approach to aid in our quest to discover how cells dynamically interpret and transmit information for therapeutic aims that integrates ideas drawn from high performance computing, Bayesian statistics, and chemical kinetics. PMID:25139179
In silico model-based inference: a contemporary approach for hypothesis testing in network biology.
Klinke, David J
2014-01-01
Inductive inference plays a central role in the study of biological systems where one aims to increase their understanding of the system by reasoning backwards from uncertain observations to identify causal relationships among components of the system. These causal relationships are postulated from prior knowledge as a hypothesis or simply a model. Experiments are designed to test the model. Inferential statistics are used to establish a level of confidence in how well our postulated model explains the acquired data. This iterative process, commonly referred to as the scientific method, either improves our confidence in a model or suggests that we revisit our prior knowledge to develop a new model. Advances in technology impact how we use prior knowledge and data to formulate models of biological networks and how we observe cellular behavior. However, the approach for model-based inference has remained largely unchanged since Fisher, Neyman and Pearson developed the ideas in the early 1900s that gave rise to what is now known as classical statistical hypothesis (model) testing. Here, I will summarize conventional methods for model-based inference and suggest a contemporary approach to aid in our quest to discover how cells dynamically interpret and transmit information for therapeutic aims that integrates ideas drawn from high performance computing, Bayesian statistics, and chemical kinetics. © 2014 American Institute of Chemical Engineers.
Spatio-temporal conditional inference and hypothesis tests for neural ensemble spiking precision
Harrison, Matthew T.; Amarasingham, Asohan; Truccolo, Wilson
2014-01-01
The collective dynamics of neural ensembles create complex spike patterns with many spatial and temporal scales. Understanding the statistical structure of these patterns can help resolve fundamental questions about neural computation and neural dynamics. Spatio-temporal conditional inference (STCI) is introduced here as a semiparametric statistical framework for investigating the nature of precise spiking patterns from collections of neurons that is robust to arbitrarily complex and nonstationary coarse spiking dynamics. The main idea is to focus statistical modeling and inference, not on the full distribution of the data, but rather on families of conditional distributions of precise spiking given different types of coarse spiking. The framework is then used to develop families of hypothesis tests for probing the spatio-temporal precision of spiking patterns. Relationships among different conditional distributions are used to improve multiple hypothesis testing adjustments and to design novel Monte Carlo spike resampling algorithms. Of special note are algorithms that can locally jitter spike times while still preserving the instantaneous peri-stimulus time histogram (PSTH) or the instantaneous total spike count from a group of recorded neurons. The framework can also be used to test whether first-order maximum entropy models with possibly random and time-varying parameters can account for observed patterns of spiking. STCI provides a detailed example of the generic principle of conditional inference, which may be applicable in other areas of neurostatistical analysis. PMID:25380339
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hou Fengji; Hogg, David W.; Goodman, Jonathan
Markov chain Monte Carlo (MCMC) proves to be powerful for Bayesian inference and in particular for exoplanet radial velocity fitting because MCMC provides more statistical information and makes better use of data than common approaches like chi-square fitting. However, the nonlinear density functions encountered in these problems can make MCMC time-consuming. In this paper, we apply an ensemble sampler respecting affine invariance to orbital parameter extraction from radial velocity data. This new sampler has only one free parameter, and does not require much tuning for good performance, which is important for automatization. The autocorrelation time of this sampler is approximatelymore » the same for all parameters and far smaller than Metropolis-Hastings, which means it requires many fewer function calls to produce the same number of independent samples. The affine-invariant sampler speeds up MCMC by hundreds of times compared with Metropolis-Hastings in the same computing situation. This novel sampler would be ideal for projects involving large data sets such as statistical investigations of planet distribution. The biggest obstacle to ensemble samplers is the existence of multiple local optima; we present a clustering technique to deal with local optima by clustering based on the likelihood of the walkers in the ensemble. We demonstrate the effectiveness of the sampler on real radial velocity data.« less
Schmidt, Paul; Schmid, Volker J; Gaser, Christian; Buck, Dorothea; Bührlen, Susanne; Förschler, Annette; Mühlau, Mark
2013-01-01
Aiming at iron-related T2-hypointensity, which is related to normal aging and neurodegenerative processes, we here present two practicable approaches, based on Bayesian inference, for preprocessing and statistical analysis of a complex set of structural MRI data. In particular, Markov Chain Monte Carlo methods were used to simulate posterior distributions. First, we rendered a segmentation algorithm that uses outlier detection based on model checking techniques within a Bayesian mixture model. Second, we rendered an analytical tool comprising a Bayesian regression model with smoothness priors (in the form of Gaussian Markov random fields) mitigating the necessity to smooth data prior to statistical analysis. For validation, we used simulated data and MRI data of 27 healthy controls (age: [Formula: see text]; range, [Formula: see text]). We first observed robust segmentation of both simulated T2-hypointensities and gray-matter regions known to be T2-hypointense. Second, simulated data and images of segmented T2-hypointensity were analyzed. We found not only robust identification of simulated effects but also a biologically plausible age-related increase of T2-hypointensity primarily within the dentate nucleus but also within the globus pallidus, substantia nigra, and red nucleus. Our results indicate that fully Bayesian inference can successfully be applied for preprocessing and statistical analysis of structural MRI data.
Drug target inference through pathway analysis of genomics data
Ma, Haisu; Zhao, Hongyu
2013-01-01
Statistical modeling coupled with bioinformatics is commonly used for drug discovery. Although there exist many approaches for single target based drug design and target inference, recent years have seen a paradigm shift to system-level pharmacological research. Pathway analysis of genomics data represents one promising direction for computational inference of drug targets. This article aims at providing a comprehensive review on the evolving issues is this field, covering methodological developments, their pros and cons, as well as future research directions. PMID:23369829
Watanabe, Hiroshi
2012-01-01
Procedures of statistical analysis are reviewed to provide an overview of applications of statistics for general use. Topics that are dealt with are inference on a population, comparison of two populations with respect to means and probabilities, and multiple comparisons. This study is the second part of series in which we survey medical statistics. Arguments related to statistical associations and regressions will be made in subsequent papers.
Species tree inference by minimizing deep coalescences.
Than, Cuong; Nakhleh, Luay
2009-09-01
In a 1997 seminal paper, W. Maddison proposed minimizing deep coalescences, or MDC, as an optimization criterion for inferring the species tree from a set of incongruent gene trees, assuming the incongruence is exclusively due to lineage sorting. In a subsequent paper, Maddison and Knowles provided and implemented a search heuristic for optimizing the MDC criterion, given a set of gene trees. However, the heuristic is not guaranteed to compute optimal solutions, and its hill-climbing search makes it slow in practice. In this paper, we provide two exact solutions to the problem of inferring the species tree from a set of gene trees under the MDC criterion. In other words, our solutions are guaranteed to find the tree that minimizes the total number of deep coalescences from a set of gene trees. One solution is based on a novel integer linear programming (ILP) formulation, and another is based on a simple dynamic programming (DP) approach. Powerful ILP solvers, such as CPLEX, make the first solution appealing, particularly for very large-scale instances of the problem, whereas the DP-based solution eliminates dependence on proprietary tools, and its simplicity makes it easy to integrate with other genomic events that may cause gene tree incongruence. Using the exact solutions, we analyze a data set of 106 loci from eight yeast species, a data set of 268 loci from eight Apicomplexan species, and several simulated data sets. We show that the MDC criterion provides very accurate estimates of the species tree topologies, and that our solutions are very fast, thus allowing for the accurate analysis of genome-scale data sets. Further, the efficiency of the solutions allow for quick exploration of sub-optimal solutions, which is important for a parsimony-based criterion such as MDC, as we show. We show that searching for the species tree in the compatibility graph of the clusters induced by the gene trees may be sufficient in practice, a finding that helps ameliorate the computational requirements of optimization solutions. Further, we study the statistical consistency and convergence rate of the MDC criterion, as well as its optimality in inferring the species tree. Finally, we show how our solutions can be used to identify potential horizontal gene transfer events that may have caused some of the incongruence in the data, thus augmenting Maddison's original framework. We have implemented our solutions in the PhyloNet software package, which is freely available at: http://bioinfo.cs.rice.edu/phylonet.
PROBABILITY SAMPLING AND POPULATION INFERENCE IN MONITORING PROGRAMS
A fundamental difference between probability sampling and conventional statistics is that "sampling" deals with real, tangible populations, whereas "conventional statistics" usually deals with hypothetical populations that have no real-world realization. he focus here is on real ...
Statistical Inference in the Learning of Novel Phonetic Categories
ERIC Educational Resources Information Center
Zhao, Yuan
2010-01-01
Learning a phonetic category (or any linguistic category) requires integrating different sources of information. A crucial unsolved problem for phonetic learning is how this integration occurs: how can we update our previous knowledge about a phonetic category as we hear new exemplars of the category? One model of learning is Bayesian Inference,…
Conceptual Challenges in Coordinating Theoretical and Data-Centered Estimates of Probability
ERIC Educational Resources Information Center
Konold, Cliff; Madden, Sandra; Pollatsek, Alexander; Pfannkuch, Maxine; Wild, Chris; Ziedins, Ilze; Finzer, William; Horton, Nicholas J.; Kazak, Sibel
2011-01-01
A core component of informal statistical inference is the recognition that judgments based on sample data are inherently uncertain. This implies that instruction aimed at developing informal inference needs to foster basic probabilistic reasoning. In this article, we analyze and critique the now-common practice of introducing students to both…
Campbell's and Rubin's Perspectives on Causal Inference
ERIC Educational Resources Information Center
West, Stephen G.; Thoemmes, Felix
2010-01-01
Donald Campbell's approach to causal inference (D. T. Campbell, 1957; W. R. Shadish, T. D. Cook, & D. T. Campbell, 2002) is widely used in psychology and education, whereas Donald Rubin's causal model (P. W. Holland, 1986; D. B. Rubin, 1974, 2005) is widely used in economics, statistics, medicine, and public health. Campbell's approach focuses on…
Direct Evidence for a Dual Process Model of Deductive Inference
ERIC Educational Resources Information Center
Markovits, Henry; Brunet, Marie-Laurence; Thompson, Valerie; Brisson, Janie
2013-01-01
In 2 experiments, we tested a strong version of a dual process theory of conditional inference (cf. Verschueren et al., 2005a, 2005b) that assumes that most reasoners have 2 strategies available, the choice of which is determined by situational variables, cognitive capacity, and metacognitive control. The statistical strategy evaluates inferences…
The Role of Probability in Developing Learners' Models of Simulation Approaches to Inference
ERIC Educational Resources Information Center
Lee, Hollylynne S.; Doerr, Helen M.; Tran, Dung; Lovett, Jennifer N.
2016-01-01
Repeated sampling approaches to inference that rely on simulations have recently gained prominence in statistics education, and probabilistic concepts are at the core of this approach. In this approach, learners need to develop a mapping among the problem situation, a physical enactment, computer representations, and the underlying randomization…
From Blickets to Synapses: Inferring Temporal Causal Networks by Observation
ERIC Educational Resources Information Center
Fernando, Chrisantha
2013-01-01
How do human infants learn the causal dependencies between events? Evidence suggests that this remarkable feat can be achieved by observation of only a handful of examples. Many computational models have been produced to explain how infants perform causal inference without explicit teaching about statistics or the scientific method. Here, we…
Thou Shalt Not Bear False Witness against Null Hypothesis Significance Testing
ERIC Educational Resources Information Center
García-Pérez, Miguel A.
2017-01-01
Null hypothesis significance testing (NHST) has been the subject of debate for decades and alternative approaches to data analysis have been proposed. This article addresses this debate from the perspective of scientific inquiry and inference. Inference is an inverse problem and application of statistical methods cannot reveal whether effects…
Krefeld-Schwalb, Antonia; Witte, Erich H.; Zenker, Frank
2018-01-01
In psychology as elsewhere, the main statistical inference strategy to establish empirical effects is null-hypothesis significance testing (NHST). The recent failure to replicate allegedly well-established NHST-results, however, implies that such results lack sufficient statistical power, and thus feature unacceptably high error-rates. Using data-simulation to estimate the error-rates of NHST-results, we advocate the research program strategy (RPS) as a superior methodology. RPS integrates Frequentist with Bayesian inference elements, and leads from a preliminary discovery against a (random) H0-hypothesis to a statistical H1-verification. Not only do RPS-results feature significantly lower error-rates than NHST-results, RPS also addresses key-deficits of a “pure” Frequentist and a standard Bayesian approach. In particular, RPS aggregates underpowered results safely. RPS therefore provides a tool to regain the trust the discipline had lost during the ongoing replicability-crisis. PMID:29740363
Krefeld-Schwalb, Antonia; Witte, Erich H; Zenker, Frank
2018-01-01
In psychology as elsewhere, the main statistical inference strategy to establish empirical effects is null-hypothesis significance testing (NHST). The recent failure to replicate allegedly well-established NHST-results, however, implies that such results lack sufficient statistical power, and thus feature unacceptably high error-rates. Using data-simulation to estimate the error-rates of NHST-results, we advocate the research program strategy (RPS) as a superior methodology. RPS integrates Frequentist with Bayesian inference elements, and leads from a preliminary discovery against a (random) H 0 -hypothesis to a statistical H 1 -verification. Not only do RPS-results feature significantly lower error-rates than NHST-results, RPS also addresses key-deficits of a "pure" Frequentist and a standard Bayesian approach. In particular, RPS aggregates underpowered results safely. RPS therefore provides a tool to regain the trust the discipline had lost during the ongoing replicability-crisis.
NASA Astrophysics Data System (ADS)
Knuth, K. H.
2001-05-01
We consider the application of Bayesian inference to the study of self-organized structures in complex adaptive systems. In particular, we examine the distribution of elements, agents, or processes in systems dominated by hierarchical structure. We demonstrate that results obtained by Caianiello [1] on Hierarchical Modular Systems (HMS) can be found by applying Jaynes' Principle of Group Invariance [2] to a few key assumptions about our knowledge of hierarchical organization. Subsequent application of the Principle of Maximum Entropy allows inferences to be made about specific systems. The utility of the Bayesian method is considered by examining both successes and failures of the hierarchical model. We discuss how Caianiello's original statements suffer from the Mind Projection Fallacy [3] and we restate his assumptions thus widening the applicability of the HMS model. The relationship between inference and statistical physics, described by Jaynes [4], is reiterated with the expectation that this realization will aid the field of complex systems research by moving away from often inappropriate direct application of statistical mechanics to a more encompassing inferential methodology.
ERIC Educational Resources Information Center
Seiver, Elizabeth; Gopnik, Alison; Goodman, Noah D.
2013-01-01
Children rely on both evidence and prior knowledge to make physical causal inferences; this study explores whether they make attributions about others' behavior in the same manner. A total of one hundred and fifty-nine 4- and 6-year-olds saw 2 dolls interacting with 2 activities, and explained the dolls' actions. In the person condition, each doll…
Conditional statistical inference with multistage testing designs.
Zwitser, Robert J; Maris, Gunter
2015-03-01
In this paper it is demonstrated how statistical inference from multistage test designs can be made based on the conditional likelihood. Special attention is given to parameter estimation, as well as the evaluation of model fit. Two reasons are provided why the fit of simple measurement models is expected to be better in adaptive designs, compared to linear designs: more parameters are available for the same number of observations; and undesirable response behavior, like slipping and guessing, might be avoided owing to a better match between item difficulty and examinee proficiency. The results are illustrated with simulated data, as well as with real data.
ERIC Educational Resources Information Center
Snyder, Patricia A.; Thompson, Bruce
The use of tests of statistical significance was explored, first by reviewing some criticisms of contemporary practice in the use of statistical tests as reflected in a series of articles in the "American Psychologist" and in the appointment of a "Task Force on Statistical Inference" by the American Psychological Association…
Standard deviation and standard error of the mean.
Lee, Dong Kyu; In, Junyong; Lee, Sangseok
2015-06-01
In most clinical and experimental studies, the standard deviation (SD) and the estimated standard error of the mean (SEM) are used to present the characteristics of sample data and to explain statistical analysis results. However, some authors occasionally muddle the distinctive usage between the SD and SEM in medical literature. Because the process of calculating the SD and SEM includes different statistical inferences, each of them has its own meaning. SD is the dispersion of data in a normal distribution. In other words, SD indicates how accurately the mean represents sample data. However the meaning of SEM includes statistical inference based on the sampling distribution. SEM is the SD of the theoretical distribution of the sample means (the sampling distribution). While either SD or SEM can be applied to describe data and statistical results, one should be aware of reasonable methods with which to use SD and SEM. We aim to elucidate the distinctions between SD and SEM and to provide proper usage guidelines for both, which summarize data and describe statistical results.
Standard deviation and standard error of the mean
In, Junyong; Lee, Sangseok
2015-01-01
In most clinical and experimental studies, the standard deviation (SD) and the estimated standard error of the mean (SEM) are used to present the characteristics of sample data and to explain statistical analysis results. However, some authors occasionally muddle the distinctive usage between the SD and SEM in medical literature. Because the process of calculating the SD and SEM includes different statistical inferences, each of them has its own meaning. SD is the dispersion of data in a normal distribution. In other words, SD indicates how accurately the mean represents sample data. However the meaning of SEM includes statistical inference based on the sampling distribution. SEM is the SD of the theoretical distribution of the sample means (the sampling distribution). While either SD or SEM can be applied to describe data and statistical results, one should be aware of reasonable methods with which to use SD and SEM. We aim to elucidate the distinctions between SD and SEM and to provide proper usage guidelines for both, which summarize data and describe statistical results. PMID:26045923
Ratmann, Oliver; Andrieu, Christophe; Wiuf, Carsten; Richardson, Sylvia
2009-06-30
Mathematical models are an important tool to explain and comprehend complex phenomena, and unparalleled computational advances enable us to easily explore them without any or little understanding of their global properties. In fact, the likelihood of the data under complex stochastic models is often analytically or numerically intractable in many areas of sciences. This makes it even more important to simultaneously investigate the adequacy of these models-in absolute terms, against the data, rather than relative to the performance of other models-but no such procedure has been formally discussed when the likelihood is intractable. We provide a statistical interpretation to current developments in likelihood-free Bayesian inference that explicitly accounts for discrepancies between the model and the data, termed Approximate Bayesian Computation under model uncertainty (ABCmicro). We augment the likelihood of the data with unknown error terms that correspond to freely chosen checking functions, and provide Monte Carlo strategies for sampling from the associated joint posterior distribution without the need of evaluating the likelihood. We discuss the benefit of incorporating model diagnostics within an ABC framework, and demonstrate how this method diagnoses model mismatch and guides model refinement by contrasting three qualitative models of protein network evolution to the protein interaction datasets of Helicobacter pylori and Treponema pallidum. Our results make a number of model deficiencies explicit, and suggest that the T. pallidum network topology is inconsistent with evolution dominated by link turnover or lateral gene transfer alone.
Tsunami Catalogues for the Eastern Mediterranean - Revisited.
NASA Astrophysics Data System (ADS)
Ambraseys, N.; Synolakis, C. E.
2008-12-01
We critically examine examine tsunami catalogues of tsunamis in the Eastern Mediterranean published in the last decade, by reference to the original sources, see Ambraseys (2008). Such catalogues have been widely used in the aftermath of the 2004 Boxing Day tsunami for probabilistic hazard analysis, even to make projections for a ten year time frame. On occasion, such predictions have caused panic and have reduced the credibility of the scientific community in making hazard assessments. We correct classification and other spurious errors in earlier catalogues and posit a new list. We conclude that for some historic events, any assignment of magnitude, even on a six point intensity scale is inappropriate due to lack of information. Further we assert that any tsunami catalogue, including ours, can only be used in conjunction with sedimentologic evidence to quantitatively infer the return period of larger events. Statistical analyses correlating numbers of tsunami events derived solely from catalogues with their inferred or imagined intensities are meaningless, at least when focusing on specific locales where only a handful of tsunamis are known to have been historically reported. Quantitative hazard assessments based on scenario events of historic tsunamis for which -at best- only the size and approximate location of the parent earthquake is known should be undertaken with extreme caution and only with benefit of geologic studies to enhance the understanding of the local tectonics. Ambraseys N. (2008) Earthquakes in the Eastern Mediterranean and the Middle East: multidisciplinary study of 2000 years of seimicity, Cambridge Univ. Press, Cambridge (ISBN 9780521872928).
Thomas, Elaine
2005-01-01
This article is the second in a series of three that will give health care professionals (HCPs) a sound introduction to medical statistics (Thomas, 2004). The objective of research is to find out about the population at large. However, it is generally not possible to study the whole of the population and research questions are addressed in an appropriate study sample. The next crucial step is then to use the information from the sample of individuals to make statements about the wider population of like individuals. This procedure of drawing conclusions about the population, based on study data, is known as inferential statistics. The findings from the study give us the best estimate of what is true for the relevant population, given the sample is representative of the population. It is important to consider how accurate this best estimate is, based on a single sample, when compared to the unknown population figure. Any difference between the observed sample result and the population characteristic is termed the sampling error. This article will cover the two main forms of statistical inference (hypothesis tests and estimation) along with issues that need to be addressed when considering the implications of the study results. Copyright (c) 2005 Whurr Publishers Ltd.
Selecting the optimum plot size for a California design-based stream and wetland mapping program.
Lackey, Leila G; Stein, Eric D
2014-04-01
Accurate estimates of the extent and distribution of wetlands and streams are the foundation of wetland monitoring, management, restoration, and regulatory programs. Traditionally, these estimates have relied on comprehensive mapping. However, this approach is prohibitively resource-intensive over large areas, making it both impractical and statistically unreliable. Probabilistic (design-based) approaches to evaluating status and trends provide a more cost-effective alternative because, compared with comprehensive mapping, overall extent is inferred from mapping a statistically representative, randomly selected subset of the target area. In this type of design, the size of sample plots has a significant impact on program costs and on statistical precision and accuracy; however, no consensus exists on the appropriate plot size for remote monitoring of stream and wetland extent. This study utilized simulated sampling to assess the performance of four plot sizes (1, 4, 9, and 16 km(2)) for three geographic regions of California. Simulation results showed smaller plot sizes (1 and 4 km(2)) were most efficient for achieving desired levels of statistical accuracy and precision. However, larger plot sizes were more likely to contain rare and spatially limited wetland subtypes. Balancing these considerations led to selection of 4 km(2) for the California status and trends program.
Visual shape perception as Bayesian inference of 3D object-centered shape representations.
Erdogan, Goker; Jacobs, Robert A
2017-11-01
Despite decades of research, little is known about how people visually perceive object shape. We hypothesize that a promising approach to shape perception is provided by a "visual perception as Bayesian inference" framework which augments an emphasis on visual representation with an emphasis on the idea that shape perception is a form of statistical inference. Our hypothesis claims that shape perception of unfamiliar objects can be characterized as statistical inference of 3D shape in an object-centered coordinate system. We describe a computational model based on our theoretical framework, and provide evidence for the model along two lines. First, we show that, counterintuitively, the model accounts for viewpoint-dependency of object recognition, traditionally regarded as evidence against people's use of 3D object-centered shape representations. Second, we report the results of an experiment using a shape similarity task, and present an extensive evaluation of existing models' abilities to account for the experimental data. We find that our shape inference model captures subjects' behaviors better than competing models. Taken as a whole, our experimental and computational results illustrate the promise of our approach and suggest that people's shape representations of unfamiliar objects are probabilistic, 3D, and object-centered. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Statistical Manual. Methods of Making Experimental Inferences
1951-06-01
34 10 plus or minus 2." i.e., the range from 8 to 12. 33 ■ - r -i iftjMjfltotf—■-•~—’"liritüMirr *i mm ■ f-j»... ■-., ■ c...for each sample. 9. Add all the results of ( 8 ); i.e., calcu- k late L (DFi In SJ2). 10 . Subtract (9) from (5). Let M = (5) - (9). That is, in sum...101 4.62 64.7 0.071 6 11 10 361 36 3.58 35.8 0.100 7 31 30 1230 41 3.71 111.3 0.033 8 15 14 1066 76 4.33 60.6 0.071 9 3 2 129 64 4.16 8.3 0.500 10
Revealing degree distribution of bursting neuron networks.
Shen, Yu; Hou, Zhonghuai; Xin, Houwen
2010-03-01
We present a method to infer the degree distribution of a bursting neuron network from its dynamics. Burst synchronization (BS) of coupled Morris-Lecar neurons has been studied under the weak coupling condition. In the BS state, all the neurons start and end bursting almost simultaneously, while the spikes inside the burst are incoherent among the neurons. Interestingly, we find that the spike amplitude of a given neuron shows an excellent linear relationship with its degree, which makes it possible to estimate the degree distribution of the network by simple statistics of the spike amplitudes. We demonstrate the validity of this scheme on scale-free as well as small-world networks. The underlying mechanism of such a method is also briefly discussed.
Inferring molecular interactions pathways from eQTL data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rashid, Imran; McDermott, Jason E.; Samudrala, Ram
Analysis of expression quantitative trait loci (eQTL) helps elucidate the connection between genotype, gene expression levels, and phenotype. However, standard statistical genetics can only attribute changes in expression levels to loci on the genome, not specific genes. Each locus can contain many genes, making it very difficult to discover which gene is controlling the expression levels of other genes. Furthermore, it is even more difficult to find a pathway of molecular interactions responsible for controlling the expression levels. Here we describe a series of techniques for finding explanatory pathways by exploring graphs of molecular interactions. We show several simple methodsmore » can find complete pathways the explain the mechanism of differential expression in eQTL data.« less
An application of an optimal statistic for characterizing relative orientations
NASA Astrophysics Data System (ADS)
Jow, Dylan L.; Hill, Ryley; Scott, Douglas; Soler, J. D.; Martin, P. G.; Devlin, M. J.; Fissel, L. M.; Poidevin, F.
2018-02-01
We present the projected Rayleigh statistic (PRS), a modification of the classic Rayleigh statistic, as a test for non-uniform relative orientation between two pseudo-vector fields. In the application here, this gives an effective way of investigating whether polarization pseudo-vectors (spin-2 quantities) are preferentially parallel or perpendicular to filaments in the interstellar medium. For example, there are other potential applications in astrophysics, e.g. when comparing small-scale orientations with larger scale shear patterns. We compare the efficiency of the PRS against histogram binning methods that have previously been used for characterizing the relative orientations of gas column density structures with the magnetic field projected on the plane of the sky. We examine data for the Vela C molecular cloud, where the column density is inferred from Herschel submillimetre observations, and the magnetic field from observations by the Balloon-borne Large-Aperture Submillimetre Telescope in the 250-, 350- and 500-μm wavelength bands. We find that the PRS has greater statistical power than approaches that bin the relative orientation angles, as it makes more efficient use of the information contained in the data. In particular, the use of the PRS to test for preferential alignment results in a higher statistical significance, in each of the four Vela C regions, with the greatest increase being by a factor 1.3 in the South-Nest region in the 250 - μ m band.
Neural Activity Reveals Preferences Without Choices
Smith, Alec; Bernheim, B. Douglas; Camerer, Colin
2014-01-01
We investigate the feasibility of inferring the choices people would make (if given the opportunity) based on their neural responses to the pertinent prospects when they are not engaged in actual decision making. The ability to make such inferences is of potential value when choice data are unavailable, or limited in ways that render standard methods of estimating choice mappings problematic. We formulate prediction models relating choices to “non-choice” neural responses and use them to predict out-of-sample choices for new items and for new groups of individuals. The predictions are sufficiently accurate to establish the feasibility of our approach. PMID:25729468
Ford, Janet A; Milosky, Linda M
2008-04-01
This study examined whether young children with typical language development (TL) and children with language impairment (LI) make emotion inferences online during the process of discourse comprehension, identified variables that predict emotion inferencing, and explored the relationship of these variables to social competence. Preschool children (16 TL and 16 LI) watched narrated videos designed to activate knowledge about a particular emotional state. Following each story, children named a facial expression that either matched or did not match the anticipated emotion. Several experimental tasks examined linguistic and nonlinguistic abilities. Finally, each child's teacher completed a measure of social competence. Children with TL named expressions significantly more slowly in the mismatched condition than in the matched condition, whereas children with LI did not differ in response times between the conditions. Language and vocal response time measures were related to emotion inferencing ability, and this ability predicted social competence scores. The findings suggest that children with TL are inferring emotions during the comprehension process, whereas children with LI often fail to make these inferences. Making emotion inferences is related to discourse comprehension and to social competence in children. The current findings provide evidence that language and vocal response time measures predicted inferencing ability and suggest that additional factors may influence discourse inferencing and social competence.
Robust inference for group sequential trials.
Ganju, Jitendra; Lin, Yunzhi; Zhou, Kefei
2017-03-01
For ethical reasons, group sequential trials were introduced to allow trials to stop early in the event of extreme results. Endpoints in such trials are usually mortality or irreversible morbidity. For a given endpoint, the norm is to use a single test statistic and to use that same statistic for each analysis. This approach is risky because the test statistic has to be specified before the study is unblinded, and there is loss in power if the assumptions that ensure optimality for each analysis are not met. To minimize the risk of moderate to substantial loss in power due to a suboptimal choice of a statistic, a robust method was developed for nonsequential trials. The concept is analogous to diversification of financial investments to minimize risk. The method is based on combining P values from multiple test statistics for formal inference while controlling the type I error rate at its designated value.This article evaluates the performance of 2 P value combining methods for group sequential trials. The emphasis is on time to event trials although results from less complex trials are also included. The gain or loss in power with the combination method relative to a single statistic is asymmetric in its favor. Depending on the power of each individual test, the combination method can give more power than any single test or give power that is closer to the test with the most power. The versatility of the method is that it can combine P values from different test statistics for analysis at different times. The robustness of results suggests that inference from group sequential trials can be strengthened with the use of combined tests. Copyright © 2017 John Wiley & Sons, Ltd.
Long-term strategy for the statistical design of a forest health monitoring system
Hans T. Schreuder; Raymond L. Czaplewski
1993-01-01
A conceptual framework is given for a broad-scale survey of forest health that accomplishes three objectives: generate descriptive statistics; detect changes in such statistics; and simplify analytical inferences that identify, and possibly establish cause-effect relationships. Our paper discusses the development of sampling schemes to satisfy these three objectives,...
ERIC Educational Resources Information Center
Beeman, Jennifer Leigh Sloan
2013-01-01
Research has found that students successfully complete an introductory course in statistics without fully comprehending the underlying theory or being able to exhibit statistical reasoning. This is particularly true for the understanding about the sampling distribution of the mean, a crucial concept for statistical inference. This study…
Using Action Research to Develop a Course in Statistical Inference for Workplace-Based Adults
ERIC Educational Resources Information Center
Forbes, Sharleen
2014-01-01
Many adults who need an understanding of statistical concepts have limited mathematical skills. They need a teaching approach that includes as little mathematical context as possible. Iterative participatory qualitative research (action research) was used to develop a statistical literacy course for adult learners informed by teaching in…
The role of working memory in inferential sentence comprehension.
Pérez, Ana Isabel; Paolieri, Daniela; Macizo, Pedro; Bajo, Teresa
2014-08-01
Existing literature on inference making is large and varied. Trabasso and Magliano (Discourse Process 21(3):255-287, 1996) proposed the existence of three types of inferences: explicative, associative and predictive. In addition, the authors suggested that these inferences were related to working memory (WM). In the present experiment, we investigated whether WM capacity plays a role in our ability to answer comprehension sentences that require text information based on these types of inferences. Participants with high and low WM span read two narratives with four paragraphs each. After each paragraph was read, they were presented with four true/false comprehension sentences. One required verbatim information and the other three implied explicative, associative and predictive inferential information. Results demonstrated that only the explicative and predictive comprehension sentences required WM: participants with high verbal WM were more accurate in giving explanations and also faster at making predictions relative to participants with low verbal WM span; in contrast, no WM differences were found in the associative comprehension sentences. These results are interpreted in terms of the causal nature underlying these types of inferences.
The penumbra of learning: a statistical theory of synaptic tagging and capture.
Gershman, Samuel J
2014-01-01
Learning in humans and animals is accompanied by a penumbra: Learning one task benefits from learning an unrelated task shortly before or after. At the cellular level, the penumbra of learning appears when weak potentiation of one synapse is amplified by strong potentiation of another synapse on the same neuron during a critical time window. Weak potentiation sets a molecular tag that enables the synapse to capture plasticity-related proteins synthesized in response to strong potentiation at another synapse. This paper describes a computational model which formalizes synaptic tagging and capture in terms of statistical learning mechanisms. According to this model, synaptic strength encodes a probabilistic inference about the dynamically changing association between pre- and post-synaptic firing rates. The rate of change is itself inferred, coupling together different synapses on the same neuron. When the inputs to one synapse change rapidly, the inferred rate of change increases, amplifying learning at other synapses.
Space-Time Data fusion for Remote Sensing Applications
NASA Technical Reports Server (NTRS)
Braverman, Amy; Nguyen, H.; Cressie, N.
2011-01-01
NASA has been collecting massive amounts of remote sensing data about Earth's systems for more than a decade. Missions are selected to be complementary in quantities measured, retrieval techniques, and sampling characteristics, so these datasets are highly synergistic. To fully exploit this, a rigorous methodology for combining data with heterogeneous sampling characteristics is required. For scientific purposes, the methodology must also provide quantitative measures of uncertainty that propagate input-data uncertainty appropriately. We view this as a statistical inference problem. The true but notdirectly- observed quantities form a vector-valued field continuous in space and time. Our goal is to infer those true values or some function of them, and provide to uncertainty quantification for those inferences. We use a spatiotemporal statistical model that relates the unobserved quantities of interest at point-level to the spatially aggregated, observed data. We describe and illustrate our method using CO2 data from two NASA data sets.
Inference of missing data and chemical model parameters using experimental statistics
NASA Astrophysics Data System (ADS)
Casey, Tiernan; Najm, Habib
2017-11-01
A method for determining the joint parameter density of Arrhenius rate expressions through the inference of missing experimental data is presented. This approach proposes noisy hypothetical data sets from target experiments and accepts those which agree with the reported statistics, in the form of nominal parameter values and their associated uncertainties. The data exploration procedure is formalized using Bayesian inference, employing maximum entropy and approximate Bayesian computation methods to arrive at a joint density on data and parameters. The method is demonstrated in the context of reactions in the H2-O2 system for predictive modeling of combustion systems of interest. Work supported by the US DOE BES CSGB. Sandia National Labs is a multimission lab managed and operated by Nat. Technology and Eng'g Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell Intl, for the US DOE NCSA under contract DE-NA-0003525.
Statistical numeracy as a moderator of (pseudo)contingency effects on decision behavior.
Fleig, Hanna; Meiser, Thorsten; Ettlin, Florence; Rummel, Jan
2017-03-01
Pseudocontingencies denote contingency estimates inferred from base rates rather than from cell frequencies. We examined the role of statistical numeracy for effects of such fallible but adaptive inferences on choice behavior. In Experiment 1, we provided information on single observations as well as on base rates and tracked participants' eye movements. In Experiment 2, we manipulated the availability of information on cell frequencies and base rates between conditions. Our results demonstrate that a focus on base rates rather than cell frequencies benefits pseudocontingency effects. Learners who are more proficient in (conditional) probability calculation prefer to rely on cell frequencies in order to judge contingencies, though, as was evident from their gaze behavior. If cell frequencies are available in summarized format, they may infer the true contingency between options and outcomes. Otherwise, however, even highly numerate learners are susceptible to pseudocontingency effects. Copyright © 2017 Elsevier B.V. All rights reserved.
Feinauer, Christoph; Procaccini, Andrea; Zecchina, Riccardo; Weigt, Martin; Pagnani, Andrea
2014-01-01
In the course of evolution, proteins show a remarkable conservation of their three-dimensional structure and their biological function, leading to strong evolutionary constraints on the sequence variability between homologous proteins. Our method aims at extracting such constraints from rapidly accumulating sequence data, and thereby at inferring protein structure and function from sequence information alone. Recently, global statistical inference methods (e.g. direct-coupling analysis, sparse inverse covariance estimation) have achieved a breakthrough towards this aim, and their predictions have been successfully implemented into tertiary and quaternary protein structure prediction methods. However, due to the discrete nature of the underlying variable (amino-acids), exact inference requires exponential time in the protein length, and efficient approximations are needed for practical applicability. Here we propose a very efficient multivariate Gaussian modeling approach as a variant of direct-coupling analysis: the discrete amino-acid variables are replaced by continuous Gaussian random variables. The resulting statistical inference problem is efficiently and exactly solvable. We show that the quality of inference is comparable or superior to the one achieved by mean-field approximations to inference with discrete variables, as done by direct-coupling analysis. This is true for (i) the prediction of residue-residue contacts in proteins, and (ii) the identification of protein-protein interaction partner in bacterial signal transduction. An implementation of our multivariate Gaussian approach is available at the website http://areeweb.polito.it/ricerca/cmp/code. PMID:24663061
The space of ultrametric phylogenetic trees.
Gavryushkin, Alex; Drummond, Alexei J
2016-08-21
The reliability of a phylogenetic inference method from genomic sequence data is ensured by its statistical consistency. Bayesian inference methods produce a sample of phylogenetic trees from the posterior distribution given sequence data. Hence the question of statistical consistency of such methods is equivalent to the consistency of the summary of the sample. More generally, statistical consistency is ensured by the tree space used to analyse the sample. In this paper, we consider two standard parameterisations of phylogenetic time-trees used in evolutionary models: inter-coalescent interval lengths and absolute times of divergence events. For each of these parameterisations we introduce a natural metric space on ultrametric phylogenetic trees. We compare the introduced spaces with existing models of tree space and formulate several formal requirements that a metric space on phylogenetic trees must possess in order to be a satisfactory space for statistical analysis, and justify them. We show that only a few known constructions of the space of phylogenetic trees satisfy these requirements. However, our results suggest that these basic requirements are not enough to distinguish between the two metric spaces we introduce and that the choice between metric spaces requires additional properties to be considered. Particularly, that the summary tree minimising the square distance to the trees from the sample might be different for different parameterisations. This suggests that further fundamental insight is needed into the problem of statistical consistency of phylogenetic inference methods. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
Assessing risk factors for dental caries: a statistical modeling approach.
Trottini, Mario; Bossù, Maurizio; Corridore, Denise; Ierardo, Gaetano; Luzzi, Valeria; Saccucci, Matteo; Polimeni, Antonella
2015-01-01
The problem of identifying potential determinants and predictors of dental caries is of key importance in caries research and it has received considerable attention in the scientific literature. From the methodological side, a broad range of statistical models is currently available to analyze dental caries indices (DMFT, dmfs, etc.). These models have been applied in several studies to investigate the impact of different risk factors on the cumulative severity of dental caries experience. However, in most of the cases (i) these studies focus on a very specific subset of risk factors; and (ii) in the statistical modeling only few candidate models are considered and model selection is at best only marginally addressed. As a result, our understanding of the robustness of the statistical inferences with respect to the choice of the model is very limited; the richness of the set of statistical models available for analysis in only marginally exploited; and inferences could be biased due the omission of potentially important confounding variables in the model's specification. In this paper we argue that these limitations can be overcome considering a general class of candidate models and carefully exploring the model space using standard model selection criteria and measures of global fit and predictive performance of the candidate models. Strengths and limitations of the proposed approach are illustrated with a real data set. In our illustration the model space contains more than 2.6 million models, which require inferences to be adjusted for 'optimism'.
Identifying biologically relevant differences between metagenomic communities.
Parks, Donovan H; Beiko, Robert G
2010-03-15
Metagenomics is the study of genetic material recovered directly from environmental samples. Taxonomic and functional differences between metagenomic samples can highlight the influence of ecological factors on patterns of microbial life in a wide range of habitats. Statistical hypothesis tests can help us distinguish ecological influences from sampling artifacts, but knowledge of only the P-value from a statistical hypothesis test is insufficient to make inferences about biological relevance. Current reporting practices for pairwise comparative metagenomics are inadequate, and better tools are needed for comparative metagenomic analysis. We have developed a new software package, STAMP, for comparative metagenomics that supports best practices in analysis and reporting. Examination of a pair of iron mine metagenomes demonstrates that deeper biological insights can be gained using statistical techniques available in our software. An analysis of the functional potential of 'Candidatus Accumulibacter phosphatis' in two enhanced biological phosphorus removal metagenomes identified several subsystems that differ between the A.phosphatis stains in these related communities, including phosphate metabolism, secretion and metal transport. Python source code and binaries are freely available from our website at http://kiwi.cs.dal.ca/Software/STAMP CONTACT: beiko@cs.dal.ca Supplementary data are available at Bioinformatics online.
The epistemological status of general circulation models
NASA Astrophysics Data System (ADS)
Loehle, Craig
2018-03-01
Forecasts of both likely anthropogenic effects on climate and consequent effects on nature and society are based on large, complex software tools called general circulation models (GCMs). Forecasts generated by GCMs have been used extensively in policy decisions related to climate change. However, the relation between underlying physical theories and results produced by GCMs is unclear. In the case of GCMs, many discretizations and approximations are made, and simulating Earth system processes is far from simple and currently leads to some results with unknown energy balance implications. Statistical testing of GCM forecasts for degree of agreement with data would facilitate assessment of fitness for use. If model results need to be put on an anomaly basis due to model bias, then both visual and quantitative measures of model fit depend strongly on the reference period used for normalization, making testing problematic. Epistemology is here applied to problems of statistical inference during testing, the relationship between the underlying physics and the models, the epistemic meaning of ensemble statistics, problems of spatial and temporal scale, the existence or not of an unforced null for climate fluctuations, the meaning of existing uncertainty estimates, and other issues. Rigorous reasoning entails carefully quantifying levels of uncertainty.
de la Rúa, Nicholas M.; Bustamante, Dulce M.; Menes, Marianela; Stevens, Lori; Monroy, Carlota; Kilpatrick, William; Rizzo, Donna; Klotz, Stephen A.; Schmidt, Justin; Axen, Heather J.; Dorn, Patricia L.
2014-01-01
Phylogenetic relationships of insect vectors of parasitic diseases are important for understanding the evolution of epidemiologically relevant traits, and may be useful in vector control. The subfamily Triatominae (Hemiptera:Reduviidae) includes ~140 extant species arranged in five tribes comprised of 15 genera. The genus Triatoma is the most species-rich and contains important vectors of Trypanosoma cruzi, the causative agent of Chagas disease. Triatoma species were grouped into complexes originally by morphology and more recently with the addition of information from molecular phylogenetics (the four-complex hypothesis); however, without a strict adherence to monophyly. To date, the validity of proposed species complexes has not been tested by statistical tests of topology. The goal of this study was to clarify the systematics of 19 Triatoma species from North and Central America. We inferred their evolutionary relatedness using two independent data sets: the complete nuclear Internal Transcribed Spacer-2 ribosomal DNA (ITS-2 rDNA) and head morphometrics. In addition, we used the Shimodaira-Hasegawa statistical test of topology to assess the fit of the data to a set of competing systematic hypotheses (topologies). An unconstrained topology inferred from the ITS-2 data was compared to topologies constrained based on the four-complex hypothesis or one inferred from our morphometry results. The unconstrained topology represents a statistically significant better fit of the molecular data than either the four-complex or the morphometric topology. We propose an update to the composition of species complexes in the North and Central American Triatoma, based on a phylogeny inferred from ITS-2 as a first step towards updating the phylogeny of the complexes based on monophyly and statistical tests of topologies. PMID:24681261
Emerging Concepts of Data Integration in Pathogen Phylodynamics.
Baele, Guy; Suchard, Marc A; Rambaut, Andrew; Lemey, Philippe
2017-01-01
Phylodynamics has become an increasingly popular statistical framework to extract evolutionary and epidemiological information from pathogen genomes. By harnessing such information, epidemiologists aim to shed light on the spatio-temporal patterns of spread and to test hypotheses about the underlying interaction of evolutionary and ecological dynamics in pathogen populations. Although the field has witnessed a rich development of statistical inference tools with increasing levels of sophistication, these tools initially focused on sequences as their sole primary data source. Integrating various sources of information, however, promises to deliver more precise insights in infectious diseases and to increase opportunities for statistical hypothesis testing. Here, we review how the emerging concept of data integration is stimulating new advances in Bayesian evolutionary inference methodology which formalize a marriage of statistical thinking and evolutionary biology. These approaches include connecting sequence to trait evolution, such as for host, phenotypic and geographic sampling information, but also the incorporation of covariates of evolutionary and epidemic processes in the reconstruction procedures. We highlight how a full Bayesian approach to covariate modeling and testing can generate further insights into sequence evolution, trait evolution, and population dynamics in pathogen populations. Specific examples demonstrate how such approaches can be used to test the impact of host on rabies and HIV evolutionary rates, to identify the drivers of influenza dispersal as well as the determinants of rabies cross-species transmissions, and to quantify the evolutionary dynamics of influenza antigenicity. Finally, we briefly discuss how data integration is now also permeating through the inference of transmission dynamics, leading to novel insights into tree-generative processes and detailed reconstructions of transmission trees. [Bayesian inference; birth–death models; coalescent models; continuous trait evolution; covariates; data integration; discrete trait evolution; pathogen phylodynamics.
Emerging Concepts of Data Integration in Pathogen Phylodynamics
Baele, Guy; Suchard, Marc A.; Rambaut, Andrew; Lemey, Philippe
2017-01-01
Phylodynamics has become an increasingly popular statistical framework to extract evolutionary and epidemiological information from pathogen genomes. By harnessing such information, epidemiologists aim to shed light on the spatio-temporal patterns of spread and to test hypotheses about the underlying interaction of evolutionary and ecological dynamics in pathogen populations. Although the field has witnessed a rich development of statistical inference tools with increasing levels of sophistication, these tools initially focused on sequences as their sole primary data source. Integrating various sources of information, however, promises to deliver more precise insights in infectious diseases and to increase opportunities for statistical hypothesis testing. Here, we review how the emerging concept of data integration is stimulating new advances in Bayesian evolutionary inference methodology which formalize a marriage of statistical thinking and evolutionary biology. These approaches include connecting sequence to trait evolution, such as for host, phenotypic and geographic sampling information, but also the incorporation of covariates of evolutionary and epidemic processes in the reconstruction procedures. We highlight how a full Bayesian approach to covariate modeling and testing can generate further insights into sequence evolution, trait evolution, and population dynamics in pathogen populations. Specific examples demonstrate how such approaches can be used to test the impact of host on rabies and HIV evolutionary rates, to identify the drivers of influenza dispersal as well as the determinants of rabies cross-species transmissions, and to quantify the evolutionary dynamics of influenza antigenicity. Finally, we briefly discuss how data integration is now also permeating through the inference of transmission dynamics, leading to novel insights into tree-generative processes and detailed reconstructions of transmission trees. [Bayesian inference; birth–death models; coalescent models; continuous trait evolution; covariates; data integration; discrete trait evolution; pathogen phylodynamics. PMID:28173504
Liao, Pei-Hung; Hsu, Pei-Ti; Chu, William; Chu, Woei-Chyn
2015-06-01
This study applied artificial intelligence to help nurses address problems and receive instructions through information technology. Nurses make diagnoses according to professional knowledge, clinical experience, and even instinct. Without comprehensive knowledge and thinking, diagnostic accuracy can be compromised and decisions may be delayed. We used a back-propagation neural network and other tools for data mining and statistical analysis. We further compared the prediction accuracy of the previous methods with an adaptive-network-based fuzzy inference system and the back-propagation neural network, identifying differences in the questions and in nurse satisfaction levels before and after using the nursing information system. This study investigated the use of artificial intelligence to generate nursing diagnoses. The percentage of agreement between diagnoses suggested by the information system and those made by nurses was as much as 87 percent. When patients are hospitalized, we can calculate the probability of various nursing diagnoses based on certain characteristics. © The Author(s) 2013.
Children use partial resource sharing as a cue to friendship.
Liberman, Zoe; Shaw, Alex
2017-07-01
Resource sharing is an important aspect of human society, and how resources are distributed can provide people with crucial information about social structure. Indeed, a recent partiality account of resource distribution suggested that people may use unequal partial resource distributions to make inferences about a distributor's social affiliations. To empirically test this suggestion derived from the theoretical argument of the partiality account, we presented 4- to 9-year-old children with distributors who gave out resources unequally using either a partial procedure (intentionally choosing which recipient would get more) or an impartial procedure (rolling a die to determine which recipient would get more) and asked children to make judgments about whom the distributor was better friends with. At each age tested, children expected a distributor who gave partially to be better friends with the favored recipient (Studies 1-3). Interestingly, younger children (4- to 6-year-olds) inferred friendship between the distributor and the favored recipient even in cases where the distributor used an impartial procedure, whereas older children (7- to 9-year-olds) did not infer friendship based on impartial distributions (Study 1). These studies demonstrate that children use third-party resource distributions to make important predictions about the social world and add to our knowledge about the developmental trajectory of understanding the importance of partiality in addition to inequity when making social inferences. Copyright © 2017 Elsevier Inc. All rights reserved.
Hsieh, PingHsun; Woerner, August E; Wall, Jeffrey D; Lachance, Joseph; Tishkoff, Sarah A; Gutenkunst, Ryan N; Hammer, Michael F
2016-03-01
Comparisons of whole-genome sequences from ancient and contemporary samples have pointed to several instances of archaic admixture through interbreeding between the ancestors of modern non-Africans and now extinct hominids such as Neanderthals and Denisovans. One implication of these findings is that some adaptive features in contemporary humans may have entered the population via gene flow with archaic forms in Eurasia. Within Africa, fossil evidence suggests that anatomically modern humans (AMH) and various archaic forms coexisted for much of the last 200,000 yr; however, the absence of ancient DNA in Africa has limited our ability to make a direct comparison between archaic and modern human genomes. Here, we use statistical inference based on high coverage whole-genome data (greater than 60×) from contemporary African Pygmy hunter-gatherers as an alternative means to study the evolutionary history of the genus Homo. Using whole-genome simulations that consider demographic histories that include both isolation and gene flow with neighboring farming populations, our inference method rejects the hypothesis that the ancestors of AMH were genetically isolated in Africa, thus providing the first whole genome-level evidence of African archaic admixture. Our inferences also suggest a complex human evolutionary history in Africa, which involves at least a single admixture event from an unknown archaic population into the ancestors of AMH, likely within the last 30,000 yr. © 2016 Hsieh et al.; Published by Cold Spring Harbor Laboratory Press.
Bayesian microsaccade detection
Mihali, Andra; van Opheusden, Bas; Ma, Wei Ji
2017-01-01
Microsaccades are high-velocity fixational eye movements, with special roles in perception and cognition. The default microsaccade detection method is to determine when the smoothed eye velocity exceeds a threshold. We have developed a new method, Bayesian microsaccade detection (BMD), which performs inference based on a simple statistical model of eye positions. In this model, a hidden state variable changes between drift and microsaccade states at random times. The eye position is a biased random walk with different velocity distributions for each state. BMD generates samples from the posterior probability distribution over the eye state time series given the eye position time series. Applied to simulated data, BMD recovers the “true” microsaccades with fewer errors than alternative algorithms, especially at high noise. Applied to EyeLink eye tracker data, BMD detects almost all the microsaccades detected by the default method, but also apparent microsaccades embedded in high noise—although these can also be interpreted as false positives. Next we apply the algorithms to data collected with a Dual Purkinje Image eye tracker, whose higher precision justifies defining the inferred microsaccades as ground truth. When we add artificial measurement noise, the inferences of all algorithms degrade; however, at noise levels comparable to EyeLink data, BMD recovers the “true” microsaccades with 54% fewer errors than the default algorithm. Though unsuitable for online detection, BMD has other advantages: It returns probabilities rather than binary judgments, and it can be straightforwardly adapted as the generative model is refined. We make our algorithm available as a software package. PMID:28114483
Nonlinear Inference in Partially Observed Physical Systems and Deep Neural Networks
NASA Astrophysics Data System (ADS)
Rozdeba, Paul J.
The problem of model state and parameter estimation is a significant challenge in nonlinear systems. Due to practical considerations of experimental design, it is often the case that physical systems are partially observed, meaning that data is only available for a subset of the degrees of freedom required to fully model the observed system's behaviors and, ultimately, predict future observations. Estimation in this context is highly complicated by the presence of chaos, stochasticity, and measurement noise in dynamical systems. One of the aims of this dissertation is to simultaneously analyze state and parameter estimation in as a regularized inverse problem, where the introduction of a model makes it possible to reverse the forward problem of partial, noisy observation; and as a statistical inference problem using data assimilation to transfer information from measurements to the model states and parameters. Ultimately these two formulations achieve the same goal. Similar aspects that appear in both are highlighted as a means for better understanding the structure of the nonlinear inference problem. An alternative approach to data assimilation that uses model reduction is then examined as a way to eliminate unresolved nonlinear gating variables from neuron models. In this formulation, only measured variables enter into the model, and the resulting errors are themselves modeled by nonlinear stochastic processes with memory. Finally, variational annealing, a data assimilation method previously applied to dynamical systems, is introduced as a potentially useful tool for understanding deep neural network training in machine learning by exploiting similarities between the two problems.
Su, Chun-Lung; Gardner, Ian A; Johnson, Wesley O
2004-07-30
The two-test two-population model, originally formulated by Hui and Walter, for estimation of test accuracy and prevalence estimation assumes conditionally independent tests, constant accuracy across populations and binomial sampling. The binomial assumption is incorrect if all individuals in a population e.g. child-care centre, village in Africa, or a cattle herd are sampled or if the sample size is large relative to population size. In this paper, we develop statistical methods for evaluating diagnostic test accuracy and prevalence estimation based on finite sample data in the absence of a gold standard. Moreover, two tests are often applied simultaneously for the purpose of obtaining a 'joint' testing strategy that has either higher overall sensitivity or specificity than either of the two tests considered singly. Sequential versions of such strategies are often applied in order to reduce the cost of testing. We thus discuss joint (simultaneous and sequential) testing strategies and inference for them. Using the developed methods, we analyse two real and one simulated data sets, and we compare 'hypergeometric' and 'binomial-based' inferences. Our findings indicate that the posterior standard deviations for prevalence (but not sensitivity and specificity) based on finite population sampling tend to be smaller than their counterparts for infinite population sampling. Finally, we make recommendations about how small the sample size should be relative to the population size to warrant use of the binomial model for prevalence estimation. Copyright 2004 John Wiley & Sons, Ltd.
Erguler, Kamil; Stumpf, Michael P H
2011-05-01
The size and complexity of cellular systems make building predictive models an extremely difficult task. In principle dynamical time-course data can be used to elucidate the structure of the underlying molecular mechanisms, but a central and recurring problem is that many and very different models can be fitted to experimental data, especially when the latter are limited and subject to noise. Even given a model, estimating its parameters remains challenging in real-world systems. Here we present a comprehensive analysis of 180 systems biology models, which allows us to classify the parameters with respect to their contribution to the overall dynamical behaviour of the different systems. Our results reveal candidate elements of control in biochemical pathways that differentially contribute to dynamics. We introduce sensitivity profiles that concisely characterize parameter sensitivity and demonstrate how this can be connected to variability in data. Systematically linking data and model sloppiness allows us to extract features of dynamical systems that determine how well parameters can be estimated from time-course measurements, and associates the extent of data required for parameter inference with the model structure, and also with the global dynamical state of the system. The comprehensive analysis of so many systems biology models reaffirms the inability to estimate precisely most model or kinetic parameters as a generic feature of dynamical systems, and provides safe guidelines for performing better inferences and model predictions in the context of reverse engineering of mathematical models for biological systems.
Human Inferences about Sequences: A Minimal Transition Probability Model
2016-01-01
The brain constantly infers the causes of the inputs it receives and uses these inferences to generate statistical expectations about future observations. Experimental evidence for these expectations and their violations include explicit reports, sequential effects on reaction times, and mismatch or surprise signals recorded in electrophysiology and functional MRI. Here, we explore the hypothesis that the brain acts as a near-optimal inference device that constantly attempts to infer the time-varying matrix of transition probabilities between the stimuli it receives, even when those stimuli are in fact fully unpredictable. This parsimonious Bayesian model, with a single free parameter, accounts for a broad range of findings on surprise signals, sequential effects and the perception of randomness. Notably, it explains the pervasive asymmetry between repetitions and alternations encountered in those studies. Our analysis suggests that a neural machinery for inferring transition probabilities lies at the core of human sequence knowledge. PMID:28030543
Stan: A Probabilistic Programming Language for Bayesian Inference and Optimization
ERIC Educational Resources Information Center
Gelman, Andrew; Lee, Daniel; Guo, Jiqiang
2015-01-01
Stan is a free and open-source C++ program that performs Bayesian inference or optimization for arbitrary user-specified models and can be called from the command line, R, Python, Matlab, or Julia and has great promise for fitting large and complex statistical models in many areas of application. We discuss Stan from users' and developers'…
IMNN: Information Maximizing Neural Networks
NASA Astrophysics Data System (ADS)
Charnock, Tom; Lavaux, Guilhem; Wandelt, Benjamin D.
2018-04-01
This software trains artificial neural networks to find non-linear functionals of data that maximize Fisher information: information maximizing neural networks (IMNNs). As compressing large data sets vastly simplifies both frequentist and Bayesian inference, important information may be inadvertently missed. Likelihood-free inference based on automatically derived IMNN summaries produces summaries that are good approximations to sufficient statistics. IMNNs are robustly capable of automatically finding optimal, non-linear summaries of the data even in cases where linear compression fails: inferring the variance of Gaussian signal in the presence of noise, inferring cosmological parameters from mock simulations of the Lyman-α forest in quasar spectra, and inferring frequency-domain parameters from LISA-like detections of gravitational waveforms. In this final case, the IMNN summary outperforms linear data compression by avoiding the introduction of spurious likelihood maxima.
Anchoring quartet-based phylogenetic distances and applications to species tree reconstruction.
Sayyari, Erfan; Mirarab, Siavash
2016-11-11
Inferring species trees from gene trees using the coalescent-based summary methods has been the subject of much attention, yet new scalable and accurate methods are needed. We introduce DISTIQUE, a new statistically consistent summary method for inferring species trees from gene trees under the coalescent model. We generalize our results to arbitrary phylogenetic inference problems; we show that two arbitrarily chosen leaves, called anchors, can be used to estimate relative distances between all other pairs of leaves by inferring relevant quartet trees. This results in a family of distance-based tree inference methods, with running times ranging between quadratic to quartic in the number of leaves. We show in simulated studies that DISTIQUE has comparable accuracy to leading coalescent-based summary methods and reduced running times.
Improving Explanatory Inferences from Assessments
ERIC Educational Resources Information Center
Diakow, Ronli Phyllis
2013-01-01
This dissertation comprises three papers that propose, discuss, and illustrate models to make improved inferences about research questions regarding student achievement in education. Addressing the types of questions common in educational research today requires three different "extensions" to traditional educational assessment: (1)…