Data Analysis Techniques for Physical Scientists
NASA Astrophysics Data System (ADS)
Pruneau, Claude A.
2017-10-01
Preface; How to read this book; 1. The scientific method; Part I. Foundation in Probability and Statistics: 2. Probability; 3. Probability models; 4. Classical inference I: estimators; 5. Classical inference II: optimization; 6. Classical inference III: confidence intervals and statistical tests; 7. Bayesian inference; Part II. Measurement Techniques: 8. Basic measurements; 9. Event reconstruction; 10. Correlation functions; 11. The multiple facets of correlation functions; 12. Data correction methods; Part III. Simulation Techniques: 13. Monte Carlo methods; 14. Collision and detector modeling; List of references; Index.
Statistical Inference and Patterns of Inequality in the Global North
ERIC Educational Resources Information Center
Moran, Timothy Patrick
2006-01-01
Cross-national inequality trends have historically been a crucial field of inquiry across the social sciences, and new methodological techniques of statistical inference have recently improved the ability to analyze these trends over time. This paper applies Monte Carlo, bootstrap inference methods to the income surveys of the Luxembourg Income…
Statistics for nuclear engineers and scientists. Part 1. Basic statistical inference
DOE Office of Scientific and Technical Information (OSTI.GOV)
Beggs, W.J.
1981-02-01
This report is intended for the use of engineers and scientists working in the nuclear industry, especially at the Bettis Atomic Power Laboratory. It serves as the basis for several Bettis in-house statistics courses. The objectives of the report are to introduce the reader to the language and concepts of statistics and to provide a basic set of techniques to apply to problems of the collection and analysis of data. Part 1 covers subjects of basic inference. The subjects include: descriptive statistics; probability; simple inference for normally distributed populations, and for non-normal populations as well; comparison of two populations; themore » analysis of variance; quality control procedures; and linear regression analysis.« less
Data mining and statistical inference in selective laser melting
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kamath, Chandrika
Selective laser melting (SLM) is an additive manufacturing process that builds a complex three-dimensional part, layer-by-layer, using a laser beam to fuse fine metal powder together. The design freedom afforded by SLM comes associated with complexity. As the physical phenomena occur over a broad range of length and time scales, the computational cost of modeling the process is high. At the same time, the large number of parameters that control the quality of a part make experiments expensive. In this paper, we describe ways in which we can use data mining and statistical inference techniques to intelligently combine simulations andmore » experiments to build parts with desired properties. We start with a brief summary of prior work in finding process parameters for high-density parts. We then expand on this work to show how we can improve the approach by using feature selection techniques to identify important variables, data-driven surrogate models to reduce computational costs, improved sampling techniques to cover the design space adequately, and uncertainty analysis for statistical inference. Here, our results indicate that techniques from data mining and statistics can complement those from physical modeling to provide greater insight into complex processes such as selective laser melting.« less
Data mining and statistical inference in selective laser melting
Kamath, Chandrika
2016-01-11
Selective laser melting (SLM) is an additive manufacturing process that builds a complex three-dimensional part, layer-by-layer, using a laser beam to fuse fine metal powder together. The design freedom afforded by SLM comes associated with complexity. As the physical phenomena occur over a broad range of length and time scales, the computational cost of modeling the process is high. At the same time, the large number of parameters that control the quality of a part make experiments expensive. In this paper, we describe ways in which we can use data mining and statistical inference techniques to intelligently combine simulations andmore » experiments to build parts with desired properties. We start with a brief summary of prior work in finding process parameters for high-density parts. We then expand on this work to show how we can improve the approach by using feature selection techniques to identify important variables, data-driven surrogate models to reduce computational costs, improved sampling techniques to cover the design space adequately, and uncertainty analysis for statistical inference. Here, our results indicate that techniques from data mining and statistics can complement those from physical modeling to provide greater insight into complex processes such as selective laser melting.« less
Variation in reaction norms: Statistical considerations and biological interpretation.
Morrissey, Michael B; Liefting, Maartje
2016-09-01
Analysis of reaction norms, the functions by which the phenotype produced by a given genotype depends on the environment, is critical to studying many aspects of phenotypic evolution. Different techniques are available for quantifying different aspects of reaction norm variation. We examine what biological inferences can be drawn from some of the more readily applicable analyses for studying reaction norms. We adopt a strongly biologically motivated view, but draw on statistical theory to highlight strengths and drawbacks of different techniques. In particular, consideration of some formal statistical theory leads to revision of some recently, and forcefully, advocated opinions on reaction norm analysis. We clarify what simple analysis of the slope between mean phenotype in two environments can tell us about reaction norms, explore the conditions under which polynomial regression can provide robust inferences about reaction norm shape, and explore how different existing approaches may be used to draw inferences about variation in reaction norm shape. We show how mixed model-based approaches can provide more robust inferences than more commonly used multistep statistical approaches, and derive new metrics of the relative importance of variation in reaction norm intercepts, slopes, and curvatures. © 2016 The Author(s). Evolution © 2016 The Society for the Study of Evolution.
Simulation and statistics: Like rhythm and song
NASA Astrophysics Data System (ADS)
Othman, Abdul Rahman
2013-04-01
Simulation has been introduced to solve problems in the form of systems. By using this technique the following two problems can be overcome. First, a problem that has an analytical solution but the cost of running an experiment to solve is high in terms of money and lives. Second, a problem exists but has no analytical solution. In the field of statistical inference the second problem is often encountered. With the advent of high-speed computing devices, a statistician can now use resampling techniques such as the bootstrap and permutations to form pseudo sampling distribution that will lead to the solution of the problem that cannot be solved analytically. This paper discusses how a Monte Carlo simulation was and still being used to verify the analytical solution in inference. This paper also discusses the resampling techniques as simulation techniques. The misunderstandings about these two techniques are examined. The successful usages of both techniques are also explained.
The Use of a Context-Based Information Retrieval Technique
2009-07-01
provided in context. Latent Semantic Analysis (LSA) is a statistical technique for inferring contextual and structural information, and previous studies...WAIS). 10 DSTO-TR-2322 1.4.4 Latent Semantic Analysis LSA, which is also known as latent semantic indexing (LSI), uses a statistical and...1.4.6 Language Models In contrast, natural language models apply algorithms that combine statistical information with semantic information. Semantic
Information Entropy Production of Maximum Entropy Markov Chains from Spike Trains
NASA Astrophysics Data System (ADS)
Cofré, Rodrigo; Maldonado, Cesar
2018-01-01
We consider the maximum entropy Markov chain inference approach to characterize the collective statistics of neuronal spike trains, focusing on the statistical properties of the inferred model. We review large deviations techniques useful in this context to describe properties of accuracy and convergence in terms of sampling size. We use these results to study the statistical fluctuation of correlations, distinguishability and irreversibility of maximum entropy Markov chains. We illustrate these applications using simple examples where the large deviation rate function is explicitly obtained for maximum entropy models of relevance in this field.
Statistical Inference for Big Data Problems in Molecular Biophysics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ramanathan, Arvind; Savol, Andrej; Burger, Virginia
2012-01-01
We highlight the role of statistical inference techniques in providing biological insights from analyzing long time-scale molecular simulation data. Technologi- cal and algorithmic improvements in computation have brought molecular simu- lations to the forefront of techniques applied to investigating the basis of living systems. While these longer simulations, increasingly complex reaching petabyte scales presently, promise a detailed view into microscopic behavior, teasing out the important information has now become a true challenge on its own. Mining this data for important patterns is critical to automating therapeutic intervention discovery, improving protein design, and fundamentally understanding the mech- anistic basis of cellularmore » homeostasis.« less
Feder, Paul I; Ma, Zhenxu J; Bull, Richard J; Teuschler, Linda K; Rice, Glenn
2009-01-01
In chemical mixtures risk assessment, the use of dose-response data developed for one mixture to estimate risk posed by a second mixture depends on whether the two mixtures are sufficiently similar. While evaluations of similarity may be made using qualitative judgments, this article uses nonparametric statistical methods based on the "bootstrap" resampling technique to address the question of similarity among mixtures of chemical disinfectant by-products (DBP) in drinking water. The bootstrap resampling technique is a general-purpose, computer-intensive approach to statistical inference that substitutes empirical sampling for theoretically based parametric mathematical modeling. Nonparametric, bootstrap-based inference involves fewer assumptions than parametric normal theory based inference. The bootstrap procedure is appropriate, at least in an asymptotic sense, whether or not the parametric, distributional assumptions hold, even approximately. The statistical analysis procedures in this article are initially illustrated with data from 5 water treatment plants (Schenck et al., 2009), and then extended using data developed from a study of 35 drinking-water utilities (U.S. EPA/AMWA, 1989), which permits inclusion of a greater number of water constituents and increased structure in the statistical models.
The Role of the Sampling Distribution in Understanding Statistical Inference
ERIC Educational Resources Information Center
Lipson, Kay
2003-01-01
Many statistics educators believe that few students develop the level of conceptual understanding essential for them to apply correctly the statistical techniques at their disposal and to interpret their outcomes appropriately. It is also commonly believed that the sampling distribution plays an important role in developing this understanding.…
In defence of model-based inference in phylogeography
Beaumont, Mark A.; Nielsen, Rasmus; Robert, Christian; Hey, Jody; Gaggiotti, Oscar; Knowles, Lacey; Estoup, Arnaud; Panchal, Mahesh; Corander, Jukka; Hickerson, Mike; Sisson, Scott A.; Fagundes, Nelson; Chikhi, Lounès; Beerli, Peter; Vitalis, Renaud; Cornuet, Jean-Marie; Huelsenbeck, John; Foll, Matthieu; Yang, Ziheng; Rousset, Francois; Balding, David; Excoffier, Laurent
2017-01-01
Recent papers have promoted the view that model-based methods in general, and those based on Approximate Bayesian Computation (ABC) in particular, are flawed in a number of ways, and are therefore inappropriate for the analysis of phylogeographic data. These papers further argue that Nested Clade Phylogeographic Analysis (NCPA) offers the best approach in statistical phylogeography. In order to remove the confusion and misconceptions introduced by these papers, we justify and explain the reasoning behind model-based inference. We argue that ABC is a statistically valid approach, alongside other computational statistical techniques that have been successfully used to infer parameters and compare models in population genetics. We also examine the NCPA method and highlight numerous deficiencies, either when used with single or multiple loci. We further show that the ages of clades are carelessly used to infer ages of demographic events, that these ages are estimated under a simple model of panmixia and population stationarity but are then used under different and unspecified models to test hypotheses, a usage the invalidates these testing procedures. We conclude by encouraging researchers to study and use model-based inference in population genetics. PMID:29284924
Statistical analysis of fNIRS data: a comprehensive review.
Tak, Sungho; Ye, Jong Chul
2014-01-15
Functional near-infrared spectroscopy (fNIRS) is a non-invasive method to measure brain activities using the changes of optical absorption in the brain through the intact skull. fNIRS has many advantages over other neuroimaging modalities such as positron emission tomography (PET), functional magnetic resonance imaging (fMRI), or magnetoencephalography (MEG), since it can directly measure blood oxygenation level changes related to neural activation with high temporal resolution. However, fNIRS signals are highly corrupted by measurement noises and physiology-based systemic interference. Careful statistical analyses are therefore required to extract neuronal activity-related signals from fNIRS data. In this paper, we provide an extensive review of historical developments of statistical analyses of fNIRS signal, which include motion artifact correction, short source-detector separation correction, principal component analysis (PCA)/independent component analysis (ICA), false discovery rate (FDR), serially-correlated errors, as well as inference techniques such as the standard t-test, F-test, analysis of variance (ANOVA), and statistical parameter mapping (SPM) framework. In addition, to provide a unified view of various existing inference techniques, we explain a linear mixed effect model with restricted maximum likelihood (ReML) variance estimation, and show that most of the existing inference methods for fNIRS analysis can be derived as special cases. Some of the open issues in statistical analysis are also described. Copyright © 2013 Elsevier Inc. All rights reserved.
Assessing dynamics, spatial scale, and uncertainty in task-related brain network analyses
Stephen, Emily P.; Lepage, Kyle Q.; Eden, Uri T.; Brunner, Peter; Schalk, Gerwin; Brumberg, Jonathan S.; Guenther, Frank H.; Kramer, Mark A.
2014-01-01
The brain is a complex network of interconnected elements, whose interactions evolve dynamically in time to cooperatively perform specific functions. A common technique to probe these interactions involves multi-sensor recordings of brain activity during a repeated task. Many techniques exist to characterize the resulting task-related activity, including establishing functional networks, which represent the statistical associations between brain areas. Although functional network inference is commonly employed to analyze neural time series data, techniques to assess the uncertainty—both in the functional network edges and the corresponding aggregate measures of network topology—are lacking. To address this, we describe a statistically principled approach for computing uncertainty in functional networks and aggregate network measures in task-related data. The approach is based on a resampling procedure that utilizes the trial structure common in experimental recordings. We show in simulations that this approach successfully identifies functional networks and associated measures of confidence emergent during a task in a variety of scenarios, including dynamically evolving networks. In addition, we describe a principled technique for establishing functional networks based on predetermined regions of interest using canonical correlation. Doing so provides additional robustness to the functional network inference. Finally, we illustrate the use of these methods on example invasive brain voltage recordings collected during an overt speech task. The general strategy described here—appropriate for static and dynamic network inference and different statistical measures of coupling—permits the evaluation of confidence in network measures in a variety of settings common to neuroscience. PMID:24678295
Assessing dynamics, spatial scale, and uncertainty in task-related brain network analyses.
Stephen, Emily P; Lepage, Kyle Q; Eden, Uri T; Brunner, Peter; Schalk, Gerwin; Brumberg, Jonathan S; Guenther, Frank H; Kramer, Mark A
2014-01-01
The brain is a complex network of interconnected elements, whose interactions evolve dynamically in time to cooperatively perform specific functions. A common technique to probe these interactions involves multi-sensor recordings of brain activity during a repeated task. Many techniques exist to characterize the resulting task-related activity, including establishing functional networks, which represent the statistical associations between brain areas. Although functional network inference is commonly employed to analyze neural time series data, techniques to assess the uncertainty-both in the functional network edges and the corresponding aggregate measures of network topology-are lacking. To address this, we describe a statistically principled approach for computing uncertainty in functional networks and aggregate network measures in task-related data. The approach is based on a resampling procedure that utilizes the trial structure common in experimental recordings. We show in simulations that this approach successfully identifies functional networks and associated measures of confidence emergent during a task in a variety of scenarios, including dynamically evolving networks. In addition, we describe a principled technique for establishing functional networks based on predetermined regions of interest using canonical correlation. Doing so provides additional robustness to the functional network inference. Finally, we illustrate the use of these methods on example invasive brain voltage recordings collected during an overt speech task. The general strategy described here-appropriate for static and dynamic network inference and different statistical measures of coupling-permits the evaluation of confidence in network measures in a variety of settings common to neuroscience.
Reading biological processes from nucleotide sequences
NASA Astrophysics Data System (ADS)
Murugan, Anand
Cellular processes have traditionally been investigated by techniques of imaging and biochemical analysis of the molecules involved. The recent rapid progress in our ability to manipulate and read nucleic acid sequences gives us direct access to the genetic information that directs and constrains biological processes. While sequence data is being used widely to investigate genotype-phenotype relationships and population structure, here we use sequencing to understand biophysical mechanisms. We present work on two different systems. First, in chapter 2, we characterize the stochastic genetic editing mechanism that produces diverse T-cell receptors in the human immune system. We do this by inferring statistical distributions of the underlying biochemical events that generate T-cell receptor coding sequences from the statistics of the observed sequences. This inferred model quantitatively describes the potential repertoire of T-cell receptors that can be produced by an individual, providing insight into its potential diversity and the probability of generation of any specific T-cell receptor. Then in chapter 3, we present work on understanding the functioning of regulatory DNA sequences in both prokaryotes and eukaryotes. Here we use experiments that measure the transcriptional activity of large libraries of mutagenized promoters and enhancers and infer models of the sequence-function relationship from this data. For the bacterial promoter, we infer a physically motivated 'thermodynamic' model of the interaction of DNA-binding proteins and RNA polymerase determining the transcription rate of the downstream gene. For the eukaryotic enhancers, we infer heuristic models of the sequence-function relationship and use these models to find synthetic enhancer sequences that optimize inducibility of expression. Both projects demonstrate the utility of sequence information in conjunction with sophisticated statistical inference techniques for dissecting underlying biophysical mechanisms.
Johnson, Jason K.; Oyen, Diane Adele; Chertkov, Michael; ...
2016-12-01
Inference and learning of graphical models are both well-studied problems in statistics and machine learning that have found many applications in science and engineering. However, exact inference is intractable in general graphical models, which suggests the problem of seeking the best approximation to a collection of random variables within some tractable family of graphical models. In this paper, we focus on the class of planar Ising models, for which exact inference is tractable using techniques of statistical physics. Based on these techniques and recent methods for planarity testing and planar embedding, we propose a greedy algorithm for learning the bestmore » planar Ising model to approximate an arbitrary collection of binary random variables (possibly from sample data). Given the set of all pairwise correlations among variables, we select a planar graph and optimal planar Ising model defined on this graph to best approximate that set of correlations. Finally, we demonstrate our method in simulations and for two applications: modeling senate voting records and identifying geo-chemical depth trends from Mars rover data.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Johnson, Jason K.; Oyen, Diane Adele; Chertkov, Michael
Inference and learning of graphical models are both well-studied problems in statistics and machine learning that have found many applications in science and engineering. However, exact inference is intractable in general graphical models, which suggests the problem of seeking the best approximation to a collection of random variables within some tractable family of graphical models. In this paper, we focus on the class of planar Ising models, for which exact inference is tractable using techniques of statistical physics. Based on these techniques and recent methods for planarity testing and planar embedding, we propose a greedy algorithm for learning the bestmore » planar Ising model to approximate an arbitrary collection of binary random variables (possibly from sample data). Given the set of all pairwise correlations among variables, we select a planar graph and optimal planar Ising model defined on this graph to best approximate that set of correlations. Finally, we demonstrate our method in simulations and for two applications: modeling senate voting records and identifying geo-chemical depth trends from Mars rover data.« less
On statistical inference in time series analysis of the evolution of road safety.
Commandeur, Jacques J F; Bijleveld, Frits D; Bergel-Hayat, Ruth; Antoniou, Constantinos; Yannis, George; Papadimitriou, Eleonora
2013-11-01
Data collected for building a road safety observatory usually include observations made sequentially through time. Examples of such data, called time series data, include annual (or monthly) number of road traffic accidents, traffic fatalities or vehicle kilometers driven in a country, as well as the corresponding values of safety performance indicators (e.g., data on speeding, seat belt use, alcohol use, etc.). Some commonly used statistical techniques imply assumptions that are often violated by the special properties of time series data, namely serial dependency among disturbances associated with the observations. The first objective of this paper is to demonstrate the impact of such violations to the applicability of standard methods of statistical inference, which leads to an under or overestimation of the standard error and consequently may produce erroneous inferences. Moreover, having established the adverse consequences of ignoring serial dependency issues, the paper aims to describe rigorous statistical techniques used to overcome them. In particular, appropriate time series analysis techniques of varying complexity are employed to describe the development over time, relating the accident-occurrences to explanatory factors such as exposure measures or safety performance indicators, and forecasting the development into the near future. Traditional regression models (whether they are linear, generalized linear or nonlinear) are shown not to naturally capture the inherent dependencies in time series data. Dedicated time series analysis techniques, such as the ARMA-type and DRAG approaches are discussed next, followed by structural time series models, which are a subclass of state space methods. The paper concludes with general recommendations and practice guidelines for the use of time series models in road safety research. Copyright © 2012 Elsevier Ltd. All rights reserved.
Pointwise probability reinforcements for robust statistical inference.
Frénay, Benoît; Verleysen, Michel
2014-02-01
Statistical inference using machine learning techniques may be difficult with small datasets because of abnormally frequent data (AFDs). AFDs are observations that are much more frequent in the training sample that they should be, with respect to their theoretical probability, and include e.g. outliers. Estimates of parameters tend to be biased towards models which support such data. This paper proposes to introduce pointwise probability reinforcements (PPRs): the probability of each observation is reinforced by a PPR and a regularisation allows controlling the amount of reinforcement which compensates for AFDs. The proposed solution is very generic, since it can be used to robustify any statistical inference method which can be formulated as a likelihood maximisation. Experiments show that PPRs can be easily used to tackle regression, classification and projection: models are freed from the influence of outliers. Moreover, outliers can be filtered manually since an abnormality degree is obtained for each observation. Copyright © 2013 Elsevier Ltd. All rights reserved.
Balancing Treatment and Control Groups in Quasi-Experiments: An Introduction to Propensity Scoring
ERIC Educational Resources Information Center
Connelly, Brian S.; Sackett, Paul R.; Waters, Shonna D.
2013-01-01
Organizational and applied sciences have long struggled with improving causal inference in quasi-experiments. We introduce organizational researchers to propensity scoring, a statistical technique that has become popular in other applied sciences as a means for improving internal validity. Propensity scoring statistically models how individuals in…
Secondary Analysis of National Longitudinal Transition Study 2 Data
ERIC Educational Resources Information Center
Hicks, Tyler A.; Knollman, Greg A.
2015-01-01
This review examines published secondary analyses of National Longitudinal Transition Study 2 (NLTS2) data, with a primary focus upon statistical objectives, paradigms, inferences, and methods. Its primary purpose was to determine which statistical techniques have been common in secondary analyses of NLTS2 data. The review begins with an…
Phylogeography Takes a Relaxed Random Walk in Continuous Space and Time
Lemey, Philippe; Rambaut, Andrew; Welch, John J.; Suchard, Marc A.
2010-01-01
Research aimed at understanding the geographic context of evolutionary histories is burgeoning across biological disciplines. Recent endeavors attempt to interpret contemporaneous genetic variation in the light of increasingly detailed geographical and environmental observations. Such interest has promoted the development of phylogeographic inference techniques that explicitly aim to integrate such heterogeneous data. One promising development involves reconstructing phylogeographic history on a continuous landscape. Here, we present a Bayesian statistical approach to infer continuous phylogeographic diffusion using random walk models while simultaneously reconstructing the evolutionary history in time from molecular sequence data. Moreover, by accommodating branch-specific variation in dispersal rates, we relax the most restrictive assumption of the standard Brownian diffusion process and demonstrate increased statistical efficiency in spatial reconstructions of overdispersed random walks by analyzing both simulated and real viral genetic data. We further illustrate how drawing inference about summary statistics from a fully specified stochastic process over both sequence evolution and spatial movement reveals important characteristics of a rabies epidemic. Together with recent advances in discrete phylogeographic inference, the continuous model developments furnish a flexible statistical framework for biogeographical reconstructions that is easily expanded upon to accommodate various landscape genetic features. PMID:20203288
White, H; Racine, J
2001-01-01
We propose tests for individual and joint irrelevance of network inputs. Such tests can be used to determine whether an input or group of inputs "belong" in a particular model, thus permitting valid statistical inference based on estimated feedforward neural-network models. The approaches employ well-known statistical resampling techniques. We conduct a small Monte Carlo experiment showing that our tests have reasonable level and power behavior, and we apply our methods to examine whether there are predictable regularities in foreign exchange rates. We find that exchange rates do appear to contain information that is exploitable for enhanced point prediction, but the nature of the predictive relations evolves through time.
ERIC Educational Resources Information Center
Harder, Valerie S.; Stuart, Elizabeth A.; Anthony, James C.
2010-01-01
There is considerable interest in using propensity score (PS) statistical techniques to address questions of causal inference in psychological research. Many PS techniques exist, yet few guidelines are available to aid applied researchers in their understanding, use, and evaluation. In this study, the authors give an overview of available…
Exploring Tree Age & Diameter to Illustrate Sample Design & Inference in Observational Ecology
ERIC Educational Resources Information Center
Casady, Grant M.
2015-01-01
Undergraduate biology labs often explore the techniques of data collection but neglect the statistical framework necessary to express findings. Students can be confused about how to use their statistical knowledge to address specific biological questions. Growth in the area of observational ecology requires that students gain experience in…
Gene-network inference by message passing
NASA Astrophysics Data System (ADS)
Braunstein, A.; Pagnani, A.; Weigt, M.; Zecchina, R.
2008-01-01
The inference of gene-regulatory processes from gene-expression data belongs to the major challenges of computational systems biology. Here we address the problem from a statistical-physics perspective and develop a message-passing algorithm which is able to infer sparse, directed and combinatorial regulatory mechanisms. Using the replica technique, the algorithmic performance can be characterized analytically for artificially generated data. The algorithm is applied to genome-wide expression data of baker's yeast under various environmental conditions. We find clear cases of combinatorial control, and enrichment in common functional annotations of regulated genes and their regulators.
Statistical Inference at Work: Statistical Process Control as an Example
ERIC Educational Resources Information Center
Bakker, Arthur; Kent, Phillip; Derry, Jan; Noss, Richard; Hoyles, Celia
2008-01-01
To characterise statistical inference in the workplace this paper compares a prototypical type of statistical inference at work, statistical process control (SPC), with a type of statistical inference that is better known in educational settings, hypothesis testing. Although there are some similarities between the reasoning structure involved in…
Space-Time Data fusion for Remote Sensing Applications
NASA Technical Reports Server (NTRS)
Braverman, Amy; Nguyen, H.; Cressie, N.
2011-01-01
NASA has been collecting massive amounts of remote sensing data about Earth's systems for more than a decade. Missions are selected to be complementary in quantities measured, retrieval techniques, and sampling characteristics, so these datasets are highly synergistic. To fully exploit this, a rigorous methodology for combining data with heterogeneous sampling characteristics is required. For scientific purposes, the methodology must also provide quantitative measures of uncertainty that propagate input-data uncertainty appropriately. We view this as a statistical inference problem. The true but notdirectly- observed quantities form a vector-valued field continuous in space and time. Our goal is to infer those true values or some function of them, and provide to uncertainty quantification for those inferences. We use a spatiotemporal statistical model that relates the unobserved quantities of interest at point-level to the spatially aggregated, observed data. We describe and illustrate our method using CO2 data from two NASA data sets.
Weigh-in-Motion Sensor and Controller Operation and Performance Comparison
DOT National Transportation Integrated Search
2018-01-01
This research project utilized statistical inference and comparison techniques to compare the performance of different Weigh-in-Motion (WIM) sensors. First, we analyzed test-vehicle data to perform an accuracy check of the results reported by the sen...
Inferring epidemiological parameters from phylogenies using regression-ABC: A comparative study
Gascuel, Olivier
2017-01-01
Inferring epidemiological parameters such as the R0 from time-scaled phylogenies is a timely challenge. Most current approaches rely on likelihood functions, which raise specific issues that range from computing these functions to finding their maxima numerically. Here, we present a new regression-based Approximate Bayesian Computation (ABC) approach, which we base on a large variety of summary statistics intended to capture the information contained in the phylogeny and its corresponding lineage-through-time plot. The regression step involves the Least Absolute Shrinkage and Selection Operator (LASSO) method, which is a robust machine learning technique. It allows us to readily deal with the large number of summary statistics, while avoiding resorting to Markov Chain Monte Carlo (MCMC) techniques. To compare our approach to existing ones, we simulated target trees under a variety of epidemiological models and settings, and inferred parameters of interest using the same priors. We found that, for large phylogenies, the accuracy of our regression-ABC is comparable to that of likelihood-based approaches involving birth-death processes implemented in BEAST2. Our approach even outperformed these when inferring the host population size with a Susceptible-Infected-Removed epidemiological model. It also clearly outperformed a recent kernel-ABC approach when assuming a Susceptible-Infected epidemiological model with two host types. Lastly, by re-analyzing data from the early stages of the recent Ebola epidemic in Sierra Leone, we showed that regression-ABC provides more realistic estimates for the duration parameters (latency and infectiousness) than the likelihood-based method. Overall, ABC based on a large variety of summary statistics and a regression method able to perform variable selection and avoid overfitting is a promising approach to analyze large phylogenies. PMID:28263987
Investigation of Statistical Inference Methodologies Through Scale Model Propagation Experiments
2015-09-30
statistical inference methodologies for ocean- acoustic problems by investigating and applying statistical methods to data collected from scale-model...to begin planning experiments for statistical inference applications. APPROACH In the ocean acoustics community over the past two decades...solutions for waveguide parameters. With the introduction of statistical inference to the field of ocean acoustics came the desire to interpret marginal
duVerle, David A; Yotsukura, Sohiya; Nomura, Seitaro; Aburatani, Hiroyuki; Tsuda, Koji
2016-09-13
Single-cell RNA sequencing is fast becoming one the standard method for gene expression measurement, providing unique insights into cellular processes. A number of methods, based on general dimensionality reduction techniques, have been suggested to help infer and visualise the underlying structure of cell populations from single-cell expression levels, yet their models generally lack proper biological grounding and struggle at identifying complex differentiation paths. Here we introduce cellTree: an R/Bioconductor package that uses a novel statistical approach, based on document analysis techniques, to produce tree structures outlining the hierarchical relationship between single-cell samples, while identifying latent groups of genes that can provide biological insights. With cellTree, we provide experimentalists with an easy-to-use tool, based on statistically and biologically-sound algorithms, to efficiently explore and visualise single-cell RNA data. The cellTree package is publicly available in the online Bionconductor repository at: http://bioconductor.org/packages/cellTree/ .
Statistical Inference of a RANS closure for a Jet-in-Crossflow simulation
NASA Astrophysics Data System (ADS)
Heyse, Jan; Edeling, Wouter; Iaccarino, Gianluca
2016-11-01
The jet-in-crossflow is found in several engineering applications, such as discrete film cooling for turbine blades, where a coolant injected through hols in the blade's surface protects the component from the hot gases leaving the combustion chamber. Experimental measurements using MRI techniques have been completed for a single hole injection into a turbulent crossflow, providing full 3D averaged velocity field. For such flows of engineering interest, Reynolds-Averaged Navier-Stokes (RANS) turbulence closure models are often the only viable computational option. However, RANS models are known to provide poor predictions in the region close to the injection point. Since these models are calibrated on simple canonical flow problems, the obtained closure coefficient estimates are unlikely to extrapolate well to more complex flows. We will therefore calibrate the parameters of a RANS model using statistical inference techniques informed by the experimental jet-in-crossflow data. The obtained probabilistic parameter estimates can in turn be used to compute flow fields with quantified uncertainty. Stanford Graduate Fellowship in Science and Engineering.
Basis function models for animal movement
Hooten, Mevin B.; Johnson, Devin S.
2017-01-01
Advances in satellite-based data collection techniques have served as a catalyst for new statistical methodology to analyze these data. In wildlife ecological studies, satellite-based data and methodology have provided a wealth of information about animal space use and the investigation of individual-based animal–environment relationships. With the technology for data collection improving dramatically over time, we are left with massive archives of historical animal telemetry data of varying quality. While many contemporary statistical approaches for inferring movement behavior are specified in discrete time, we develop a flexible continuous-time stochastic integral equation framework that is amenable to reduced-rank second-order covariance parameterizations. We demonstrate how the associated first-order basis functions can be constructed to mimic behavioral characteristics in realistic trajectory processes using telemetry data from mule deer and mountain lion individuals in western North America. Our approach is parallelizable and provides inference for heterogenous trajectories using nonstationary spatial modeling techniques that are feasible for large telemetry datasets. Supplementary materials for this article are available online.
Strelioff, Christopher C; Crutchfield, James P; Hübler, Alfred W
2007-07-01
Markov chains are a natural and well understood tool for describing one-dimensional patterns in time or space. We show how to infer kth order Markov chains, for arbitrary k , from finite data by applying Bayesian methods to both parameter estimation and model-order selection. Extending existing results for multinomial models of discrete data, we connect inference to statistical mechanics through information-theoretic (type theory) techniques. We establish a direct relationship between Bayesian evidence and the partition function which allows for straightforward calculation of the expectation and variance of the conditional relative entropy and the source entropy rate. Finally, we introduce a method that uses finite data-size scaling with model-order comparison to infer the structure of out-of-class processes.
Schmidt, Paul; Schmid, Volker J; Gaser, Christian; Buck, Dorothea; Bührlen, Susanne; Förschler, Annette; Mühlau, Mark
2013-01-01
Aiming at iron-related T2-hypointensity, which is related to normal aging and neurodegenerative processes, we here present two practicable approaches, based on Bayesian inference, for preprocessing and statistical analysis of a complex set of structural MRI data. In particular, Markov Chain Monte Carlo methods were used to simulate posterior distributions. First, we rendered a segmentation algorithm that uses outlier detection based on model checking techniques within a Bayesian mixture model. Second, we rendered an analytical tool comprising a Bayesian regression model with smoothness priors (in the form of Gaussian Markov random fields) mitigating the necessity to smooth data prior to statistical analysis. For validation, we used simulated data and MRI data of 27 healthy controls (age: [Formula: see text]; range, [Formula: see text]). We first observed robust segmentation of both simulated T2-hypointensities and gray-matter regions known to be T2-hypointense. Second, simulated data and images of segmented T2-hypointensity were analyzed. We found not only robust identification of simulated effects but also a biologically plausible age-related increase of T2-hypointensity primarily within the dentate nucleus but also within the globus pallidus, substantia nigra, and red nucleus. Our results indicate that fully Bayesian inference can successfully be applied for preprocessing and statistical analysis of structural MRI data.
NASA Astrophysics Data System (ADS)
Stan Development Team
2018-01-01
Stan facilitates statistical inference at the frontiers of applied statistics and provides both a modeling language for specifying complex statistical models and a library of statistical algorithms for computing inferences with those models. These components are exposed through interfaces in environments such as R, Python, and the command line.
ERIC Educational Resources Information Center
Jacob, Bridgette L.
2013-01-01
The difficulties introductory statistics students have with formal statistical inference are well known in the field of statistics education. "Informal" statistical inference has been studied as a means to introduce inferential reasoning well before and without the formalities of formal statistical inference. This mixed methods study…
ERIC Educational Resources Information Center
Braham, Hana Manor; Ben-Zvi, Dani
2017-01-01
A fundamental aspect of statistical inference is representation of real-world data using statistical models. This article analyzes students' articulations of statistical models and modeling during their first steps in making informal statistical inferences. An integrated modeling approach (IMA) was designed and implemented to help students…
Notes on power of normality tests of error terms in regression models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Střelec, Luboš
2015-03-10
Normality is one of the basic assumptions in applying statistical procedures. For example in linear regression most of the inferential procedures are based on the assumption of normality, i.e. the disturbance vector is assumed to be normally distributed. Failure to assess non-normality of the error terms may lead to incorrect results of usual statistical inference techniques such as t-test or F-test. Thus, error terms should be normally distributed in order to allow us to make exact inferences. As a consequence, normally distributed stochastic errors are necessary in order to make a not misleading inferences which explains a necessity and importancemore » of robust tests of normality. Therefore, the aim of this contribution is to discuss normality testing of error terms in regression models. In this contribution, we introduce the general RT class of robust tests for normality, and present and discuss the trade-off between power and robustness of selected classical and robust normality tests of error terms in regression models.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morton, April M; Piburn, Jesse O; McManamay, Ryan A
2017-01-01
Monte Carlo simulation is a popular numerical experimentation technique used in a range of scientific fields to obtain the statistics of unknown random output variables. Despite its widespread applicability, it can be difficult to infer required input probability distributions when they are related to population counts unknown at desired spatial resolutions. To overcome this challenge, we propose a framework that uses a dasymetric model to infer the probability distributions needed for a specific class of Monte Carlo simulations which depend on population counts.
The Reasoning behind Informal Statistical Inference
ERIC Educational Resources Information Center
Makar, Katie; Bakker, Arthur; Ben-Zvi, Dani
2011-01-01
Informal statistical inference (ISI) has been a frequent focus of recent research in statistics education. Considering the role that context plays in developing ISI calls into question the need to be more explicit about the reasoning that underpins ISI. This paper uses educational literature on informal statistical inference and philosophical…
Targeted numerical simulations of binary black holes for GW170104
NASA Astrophysics Data System (ADS)
Healy, J.; Lange, J.; O'Shaughnessy, R.; Lousto, C. O.; Campanelli, M.; Williamson, A. R.; Zlochower, Y.; Calderón Bustillo, J.; Clark, J. A.; Evans, C.; Ferguson, D.; Ghonge, S.; Jani, K.; Khamesra, B.; Laguna, P.; Shoemaker, D. M.; Boyle, M.; García, A.; Hemberger, D. A.; Kidder, L. E.; Kumar, P.; Lovelace, G.; Pfeiffer, H. P.; Scheel, M. A.; Teukolsky, S. A.
2018-03-01
In response to LIGO's observation of GW170104, we performed a series of full numerical simulations of binary black holes, each designed to replicate likely realizations of its dynamics and radiation. These simulations have been performed at multiple resolutions and with two independent techniques to solve Einstein's equations. For the nonprecessing and precessing simulations, we demonstrate the two techniques agree mode by mode, at a precision substantially in excess of statistical uncertainties in current LIGO's observations. Conversely, we demonstrate our full numerical solutions contain information which is not accurately captured with the approximate phenomenological models commonly used to infer compact binary parameters. To quantify the impact of these differences on parameter inference for GW170104 specifically, we compare the predictions of our simulations and these approximate models to LIGO's observations of GW170104.
Learning Quantitative Sequence-Function Relationships from Massively Parallel Experiments
NASA Astrophysics Data System (ADS)
Atwal, Gurinder S.; Kinney, Justin B.
2016-03-01
A fundamental aspect of biological information processing is the ubiquity of sequence-function relationships—functions that map the sequence of DNA, RNA, or protein to a biochemically relevant activity. Most sequence-function relationships in biology are quantitative, but only recently have experimental techniques for effectively measuring these relationships been developed. The advent of such "massively parallel" experiments presents an exciting opportunity for the concepts and methods of statistical physics to inform the study of biological systems. After reviewing these recent experimental advances, we focus on the problem of how to infer parametric models of sequence-function relationships from the data produced by these experiments. Specifically, we retrace and extend recent theoretical work showing that inference based on mutual information, not the standard likelihood-based approach, is often necessary for accurately learning the parameters of these models. Closely connected with this result is the emergence of "diffeomorphic modes"—directions in parameter space that are far less constrained by data than likelihood-based inference would suggest. Analogous to Goldstone modes in physics, diffeomorphic modes arise from an arbitrarily broken symmetry of the inference problem. An analytically tractable model of a massively parallel experiment is then described, providing an explicit demonstration of these fundamental aspects of statistical inference. This paper concludes with an outlook on the theoretical and computational challenges currently facing studies of quantitative sequence-function relationships.
Kernel canonical-correlation Granger causality for multiple time series
NASA Astrophysics Data System (ADS)
Wu, Guorong; Duan, Xujun; Liao, Wei; Gao, Qing; Chen, Huafu
2011-04-01
Canonical-correlation analysis as a multivariate statistical technique has been applied to multivariate Granger causality analysis to infer information flow in complex systems. It shows unique appeal and great superiority over the traditional vector autoregressive method, due to the simplified procedure that detects causal interaction between multiple time series, and the avoidance of potential model estimation problems. However, it is limited to the linear case. Here, we extend the framework of canonical correlation to include the estimation of multivariate nonlinear Granger causality for drawing inference about directed interaction. Its feasibility and effectiveness are verified on simulated data.
From inverse problems to learning: a Statistical Mechanics approach
NASA Astrophysics Data System (ADS)
Baldassi, Carlo; Gerace, Federica; Saglietti, Luca; Zecchina, Riccardo
2018-01-01
We present a brief introduction to the statistical mechanics approaches for the study of inverse problems in data science. We then provide concrete new results on inferring couplings from sampled configurations in systems characterized by an extensive number of stable attractors in the low temperature regime. We also show how these result are connected to the problem of learning with realistic weak signals in computational neuroscience. Our techniques and algorithms rely on advanced mean-field methods developed in the context of disordered systems.
Statistical inference of protein structural alignments using information and compression.
Collier, James H; Allison, Lloyd; Lesk, Arthur M; Stuckey, Peter J; Garcia de la Banda, Maria; Konagurthu, Arun S
2017-04-01
Structural molecular biology depends crucially on computational techniques that compare protein three-dimensional structures and generate structural alignments (the assignment of one-to-one correspondences between subsets of amino acids based on atomic coordinates). Despite its importance, the structural alignment problem has not been formulated, much less solved, in a consistent and reliable way. To overcome these difficulties, we present here a statistical framework for the precise inference of structural alignments, built on the Bayesian and information-theoretic principle of Minimum Message Length (MML). The quality of any alignment is measured by its explanatory power-the amount of lossless compression achieved to explain the protein coordinates using that alignment. We have implemented this approach in MMLigner , the first program able to infer statistically significant structural alignments. We also demonstrate the reliability of MMLigner 's alignment results when compared with the state of the art. Importantly, MMLigner can also discover different structural alignments of comparable quality, a challenging problem for oligomers and protein complexes. Source code, binaries and an interactive web version are available at http://lcb.infotech.monash.edu.au/mmligner . arun.konagurthu@monash.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Measuring the Number of M Dwarfs per M Dwarf Using Kepler Eclipsing Binaries
NASA Astrophysics Data System (ADS)
Shan, Yutong; Johnson, John A.; Morton, Timothy D.
2015-11-01
We measure the binarity of detached M dwarfs in the Kepler field with orbital periods in the range of 1-90 days. Kepler’s photometric precision and nearly continuous monitoring of stellar targets over time baselines ranging from 3 months to 4 years make its detection efficiency for eclipsing binaries nearly complete over this period range and for all radius ratios. Our investigation employs a statistical framework akin to that used for inferring planetary occurrence rates from planetary transits. The obvious simplification is that eclipsing binaries have a vastly improved detection efficiency that is limited chiefly by their geometric probabilities to eclipse. For the M-dwarf sample observed by the Kepler Mission, the fractional incidence of eclipsing binaries implies that there are {0.11}-0.04+0.02 close stellar companions per apparently single M dwarf. Our measured binarity is higher than previous inferences of the occurrence rate of close binaries via radial velocity techniques, at roughly the 2σ level. This study represents the first use of eclipsing binary detections from a high quality transiting planet mission to infer binary statistics. Application of this statistical framework to the eclipsing binaries discovered by future transit surveys will establish better constraints on short-period M+M binary rate, as well as binarity measurements for stars of other spectral types.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, J.; Elmore, R.; Kennedy, C.
This research is to illustrate the use of statistical inference techniques in order to quantify the uncertainty surrounding reliability estimates in a step-stress accelerated degradation testing (SSADT) scenario. SSADT can be used when a researcher is faced with a resource-constrained environment, e.g., limits on chamber time or on the number of units to test. We apply the SSADT methodology to a degradation experiment involving concentrated solar power (CSP) mirrors and compare the results to a more traditional multiple accelerated testing paradigm. Specifically, our work includes: (1) designing a durability testing plan for solar mirrors (3M's new improved silvered acrylic "Solarmore » Reflector Film (SFM) 1100") through the ultra-accelerated weathering system (UAWS), (2) defining degradation paths of optical performance based on the SSADT model which is accelerated by high UV-radiant exposure, and (3) developing service lifetime prediction models for solar mirrors using advanced statistical inference. We use the method of least squares to estimate the model parameters and this serves as the basis for the statistical inference in SSADT. Several quantities of interest can be estimated from this procedure, e.g., mean-time-to-failure (MTTF) and warranty time. The methods allow for the estimation of quantities that may be of interest to the domain scientists.« less
Inferring Characteristics of Sensorimotor Behavior by Quantifying Dynamics of Animal Locomotion
NASA Astrophysics Data System (ADS)
Leung, KaWai
Locomotion is one of the most well-studied topics in animal behavioral studies. Many fundamental and clinical research make use of the locomotion of an animal model to explore various aspects in sensorimotor behavior. In the past, most of these studies focused on population average of a specific trait due to limitation of data collection and processing power. With recent advance in computer vision and statistical modeling techniques, it is now possible to track and analyze large amounts of behavioral data. In this thesis, I present two projects that aim to infer the characteristics of sensorimotor behavior by quantifying the dynamics of locomotion of nematode Caenorhabditis elegans and fruit fly Drosophila melanogaster, shedding light on statistical dependence between sensing and behavior. In the first project, I investigate the possibility of inferring noxious sensory information from the behavior of Caenorhabditis elegans. I develop a statistical model to infer the heat stimulus level perceived by individual animals from their stereotyped escape responses after stimulation by an IR laser. The model allows quantification of analgesic-like effects of chemical agents or genetic mutations in the worm. At the same time, the method is able to differentiate perturbations of locomotion behavior that are beyond affecting the sensory system. With this model I propose experimental designs that allows statistically significant identification of analgesic-like effects. In the second project, I investigate the relationship of energy budget and stability of locomotion in determining the walking speed distribution of Drosophila melanogaster during aging. The locomotion stability at different age groups is estimated from video recordings using Floquet theory. I calculate the power consumption of different locomotion speed using a biomechanics model. In conclusion, the power consumption, not stability, predicts the locomotion speed distribution at different ages.
Cocco, S; Monasson, R; Sessak, V
2011-05-01
We consider the problem of inferring the interactions between a set of N binary variables from the knowledge of their frequencies and pairwise correlations. The inference framework is based on the Hopfield model, a special case of the Ising model where the interaction matrix is defined through a set of patterns in the variable space, and is of rank much smaller than N. We show that maximum likelihood inference is deeply related to principal component analysis when the amplitude of the pattern components ξ is negligible compared to √N. Using techniques from statistical mechanics, we calculate the corrections to the patterns to the first order in ξ/√N. We stress the need to generalize the Hopfield model and include both attractive and repulsive patterns in order to correctly infer networks with sparse and strong interactions. We present a simple geometrical criterion to decide how many attractive and repulsive patterns should be considered as a function of the sampling noise. We moreover discuss how many sampled configurations are required for a good inference, as a function of the system size N and of the amplitude ξ. The inference approach is illustrated on synthetic and biological data.
A simulation-based evaluation of methods for inferring linear barriers to gene flow
Christopher Blair; Dana E. Weigel; Matthew Balazik; Annika T. H. Keeley; Faith M. Walker; Erin Landguth; Sam Cushman; Melanie Murphy; Lisette Waits; Niko Balkenhol
2012-01-01
Different analytical techniques used on the same data set may lead to different conclusions about the existence and strength of genetic structure. Therefore, reliable interpretation of the results from different methods depends on the efficacy and reliability of different statistical methods. In this paper, we evaluated the performance of multiple analytical methods to...
Walker, Martin; Basáñez, María-Gloria; Ouédraogo, André Lin; Hermsen, Cornelus; Bousema, Teun; Churcher, Thomas S
2015-01-16
Quantitative molecular methods (QMMs) such as quantitative real-time polymerase chain reaction (q-PCR), reverse-transcriptase PCR (qRT-PCR) and quantitative nucleic acid sequence-based amplification (QT-NASBA) are increasingly used to estimate pathogen density in a variety of clinical and epidemiological contexts. These methods are often classified as semi-quantitative, yet estimates of reliability or sensitivity are seldom reported. Here, a statistical framework is developed for assessing the reliability (uncertainty) of pathogen densities estimated using QMMs and the associated diagnostic sensitivity. The method is illustrated with quantification of Plasmodium falciparum gametocytaemia by QT-NASBA. The reliability of pathogen (e.g. gametocyte) densities, and the accompanying diagnostic sensitivity, estimated by two contrasting statistical calibration techniques, are compared; a traditional method and a mixed model Bayesian approach. The latter accounts for statistical dependence of QMM assays run under identical laboratory protocols and permits structural modelling of experimental measurements, allowing precision to vary with pathogen density. Traditional calibration cannot account for inter-assay variability arising from imperfect QMMs and generates estimates of pathogen density that have poor reliability, are variable among assays and inaccurately reflect diagnostic sensitivity. The Bayesian mixed model approach assimilates information from replica QMM assays, improving reliability and inter-assay homogeneity, providing an accurate appraisal of quantitative and diagnostic performance. Bayesian mixed model statistical calibration supersedes traditional techniques in the context of QMM-derived estimates of pathogen density, offering the potential to improve substantially the depth and quality of clinical and epidemiological inference for a wide variety of pathogens.
From Weakly Chaotic Dynamics to Deterministic Subdiffusion via Copula Modeling
NASA Astrophysics Data System (ADS)
Nazé, Pierre
2018-03-01
Copula modeling consists in finding a probabilistic distribution, called copula, whereby its coupling with the marginal distributions of a set of random variables produces their joint distribution. The present work aims to use this technique to connect the statistical distributions of weakly chaotic dynamics and deterministic subdiffusion. More precisely, we decompose the jumps distribution of Geisel-Thomae map into a bivariate one and determine the marginal and copula distributions respectively by infinite ergodic theory and statistical inference techniques. We verify therefore that the characteristic tail distribution of subdiffusion is an extreme value copula coupling Mittag-Leffler distributions. We also present a method to calculate the exact copula and joint distributions in the case where weakly chaotic dynamics and deterministic subdiffusion statistical distributions are already known. Numerical simulations and consistency with the dynamical aspects of the map support our results.
Inferring explicit weighted consensus networks to represent alternative evolutionary histories
2013-01-01
Background The advent of molecular biology techniques and constant increase in availability of genetic material have triggered the development of many phylogenetic tree inference methods. However, several reticulate evolution processes, such as horizontal gene transfer and hybridization, have been shown to blur the species evolutionary history by causing discordance among phylogenies inferred from different genes. Methods To tackle this problem, we hereby describe a new method for inferring and representing alternative (reticulate) evolutionary histories of species as an explicit weighted consensus network which can be constructed from a collection of gene trees with or without prior knowledge of the species phylogeny. Results We provide a way of building a weighted phylogenetic network for each of the following reticulation mechanisms: diploid hybridization, intragenic recombination and complete or partial horizontal gene transfer. We successfully tested our method on some synthetic and real datasets to infer the above-mentioned evolutionary events which may have influenced the evolution of many species. Conclusions Our weighted consensus network inference method allows one to infer, visualize and validate statistically major conflicting signals induced by the mechanisms of reticulate evolution. The results provided by the new method can be used to represent the inferred conflicting signals by means of explicit and easy-to-interpret phylogenetic networks. PMID:24359207
Design-based Sample and Probability Law-Assumed Sample: Their Role in Scientific Investigation.
ERIC Educational Resources Information Center
Ojeda, Mario Miguel; Sahai, Hardeo
2002-01-01
Discusses some key statistical concepts in probabilistic and non-probabilistic sampling to provide an overview for understanding the inference process. Suggests a statistical model constituting the basis of statistical inference and provides a brief review of the finite population descriptive inference and a quota sampling inferential theory.…
The Importance of Statistical Modeling in Data Analysis and Inference
ERIC Educational Resources Information Center
Rollins, Derrick, Sr.
2017-01-01
Statistical inference simply means to draw a conclusion based on information that comes from data. Error bars are the most commonly used tool for data analysis and inference in chemical engineering data studies. This work demonstrates, using common types of data collection studies, the importance of specifying the statistical model for sound…
NASA Astrophysics Data System (ADS)
Schmidt, S.; Heyns, P. S.; de Villiers, J. P.
2018-02-01
In this paper, a fault diagnostic methodology is developed which is able to detect, locate and trend gear faults under fluctuating operating conditions when only vibration data from a single transducer, measured on a healthy gearbox are available. A two-phase feature extraction and modelling process is proposed to infer the operating condition and based on the operating condition, to detect changes in the machine condition. Information from optimised machine and operating condition hidden Markov models are statistically combined to generate a discrepancy signal which is post-processed to infer the condition of the gearbox. The discrepancy signal is processed and combined with statistical methods for automatic fault detection and localisation and to perform fault trending over time. The proposed methodology is validated on experimental data and a tacholess order tracking methodology is used to enhance the cost-effectiveness of the diagnostic methodology.
Inferring Demographic History Using Two-Locus Statistics.
Ragsdale, Aaron P; Gutenkunst, Ryan N
2017-06-01
Population demographic history may be learned from contemporary genetic variation data. Methods based on aggregating the statistics of many single loci into an allele frequency spectrum (AFS) have proven powerful, but such methods ignore potentially informative patterns of linkage disequilibrium (LD) between neighboring loci. To leverage such patterns, we developed a composite-likelihood framework for inferring demographic history from aggregated statistics of pairs of loci. Using this framework, we show that two-locus statistics are more sensitive to demographic history than single-locus statistics such as the AFS. In particular, two-locus statistics escape the notorious confounding of depth and duration of a bottleneck, and they provide a means to estimate effective population size based on the recombination rather than mutation rate. We applied our approach to a Zambian population of Drosophila melanogaster Notably, using both single- and two-locus statistics, we inferred a substantially lower ancestral effective population size than previous works and did not infer a bottleneck history. Together, our results demonstrate the broad potential for two-locus statistics to enable powerful population genetic inference. Copyright © 2017 by the Genetics Society of America.
Correlation techniques and measurements of wave-height statistics
NASA Technical Reports Server (NTRS)
Guthart, H.; Taylor, W. C.; Graf, K. A.; Douglas, D. G.
1972-01-01
Statistical measurements of wave height fluctuations have been made in a wind wave tank. The power spectral density function of temporal wave height fluctuations evidenced second-harmonic components and an f to the minus 5th power law decay beyond the second harmonic. The observations of second harmonic effects agreed very well with a theoretical prediction. From the wave statistics, surface drift currents were inferred and compared to experimental measurements with satisfactory agreement. Measurements were made of the two dimensional correlation coefficient at 15 deg increments in angle with respect to the wind vector. An estimate of the two-dimensional spatial power spectral density function was also made.
Sojoudi, Alireza; Goodyear, Bradley G
2016-12-01
Spontaneous fluctuations of blood-oxygenation level-dependent functional magnetic resonance imaging (BOLD fMRI) signals are highly synchronous between brain regions that serve similar functions. This provides a means to investigate functional networks; however, most analysis techniques assume functional connections are constant over time. This may be problematic in the case of neurological disease, where functional connections may be highly variable. Recently, several methods have been proposed to determine moment-to-moment changes in the strength of functional connections over an imaging session (so called dynamic connectivity). Here a novel analysis framework based on a hierarchical observation modeling approach was proposed, to permit statistical inference of the presence of dynamic connectivity. A two-level linear model composed of overlapping sliding windows of fMRI signals, incorporating the fact that overlapping windows are not independent was described. To test this approach, datasets were synthesized whereby functional connectivity was either constant (significant or insignificant) or modulated by an external input. The method successfully determines the statistical significance of a functional connection in phase with the modulation, and it exhibits greater sensitivity and specificity in detecting regions with variable connectivity, when compared with sliding-window correlation analysis. For real data, this technique possesses greater reproducibility and provides a more discriminative estimate of dynamic connectivity than sliding-window correlation analysis. Hum Brain Mapp 37:4566-4580, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
ERIC Educational Resources Information Center
Berenson, Mark L.
2013-01-01
There is consensus in the statistical literature that severe departures from its assumptions invalidate the use of regression modeling for purposes of inference. The assumptions of regression modeling are usually evaluated subjectively through visual, graphic displays in a residual analysis but such an approach, taken alone, may be insufficient…
Robust biological parametric mapping: an improved technique for multimodal brain image analysis
NASA Astrophysics Data System (ADS)
Yang, Xue; Beason-Held, Lori; Resnick, Susan M.; Landman, Bennett A.
2011-03-01
Mapping the quantitative relationship between structure and function in the human brain is an important and challenging problem. Numerous volumetric, surface, region of interest and voxelwise image processing techniques have been developed to statistically assess potential correlations between imaging and non-imaging metrics. Recently, biological parametric mapping has extended the widely popular statistical parametric approach to enable application of the general linear model to multiple image modalities (both for regressors and regressands) along with scalar valued observations. This approach offers great promise for direct, voxelwise assessment of structural and functional relationships with multiple imaging modalities. However, as presented, the biological parametric mapping approach is not robust to outliers and may lead to invalid inferences (e.g., artifactual low p-values) due to slight mis-registration or variation in anatomy between subjects. To enable widespread application of this approach, we introduce robust regression and robust inference in the neuroimaging context of application of the general linear model. Through simulation and empirical studies, we demonstrate that our robust approach reduces sensitivity to outliers without substantial degradation in power. The robust approach and associated software package provides a reliable way to quantitatively assess voxelwise correlations between structural and functional neuroimaging modalities.
Teaching Statistical Inference for Causal Effects in Experiments and Observational Studies
ERIC Educational Resources Information Center
Rubin, Donald B.
2004-01-01
Inference for causal effects is a critical activity in many branches of science and public policy. The field of statistics is the one field most suited to address such problems, whether from designed experiments or observational studies. Consequently, it is arguably essential that departments of statistics teach courses in causal inference to both…
Redshift data and statistical inference
NASA Technical Reports Server (NTRS)
Newman, William I.; Haynes, Martha P.; Terzian, Yervant
1994-01-01
Frequency histograms and the 'power spectrum analysis' (PSA) method, the latter developed by Yu & Peebles (1969), have been widely employed as techniques for establishing the existence of periodicities. We provide a formal analysis of these two classes of methods, including controlled numerical experiments, to better understand their proper use and application. In particular, we note that typical published applications of frequency histograms commonly employ far greater numbers of class intervals or bins than is advisable by statistical theory sometimes giving rise to the appearance of spurious patterns. The PSA method generates a sequence of random numbers from observational data which, it is claimed, is exponentially distributed with unit mean and variance, essentially independent of the distribution of the original data. We show that the derived random processes is nonstationary and produces a small but systematic bias in the usual estimate of the mean and variance. Although the derived variable may be reasonably described by an exponential distribution, the tail of the distribution is far removed from that of an exponential, thereby rendering statistical inference and confidence testing based on the tail of the distribution completely unreliable. Finally, we examine a number of astronomical examples wherein these methods have been used giving rise to widespread acceptance of statistically unconfirmed conclusions.
Reasoning about Informal Statistical Inference: One Statistician's View
ERIC Educational Resources Information Center
Rossman, Allan J.
2008-01-01
This paper identifies key concepts and issues associated with the reasoning of informal statistical inference. I focus on key ideas of inference that I think all students should learn, including at secondary level as well as tertiary. I argue that a fundamental component of inference is to go beyond the data at hand, and I propose that statistical…
Data-driven inference for the spatial scan statistic.
Almeida, Alexandre C L; Duarte, Anderson R; Duczmal, Luiz H; Oliveira, Fernando L P; Takahashi, Ricardo H C
2011-08-02
Kulldorff's spatial scan statistic for aggregated area maps searches for clusters of cases without specifying their size (number of areas) or geographic location in advance. Their statistical significance is tested while adjusting for the multiple testing inherent in such a procedure. However, as is shown in this work, this adjustment is not done in an even manner for all possible cluster sizes. A modification is proposed to the usual inference test of the spatial scan statistic, incorporating additional information about the size of the most likely cluster found. A new interpretation of the results of the spatial scan statistic is done, posing a modified inference question: what is the probability that the null hypothesis is rejected for the original observed cases map with a most likely cluster of size k, taking into account only those most likely clusters of size k found under null hypothesis for comparison? This question is especially important when the p-value computed by the usual inference process is near the alpha significance level, regarding the correctness of the decision based in this inference. A practical procedure is provided to make more accurate inferences about the most likely cluster found by the spatial scan statistic.
NASA Astrophysics Data System (ADS)
Saputra, K. V. I.; Cahyadi, L.; Sembiring, U. A.
2018-01-01
Start in this paper, we assess our traditional elementary statistics education and also we introduce elementary statistics with simulation-based inference. To assess our statistical class, we adapt the well-known CAOS (Comprehensive Assessment of Outcomes in Statistics) test that serves as an external measure to assess the student’s basic statistical literacy. This test generally represents as an accepted measure of statistical literacy. We also introduce a new teaching method on elementary statistics class. Different from the traditional elementary statistics course, we will introduce a simulation-based inference method to conduct hypothesis testing. From the literature, it has shown that this new teaching method works very well in increasing student’s understanding of statistics.
Apes are intuitive statisticians.
Rakoczy, Hannes; Clüver, Annette; Saucke, Liane; Stoffregen, Nicole; Gräbener, Alice; Migura, Judith; Call, Josep
2014-04-01
Inductive learning and reasoning, as we use it both in everyday life and in science, is characterized by flexible inferences based on statistical information: inferences from populations to samples and vice versa. Many forms of such statistical reasoning have been found to develop late in human ontogeny, depending on formal education and language, and to be fragile even in adults. New revolutionary research, however, suggests that even preverbal human infants make use of intuitive statistics. Here, we conducted the first investigation of such intuitive statistical reasoning with non-human primates. In a series of 7 experiments, Bonobos, Chimpanzees, Gorillas and Orangutans drew flexible statistical inferences from populations to samples. These inferences, furthermore, were truly based on statistical information regarding the relative frequency distributions in a population, and not on absolute frequencies. Intuitive statistics in its most basic form is thus an evolutionarily more ancient rather than a uniquely human capacity. Copyright © 2014 Elsevier B.V. All rights reserved.
SIG-VISA: Signal-based Vertically Integrated Seismic Monitoring
NASA Astrophysics Data System (ADS)
Moore, D.; Mayeda, K. M.; Myers, S. C.; Russell, S.
2013-12-01
Traditional seismic monitoring systems rely on discrete detections produced by station processing software; however, while such detections may constitute a useful summary of station activity, they discard large amounts of information present in the original recorded signal. We present SIG-VISA (Signal-based Vertically Integrated Seismic Analysis), a system for seismic monitoring through Bayesian inference on seismic signals. By directly modeling the recorded signal, our approach incorporates additional information unavailable to detection-based methods, enabling higher sensitivity and more accurate localization using techniques such as waveform matching. SIG-VISA's Bayesian forward model of seismic signal envelopes includes physically-derived models of travel times and source characteristics as well as Gaussian process (kriging) statistical models of signal properties that combine interpolation of historical data with extrapolation of learned physical trends. Applying Bayesian inference, we evaluate the model on earthquakes as well as the 2009 DPRK test event, demonstrating a waveform matching effect as part of the probabilistic inference, along with results on event localization and sensitivity. In particular, we demonstrate increased sensitivity from signal-based modeling, in which the SIGVISA signal model finds statistical evidence for arrivals even at stations for which the IMS station processing failed to register any detection.
Cluster mass inference via random field theory.
Zhang, Hui; Nichols, Thomas E; Johnson, Timothy D
2009-01-01
Cluster extent and voxel intensity are two widely used statistics in neuroimaging inference. Cluster extent is sensitive to spatially extended signals while voxel intensity is better for intense but focal signals. In order to leverage strength from both statistics, several nonparametric permutation methods have been proposed to combine the two methods. Simulation studies have shown that of the different cluster permutation methods, the cluster mass statistic is generally the best. However, to date, there is no parametric cluster mass inference available. In this paper, we propose a cluster mass inference method based on random field theory (RFT). We develop this method for Gaussian images, evaluate it on Gaussian and Gaussianized t-statistic images and investigate its statistical properties via simulation studies and real data. Simulation results show that the method is valid under the null hypothesis and demonstrate that it can be more powerful than the cluster extent inference method. Further, analyses with a single subject and a group fMRI dataset demonstrate better power than traditional cluster size inference, and good accuracy relative to a gold-standard permutation test.
A Test by Any Other Name: P Values, Bayes Factors, and Statistical Inference.
Stern, Hal S
2016-01-01
Procedures used for statistical inference are receiving increased scrutiny as the scientific community studies the factors associated with insuring reproducible research. This note addresses recent negative attention directed at p values, the relationship of confidence intervals and tests, and the role of Bayesian inference and Bayes factors, with an eye toward better understanding these different strategies for statistical inference. We argue that researchers and data analysts too often resort to binary decisions (e.g., whether to reject or accept the null hypothesis) in settings where this may not be required.
Using Guided Reinvention to Develop Teachers' Understanding of Hypothesis Testing Concepts
ERIC Educational Resources Information Center
Dolor, Jason; Noll, Jennifer
2015-01-01
Statistics education reform efforts emphasize the importance of informal inference in the learning of statistics. Research suggests statistics teachers experience similar difficulties understanding statistical inference concepts as students and how teacher knowledge can impact student learning. This study investigates how teachers reinvented an…
Christensen, Hilary B.
2014-01-01
Low-magnification microwear techniques have been used effectively to infer diets within many unrelated mammalian orders, but the extent to which patterns are comparable among such different groups, including long extinct mammal lineages, is unknown. Microwear patterns between ecologically equivalent placental and marsupial mammals are found to be statistically indistinguishable, indicating that microwear can be used to infer diet across the mammals. Microwear data were compared to body size and molar shearing crest length in order to develop a system to distinguish the diet of mammals. Insectivores and carnivores were difficult to distinguish from herbivores using microwear alone, but combining microwear data with body size estimates and tooth morphology provides robust dietary inferences. This approach is a powerful tool for dietary assessment of fossils from extinct lineages and from museum specimens of living species where field study would be difficult owing to the animal’s behavior, habitat, or conservation status. PMID:25099537
Karakaya, Jale; Karabulut, Erdem; Yucel, Recai M.
2015-01-01
Modern statistical methods using incomplete data have been increasingly applied in a wide variety of substantive problems. Similarly, receiver operating characteristic (ROC) analysis, a method used in evaluating diagnostic tests or biomarkers in medical research, has also been increasingly popular problem in both its development and application. While missing-data methods have been applied in ROC analysis, the impact of model mis-specification and/or assumptions (e.g. missing at random) underlying the missing data has not been thoroughly studied. In this work, we study the performance of multiple imputation (MI) inference in ROC analysis. Particularly, we investigate parametric and non-parametric techniques for MI inference under common missingness mechanisms. Depending on the coherency of the imputation model with the underlying data generation mechanism, our results show that MI generally leads to well-calibrated inferences under ignorable missingness mechanisms. PMID:26379316
Bayesian Inference on Proportional Elections
Brunello, Gabriel Hideki Vatanabe; Nakano, Eduardo Yoshio
2015-01-01
Polls for majoritarian voting systems usually show estimates of the percentage of votes for each candidate. However, proportional vote systems do not necessarily guarantee the candidate with the most percentage of votes will be elected. Thus, traditional methods used in majoritarian elections cannot be applied on proportional elections. In this context, the purpose of this paper was to perform a Bayesian inference on proportional elections considering the Brazilian system of seats distribution. More specifically, a methodology to answer the probability that a given party will have representation on the chamber of deputies was developed. Inferences were made on a Bayesian scenario using the Monte Carlo simulation technique, and the developed methodology was applied on data from the Brazilian elections for Members of the Legislative Assembly and Federal Chamber of Deputies in 2010. A performance rate was also presented to evaluate the efficiency of the methodology. Calculations and simulations were carried out using the free R statistical software. PMID:25786259
Bayesian inference on proportional elections.
Brunello, Gabriel Hideki Vatanabe; Nakano, Eduardo Yoshio
2015-01-01
Polls for majoritarian voting systems usually show estimates of the percentage of votes for each candidate. However, proportional vote systems do not necessarily guarantee the candidate with the most percentage of votes will be elected. Thus, traditional methods used in majoritarian elections cannot be applied on proportional elections. In this context, the purpose of this paper was to perform a Bayesian inference on proportional elections considering the Brazilian system of seats distribution. More specifically, a methodology to answer the probability that a given party will have representation on the chamber of deputies was developed. Inferences were made on a Bayesian scenario using the Monte Carlo simulation technique, and the developed methodology was applied on data from the Brazilian elections for Members of the Legislative Assembly and Federal Chamber of Deputies in 2010. A performance rate was also presented to evaluate the efficiency of the methodology. Calculations and simulations were carried out using the free R statistical software.
ERIC Educational Resources Information Center
Larwin, Karen H.; Larwin, David A.
2011-01-01
Bootstrapping methods and random distribution methods are increasingly recommended as better approaches for teaching students about statistical inference in introductory-level statistics courses. The authors examined the effect of teaching undergraduate business statistics students using random distribution and bootstrapping simulations. It is the…
Xu, Jason; Guttorp, Peter; Kato-Maeda, Midori; Minin, Vladimir N
2015-12-01
Continuous-time birth-death-shift (BDS) processes are frequently used in stochastic modeling, with many applications in ecology and epidemiology. In particular, such processes can model evolutionary dynamics of transposable elements-important genetic markers in molecular epidemiology. Estimation of the effects of individual covariates on the birth, death, and shift rates of the process can be accomplished by analyzing patient data, but inferring these rates in a discretely and unevenly observed setting presents computational challenges. We propose a multi-type branching process approximation to BDS processes and develop a corresponding expectation maximization algorithm, where we use spectral techniques to reduce calculation of expected sufficient statistics to low-dimensional integration. These techniques yield an efficient and robust optimization routine for inferring the rates of the BDS process, and apply broadly to multi-type branching processes whose rates can depend on many covariates. After rigorously testing our methodology in simulation studies, we apply our method to study intrapatient time evolution of IS6110 transposable element, a genetic marker frequently used during estimation of epidemiological clusters of Mycobacterium tuberculosis infections. © 2015, The International Biometric Society.
METEOR - an artificial intelligence system for convective storm forecasting
DOE Office of Scientific and Technical Information (OSTI.GOV)
Elio, R.; De haan, J.; Strong, G.S.
1987-03-01
An AI system called METEOR, which uses the meteorologist's heuristics, strategies, and statistical tools to forecast severe hailstorms in Alberta, is described, emphasizing the information and knowledge that METEOR uses to mimic the forecasting procedure of an expert meteorologist. METEOR is then discussed as an AI system, emphasizing the ways in which it is qualitatively different from algorithmic or statistical approaches to prediction. Some features of METEOR's design and the AI techniques for representing meteorological knowledge and for reasoning and inference are presented. Finally, some observations on designing and implementing intelligent consultants for meteorological applications are made. 7 references.
Application of Transformations in Parametric Inference
ERIC Educational Resources Information Center
Brownstein, Naomi; Pensky, Marianna
2008-01-01
The objective of the present paper is to provide a simple approach to statistical inference using the method of transformations of variables. We demonstrate performance of this powerful tool on examples of constructions of various estimation procedures, hypothesis testing, Bayes analysis and statistical inference for the stress-strength systems.…
Automatic physical inference with information maximizing neural networks
NASA Astrophysics Data System (ADS)
Charnock, Tom; Lavaux, Guilhem; Wandelt, Benjamin D.
2018-04-01
Compressing large data sets to a manageable number of summaries that are informative about the underlying parameters vastly simplifies both frequentist and Bayesian inference. When only simulations are available, these summaries are typically chosen heuristically, so they may inadvertently miss important information. We introduce a simulation-based machine learning technique that trains artificial neural networks to find nonlinear functionals of data that maximize Fisher information: information maximizing neural networks (IMNNs). In test cases where the posterior can be derived exactly, likelihood-free inference based on automatically derived IMNN summaries produces nearly exact posteriors, showing that these summaries are good approximations to sufficient statistics. In a series of numerical examples of increasing complexity and astrophysical relevance we show that IMNNs are robustly capable of automatically finding optimal, nonlinear summaries of the data even in cases where linear compression fails: inferring the variance of Gaussian signal in the presence of noise, inferring cosmological parameters from mock simulations of the Lyman-α forest in quasar spectra, and inferring frequency-domain parameters from LISA-like detections of gravitational waveforms. In this final case, the IMNN summary outperforms linear data compression by avoiding the introduction of spurious likelihood maxima. We anticipate that the automatic physical inference method described in this paper will be essential to obtain both accurate and precise cosmological parameter estimates from complex and large astronomical data sets, including those from LSST and Euclid.
Statistical inference for tumor growth inhibition T/C ratio.
Wu, Jianrong
2010-09-01
The tumor growth inhibition T/C ratio is commonly used to quantify treatment effects in drug screening tumor xenograft experiments. The T/C ratio is converted to an antitumor activity rating using an arbitrary cutoff point and often without any formal statistical inference. Here, we applied a nonparametric bootstrap method and a small sample likelihood ratio statistic to make a statistical inference of the T/C ratio, including both hypothesis testing and a confidence interval estimate. Furthermore, sample size and power are also discussed for statistical design of tumor xenograft experiments. Tumor xenograft data from an actual experiment were analyzed to illustrate the application.
Investigating Mathematics Teachers' Thoughts of Statistical Inference
ERIC Educational Resources Information Center
Yang, Kai-Lin
2012-01-01
Research on statistical cognition and application suggests that statistical inference concepts are commonly misunderstood by students and even misinterpreted by researchers. Although some research has been done on students' misunderstanding or misconceptions of confidence intervals (CIs), few studies explore either students' or mathematics…
Lessons from Inferentialism for Statistics Education
ERIC Educational Resources Information Center
Bakker, Arthur; Derry, Jan
2011-01-01
This theoretical paper relates recent interest in informal statistical inference (ISI) to the semantic theory termed inferentialism, a significant development in contemporary philosophy, which places inference at the heart of human knowing. This theory assists epistemological reflection on challenges in statistics education encountered when…
Statistical inference and Aristotle's Rhetoric.
Macdonald, Ranald R
2004-11-01
Formal logic operates in a closed system where all the information relevant to any conclusion is present, whereas this is not the case when one reasons about events and states of the world. Pollard and Richardson drew attention to the fact that the reasoning behind statistical tests does not lead to logically justifiable conclusions. In this paper statistical inferences are defended not by logic but by the standards of everyday reasoning. Aristotle invented formal logic, but argued that people mostly get at the truth with the aid of enthymemes--incomplete syllogisms which include arguing from examples, analogies and signs. It is proposed that statistical tests work in the same way--in that they are based on examples, invoke the analogy of a model and use the size of the effect under test as a sign that the chance hypothesis is unlikely. Of existing theories of statistical inference only a weak version of Fisher's takes this into account. Aristotle anticipated Fisher by producing an argument of the form that there were too many cases in which an outcome went in a particular direction for that direction to be plausibly attributed to chance. We can therefore conclude that Aristotle would have approved of statistical inference and there is a good reason for calling this form of statistical inference classical.
Computational approaches to protein inference in shotgun proteomics
2012-01-01
Shotgun proteomics has recently emerged as a powerful approach to characterizing proteomes in biological samples. Its overall objective is to identify the form and quantity of each protein in a high-throughput manner by coupling liquid chromatography with tandem mass spectrometry. As a consequence of its high throughput nature, shotgun proteomics faces challenges with respect to the analysis and interpretation of experimental data. Among such challenges, the identification of proteins present in a sample has been recognized as an important computational task. This task generally consists of (1) assigning experimental tandem mass spectra to peptides derived from a protein database, and (2) mapping assigned peptides to proteins and quantifying the confidence of identified proteins. Protein identification is fundamentally a statistical inference problem with a number of methods proposed to address its challenges. In this review we categorize current approaches into rule-based, combinatorial optimization and probabilistic inference techniques, and present them using integer programing and Bayesian inference frameworks. We also discuss the main challenges of protein identification and propose potential solutions with the goal of spurring innovative research in this area. PMID:23176300
Confidence set inference with a prior quadratic bound
NASA Technical Reports Server (NTRS)
Backus, George E.
1989-01-01
In the uniqueness part of a geophysical inverse problem, the observer wants to predict all likely values of P unknown numerical properties z=(z sub 1,...,z sub p) of the earth from measurement of D other numerical properties y (sup 0) = (y (sub 1) (sup 0), ..., y (sub D (sup 0)), using full or partial knowledge of the statistical distribution of the random errors in y (sup 0). The data space Y containing y(sup 0) is D-dimensional, so when the model space X is infinite-dimensional the linear uniqueness problem usually is insoluble without prior information about the correct earth model x. If that information is a quadratic bound on x, Bayesian inference (BI) and stochastic inversion (SI) inject spurious structure into x, implied by neither the data nor the quadratic bound. Confidence set inference (CSI) provides an alternative inversion technique free of this objection. Confidence set inference is illustrated in the problem of estimating the geomagnetic field B at the core-mantle boundary (CMB) from components of B measured on or above the earth's surface.
Application of AI techniques to infer vegetation characteristics from directional reflectance(s)
NASA Technical Reports Server (NTRS)
Kimes, D. S.; Smith, J. A.; Harrison, P. A.; Harrison, P. R.
1994-01-01
Traditionally, the remote sensing community has relied totally on spectral knowledge to extract vegetation characteristics. However, there are other knowledge bases (KB's) that can be used to significantly improve the accuracy and robustness of inference techniques. Using AI (artificial intelligence) techniques a KB system (VEG) was developed that integrates input spectral measurements with diverse KB's. These KB's consist of data sets of directional reflectance measurements, knowledge from literature, and knowledge from experts which are combined into an intelligent and efficient system for making vegetation inferences. VEG accepts spectral data of an unknown target as input, determines the best techniques for inferring the desired vegetation characteristic(s), applies the techniques to the target data, and provides a rigorous estimate of the accuracy of the inference. VEG was developed to: infer spectral hemispherical reflectance from any combination of nadir and/or off-nadir view angles; infer percent ground cover from any combination of nadir and/or off-nadir view angles; infer unknown view angle(s) from known view angle(s) (known as view angle extension); and discriminate between user defined vegetation classes using spectral and directional reflectance relationships developed from an automated learning algorithm. The errors for these techniques were generally very good ranging between 2 to 15% (proportional root mean square). The system is designed to aid scientists in developing, testing, and applying new inference techniques using directional reflectance data.
Duchesne, Thierry; Fortin, Daniel; Rivest, Louis-Paul
2015-01-01
Animal movement has a fundamental impact on population and community structure and dynamics. Biased correlated random walks (BCRW) and step selection functions (SSF) are commonly used to study movements. Because no studies have contrasted the parameters and the statistical properties of their estimators for models constructed under these two Lagrangian approaches, it remains unclear whether or not they allow for similar inference. First, we used the Weak Law of Large Numbers to demonstrate that the log-likelihood function for estimating the parameters of BCRW models can be approximated by the log-likelihood of SSFs. Second, we illustrated the link between the two approaches by fitting BCRW with maximum likelihood and with SSF to simulated movement data in virtual environments and to the trajectory of bison (Bison bison L.) trails in natural landscapes. Using simulated and empirical data, we found that the parameters of a BCRW estimated directly from maximum likelihood and by fitting an SSF were remarkably similar. Movement analysis is increasingly used as a tool for understanding the influence of landscape properties on animal distribution. In the rapidly developing field of movement ecology, management and conservation biologists must decide which method they should implement to accurately assess the determinants of animal movement. We showed that BCRW and SSF can provide similar insights into the environmental features influencing animal movements. Both techniques have advantages. BCRW has already been extended to allow for multi-state modeling. Unlike BCRW, however, SSF can be estimated using most statistical packages, it can simultaneously evaluate habitat selection and movement biases, and can easily integrate a large number of movement taxes at multiple scales. SSF thus offers a simple, yet effective, statistical technique to identify movement taxis.
CADDIS Volume 4. Data Analysis: Biological and Environmental Data Requirements
Overview of PECBO Module, using scripts to infer environmental conditions from biological observations, statistically estimating species-environment relationships, methods for inferring environmental conditions, statistical scripts in module.
Approximation of epidemic models by diffusion processes and their statistical inference.
Guy, Romain; Larédo, Catherine; Vergu, Elisabeta
2015-02-01
Multidimensional continuous-time Markov jump processes [Formula: see text] on [Formula: see text] form a usual set-up for modeling [Formula: see text]-like epidemics. However, when facing incomplete epidemic data, inference based on [Formula: see text] is not easy to be achieved. Here, we start building a new framework for the estimation of key parameters of epidemic models based on statistics of diffusion processes approximating [Formula: see text]. First, previous results on the approximation of density-dependent [Formula: see text]-like models by diffusion processes with small diffusion coefficient [Formula: see text], where [Formula: see text] is the population size, are generalized to non-autonomous systems. Second, our previous inference results on discretely observed diffusion processes with small diffusion coefficient are extended to time-dependent diffusions. Consistent and asymptotically Gaussian estimates are obtained for a fixed number [Formula: see text] of observations, which corresponds to the epidemic context, and for [Formula: see text]. A correction term, which yields better estimates non asymptotically, is also included. Finally, performances and robustness of our estimators with respect to various parameters such as [Formula: see text] (the basic reproduction number), [Formula: see text], [Formula: see text] are investigated on simulations. Two models, [Formula: see text] and [Formula: see text], corresponding to single and recurrent outbreaks, respectively, are used to simulate data. The findings indicate that our estimators have good asymptotic properties and behave noticeably well for realistic numbers of observations and population sizes. This study lays the foundations of a generic inference method currently under extension to incompletely observed epidemic data. Indeed, contrary to the majority of current inference techniques for partially observed processes, which necessitates computer intensive simulations, our method being mostly an analytical approach requires only the classical optimization steps.
Statistical methods for the beta-binomial model in teratology.
Yamamoto, E; Yanagimoto, T
1994-01-01
The beta-binomial model is widely used for analyzing teratological data involving littermates. Recent developments in statistical analyses of teratological data are briefly reviewed with emphasis on the model. For statistical inference of the parameters in the beta-binomial distribution, separation of the likelihood introduces an likelihood inference. This leads to reducing biases of estimators and also to improving accuracy of empirical significance levels of tests. Separate inference of the parameters can be conducted in a unified way. PMID:8187716
LANDSAT menhaden and thread herring resources investigation. [Gulf of Mexico
NASA Technical Reports Server (NTRS)
Kemmerer, A. J. (Principal Investigator); Brucks, J. T.; Butler, J. A.; Faller, K. H.; Holley, H. J.; Leming, T. D.; Savastano, K. J.; Vanselous, T. M.
1977-01-01
The author has identified the following significant results. The relationship between the distribution of menhaden and selected oceanographic parameters (water color, turbidity, and possibly chlorophyll concentrations) was established. Similar relationships for thread herring were not established nor were relationships relating to the abundance of either species. Use of aircraft and LANDSAT remote sensing instruments to measure or infer a set of basic oceanographic parameters was evaluated. Parameters which could be accurately inferred included surface water temperature, salinity, and color. Water turbidity (Secchi disk) was evaluated as marginally inferrable from the LANDSAT MSS data and chlorophyll-a concentrations as less than marginal. These evaluations considered the parameters only as experienced in the two test areas using available sensors and statistical techniques.
On Some Assumptions of the Null Hypothesis Statistical Testing
ERIC Educational Resources Information Center
Patriota, Alexandre Galvão
2017-01-01
Bayesian and classical statistical approaches are based on different types of logical principles. In order to avoid mistaken inferences and misguided interpretations, the practitioner must respect the inference rules embedded into each statistical method. Ignoring these principles leads to the paradoxical conclusions that the hypothesis…
NASA Technical Reports Server (NTRS)
Hailperin, Max
1993-01-01
This thesis provides design and analysis of techniques for global load balancing on ensemble architectures running soft-real-time object-oriented applications with statistically periodic loads. It focuses on estimating the instantaneous average load over all the processing elements. The major contribution is the use of explicit stochastic process models for both the loading and the averaging itself. These models are exploited via statistical time-series analysis and Bayesian inference to provide improved average load estimates, and thus to facilitate global load balancing. This thesis explains the distributed algorithms used and provides some optimality results. It also describes the algorithms' implementation and gives performance results from simulation. These results show that our techniques allow more accurate estimation of the global system load ing, resulting in fewer object migration than local methods. Our method is shown to provide superior performance, relative not only to static load-balancing schemes but also to many adaptive methods.
NASA Astrophysics Data System (ADS)
Dunstan, D. J.; Hodgson, D. J.
2014-06-01
Many gardeners and horticulturalists seek non-chemical methods to control populations of snails. It has frequently been reported that snails that are marked and removed from a garden are later found in the garden again. This phenomenon is often cited as evidence for a homing instinct. We report a systematic study of the snail population in a small suburban garden, in which large numbers of snails were marked and removed over a period of about 6 months. While many returned, inferring a homing instinct from this evidence requires statistical modelling. Monte Carlo techniques demonstrate that movements of snails are better explained by drift under the influence of a homing instinct than by random diffusion. Maximum likelihood techniques infer the existence of two groups of snails in the garden: members of a larger population that show little affinity to the garden itself, and core members of a local garden population that regularly return to their home if removed. The data are strongly suggestive of a homing instinct, but also reveal that snail-throwing can work as a pest management strategy.
Direct evidence for a dual process model of deductive inference.
Markovits, Henry; Brunet, Marie-Laurence; Thompson, Valerie; Brisson, Janie
2013-07-01
In 2 experiments, we tested a strong version of a dual process theory of conditional inference (cf. Verschueren et al., 2005a, 2005b) that assumes that most reasoners have 2 strategies available, the choice of which is determined by situational variables, cognitive capacity, and metacognitive control. The statistical strategy evaluates inferences probabilistically, accepting those with high conditional probability. The counterexample strategy rejects inferences when a counterexample shows the inference to be invalid. To discriminate strategy use, we presented reasoners with conditional statements (if p, then q) and explicit statistical information about the relative frequency of the probability of p/q (50% vs. 90%). A statistical strategy would accept the more probable inferences more frequently, whereas the counterexample one would reject both. In Experiment 1, reasoners under time pressure used the statistical strategy more, but switched to the counterexample strategy when time constraints were removed; the former took less time than the latter. These data are consistent with the hypothesis that the statistical strategy is the default heuristic. Under a free-time condition, reasoners preferred the counterexample strategy and kept it when put under time pressure. Thus, it is not simply a lack of capacity that produces a statistical strategy; instead, it seems that time pressure disrupts the ability to make good metacognitive choices. In line with this conclusion, in a 2nd experiment, we measured reasoners' confidence in their performance; those under time pressure were less confident in the statistical than the counterexample strategy and more likely to switch strategies under free-time conditions. PsycINFO Database Record (c) 2013 APA, all rights reserved.
ERIC Educational Resources Information Center
Thompson, Bruce
Web-based statistical instruction, like all statistical instruction, ought to focus on teaching the essence of the research endeavor: the exercise of reflective judgment. Using the framework of the recent report of the American Psychological Association (APA) Task Force on Statistical Inference (Wilkinson and the APA Task Force on Statistical…
Overview of PECBO Module, using scripts to infer environmental conditions from biological observations, statistically estimating species-environment relationships, methods for inferring environmental conditions, statistical scripts in module.
Measurement of the relationship between perceived and computed color differences
NASA Astrophysics Data System (ADS)
García, Pedro A.; Huertas, Rafael; Melgosa, Manuel; Cui, Guihua
2007-07-01
Using simulated data sets, we have analyzed some mathematical properties of different statistical measurements that have been employed in previous literature to test the performance of different color-difference formulas. Specifically, the properties of the combined index PF/3 (performance factor obtained as average of three terms), widely employed in current literature, have been considered. A new index named standardized residual sum of squares (STRESS), employed in multidimensional scaling techniques, is recommended. The main difference between PF/3 and STRESS is that the latter is simpler and allows inferences on the statistical significance of two color-difference formulas with respect to a given set of visual data.
Software for Data Analysis with Graphical Models
NASA Technical Reports Server (NTRS)
Buntine, Wray L.; Roy, H. Scott
1994-01-01
Probabilistic graphical models are being used widely in artificial intelligence and statistics, for instance, in diagnosis and expert systems, as a framework for representing and reasoning with probabilities and independencies. They come with corresponding algorithms for performing statistical inference. This offers a unifying framework for prototyping and/or generating data analysis algorithms from graphical specifications. This paper illustrates the framework with an example and then presents some basic techniques for the task: problem decomposition and the calculation of exact Bayes factors. Other tools already developed, such as automatic differentiation, Gibbs sampling, and use of the EM algorithm, make this a broad basis for the generation of data analysis software.
Diagnosis of Misalignment in Overhung Rotor using the K-S Statistic and A2 Test
NASA Astrophysics Data System (ADS)
Garikapati, Diwakar; Pacharu, RaviKumar; Munukurthi, Rama Satya Satyanarayana
2018-02-01
Vibration measurement at the bearings of rotating machinery has become a useful technique for diagnosing incipient fault conditions. In particular, vibration measurement can be used to detect unbalance in rotor, bearing failure, gear problems or misalignment between a motor shaft and coupled shaft. This is a particular problem encountered in turbines, ID fans and FD fans used for power generation. For successful fault diagnosis, it is important to adopt motor current signature analysis (MCSA) techniques capable of identifying the faults. It is also useful to develop techniques for inferring information such as the severity of fault. It is proposed that modeling the cumulative distribution function of motor current signals with respect to appropriate theoretical distributions, and quantifying the goodness of fit with the Kolmogorov-Smirnov (KS) statistic and A2 test offers a suitable signal feature for diagnosis. This paper demonstrates the successful comparison of the K-S feature and A2 test for discriminating the misalignment fault from normal function.
Statistical Inferences from Formaldehyde Dna-Protein Cross-Link Data
Physiologically-based pharmacokinetic (PBPK) modeling has reached considerable sophistication in its application in the pharmacological and environmental health areas. Yet, mature methodologies for making statistical inferences have not been routinely incorporated in these applic...
Introducing Statistical Inference to Biology Students through Bootstrapping and Randomization
ERIC Educational Resources Information Center
Lock, Robin H.; Lock, Patti Frazer
2008-01-01
Bootstrap methods and randomization tests are increasingly being used as alternatives to standard statistical procedures in biology. They also serve as an effective introduction to the key ideas of statistical inference in introductory courses for biology students. We discuss the use of such simulation based procedures in an integrated curriculum…
Developing Young Children's Emergent Inferential Practices in Statistics
ERIC Educational Resources Information Center
Makar, Katie
2016-01-01
Informal statistical inference has now been researched at all levels of schooling and initial tertiary study. Work in informal statistical inference is least understood in the early years, where children have had little if any exposure to data handling. A qualitative study in Australia was carried out through a series of teaching experiments with…
Furchtgott, Leon A; Melton, Samuel; Menon, Vilas; Ramanathan, Sharad
2017-01-01
Computational analysis of gene expression to determine both the sequence of lineage choices made by multipotent cells and to identify the genes influencing these decisions is challenging. Here we discover a pattern in the expression levels of a sparse subset of genes among cell types in B- and T-cell developmental lineages that correlates with developmental topologies. We develop a statistical framework using this pattern to simultaneously infer lineage transitions and the genes that determine these relationships. We use this technique to reconstruct the early hematopoietic and intestinal developmental trees. We extend this framework to analyze single-cell RNA-seq data from early human cortical development, inferring a neocortical-hindbrain split in early progenitor cells and the key genes that could control this lineage decision. Our work allows us to simultaneously infer both the identity and lineage of cell types as well as a small set of key genes whose expression patterns reflect these relationships. DOI: http://dx.doi.org/10.7554/eLife.20488.001 PMID:28296636
Sigma: Strain-level inference of genomes from metagenomic analysis for biosurveillance
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ahn, Tae-Hyuk; Chai, Juanjuan; Pan, Chongle
Motivation: Metagenomic sequencing of clinical samples provides a promising technique for direct pathogen detection and characterization in biosurveillance. Taxonomic analysis at the strain level can be used to resolve serotypes of a pathogen in biosurveillance. Sigma was developed for strain-level identification and quantification of pathogens using their reference genomes based on metagenomic analysis. Results: Sigma provides not only accurate strain-level inferences, but also three unique capabilities: (i) Sigma quantifies the statistical uncertainty of its inferences, which includes hypothesis testing of identified genomes and confidence interval estimation of their relative abundances; (ii) Sigma enables strain variant calling by assigning metagenomic readsmore » to their most likely reference genomes; and (iii) Sigma supports parallel computing for fast analysis of large datasets. In conclusion, the algorithm performance was evaluated using simulated mock communities and fecal samples with spike-in pathogen strains. Availability and Implementation: Sigma was implemented in C++ with source codes and binaries freely available at http://sigma.omicsbio.org.« less
Sigma: Strain-level inference of genomes from metagenomic analysis for biosurveillance
Ahn, Tae-Hyuk; Chai, Juanjuan; Pan, Chongle
2014-09-29
Motivation: Metagenomic sequencing of clinical samples provides a promising technique for direct pathogen detection and characterization in biosurveillance. Taxonomic analysis at the strain level can be used to resolve serotypes of a pathogen in biosurveillance. Sigma was developed for strain-level identification and quantification of pathogens using their reference genomes based on metagenomic analysis. Results: Sigma provides not only accurate strain-level inferences, but also three unique capabilities: (i) Sigma quantifies the statistical uncertainty of its inferences, which includes hypothesis testing of identified genomes and confidence interval estimation of their relative abundances; (ii) Sigma enables strain variant calling by assigning metagenomic readsmore » to their most likely reference genomes; and (iii) Sigma supports parallel computing for fast analysis of large datasets. In conclusion, the algorithm performance was evaluated using simulated mock communities and fecal samples with spike-in pathogen strains. Availability and Implementation: Sigma was implemented in C++ with source codes and binaries freely available at http://sigma.omicsbio.org.« less
Correlation and simple linear regression.
Eberly, Lynn E
2007-01-01
This chapter highlights important steps in using correlation and simple linear regression to address scientific questions about the association of two continuous variables with each other. These steps include estimation and inference, assessing model fit, the connection between regression and ANOVA, and study design. Examples in microbiology are used throughout. This chapter provides a framework that is helpful in understanding more complex statistical techniques, such as multiple linear regression, linear mixed effects models, logistic regression, and proportional hazards regression.
[Inferring landmark displacements from changes in cephalometric angles].
Xu, T; Baumrind, S
2001-07-01
To investigate the appropriateness of using changes in angular measurements to reflect the actually profile changes. The sample consists of 48 growing malocclusion patients, contained 24 Class I and 24 Class II subjects, treated by an experienced orthodontist using Edgewise technique. Landmark and superimpositional data were extracted from the previously prepared numerical database. Three pairs of angular and linear measures were computed by the Craniofacial Software Package. Although the associations between all three angular measures and their corresponding linear measures are statistically significant at the 0.001 level, the disagreement between these three pairs of measures are 10.4%, 22.9% and 37.5% respectively in this sample. The direction of displacement of anterior facial landmarks during growth and treatment cannot reliably be inferred merely from changes in cephalometric Angles.
Using Alien Coins to Test Whether Simple Inference Is Bayesian
ERIC Educational Resources Information Center
Cassey, Peter; Hawkins, Guy E.; Donkin, Chris; Brown, Scott D.
2016-01-01
Reasoning and inference are well-studied aspects of basic cognition that have been explained as statistically optimal Bayesian inference. Using a simplified experimental design, we conducted quantitative comparisons between Bayesian inference and human inference at the level of individuals. In 3 experiments, with more than 13,000 participants, we…
ERIC Educational Resources Information Center
Henriques, Ana; Oliveira, Hélia
2016-01-01
This paper reports on the results of a study investigating the potential to embed Informal Statistical Inference in statistical investigations, using TinkerPlots, for assisting 8th grade students' informal inferential reasoning to emerge, particularly their articulations of uncertainty. Data collection included students' written work on a…
A Predictive Approach to Network Reverse-Engineering
NASA Astrophysics Data System (ADS)
Wiggins, Chris
2005-03-01
A central challenge of systems biology is the ``reverse engineering" of transcriptional networks: inferring which genes exert regulatory control over which other genes. Attempting such inference at the genomic scale has only recently become feasible, via data-intensive biological innovations such as DNA microrrays (``DNA chips") and the sequencing of whole genomes. In this talk we present a predictive approach to network reverse-engineering, in which we integrate DNA chip data and sequence data to build a model of the transcriptional network of the yeast S. cerevisiae capable of predicting the response of genes in unseen experiments. The technique can also be used to extract ``motifs,'' sequence elements which act as binding sites for regulatory proteins. We validate by a number of approaches and present comparison of theoretical prediction vs. experimental data, along with biological interpretations of the resulting model. En route, we will illustrate some basic notions in statistical learning theory (fitting vs. over-fitting; cross- validation; assessing statistical significance), highlighting ways in which physicists can make a unique contribution in data- driven approaches to reverse engineering.
Inferring the Mode of Selection from the Transient Response to Demographic Perturbations
NASA Astrophysics Data System (ADS)
Balick, Daniel; Do, Ron; Reich, David; Sunyaev, Shamil
2014-03-01
Despite substantial recent progress in theoretical population genetics, most models work under the assumption of a constant population size. Deviations from fixed population sizes are ubiquitous in natural populations, many of which experience population bottlenecks and re-expansions. The non-equilibrium dynamics introduced by a large perturbation in population size are generally viewed as a confounding factor. In the present work, we take advantage of the transient response to a population bottleneck to infer features of the mode of selection and the distribution of selective effects. We develop an analytic framework and a corresponding statistical test that qualitatively differentiates between alleles under additive and those under recessive or more general epistatic selection. This statistic can be used to bound the joint distribution of selective effects and dominance effects in any diploid sexual organism. We apply this technique to human population genetic data, and severely restrict the space of allowed selective coefficients in humans. Additionally, one can test a set of functionally or medically relevant alleles for the primary mode of selection, or determine the local regional variation in dominance coefficients along the genome.
NASA Technical Reports Server (NTRS)
Harrison, P. Ann
1992-01-01
The NASA VEGetation Workbench (VEG) is a knowledge based system that infers vegetation characteristics from reflectance data. The VEG subgoal PROPORTION.GROUND.COVER has been completed and a number of additional techniques that infer the proportion ground cover of a sample have been implemented. Some techniques operate on sample data at a single wavelength. The techniques previously incorporated in VEG for other subgoals operated on data at a single wavelength so implementing the additional single wavelength techniques required no changes to the structure of VEG. Two techniques which use data at multiple wavelengths to infer proportion ground cover were also implemented. This work involved modifying the structure of VEG so that multiple wavelength techniques could be incorporated. All the new techniques were tested using both the VEG 'Research Mode' and the 'Automatic Mode.'
Liu, Fang; Eugenio, Evercita C
2018-04-01
Beta regression is an increasingly popular statistical technique in medical research for modeling of outcomes that assume values in (0, 1), such as proportions and patient reported outcomes. When outcomes take values in the intervals [0,1), (0,1], or [0,1], zero-or-one-inflated beta (zoib) regression can be used. We provide a thorough review on beta regression and zoib regression in the modeling, inferential, and computational aspects via the likelihood-based and Bayesian approaches. We demonstrate the statistical and practical importance of correctly modeling the inflation at zero/one rather than ad hoc replacing them with values close to zero/one via simulation studies; the latter approach can lead to biased estimates and invalid inferences. We show via simulation studies that the likelihood-based approach is computationally faster in general than MCMC algorithms used in the Bayesian inferences, but runs the risk of non-convergence, large biases, and sensitivity to starting values in the optimization algorithm especially with clustered/correlated data, data with sparse inflation at zero and one, and data that warrant regularization of the likelihood. The disadvantages of the regular likelihood-based approach make the Bayesian approach an attractive alternative in these cases. Software packages and tools for fitting beta and zoib regressions in both the likelihood-based and Bayesian frameworks are also reviewed.
Serang, Oliver; Noble, William Stafford
2012-01-01
The problem of identifying the proteins in a complex mixture using tandem mass spectrometry can be framed as an inference problem on a graph that connects peptides to proteins. Several existing protein identification methods make use of statistical inference methods for graphical models, including expectation maximization, Markov chain Monte Carlo, and full marginalization coupled with approximation heuristics. We show that, for this problem, the majority of the cost of inference usually comes from a few highly connected subgraphs. Furthermore, we evaluate three different statistical inference methods using a common graphical model, and we demonstrate that junction tree inference substantially improves rates of convergence compared to existing methods. The python code used for this paper is available at http://noble.gs.washington.edu/proj/fido. PMID:22331862
Using genetic data to strengthen causal inference in observational research.
Pingault, Jean-Baptiste; O'Reilly, Paul F; Schoeler, Tabea; Ploubidis, George B; Rijsdijk, Frühling; Dudbridge, Frank
2018-06-05
Causal inference is essential across the biomedical, behavioural and social sciences.By progressing from confounded statistical associations to evidence of causal relationships, causal inference can reveal complex pathways underlying traits and diseases and help to prioritize targets for intervention. Recent progress in genetic epidemiology - including statistical innovation, massive genotyped data sets and novel computational tools for deep data mining - has fostered the intense development of methods exploiting genetic data and relatedness to strengthen causal inference in observational research. In this Review, we describe how such genetically informed methods differ in their rationale, applicability and inherent limitations and outline how they should be integrated in the future to offer a rich causal inference toolbox.
Brannigan, V M; Bier, V M; Berg, C
1992-09-01
Toxic torts are product liability cases dealing with alleged injuries due to chemical or biological hazards such as radiation, thalidomide, or Agent Orange. Toxic tort cases typically rely more heavily than other product liability cases on indirect or statistical proof of injury. There have been numerous theoretical analyses of statistical proof of injury in toxic tort cases. However, there have been only a handful of actual legal decisions regarding the use of such statistical evidence, and most of those decisions have been inconclusive. Recently, a major case from the Fifth Circuit, involving allegations that Benedectin (a morning sickness drug) caused birth defects, was decided entirely on the basis of statistical inference. This paper examines both the conceptual basis of that decision, and also the relationships among statistical inference, scientific evidence, and the rules of product liability in general.
Visualizing statistical significance of disease clusters using cartograms.
Kronenfeld, Barry J; Wong, David W S
2017-05-15
Health officials and epidemiological researchers often use maps of disease rates to identify potential disease clusters. Because these maps exaggerate the prominence of low-density districts and hide potential clusters in urban (high-density) areas, many researchers have used density-equalizing maps (cartograms) as a basis for epidemiological mapping. However, we do not have existing guidelines for visual assessment of statistical uncertainty. To address this shortcoming, we develop techniques for visual determination of statistical significance of clusters spanning one or more districts on a cartogram. We developed the techniques within a geovisual analytics framework that does not rely on automated significance testing, and can therefore facilitate visual analysis to detect clusters that automated techniques might miss. On a cartogram of the at-risk population, the statistical significance of a disease cluster is determinate from the rate, area and shape of the cluster under standard hypothesis testing scenarios. We develop formulae to determine, for a given rate, the area required for statistical significance of a priori and a posteriori designated regions under certain test assumptions. Uniquely, our approach enables dynamic inference of aggregate regions formed by combining individual districts. The method is implemented in interactive tools that provide choropleth mapping, automated legend construction and dynamic search tools to facilitate cluster detection and assessment of the validity of tested assumptions. A case study of leukemia incidence analysis in California demonstrates the ability to visually distinguish between statistically significant and insignificant regions. The proposed geovisual analytics approach enables intuitive visual assessment of statistical significance of arbitrarily defined regions on a cartogram. Our research prompts a broader discussion of the role of geovisual exploratory analyses in disease mapping and the appropriate framework for visually assessing the statistical significance of spatial clusters.
Making statistical inferences about software reliability
NASA Technical Reports Server (NTRS)
Miller, Douglas R.
1988-01-01
Failure times of software undergoing random debugging can be modelled as order statistics of independent but nonidentically distributed exponential random variables. Using this model inferences can be made about current reliability and, if debugging continues, future reliability. This model also shows the difficulty inherent in statistical verification of very highly reliable software such as that used by digital avionics in commercial aircraft.
ERIC Educational Resources Information Center
Denbleyker, John Nickolas
2012-01-01
The shortcomings of the proportion above cut (PAC) statistic used so prominently in the educational landscape renders it a very problematic measure for making correct inferences with student test data. The limitations of PAC-based statistics are more pronounced with cross-test comparisons due to their dependency on cut-score locations. A better…
ERIC Educational Resources Information Center
Trumpower, David L.
2015-01-01
Making inferences about population differences based on samples of data, that is, performing intuitive analysis of variance (IANOVA), is common in everyday life. However, the intuitive reasoning of individuals when making such inferences (even following statistics instruction), often differs from the normative logic of formal statistics. The…
Difference to Inference: teaching logical and statistical reasoning through on-line interactivity.
Malloy, T E
2001-05-01
Difference to Inference is an on-line JAVA program that simulates theory testing and falsification through research design and data collection in a game format. The program, based on cognitive and epistemological principles, is designed to support learning of the thinking skills underlying deductive and inductive logic and statistical reasoning. Difference to Inference has database connectivity so that game scores can be counted as part of course grades.
1987-06-01
number of series among the 63 which were identified as a particular ARIMA form and were "best" modeled by a particular technique. Figure 1 illustrates a...th time from xe’s. The integrbted autoregressive - moving average model , denoted by ARIMA (p,d,q) is a result of combining d-th differencing process...Experiments, (4) Data Analysis and Modeling , (5) Theory and Probablistic Inference, (6) Fuzzy Statistics, (7) Forecasting and Prediction, (8) Small Sample
Trends in modeling Biomedical Complex Systems
Milanesi, Luciano; Romano, Paolo; Castellani, Gastone; Remondini, Daniel; Liò, Petro
2009-01-01
In this paper we provide an introduction to the techniques for multi-scale complex biological systems, from the single bio-molecule to the cell, combining theoretical modeling, experiments, informatics tools and technologies suitable for biological and biomedical research, which are becoming increasingly multidisciplinary, multidimensional and information-driven. The most important concepts on mathematical modeling methodologies and statistical inference, bioinformatics and standards tools to investigate complex biomedical systems are discussed and the prominent literature useful to both the practitioner and the theoretician are presented. PMID:19828068
NASA Astrophysics Data System (ADS)
Cunningham, Laura; Holmes, Naomi; Bigler, Christian; Dadal, Anna; Bergman, Jonas; Eriksson, Lars; Brooks, Stephen; Langdon, Pete; Caseldine, Chris
2010-05-01
Over the past two decades considerable effort has been devoted to quantitatively reconstructing temperatures from biological proxies preserved in lake sediments, via transfer functions. Such transfer functions typically consist of modern sediment samples, collected over a broad environmental gradient. Correlations between the biological communities and environmental parameters observed over these broad gradients are assumed to be equally valid temporally. The predictive ability of such spatially based transfer functions has traditionally been assessed by comparisons of measured and inferred temperatures within the calibration sets, with little validation against historical data. Although statistical techniques such as bootstrapping may improve error estimation, this approach remains partly a circular argument. This raises the question of how reliable such reconstructions are for inferring past changes in temperature? In order to address this question, we used transfer functions to reconstruct July temperatures from diatoms and chironomids from several locations across northern Europe. The transfer functions used showed good internal calibration statistics (r2 = 0.66 - 0.91). The diatom and chironomid inferred July air temperatures were compared to local observational records. As the sediment records were non-annual, all data were first smoothed using a 15 yr moving average filter. None of the five biologically-inferred temperature records were correlated with the local meteorological records. Furthermore, diatom inferred temperatures did not agree with chironomid inferred temperatures from the same cores from the same sites. In an attempt to understand this poor performance the biological proxy data was compressed using principal component analysis (PCA), and the PCA axes compared to the local meteorological data. These analyses clearly demonstrated that July temperatures were not correlated with the biological data at these locations. Some correlations were observed between the biological proxies and autumn and spring temperatures, although this varied slightly between sites and proxies. For example, chironomid data from Iceland was most strongly correlated with temperatures in February, March and April whilst in northern Sweden, the chironomid data was most strongly correlated with temperatures in March, April and May. It is suggested that the biological data at these sites may be responding to changes in the length of the ice-free period or hydrological regimes (including snow melt), rather than temperature per se. Our findings demonstrate the need to validate inferred temperatures against local meteorological data. Where such validation cannot be undertaken, inferred temperature reconstructions should be treated cautiously.
Hayat, Matthew J.; Powell, Amanda; Johnson, Tessa; Cadwell, Betsy L.
2017-01-01
Statistical literacy and knowledge is needed to read and understand the public health literature. The purpose of this study was to quantify basic and advanced statistical methods used in public health research. We randomly sampled 216 published articles from seven top tier general public health journals. Studies were reviewed by two readers and a standardized data collection form completed for each article. Data were analyzed with descriptive statistics and frequency distributions. Results were summarized for statistical methods used in the literature, including descriptive and inferential statistics, modeling, advanced statistical techniques, and statistical software used. Approximately 81.9% of articles reported an observational study design and 93.1% of articles were substantively focused. Descriptive statistics in table or graphical form were reported in more than 95% of the articles, and statistical inference reported in more than 76% of the studies reviewed. These results reveal the types of statistical methods currently used in the public health literature. Although this study did not obtain information on what should be taught, information on statistical methods being used is useful for curriculum development in graduate health sciences education, as well as making informed decisions about continuing education for public health professionals. PMID:28591190
Hayat, Matthew J; Powell, Amanda; Johnson, Tessa; Cadwell, Betsy L
2017-01-01
Statistical literacy and knowledge is needed to read and understand the public health literature. The purpose of this study was to quantify basic and advanced statistical methods used in public health research. We randomly sampled 216 published articles from seven top tier general public health journals. Studies were reviewed by two readers and a standardized data collection form completed for each article. Data were analyzed with descriptive statistics and frequency distributions. Results were summarized for statistical methods used in the literature, including descriptive and inferential statistics, modeling, advanced statistical techniques, and statistical software used. Approximately 81.9% of articles reported an observational study design and 93.1% of articles were substantively focused. Descriptive statistics in table or graphical form were reported in more than 95% of the articles, and statistical inference reported in more than 76% of the studies reviewed. These results reveal the types of statistical methods currently used in the public health literature. Although this study did not obtain information on what should be taught, information on statistical methods being used is useful for curriculum development in graduate health sciences education, as well as making informed decisions about continuing education for public health professionals.
NASA Technical Reports Server (NTRS)
Hailperin, M.
1993-01-01
This thesis provides design and analysis of techniques for global load balancing on ensemble architectures running soft-real-time object-oriented applications with statistically periodic loads. It focuses on estimating the instantaneous average load over all the processing elements. The major contribution is the use of explicit stochastic process models for both the loading and the averaging itself. These models are exploited via statistical time-series analysis and Bayesian inference to provide improved average load estimates, and thus to facilitate global load balancing. This thesis explains the distributed algorithms used and provides some optimality results. It also describes the algorithms' implementation and gives performance results from simulation. These results show that the authors' techniques allow more accurate estimation of the global system loading, resulting in fewer object migrations than local methods. The authors' method is shown to provide superior performance, relative not only to static load-balancing schemes but also to many adaptive load-balancing methods. Results from a preliminary analysis of another system and from simulation with a synthetic load provide some evidence of more general applicability.
Modeling of a Robust Confidence Band for the Power Curve of a Wind Turbine.
Hernandez, Wilmar; Méndez, Alfredo; Maldonado-Correa, Jorge L; Balleteros, Francisco
2016-12-07
Having an accurate model of the power curve of a wind turbine allows us to better monitor its operation and planning of storage capacity. Since wind speed and direction is of a highly stochastic nature, the forecasting of the power generated by the wind turbine is of the same nature as well. In this paper, a method for obtaining a robust confidence band containing the power curve of a wind turbine under test conditions is presented. Here, the confidence band is bound by two curves which are estimated using parametric statistical inference techniques. However, the observations that are used for carrying out the statistical analysis are obtained by using the binning method, and in each bin, the outliers are eliminated by using a censorship process based on robust statistical techniques. Then, the observations that are not outliers are divided into observation sets. Finally, both the power curve of the wind turbine and the two curves that define the robust confidence band are estimated using each of the previously mentioned observation sets.
Modeling of a Robust Confidence Band for the Power Curve of a Wind Turbine
Hernandez, Wilmar; Méndez, Alfredo; Maldonado-Correa, Jorge L.; Balleteros, Francisco
2016-01-01
Having an accurate model of the power curve of a wind turbine allows us to better monitor its operation and planning of storage capacity. Since wind speed and direction is of a highly stochastic nature, the forecasting of the power generated by the wind turbine is of the same nature as well. In this paper, a method for obtaining a robust confidence band containing the power curve of a wind turbine under test conditions is presented. Here, the confidence band is bound by two curves which are estimated using parametric statistical inference techniques. However, the observations that are used for carrying out the statistical analysis are obtained by using the binning method, and in each bin, the outliers are eliminated by using a censorship process based on robust statistical techniques. Then, the observations that are not outliers are divided into observation sets. Finally, both the power curve of the wind turbine and the two curves that define the robust confidence band are estimated using each of the previously mentioned observation sets. PMID:27941604
ERIC Educational Resources Information Center
Watson, Jane
2007-01-01
Inference, or decision making, is seen in curriculum documents as the final step in a statistical investigation. For a formal statistical enquiry this may be associated with sophisticated tests involving probability distributions. For young students without the mathematical background to perform such tests, it is still possible to draw informal…
A Framework for Thinking about Informal Statistical Inference
ERIC Educational Resources Information Center
Makar, Katie; Rubin, Andee
2009-01-01
Informal inferential reasoning has shown some promise in developing students' deeper understanding of statistical processes. This paper presents a framework to think about three key principles of informal inference--generalizations "beyond the data," probabilistic language, and data as evidence. The authors use primary school classroom…
Sensitivity to the Sampling Process Emerges From the Principle of Efficiency.
Jara-Ettinger, Julian; Sun, Felix; Schulz, Laura; Tenenbaum, Joshua B
2018-05-01
Humans can seamlessly infer other people's preferences, based on what they do. Broadly, two types of accounts have been proposed to explain different aspects of this ability. The first account focuses on spatial information: Agents' efficient navigation in space reveals what they like. The second account focuses on statistical information: Uncommon choices reveal stronger preferences. Together, these two lines of research suggest that we have two distinct capacities for inferring preferences. Here we propose that this is not the case, and that spatial-based and statistical-based preference inferences can be explained by the assumption that agents are efficient alone. We show that people's sensitivity to spatial and statistical information when they infer preferences is best predicted by a computational model of the principle of efficiency, and that this model outperforms dual-system models, even when the latter are fit to participant judgments. Our results suggest that, as adults, a unified understanding of agency under the principle of efficiency underlies our ability to infer preferences. Copyright © 2018 Cognitive Science Society, Inc.
Deep Learning for Population Genetic Inference.
Sheehan, Sara; Song, Yun S
2016-03-01
Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme.
Deep Learning for Population Genetic Inference
Sheehan, Sara; Song, Yun S.
2016-01-01
Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme. PMID:27018908
Predicting the binding preference of transcription factors to individual DNA k-mers.
Alleyne, Trevis M; Peña-Castillo, Lourdes; Badis, Gwenael; Talukder, Shaheynoor; Berger, Michael F; Gehrke, Andrew R; Philippakis, Anthony A; Bulyk, Martha L; Morris, Quaid D; Hughes, Timothy R
2009-04-15
Recognition of specific DNA sequences is a central mechanism by which transcription factors (TFs) control gene expression. Many TF-binding preferences, however, are unknown or poorly characterized, in part due to the difficulty associated with determining their specificity experimentally, and an incomplete understanding of the mechanisms governing sequence specificity. New techniques that estimate the affinity of TFs to all possible k-mers provide a new opportunity to study DNA-protein interaction mechanisms, and may facilitate inference of binding preferences for members of a given TF family when such information is available for other family members. We employed a new dataset consisting of the relative preferences of mouse homeodomains for all eight-base DNA sequences in order to ask how well we can predict the binding profiles of homeodomains when only their protein sequences are given. We evaluated a panel of standard statistical inference techniques, as well as variations of the protein features considered. Nearest neighbour among functionally important residues emerged among the most effective methods. Our results underscore the complexity of TF-DNA recognition, and suggest a rational approach for future analyses of TF families.
Statistical inference for extended or shortened phase II studies based on Simon's two-stage designs.
Zhao, Junjun; Yu, Menggang; Feng, Xi-Ping
2015-06-07
Simon's two-stage designs are popular choices for conducting phase II clinical trials, especially in the oncology trials to reduce the number of patients placed on ineffective experimental therapies. Recently Koyama and Chen (2008) discussed how to conduct proper inference for such studies because they found that inference procedures used with Simon's designs almost always ignore the actual sampling plan used. In particular, they proposed an inference method for studies when the actual second stage sample sizes differ from planned ones. We consider an alternative inference method based on likelihood ratio. In particular, we order permissible sample paths under Simon's two-stage designs using their corresponding conditional likelihood. In this way, we can calculate p-values using the common definition: the probability of obtaining a test statistic value at least as extreme as that observed under the null hypothesis. In addition to providing inference for a couple of scenarios where Koyama and Chen's method can be difficult to apply, the resulting estimate based on our method appears to have certain advantage in terms of inference properties in many numerical simulations. It generally led to smaller biases and narrower confidence intervals while maintaining similar coverages. We also illustrated the two methods in a real data setting. Inference procedures used with Simon's designs almost always ignore the actual sampling plan. Reported P-values, point estimates and confidence intervals for the response rate are not usually adjusted for the design's adaptiveness. Proper statistical inference procedures should be used.
NASA Astrophysics Data System (ADS)
Alsing, Justin; Wandelt, Benjamin; Feeney, Stephen
2018-07-01
Many statistical models in cosmology can be simulated forwards but have intractable likelihood functions. Likelihood-free inference methods allow us to perform Bayesian inference from these models using only forward simulations, free from any likelihood assumptions or approximations. Likelihood-free inference generically involves simulating mock data and comparing to the observed data; this comparison in data space suffers from the curse of dimensionality and requires compression of the data to a small number of summary statistics to be tractable. In this paper, we use massive asymptotically optimal data compression to reduce the dimensionality of the data space to just one number per parameter, providing a natural and optimal framework for summary statistic choice for likelihood-free inference. Secondly, we present the first cosmological application of Density Estimation Likelihood-Free Inference (DELFI), which learns a parametrized model for joint distribution of data and parameters, yielding both the parameter posterior and the model evidence. This approach is conceptually simple, requires less tuning than traditional Approximate Bayesian Computation approaches to likelihood-free inference and can give high-fidelity posteriors from orders of magnitude fewer forward simulations. As an additional bonus, it enables parameter inference and Bayesian model comparison simultaneously. We demonstrate DELFI with massive data compression on an analysis of the joint light-curve analysis supernova data, as a simple validation case study. We show that high-fidelity posterior inference is possible for full-scale cosmological data analyses with as few as ˜104 simulations, with substantial scope for further improvement, demonstrating the scalability of likelihood-free inference to large and complex cosmological data sets.
A Coalitional Game for Distributed Inference in Sensor Networks With Dependent Observations
NASA Astrophysics Data System (ADS)
He, Hao; Varshney, Pramod K.
2016-04-01
We consider the problem of collaborative inference in a sensor network with heterogeneous and statistically dependent sensor observations. Each sensor aims to maximize its inference performance by forming a coalition with other sensors and sharing information within the coalition. It is proved that the inference performance is a nondecreasing function of the coalition size. However, in an energy constrained network, the energy consumption of inter-sensor communication also increases with increasing coalition size, which discourages the formation of the grand coalition (the set of all sensors). In this paper, the formation of non-overlapping coalitions with statistically dependent sensors is investigated under a specific communication constraint. We apply a game theoretical approach to fully explore and utilize the information contained in the spatial dependence among sensors to maximize individual sensor performance. Before formulating the distributed inference problem as a coalition formation game, we first quantify the gain and loss in forming a coalition by introducing the concepts of diversity gain and redundancy loss for both estimation and detection problems. These definitions, enabled by the statistical theory of copulas, allow us to characterize the influence of statistical dependence among sensor observations on inference performance. An iterative algorithm based on merge-and-split operations is proposed for the solution and the stability of the proposed algorithm is analyzed. Numerical results are provided to demonstrate the superiority of our proposed game theoretical approach.
The Problem of Auto-Correlation in Parasitology
Pollitt, Laura C.; Reece, Sarah E.; Mideo, Nicole; Nussey, Daniel H.; Colegrave, Nick
2012-01-01
Explaining the contribution of host and pathogen factors in driving infection dynamics is a major ambition in parasitology. There is increasing recognition that analyses based on single summary measures of an infection (e.g., peak parasitaemia) do not adequately capture infection dynamics and so, the appropriate use of statistical techniques to analyse dynamics is necessary to understand infections and, ultimately, control parasites. However, the complexities of within-host environments mean that tracking and analysing pathogen dynamics within infections and among hosts poses considerable statistical challenges. Simple statistical models make assumptions that will rarely be satisfied in data collected on host and parasite parameters. In particular, model residuals (unexplained variance in the data) should not be correlated in time or space. Here we demonstrate how failure to account for such correlations can result in incorrect biological inference from statistical analysis. We then show how mixed effects models can be used as a powerful tool to analyse such repeated measures data in the hope that this will encourage better statistical practices in parasitology. PMID:22511865
Statistical inference for classification of RRIM clone series using near IR reflectance properties
NASA Astrophysics Data System (ADS)
Ismail, Faridatul Aima; Madzhi, Nina Korlina; Hashim, Hadzli; Abdullah, Noor Ezan; Khairuzzaman, Noor Aishah; Azmi, Azrie Faris Mohd; Sampian, Ahmad Faiz Mohd; Harun, Muhammad Hafiz
2015-08-01
RRIM clone is a rubber breeding series produced by RRIM (Rubber Research Institute of Malaysia) through "rubber breeding program" to improve latex yield and producing clones attractive to farmers. The objective of this work is to analyse measurement of optical sensing device on latex of selected clone series. The device using transmitting NIR properties and its reflectance is converted in terms of voltage. The obtained reflectance index value via voltage was analyzed using statistical technique in order to find out the discrimination among the clones. From the statistical results using error plots and one-way ANOVA test, there is an overwhelming evidence showing discrimination of RRIM 2002, RRIM 2007 and RRIM 3001 clone series with p value = 0.000. RRIM 2008 cannot be discriminated with RRIM 2014; however both of these groups are distinct from the other clones.
Inverse statistical physics of protein sequences: a key issues review.
Cocco, Simona; Feinauer, Christoph; Figliuzzi, Matteo; Monasson, Rémi; Weigt, Martin
2018-03-01
In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e. evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.
Inverse statistical physics of protein sequences: a key issues review
NASA Astrophysics Data System (ADS)
Cocco, Simona; Feinauer, Christoph; Figliuzzi, Matteo; Monasson, Rémi; Weigt, Martin
2018-03-01
In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e. evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.
NASA Astrophysics Data System (ADS)
Moura, Ricardo; Sinha, Bimal; Coelho, Carlos A.
2017-06-01
The recent popularity of the use of synthetic data as a Statistical Disclosure Control technique has enabled the development of several methods of generating and analyzing such data, but almost always relying in asymptotic distributions and in consequence being not adequate for small sample datasets. Thus, a likelihood-based exact inference procedure is derived for the matrix of regression coefficients of the multivariate regression model, for multiply imputed synthetic data generated via Posterior Predictive Sampling. Since it is based in exact distributions this procedure may even be used in small sample datasets. Simulation studies compare the results obtained from the proposed exact inferential procedure with the results obtained from an adaptation of Reiters combination rule to multiply imputed synthetic datasets and an application to the 2000 Current Population Survey is discussed.
Near Earth Asteroid Characteristics for Asteroid Threat Assessment
NASA Technical Reports Server (NTRS)
Dotson, Jessie
2015-01-01
Information about the physical characteristics of Near Earth Asteroids (NEAs) is needed to model behavior during atmospheric entry, to assess the risk of an impact, and to model possible mitigation techniques. The intrinsic properties of interest to entry and mitigation modelers, however, rarely are directly measureable. Instead we measure other properties and infer the intrinsic physical properties, so determining the complete set of characteristics of interest is far from straightforward. In addition, for the majority of NEAs, only the basic measurements exist so often properties must be inferred from statistics of the population of more completely characterized objects. We will provide an assessment of the current state of knowledge about the physical characteristics of importance to asteroid threat assessment. In addition, an ongoing effort to collate NEA characteristics into a readily accessible database for use by the planetary defense community will be discussed.
Inference of reaction rate parameters based on summary statistics from experiments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Khalil, Mohammad; Chowdhary, Kamaljit Singh; Safta, Cosmin
Here, we present the results of an application of Bayesian inference and maximum entropy methods for the estimation of the joint probability density for the Arrhenius rate para meters of the rate coefficient of the H 2/O 2-mechanism chain branching reaction H + O 2 → OH + O. Available published data is in the form of summary statistics in terms of nominal values and error bars of the rate coefficient of this reaction at a number of temperature values obtained from shock-tube experiments. Our approach relies on generating data, in this case OH concentration profiles, consistent with the givenmore » summary statistics, using Approximate Bayesian Computation methods and a Markov Chain Monte Carlo procedure. The approach permits the forward propagation of parametric uncertainty through the computational model in a manner that is consistent with the published statistics. A consensus joint posterior on the parameters is obtained by pooling the posterior parameter densities given each consistent data set. To expedite this process, we construct efficient surrogates for the OH concentration using a combination of Pad'e and polynomial approximants. These surrogate models adequately represent forward model observables and their dependence on input parameters and are computationally efficient to allow their use in the Bayesian inference procedure. We also utilize Gauss-Hermite quadrature with Gaussian proposal probability density functions for moment computation resulting in orders of magnitude speedup in data likelihood evaluation. Despite the strong non-linearity in the model, the consistent data sets all res ult in nearly Gaussian conditional parameter probability density functions. The technique also accounts for nuisance parameters in the form of Arrhenius parameters of other rate coefficients with prescribed uncertainty. The resulting pooled parameter probability density function is propagated through stoichiometric hydrogen-air auto-ignition computations to illustrate the need to account for correlation among the Arrhenius rate parameters of one reaction and across rate parameters of different reactions.« less
Inference of reaction rate parameters based on summary statistics from experiments
Khalil, Mohammad; Chowdhary, Kamaljit Singh; Safta, Cosmin; ...
2016-10-15
Here, we present the results of an application of Bayesian inference and maximum entropy methods for the estimation of the joint probability density for the Arrhenius rate para meters of the rate coefficient of the H 2/O 2-mechanism chain branching reaction H + O 2 → OH + O. Available published data is in the form of summary statistics in terms of nominal values and error bars of the rate coefficient of this reaction at a number of temperature values obtained from shock-tube experiments. Our approach relies on generating data, in this case OH concentration profiles, consistent with the givenmore » summary statistics, using Approximate Bayesian Computation methods and a Markov Chain Monte Carlo procedure. The approach permits the forward propagation of parametric uncertainty through the computational model in a manner that is consistent with the published statistics. A consensus joint posterior on the parameters is obtained by pooling the posterior parameter densities given each consistent data set. To expedite this process, we construct efficient surrogates for the OH concentration using a combination of Pad'e and polynomial approximants. These surrogate models adequately represent forward model observables and their dependence on input parameters and are computationally efficient to allow their use in the Bayesian inference procedure. We also utilize Gauss-Hermite quadrature with Gaussian proposal probability density functions for moment computation resulting in orders of magnitude speedup in data likelihood evaluation. Despite the strong non-linearity in the model, the consistent data sets all res ult in nearly Gaussian conditional parameter probability density functions. The technique also accounts for nuisance parameters in the form of Arrhenius parameters of other rate coefficients with prescribed uncertainty. The resulting pooled parameter probability density function is propagated through stoichiometric hydrogen-air auto-ignition computations to illustrate the need to account for correlation among the Arrhenius rate parameters of one reaction and across rate parameters of different reactions.« less
NASA Astrophysics Data System (ADS)
Rajabi, Mohammad Mahdi; Ataie-Ashtiani, Behzad
2016-05-01
Bayesian inference has traditionally been conceived as the proper framework for the formal incorporation of expert knowledge in parameter estimation of groundwater models. However, conventional Bayesian inference is incapable of taking into account the imprecision essentially embedded in expert provided information. In order to solve this problem, a number of extensions to conventional Bayesian inference have been introduced in recent years. One of these extensions is 'fuzzy Bayesian inference' which is the result of integrating fuzzy techniques into Bayesian statistics. Fuzzy Bayesian inference has a number of desirable features which makes it an attractive approach for incorporating expert knowledge in the parameter estimation process of groundwater models: (1) it is well adapted to the nature of expert provided information, (2) it allows to distinguishably model both uncertainty and imprecision, and (3) it presents a framework for fusing expert provided information regarding the various inputs of the Bayesian inference algorithm. However an important obstacle in employing fuzzy Bayesian inference in groundwater numerical modeling applications is the computational burden, as the required number of numerical model simulations often becomes extremely exhaustive and often computationally infeasible. In this paper, a novel approach of accelerating the fuzzy Bayesian inference algorithm is proposed which is based on using approximate posterior distributions derived from surrogate modeling, as a screening tool in the computations. The proposed approach is first applied to a synthetic test case of seawater intrusion (SWI) in a coastal aquifer. It is shown that for this synthetic test case, the proposed approach decreases the number of required numerical simulations by an order of magnitude. Then the proposed approach is applied to a real-world test case involving three-dimensional numerical modeling of SWI in Kish Island, located in the Persian Gulf. An expert elicitation methodology is developed and applied to the real-world test case in order to provide a road map for the use of fuzzy Bayesian inference in groundwater modeling applications.
Spurious correlations and inference in landscape genetics
Samuel A. Cushman; Erin L. Landguth
2010-01-01
Reliable interpretation of landscape genetic analyses depends on statistical methods that have high power to identify the correct process driving gene flow while rejecting incorrect alternative hypotheses. Little is known about statistical power and inference in individual-based landscape genetics. Our objective was to evaluate the power of causalmodelling with partial...
The Philosophical Foundations of Prescriptive Statements and Statistical Inference
ERIC Educational Resources Information Center
Sun, Shuyan; Pan, Wei
2011-01-01
From the perspectives of the philosophy of science and statistical inference, we discuss the challenges of making prescriptive statements in quantitative research articles. We first consider the prescriptive nature of educational research and argue that prescriptive statements are a necessity in educational research. The logic of deduction,…
Inference and the Introductory Statistics Course
ERIC Educational Resources Information Center
Pfannkuch, Maxine; Regan, Matt; Wild, Chris; Budgett, Stephanie; Forbes, Sharleen; Harraway, John; Parsonage, Ross
2011-01-01
This article sets out some of the rationale and arguments for making major changes to the teaching and learning of statistical inference in introductory courses at our universities by changing from a norm-based, mathematical approach to more conceptually accessible computer-based approaches. The core problem of the inferential argument with its…
"Magnitude-based inference": a statistical review.
Welsh, Alan H; Knight, Emma J
2015-04-01
We consider "magnitude-based inference" and its interpretation by examining in detail its use in the problem of comparing two means. We extract from the spreadsheets, which are provided to users of the analysis (http://www.sportsci.org/), a precise description of how "magnitude-based inference" is implemented. We compare the implemented version of the method with general descriptions of it and interpret the method in familiar statistical terms. We show that "magnitude-based inference" is not a progressive improvement on modern statistics. The additional probabilities introduced are not directly related to the confidence interval but, rather, are interpretable either as P values for two different nonstandard tests (for different null hypotheses) or as approximate Bayesian calculations, which also lead to a type of test. We also discuss sample size calculations associated with "magnitude-based inference" and show that the substantial reduction in sample sizes claimed for the method (30% of the sample size obtained from standard frequentist calculations) is not justifiable so the sample size calculations should not be used. Rather than using "magnitude-based inference," a better solution is to be realistic about the limitations of the data and use either confidence intervals or a fully Bayesian analysis.
Multi-Agent Inference in Social Networks: A Finite Population Learning Approach.
Fan, Jianqing; Tong, Xin; Zeng, Yao
When people in a society want to make inference about some parameter, each person may want to use data collected by other people. Information (data) exchange in social networks is usually costly, so to make reliable statistical decisions, people need to trade off the benefits and costs of information acquisition. Conflicts of interests and coordination problems will arise in the process. Classical statistics does not consider people's incentives and interactions in the data collection process. To address this imperfection, this work explores multi-agent Bayesian inference problems with a game theoretic social network model. Motivated by our interest in aggregate inference at the societal level, we propose a new concept, finite population learning , to address whether with high probability, a large fraction of people in a given finite population network can make "good" inference. Serving as a foundation, this concept enables us to study the long run trend of aggregate inference quality as population grows.
Applying a multiobjective metaheuristic inspired by honey bees to phylogenetic inference.
Santander-Jiménez, Sergio; Vega-Rodríguez, Miguel A
2013-10-01
The development of increasingly popular multiobjective metaheuristics has allowed bioinformaticians to deal with optimization problems in computational biology where multiple objective functions must be taken into account. One of the most relevant research topics that can benefit from these techniques is phylogenetic inference. Throughout the years, different researchers have proposed their own view about the reconstruction of ancestral evolutionary relationships among species. As a result, biologists often report different phylogenetic trees from a same dataset when considering distinct optimality principles. In this work, we detail a multiobjective swarm intelligence approach based on the novel Artificial Bee Colony algorithm for inferring phylogenies. The aim of this paper is to propose a complementary view of phylogenetics according to the maximum parsimony and maximum likelihood criteria, in order to generate a set of phylogenetic trees that represent a compromise between these principles. Experimental results on a variety of nucleotide data sets and statistical studies highlight the relevance of the proposal with regard to other multiobjective algorithms and state-of-the-art biological methods. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Intuitive statistics by 8-month-old infants
Xu, Fei; Garcia, Vashti
2008-01-01
Human learners make inductive inferences based on small amounts of data: we generalize from samples to populations and vice versa. The academic discipline of statistics formalizes these intuitive statistical inferences. What is the origin of this ability? We report six experiments investigating whether 8-month-old infants are “intuitive statisticians.” Our results showed that, given a sample, the infants were able to make inferences about the population from which the sample had been drawn. Conversely, given information about the entire population of relatively small size, the infants were able to make predictions about the sample. Our findings provide evidence that infants possess a powerful mechanism for inductive learning, either using heuristics or basic principles of probability. This ability to make inferences based on samples or information about the population develops early and in the absence of schooling or explicit teaching. Human infants may be rational learners from very early in development. PMID:18378901
Assessing colour-dependent occupation statistics inferred from galaxy group catalogues
NASA Astrophysics Data System (ADS)
Campbell, Duncan; van den Bosch, Frank C.; Hearin, Andrew; Padmanabhan, Nikhil; Berlind, Andreas; Mo, H. J.; Tinker, Jeremy; Yang, Xiaohu
2015-09-01
We investigate the ability of current implementations of galaxy group finders to recover colour-dependent halo occupation statistics. To test the fidelity of group catalogue inferred statistics, we run three different group finders used in the literature over a mock that includes galaxy colours in a realistic manner. Overall, the resulting mock group catalogues are remarkably similar, and most colour-dependent statistics are recovered with reasonable accuracy. However, it is also clear that certain systematic errors arise as a consequence of correlated errors in group membership determination, central/satellite designation, and halo mass assignment. We introduce a new statistic, the halo transition probability (HTP), which captures the combined impact of all these errors. As a rule of thumb, errors tend to equalize the properties of distinct galaxy populations (i.e. red versus blue galaxies or centrals versus satellites), and to result in inferred occupation statistics that are more accurate for red galaxies than for blue galaxies. A statistic that is particularly poorly recovered from the group catalogues is the red fraction of central galaxies as a function of halo mass. Group finders do a good job in recovering galactic conformity, but also have a tendency to introduce weak conformity when none is present. We conclude that proper inference of colour-dependent statistics from group catalogues is best achieved using forward modelling (i.e. running group finders over mock data) or by implementing a correction scheme based on the HTP, as long as the latter is not too strongly model dependent.
Statistical learning and selective inference.
Taylor, Jonathan; Tibshirani, Robert J
2015-06-23
We describe the problem of "selective inference." This addresses the following challenge: Having mined a set of data to find potential associations, how do we properly assess the strength of these associations? The fact that we have "cherry-picked"--searched for the strongest associations--means that we must set a higher bar for declaring significant the associations that we see. This challenge becomes more important in the era of big data and complex statistical modeling. The cherry tree (dataset) can be very large and the tools for cherry picking (statistical learning methods) are now very sophisticated. We describe some recent new developments in selective inference and illustrate their use in forward stepwise regression, the lasso, and principal components analysis.
Variations on Bayesian Prediction and Inference
2016-05-09
inference 2.2.1 Background There are a number of statistical inference problems that are not generally formulated via a full probability model...problem of inference about an unknown parameter, the Bayesian approach requires a full probability 1. REPORT DATE (DD-MM-YYYY) 4. TITLE AND...the problem of inference about an unknown parameter, the Bayesian approach requires a full probability model/likelihood which can be an obstacle
Frequentist Model Averaging in Structural Equation Modelling.
Jin, Shaobo; Ankargren, Sebastian
2018-06-04
Model selection from a set of candidate models plays an important role in many structural equation modelling applications. However, traditional model selection methods introduce extra randomness that is not accounted for by post-model selection inference. In the current study, we propose a model averaging technique within the frequentist statistical framework. Instead of selecting an optimal model, the contributions of all candidate models are acknowledged. Valid confidence intervals and a [Formula: see text] test statistic are proposed. A simulation study shows that the proposed method is able to produce a robust mean-squared error, a better coverage probability, and a better goodness-of-fit test compared to model selection. It is an interesting compromise between model selection and the full model.
Meng, Xiang-He; Shen, Hui; Chen, Xiang-Ding; Xiao, Hong-Mei; Deng, Hong-Wen
2018-03-01
Genome-wide association studies (GWAS) have successfully identified numerous genetic variants associated with diverse complex phenotypes and diseases, and provided tremendous opportunities for further analyses using summary association statistics. Recently, Pickrell et al. developed a robust method for causal inference using independent putative causal SNPs. However, this method may fail to infer the causal relationship between two phenotypes when only a limited number of independent putative causal SNPs identified. Here, we extended Pickrell's method to make it more applicable for the general situations. We extended the causal inference method by replacing the putative causal SNPs with the lead SNPs (the set of the most significant SNPs in each independent locus) and tested the performance of our extended method using both simulation and empirical data. Simulations suggested that when the same number of genetic variants is used, our extended method had similar distribution of test statistic under the null model as well as comparable power under the causal model compared with the original method by Pickrell et al. But in practice, our extended method would generally be more powerful because the number of independent lead SNPs was often larger than the number of independent putative causal SNPs. And including more SNPs, on the other hand, would not cause more false positives. By applying our extended method to summary statistics from GWAS for blood metabolites and femoral neck bone mineral density (FN-BMD), we successfully identified ten blood metabolites that may causally influence FN-BMD. We extended a causal inference method for inferring putative causal relationship between two phenotypes using summary statistics from GWAS, and identified a number of potential causal metabolites for FN-BMD, which may provide novel insights into the pathophysiological mechanisms underlying osteoporosis.
Statistical inference for remote sensing-based estimates of net deforestation
Ronald E. McRoberts; Brian F. Walters
2012-01-01
Statistical inference requires expression of an estimate in probabilistic terms, usually in the form of a confidence interval. An approach to constructing confidence intervals for remote sensing-based estimates of net deforestation is illustrated. The approach is based on post-classification methods using two independent forest/non-forest classifications because...
Kernel methods and flexible inference for complex stochastic dynamics
NASA Astrophysics Data System (ADS)
Capobianco, Enrico
2008-07-01
Approximation theory suggests that series expansions and projections represent standard tools for random process applications from both numerical and statistical standpoints. Such instruments emphasize the role of both sparsity and smoothness for compression purposes, the decorrelation power achieved in the expansion coefficients space compared to the signal space, and the reproducing kernel property when some special conditions are met. We consider these three aspects central to the discussion in this paper, and attempt to analyze the characteristics of some known approximation instruments employed in a complex application domain such as financial market time series. Volatility models are often built ad hoc, parametrically and through very sophisticated methodologies. But they can hardly deal with stochastic processes with regard to non-Gaussianity, covariance non-stationarity or complex dependence without paying a big price in terms of either model mis-specification or computational efficiency. It is thus a good idea to look at other more flexible inference tools; hence the strategy of combining greedy approximation and space dimensionality reduction techniques, which are less dependent on distributional assumptions and more targeted to achieve computationally efficient performances. Advantages and limitations of their use will be evaluated by looking at algorithmic and model building strategies, and by reporting statistical diagnostics.
Quantification of downscaled precipitation uncertainties via Bayesian inference
NASA Astrophysics Data System (ADS)
Nury, A. H.; Sharma, A.; Marshall, L. A.
2017-12-01
Prediction of precipitation from global climate model (GCM) outputs remains critical to decision-making in water-stressed regions. In this regard, downscaling of GCM output has been a useful tool for analysing future hydro-climatological states. Several downscaling approaches have been developed for precipitation downscaling, including those using dynamical or statistical downscaling methods. Frequently, outputs from dynamical downscaling are not readily transferable across regions for significant methodical and computational difficulties. Statistical downscaling approaches provide a flexible and efficient alternative, providing hydro-climatological outputs across multiple temporal and spatial scales in many locations. However these approaches are subject to significant uncertainty, arising due to uncertainty in the downscaled model parameters and in the use of different reanalysis products for inferring appropriate model parameters. Consequently, these will affect the performance of simulation in catchment scale. This study develops a Bayesian framework for modelling downscaled daily precipitation from GCM outputs. This study aims to introduce uncertainties in downscaling evaluating reanalysis datasets against observational rainfall data over Australia. In this research a consistent technique for quantifying downscaling uncertainties by means of Bayesian downscaling frame work has been proposed. The results suggest that there are differences in downscaled precipitation occurrences and extremes.
Acoustic emissions diagnosis of rotor-stator rubs using the KS statistic
NASA Astrophysics Data System (ADS)
Hall, L. D.; Mba, D.
2004-07-01
Acoustic emission (AE) measurement at the bearings of rotating machinery has become a useful tool for diagnosing incipient fault conditions. In particular, AE can be used to detect unwanted intermittent or partial rubbing between a rotating central shaft and surrounding stationary components. This is a particular problem encountered in turbines used for power generation. For successful fault diagnosis, it is important to adopt AE signal analysis techniques capable of distinguishing between various types of rub mechanisms. It is also useful to develop techniques for inferring information such as the severity of rubbing or the type of seal material making contact on the shaft. It is proposed that modelling the cumulative distribution function of rub-induced AE signals with respect to appropriate theoretical distributions, and quantifying the goodness of fit with the Kolmogorov-Smirnov (KS) statistic, offers a suitable signal feature for diagnosis. This paper demonstrates the successful use of the KS feature for discriminating different classes of shaft-seal rubbing.
Social networks help to infer causality in the tumor microenvironment.
Crespo, Isaac; Doucey, Marie-Agnès; Xenarios, Ioannis
2016-03-15
Networks have become a popular way to conceptualize a system of interacting elements, such as electronic circuits, social communication, metabolism or gene regulation. Network inference, analysis, and modeling techniques have been developed in different areas of science and technology, such as computer science, mathematics, physics, and biology, with an active interdisciplinary exchange of concepts and approaches. However, some concepts seem to belong to a specific field without a clear transferability to other domains. At the same time, it is increasingly recognized that within some biological systems--such as the tumor microenvironment--where different types of resident and infiltrating cells interact to carry out their functions, the complexity of the system demands a theoretical framework, such as statistical inference, graph analysis and dynamical models, in order to asses and study the information derived from high-throughput experimental technologies. In this article we propose to adopt and adapt the concepts of influence and investment from the world of social network analysis to biological problems, and in particular to apply this approach to infer causality in the tumor microenvironment. We showed that constructing a bidirectional network of influence between cell and cell communication molecules allowed us to determine the direction of inferred regulations at the expression level and correctly recapitulate cause-effect relationships described in literature. This work constitutes an example of a transfer of knowledge and concepts from the world of social network analysis to biomedical research, in particular to infer network causality in biological networks. This causality elucidation is essential to model the homeostatic response of biological systems to internal and external factors, such as environmental conditions, pathogens or treatments.
Foster, Katherine T; Beltz, Adriene M
2018-08-01
Ambulatory assessment (AA) methodologies have the potential to increase understanding and treatment of addictive behavior in seemingly unprecedented ways, due in part, to their emphasis on intensive repeated assessments of an individual's addictive behavior in context. But, many analytic techniques traditionally applied to AA data - techniques that average across people and time - do not fully leverage this potential. In an effort to take advantage of the individualized, temporal nature of AA data on addictive behavior, the current paper considers three underutilized person-oriented analytic techniques: multilevel modeling, p-technique, and group iterative multiple model estimation. After reviewing prevailing analytic techniques, each person-oriented technique is presented, AA data specifications are mentioned, an example analysis using generated data is provided, and advantages and limitations are discussed; the paper closes with a brief comparison across techniques. Increasing use of person-oriented techniques will substantially enhance inferences that can be drawn from AA data on addictive behavior and has implications for the development of individualized interventions. Copyright © 2017. Published by Elsevier Ltd.
Statistics and Informatics in Space Astrophysics
NASA Astrophysics Data System (ADS)
Feigelson, E.
2017-12-01
The interest in statistical and computational methodology has seen rapid growth in space-based astrophysics, parallel to the growth seen in Earth remote sensing. There is widespread agreement that scientific interpretation of the cosmic microwave background, discovery of exoplanets, and classifying multiwavelength surveys is too complex to be accomplished with traditional techniques. NASA operates several well-functioning Science Archive Research Centers providing 0.5 PBy datasets to the research community. These databases are integrated with full-text journal articles in the NASA Astrophysics Data System (200K pageviews/day). Data products use interoperable formats and protocols established by the International Virtual Observatory Alliance. NASA supercomputers also support complex astrophysical models of systems such as accretion disks and planet formation. Academic researcher interest in methodology has significantly grown in areas such as Bayesian inference and machine learning, and statistical research is underway to treat problems such as irregularly spaced time series and astrophysical model uncertainties. Several scholarly societies have created interest groups in astrostatistics and astroinformatics. Improvements are needed on several fronts. Community education in advanced methodology is not sufficiently rapid to meet the research needs. Statistical procedures within NASA science analysis software are sometimes not optimal, and pipeline development may not use modern software engineering techniques. NASA offers few grant opportunities supporting research in astroinformatics and astrostatistics.
Exploring the Connection Between Sampling Problems in Bayesian Inference and Statistical Mechanics
NASA Technical Reports Server (NTRS)
Pohorille, Andrew
2006-01-01
The Bayesian and statistical mechanical communities often share the same objective in their work - estimating and integrating probability distribution functions (pdfs) describing stochastic systems, models or processes. Frequently, these pdfs are complex functions of random variables exhibiting multiple, well separated local minima. Conventional strategies for sampling such pdfs are inefficient, sometimes leading to an apparent non-ergodic behavior. Several recently developed techniques for handling this problem have been successfully applied in statistical mechanics. In the multicanonical and Wang-Landau Monte Carlo (MC) methods, the correct pdfs are recovered from uniform sampling of the parameter space by iteratively establishing proper weighting factors connecting these distributions. Trivial generalizations allow for sampling from any chosen pdf. The closely related transition matrix method relies on estimating transition probabilities between different states. All these methods proved to generate estimates of pdfs with high statistical accuracy. In another MC technique, parallel tempering, several random walks, each corresponding to a different value of a parameter (e.g. "temperature"), are generated and occasionally exchanged using the Metropolis criterion. This method can be considered as a statistically correct version of simulated annealing. An alternative approach is to represent the set of independent variables as a Hamiltonian system. Considerab!e progress has been made in understanding how to ensure that the system obeys the equipartition theorem or, equivalently, that coupling between the variables is correctly described. Then a host of techniques developed for dynamical systems can be used. Among them, probably the most powerful is the Adaptive Biasing Force method, in which thermodynamic integration and biased sampling are combined to yield very efficient estimates of pdfs. The third class of methods deals with transitions between states described by rate constants. These problems are isomorphic with chemical kinetics problems. Recently, several efficient techniques for this purpose have been developed based on the approach originally proposed by Gillespie. Although the utility of the techniques mentioned above for Bayesian problems has not been determined, further research along these lines is warranted
NASA Technical Reports Server (NTRS)
Wilson, Robert M.
2007-01-01
Statistical aspects of the North Atlantic basin tropical cyclones for the interval 1945- 2005 are examined, including the variation of the yearly frequency of occurrence for various subgroups of storms (all tropical cyclones, hurricanes, major hurricanes, U.S. landfalling hurricanes, and category 4/5 hurricanes); the yearly variation of the mean latitude and longitude (genesis location) of all tropical cyclones and hurricanes; and the yearly variation of the mean peak wind speeds, lowest pressures, and durations for all tropical cyclones, hurricanes, and major hurricanes. Also examined is the relationship between inferred trends found in the North Atlantic basin tropical cyclonic activity and natural variability and global warming, the latter described using surface air temperatures from the Armagh Observatory Armagh, Northern Ireland. Lastly, a simple statistical technique is employed to ascertain the expected level of North Atlantic basin tropical cyclonic activity for the upcoming 2007 season.
The statistical analysis of circadian phase and amplitude in constant-routine core-temperature data
NASA Technical Reports Server (NTRS)
Brown, E. N.; Czeisler, C. A.
1992-01-01
Accurate estimation of the phases and amplitude of the endogenous circadian pacemaker from constant-routine core-temperature series is crucial for making inferences about the properties of the human biological clock from data collected under this protocol. This paper presents a set of statistical methods based on a harmonic-regression-plus-correlated-noise model for estimating the phases and the amplitude of the endogenous circadian pacemaker from constant-routine core-temperature data. The methods include a Bayesian Monte Carlo procedure for computing the uncertainty in these circadian functions. We illustrate the techniques with a detailed study of a single subject's core-temperature series and describe their relationship to other statistical methods for circadian data analysis. In our laboratory, these methods have been successfully used to analyze more than 300 constant routines and provide a highly reliable means of extracting phase and amplitude information from core-temperature data.
NASA Technical Reports Server (NTRS)
Lee, Mun Wai
2015-01-01
Crew exercise is important during long-duration space flight not only for maintaining health and fitness but also for preventing adverse health problems, such as losses in muscle strength and bone density. Monitoring crew exercise via motion capture and kinematic analysis aids understanding of the effects of microgravity on exercise and helps ensure that exercise prescriptions are effective. Intelligent Automation, Inc., has developed ESPRIT to monitor exercise activities, detect body markers, extract image features, and recover three-dimensional (3D) kinematic body poses. The system relies on prior knowledge and modeling of the human body and on advanced statistical inference techniques to achieve robust and accurate motion capture. In Phase I, the company demonstrated motion capture of several exercises, including walking, curling, and dead lifting. Phase II efforts focused on enhancing algorithms and delivering an ESPRIT prototype for testing and demonstration.
Valeri, Linda; VanderWeele, Tyler J.
2012-01-01
Mediation analysis is a useful and widely employed approach to studies in the field of psychology and in the social and biomedical sciences. The contributions of this paper are several-fold. First we seek to bring the developments in mediation analysis for non linear models within the counterfactual framework to the psychology audience in an accessible format and compare the sorts of inferences about mediation that are possible in the presence of exposure-mediator interaction when using a counterfactual versus the standard statistical approach. Second, the work by VanderWeele and Vansteelandt (2009, 2010) is extended here to allow for dichotomous mediators and count outcomes. Third, we provide SAS and SPSS macros to implement all of these mediation analysis techniques automatically and we compare the types of inferences about mediation that are allowed by a variety of software macros. PMID:23379553
Valeri, Linda; Vanderweele, Tyler J
2013-06-01
Mediation analysis is a useful and widely employed approach to studies in the field of psychology and in the social and biomedical sciences. The contributions of this article are several-fold. First we seek to bring the developments in mediation analysis for nonlinear models within the counterfactual framework to the psychology audience in an accessible format and compare the sorts of inferences about mediation that are possible in the presence of exposure-mediator interaction when using a counterfactual versus the standard statistical approach. Second, the work by VanderWeele and Vansteelandt (2009, 2010) is extended here to allow for dichotomous mediators and count outcomes. Third, we provide SAS and SPSS macros to implement all of these mediation analysis techniques automatically, and we compare the types of inferences about mediation that are allowed by a variety of software macros. (PsycINFO Database Record (c) 2013 APA, all rights reserved).
2dFLenS and KiDS: determining source redshift distributions with cross-correlations
NASA Astrophysics Data System (ADS)
Johnson, Andrew; Blake, Chris; Amon, Alexandra; Erben, Thomas; Glazebrook, Karl; Harnois-Deraps, Joachim; Heymans, Catherine; Hildebrandt, Hendrik; Joudaki, Shahab; Klaes, Dominik; Kuijken, Konrad; Lidman, Chris; Marin, Felipe A.; McFarland, John; Morrison, Christopher B.; Parkinson, David; Poole, Gregory B.; Radovich, Mario; Wolf, Christian
2017-03-01
We develop a statistical estimator to infer the redshift probability distribution of a photometric sample of galaxies from its angular cross-correlation in redshift bins with an overlapping spectroscopic sample. This estimator is a minimum-variance weighted quadratic function of the data: a quadratic estimator. This extends and modifies the methodology presented by McQuinn & White. The derived source redshift distribution is degenerate with the source galaxy bias, which must be constrained via additional assumptions. We apply this estimator to constrain source galaxy redshift distributions in the Kilo-Degree imaging survey through cross-correlation with the spectroscopic 2-degree Field Lensing Survey, presenting results first as a binned step-wise distribution in the range z < 0.8, and then building a continuous distribution using a Gaussian process model. We demonstrate the robustness of our methodology using mock catalogues constructed from N-body simulations, and comparisons with other techniques for inferring the redshift distribution.
Thermodynamics of statistical inference by cells.
Lang, Alex H; Fisher, Charles K; Mora, Thierry; Mehta, Pankaj
2014-10-03
The deep connection between thermodynamics, computation, and information is now well established both theoretically and experimentally. Here, we extend these ideas to show that thermodynamics also places fundamental constraints on statistical estimation and learning. To do so, we investigate the constraints placed by (nonequilibrium) thermodynamics on the ability of biochemical signaling networks to estimate the concentration of an external signal. We show that accuracy is limited by energy consumption, suggesting that there are fundamental thermodynamic constraints on statistical inference.
NASA Astrophysics Data System (ADS)
El-Sebakhy, Emad A.
2009-09-01
Pressure-volume-temperature properties are very important in the reservoir engineering computations. There are many empirical approaches for predicting various PVT properties based on empirical correlations and statistical regression models. Last decade, researchers utilized neural networks to develop more accurate PVT correlations. These achievements of neural networks open the door to data mining techniques to play a major role in oil and gas industry. Unfortunately, the developed neural networks correlations are often limited, and global correlations are usually less accurate compared to local correlations. Recently, adaptive neuro-fuzzy inference systems have been proposed as a new intelligence framework for both prediction and classification based on fuzzy clustering optimization criterion and ranking. This paper proposes neuro-fuzzy inference systems for estimating PVT properties of crude oil systems. This new framework is an efficient hybrid intelligence machine learning scheme for modeling the kind of uncertainty associated with vagueness and imprecision. We briefly describe the learning steps and the use of the Takagi Sugeno and Kang model and Gustafson-Kessel clustering algorithm with K-detected clusters from the given database. It has featured in a wide range of medical, power control system, and business journals, often with promising results. A comparative study will be carried out to compare their performance of this new framework with the most popular modeling techniques, such as neural networks, nonlinear regression, and the empirical correlations algorithms. The results show that the performance of neuro-fuzzy systems is accurate, reliable, and outperform most of the existing forecasting techniques. Future work can be achieved by using neuro-fuzzy systems for clustering the 3D seismic data, identification of lithofacies types, and other reservoir characterization.
ERIC Educational Resources Information Center
Noll, Jennifer; Hancock, Stacey
2015-01-01
This research investigates what students' use of statistical language can tell us about their conceptions of distribution and sampling in relation to informal inference. Prior research documents students' challenges in understanding ideas of distribution and sampling as tools for making informal statistical inferences. We know that these…
Multiple hot-deck imputation for network inference from RNA sequencing data.
Imbert, Alyssa; Valsesia, Armand; Le Gall, Caroline; Armenise, Claudia; Lefebvre, Gregory; Gourraud, Pierre-Antoine; Viguerie, Nathalie; Villa-Vialaneix, Nathalie
2018-05-15
Network inference provides a global view of the relations existing between gene expression in a given transcriptomic experiment (often only for a restricted list of chosen genes). However, it is still a challenging problem: even if the cost of sequencing techniques has decreased over the last years, the number of samples in a given experiment is still (very) small compared to the number of genes. We propose a method to increase the reliability of the inference when RNA-seq expression data have been measured together with an auxiliary dataset that can provide external information on gene expression similarity between samples. Our statistical approach, hd-MI, is based on imputation for samples without available RNA-seq data that are considered as missing data but are observed on the secondary dataset. hd-MI can improve the reliability of the inference for missing rates up to 30% and provides more stable networks with a smaller number of false positive edges. On a biological point of view, hd-MI was also found relevant to infer networks from RNA-seq data acquired in adipose tissue during a nutritional intervention in obese individuals. In these networks, novel links between genes were highlighted, as well as an improved comparability between the two steps of the nutritional intervention. Software and sample data are available as an R package, RNAseqNet, that can be downloaded from the Comprehensive R Archive Network (CRAN). alyssa.imbert@inra.fr or nathalie.villa-vialaneix@inra.fr. Supplementary data are available at Bioinformatics online.
Multi-Agent Inference in Social Networks: A Finite Population Learning Approach
Tong, Xin; Zeng, Yao
2016-01-01
When people in a society want to make inference about some parameter, each person may want to use data collected by other people. Information (data) exchange in social networks is usually costly, so to make reliable statistical decisions, people need to trade off the benefits and costs of information acquisition. Conflicts of interests and coordination problems will arise in the process. Classical statistics does not consider people’s incentives and interactions in the data collection process. To address this imperfection, this work explores multi-agent Bayesian inference problems with a game theoretic social network model. Motivated by our interest in aggregate inference at the societal level, we propose a new concept, finite population learning, to address whether with high probability, a large fraction of people in a given finite population network can make “good” inference. Serving as a foundation, this concept enables us to study the long run trend of aggregate inference quality as population grows. PMID:27076691
Ganju, Jitendra; Yu, Xinxin; Ma, Guoguang Julie
2013-01-01
Formal inference in randomized clinical trials is based on controlling the type I error rate associated with a single pre-specified statistic. The deficiency of using just one method of analysis is that it depends on assumptions that may not be met. For robust inference, we propose pre-specifying multiple test statistics and relying on the minimum p-value for testing the null hypothesis of no treatment effect. The null hypothesis associated with the various test statistics is that the treatment groups are indistinguishable. The critical value for hypothesis testing comes from permutation distributions. Rejection of the null hypothesis when the smallest p-value is less than the critical value controls the type I error rate at its designated value. Even if one of the candidate test statistics has low power, the adverse effect on the power of the minimum p-value statistic is not much. Its use is illustrated with examples. We conclude that it is better to rely on the minimum p-value rather than a single statistic particularly when that single statistic is the logrank test, because of the cost and complexity of many survival trials. Copyright © 2013 John Wiley & Sons, Ltd.
Ensemble stacking mitigates biases in inference of synaptic connectivity.
Chambers, Brendan; Levy, Maayan; Dechery, Joseph B; MacLean, Jason N
2018-01-01
A promising alternative to directly measuring the anatomical connections in a neuronal population is inferring the connections from the activity. We employ simulated spiking neuronal networks to compare and contrast commonly used inference methods that identify likely excitatory synaptic connections using statistical regularities in spike timing. We find that simple adjustments to standard algorithms improve inference accuracy: A signing procedure improves the power of unsigned mutual-information-based approaches and a correction that accounts for differences in mean and variance of background timing relationships, such as those expected to be induced by heterogeneous firing rates, increases the sensitivity of frequency-based methods. We also find that different inference methods reveal distinct subsets of the synaptic network and each method exhibits different biases in the accurate detection of reciprocity and local clustering. To correct for errors and biases specific to single inference algorithms, we combine methods into an ensemble. Ensemble predictions, generated as a linear combination of multiple inference algorithms, are more sensitive than the best individual measures alone, and are more faithful to ground-truth statistics of connectivity, mitigating biases specific to single inference methods. These weightings generalize across simulated datasets, emphasizing the potential for the broad utility of ensemble-based approaches.
Statistical atlas based extrapolation of CT data
NASA Astrophysics Data System (ADS)
Chintalapani, Gouthami; Murphy, Ryan; Armiger, Robert S.; Lepisto, Jyri; Otake, Yoshito; Sugano, Nobuhiko; Taylor, Russell H.; Armand, Mehran
2010-02-01
We present a framework to estimate the missing anatomical details from a partial CT scan with the help of statistical shape models. The motivating application is periacetabular osteotomy (PAO), a technique for treating developmental hip dysplasia, an abnormal condition of the hip socket that, if untreated, may lead to osteoarthritis. The common goals of PAO are to reduce pain, joint subluxation and improve contact pressure distribution by increasing the coverage of the femoral head by the hip socket. While current diagnosis and planning is based on radiological measurements, because of significant structural variations in dysplastic hips, a computer-assisted geometrical and biomechanical planning based on CT data is desirable to help the surgeon achieve optimal joint realignments. Most of the patients undergoing PAO are young females, hence it is usually desirable to minimize the radiation dose by scanning only the joint portion of the hip anatomy. These partial scans, however, do not provide enough information for biomechanical analysis due to missing iliac region. A statistical shape model of full pelvis anatomy is constructed from a database of CT scans. The partial volume is first aligned with the statistical atlas using an iterative affine registration, followed by a deformable registration step and the missing information is inferred from the atlas. The atlas inferences are further enhanced by the use of X-ray images of the patient, which are very common in an osteotomy procedure. The proposed method is validated with a leave-one-out analysis method. Osteotomy cuts are simulated and the effect of atlas predicted models on the actual procedure is evaluated.
Bayesian inference of physiologically meaningful parameters from body sway measurements.
Tietäväinen, A; Gutmann, M U; Keski-Vakkuri, E; Corander, J; Hæggström, E
2017-06-19
The control of the human body sway by the central nervous system, muscles, and conscious brain is of interest since body sway carries information about the physiological status of a person. Several models have been proposed to describe body sway in an upright standing position, however, due to the statistical intractability of the more realistic models, no formal parameter inference has previously been conducted and the expressive power of such models for real human subjects remains unknown. Using the latest advances in Bayesian statistical inference for intractable models, we fitted a nonlinear control model to posturographic measurements, and we showed that it can accurately predict the sway characteristics of both simulated and real subjects. Our method provides a full statistical characterization of the uncertainty related to all model parameters as quantified by posterior probability density functions, which is useful for comparisons across subjects and test settings. The ability to infer intractable control models from sensor data opens new possibilities for monitoring and predicting body status in health applications.
Automatic evaluation of skin histopathological images for melanocytic features
NASA Astrophysics Data System (ADS)
Koosha, Mohaddeseh; Hoseini Alinodehi, S. Pourya; Nicolescu, Mircea; Safaei Naraghi, Zahra
2017-03-01
Successfully detecting melanocyte cells in the skin epidermis has great significance in skin histopathology. Because of the existence of cells with similar appearance to melanocytes in hematoxylin and eosin (HE) images of the epidermis, detecting melanocytes becomes a challenging task. This paper proposes a novel technique for the detection of melanocytes in HE images of the epidermis, based on the melanocyte color features, in the HSI color domain. Initially, an effective soft morphological filter is applied to the HE images in the HSI color domain to remove noise. Then a novel threshold-based technique is applied to distinguish the candidate melanocytes' nuclei. Similarly, the method is applied to find the candidate surrounding halos of the melanocytes. The candidate nuclei are associated with their surrounding halos using the suggested logical and statistical inferences. Finally, a fuzzy inference system is proposed, based on the HSI color information of a typical melanocyte in the epidermis, to calculate the similarity ratio of each candidate cell to a melanocyte. As our review on the literature shows, this is the first method evaluating epidermis cells for melanocyte similarity ratio. Experimental results on various images with different zooming factors show that the proposed method improves the results of previous works.
2018-01-01
This paper measures the adhesion/cohesion force among asphalt molecules at nanoscale level using an Atomic Force Microscopy (AFM) and models the moisture damage by applying state-of-the-art Computational Intelligence (CI) techniques (e.g., artificial neural network (ANN), support vector regression (SVR), and an Adaptive Neuro Fuzzy Inference System (ANFIS)). Various combinations of lime and chemicals as well as dry and wet environments are used to produce different asphalt samples. The parameters that were varied to generate different asphalt samples and measure the corresponding adhesion/cohesion forces are percentage of antistripping agents (e.g., Lime and Unichem), AFM tips K values, and AFM tip types. The CI methods are trained to model the adhesion/cohesion forces given the variation in values of the above parameters. To achieve enhanced performance, the statistical methods such as average, weighted average, and regression of the outputs generated by the CI techniques are used. The experimental results show that, of the three individual CI methods, ANN can model moisture damage to lime- and chemically modified asphalt better than the other two CI techniques for both wet and dry conditions. Moreover, the ensemble of CI along with statistical measurement provides better accuracy than any of the individual CI techniques. PMID:29849551
Bayesian inference for the spatio-temporal invasion of alien species.
Cook, Alex; Marion, Glenn; Butler, Adam; Gibson, Gavin
2007-08-01
In this paper we develop a Bayesian approach to parameter estimation in a stochastic spatio-temporal model of the spread of invasive species across a landscape. To date, statistical techniques, such as logistic and autologistic regression, have outstripped stochastic spatio-temporal models in their ability to handle large numbers of covariates. Here we seek to address this problem by making use of a range of covariates describing the bio-geographical features of the landscape. Relative to regression techniques, stochastic spatio-temporal models are more transparent in their representation of biological processes. They also explicitly model temporal change, and therefore do not require the assumption that the species' distribution (or other spatial pattern) has already reached equilibrium as is often the case with standard statistical approaches. In order to illustrate the use of such techniques we apply them to the analysis of data detailing the spread of an invasive plant, Heracleum mantegazzianum, across Britain in the 20th Century using geo-referenced covariate information describing local temperature, elevation and habitat type. The use of Markov chain Monte Carlo sampling within a Bayesian framework facilitates statistical assessments of differences in the suitability of different habitat classes for H. mantegazzianum, and enables predictions of future spread to account for parametric uncertainty and system variability. Our results show that ignoring such covariate information may lead to biased estimates of key processes and implausible predictions of future distributions.
Research participant compensation: A matter of statistical inference as well as ethics.
Swanson, David M; Betensky, Rebecca A
2015-11-01
The ethics of compensation of research subjects for participation in clinical trials has been debated for years. One ethical issue of concern is variation among subjects in the level of compensation for identical treatments. Surprisingly, the impact of variation on the statistical inferences made from trial results has not been examined. We seek to identify how variation in compensation may influence any existing dependent censoring in clinical trials, thereby also influencing inference about the survival curve, hazard ratio, or other measures of treatment efficacy. In simulation studies, we consider a model for how compensation structure may influence the censoring model. Under existing dependent censoring, we estimate survival curves under different compensation structures and observe how these structures induce variability in the estimates. We show through this model that if the compensation structure affects the censoring model and dependent censoring is present, then variation in that structure induces variation in the estimates and affects the accuracy of estimation and inference on treatment efficacy. From the perspectives of both ethics and statistical inference, standardization and transparency in the compensation of participants in clinical trials are warranted. Copyright © 2015 Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Kazak, Sibel; Pratt, Dave
2017-01-01
This study considers probability models as tools for both making informal statistical inferences and building stronger conceptual connections between data and chance topics in teaching statistics. In this paper, we aim to explore pre-service mathematics teachers' use of probability models for a chance game, where the sum of two dice matters in…
Fujioka, Kouki; Shimizu, Nobuo; Manome, Yoshinobu; Ikeda, Keiichi; Yamamoto, Kenji; Tomizawa, Yasuko
2013-01-01
Electronic noses have the benefit of obtaining smell information in a simple and objective manner, therefore, many applications have been developed for broad analysis areas such as food, drinks, cosmetics, medicine, and agriculture. However, measurement values from electronic noses have a tendency to vary under humidity or alcohol exposure conditions, since several types of sensors in the devices are affected by such variables. Consequently, we show three techniques for reducing the variation of sensor values: (1) using a trapping system to reduce the infering components; (2) performing statistical standardization (calculation of z-score); and (3) selecting suitable sensors. With these techniques, we discriminated the volatiles of four types of fresh mushrooms: golden needle (Flammulina velutipes), white mushroom (Agaricus bisporus), shiitake (Lentinus edodes), and eryngii (Pleurotus eryngii) among six fresh mushrooms (hen of the woods (Grifola frondosa), shimeji (Hypsizygus marmoreus) plus the above mushrooms). Additionally, we succeeded in discrimination of white mushroom, only comparing with artificial mushroom flavors, such as champignon flavor and truffle flavor. In conclusion, our techniques will expand the options to reduce variations in sensor values. PMID:24233028
The Role of Probability-Based Inference in an Intelligent Tutoring System.
ERIC Educational Resources Information Center
Mislevy, Robert J.; Gitomer, Drew H.
Probability-based inference in complex networks of interdependent variables is an active topic in statistical research, spurred by such diverse applications as forecasting, pedigree analysis, troubleshooting, and medical diagnosis. This paper concerns the role of Bayesian inference networks for updating student models in intelligent tutoring…
NASA Astrophysics Data System (ADS)
Albert, Carlo; Ulzega, Simone; Stoop, Ruedi
2016-04-01
Measured time-series of both precipitation and runoff are known to exhibit highly non-trivial statistical properties. For making reliable probabilistic predictions in hydrology, it is therefore desirable to have stochastic models with output distributions that share these properties. When parameters of such models have to be inferred from data, we also need to quantify the associated parametric uncertainty. For non-trivial stochastic models, however, this latter step is typically very demanding, both conceptually and numerically, and always never done in hydrology. Here, we demonstrate that methods developed in statistical physics make a large class of stochastic differential equation (SDE) models amenable to a full-fledged Bayesian parameter inference. For concreteness we demonstrate these methods by means of a simple yet non-trivial toy SDE model. We consider a natural catchment that can be described by a linear reservoir, at the scale of observation. All the neglected processes are assumed to happen at much shorter time-scales and are therefore modeled with a Gaussian white noise term, the standard deviation of which is assumed to scale linearly with the system state (water volume in the catchment). Even for constant input, the outputs of this simple non-linear SDE model show a wealth of desirable statistical properties, such as fat-tailed distributions and long-range correlations. Standard algorithms for Bayesian inference fail, for models of this kind, because their likelihood functions are extremely high-dimensional intractable integrals over all possible model realizations. The use of Kalman filters is illegitimate due to the non-linearity of the model. Particle filters could be used but become increasingly inefficient with growing number of data points. Hamiltonian Monte Carlo algorithms allow us to translate this inference problem to the problem of simulating the dynamics of a statistical mechanics system and give us access to most sophisticated methods that have been developed in the statistical physics community over the last few decades. We demonstrate that such methods, along with automated differentiation algorithms, allow us to perform a full-fledged Bayesian inference, for a large class of SDE models, in a highly efficient and largely automatized manner. Furthermore, our algorithm is highly parallelizable. For our toy model, discretized with a few hundred points, a full Bayesian inference can be performed in a matter of seconds on a standard PC.
NASA Astrophysics Data System (ADS)
Bakker, Arthur; Ben-Zvi, Dani; Makar, Katie
2017-12-01
To understand how statistical and other types of reasoning are coordinated with actions to reduce uncertainty, we conducted a case study in vocational education that involved statistical hypothesis testing. We analyzed an intern's research project in a hospital laboratory in which reducing uncertainties was crucial to make a valid statistical inference. In his project, the intern, Sam, investigated whether patients' blood could be sent through pneumatic post without influencing the measurement of particular blood components. We asked, in the process of making a statistical inference, how are reasons and actions coordinated to reduce uncertainty? For the analysis, we used the semantic theory of inferentialism, specifically, the concept of webs of reasons and actions—complexes of interconnected reasons for facts and actions; these reasons include premises and conclusions, inferential relations, implications, motives for action, and utility of tools for specific purposes in a particular context. Analysis of interviews with Sam, his supervisor and teacher as well as video data of Sam in the classroom showed that many of Sam's actions aimed to reduce variability, rule out errors, and thus reduce uncertainties so as to arrive at a valid inference. Interestingly, the decisive factor was not the outcome of a t test but of the reference change value, a clinical chemical measure of analytic and biological variability. With insights from this case study, we expect that students can be better supported in connecting statistics with context and in dealing with uncertainty.
Cornuet, Jean-Marie; Santos, Filipe; Beaumont, Mark A; Robert, Christian P; Marin, Jean-Michel; Balding, David J; Guillemaud, Thomas; Estoup, Arnaud
2008-12-01
Genetic data obtained on population samples convey information about their evolutionary history. Inference methods can extract part of this information but they require sophisticated statistical techniques that have been made available to the biologist community (through computer programs) only for simple and standard situations typically involving a small number of samples. We propose here a computer program (DIY ABC) for inference based on approximate Bayesian computation (ABC), in which scenarios can be customized by the user to fit many complex situations involving any number of populations and samples. Such scenarios involve any combination of population divergences, admixtures and population size changes. DIY ABC can be used to compare competing scenarios, estimate parameters for one or more scenarios and compute bias and precision measures for a given scenario and known values of parameters (the current version applies to unlinked microsatellite data). This article describes key methods used in the program and provides its main features. The analysis of one simulated and one real dataset, both with complex evolutionary scenarios, illustrates the main possibilities of DIY ABC. The software DIY ABC is freely available at http://www.montpellier.inra.fr/CBGP/diyabc.
Grover, Sandeep; Del Greco M, Fabiola; Stein, Catherine M; Ziegler, Andreas
2017-01-01
Confounding and reverse causality have prevented us from drawing meaningful clinical interpretation even in well-powered observational studies. Confounding may be attributed to our inability to randomize the exposure variable in observational studies. Mendelian randomization (MR) is one approach to overcome confounding. It utilizes one or more genetic polymorphisms as a proxy for the exposure variable of interest. Polymorphisms are randomly distributed in a population, they are static throughout an individual's lifetime, and may thus help in inferring directionality in exposure-outcome associations. Genome-wide association studies (GWAS) or meta-analyses of GWAS are characterized by large sample sizes and the availability of many single nucleotide polymorphisms (SNPs), making GWAS-based MR an attractive approach. GWAS-based MR comes with specific challenges, including multiple causality. Despite shortcomings, it still remains one of the most powerful techniques for inferring causality.With MR still an evolving concept with complex statistical challenges, the literature is relatively scarce in terms of providing working examples incorporating real datasets. In this chapter, we provide a step-by-step guide for causal inference based on the principles of MR with a real dataset using both individual and summary data from unrelated individuals. We suggest best possible practices and give recommendations based on the current literature.
Bayesian Inference for Generalized Linear Models for Spiking Neurons
Gerwinn, Sebastian; Macke, Jakob H.; Bethge, Matthias
2010-01-01
Generalized Linear Models (GLMs) are commonly used statistical methods for modelling the relationship between neural population activity and presented stimuli. When the dimension of the parameter space is large, strong regularization has to be used in order to fit GLMs to datasets of realistic size without overfitting. By imposing properly chosen priors over parameters, Bayesian inference provides an effective and principled approach for achieving regularization. Here we show how the posterior distribution over model parameters of GLMs can be approximated by a Gaussian using the Expectation Propagation algorithm. In this way, we obtain an estimate of the posterior mean and posterior covariance, allowing us to calculate Bayesian confidence intervals that characterize the uncertainty about the optimal solution. From the posterior we also obtain a different point estimate, namely the posterior mean as opposed to the commonly used maximum a posteriori estimate. We systematically compare the different inference techniques on simulated as well as on multi-electrode recordings of retinal ganglion cells, and explore the effects of the chosen prior and the performance measure used. We find that good performance can be achieved by choosing an Laplace prior together with the posterior mean estimate. PMID:20577627
PyClone: statistical inference of clonal population structure in cancer.
Roth, Andrew; Khattra, Jaswinder; Yap, Damian; Wan, Adrian; Laks, Emma; Biele, Justina; Ha, Gavin; Aparicio, Samuel; Bouchard-Côté, Alexandre; Shah, Sohrab P
2014-04-01
We introduce PyClone, a statistical model for inference of clonal population structures in cancers. PyClone is a Bayesian clustering method for grouping sets of deeply sequenced somatic mutations into putative clonal clusters while estimating their cellular prevalences and accounting for allelic imbalances introduced by segmental copy-number changes and normal-cell contamination. Single-cell sequencing validation demonstrates PyClone's accuracy.
Statistical Signal Models and Algorithms for Image Analysis
1984-10-25
In this report, two-dimensional stochastic linear models are used in developing algorithms for image analysis such as classification, segmentation, and object detection in images characterized by textured backgrounds. These models generate two-dimensional random processes as outputs to which statistical inference procedures can naturally be applied. A common thread throughout our algorithms is the interpretation of the inference procedures in terms of linear prediction
Rule groupings in expert systems using nearest neighbour decision rules, and convex hulls
NASA Technical Reports Server (NTRS)
Anastasiadis, Stergios
1991-01-01
Expert System shells are lacking in many areas of software engineering. Large rule based systems are not semantically comprehensible, difficult to debug, and impossible to modify or validate. Partitioning a set of rules found in CLIPS (C Language Integrated Production System) into groups of rules which reflect the underlying semantic subdomains of the problem, will address adequately the concerns stated above. Techniques are introduced to structure a CLIPS rule base into groups of rules that inherently have common semantic information. The concepts involved are imported from the field of A.I., Pattern Recognition, and Statistical Inference. Techniques focus on the areas of feature selection, classification, and a criteria of how 'good' the classification technique is, based on Bayesian Decision Theory. A variety of distance metrics are discussed for measuring the 'closeness' of CLIPS rules and various Nearest Neighbor classification algorithms are described based on the above metric.
NASA Technical Reports Server (NTRS)
Bemra, R. S.; Rastogi, P. K.; Balsley, B. B.
1986-01-01
An analysis of frequency spectra at periods of about 5 days to 5 min from two 20-day sets of velocity measurements in the stratosphere and troposphere region obtained with the Poker Flat mesosphere-stratosphere-troposphere (MST) radar during January and June, 1984 is presented. A technique based on median filtering and averaged order statistics for automatic editing, smoothing and spectral analysis of velocity time series contaminated with spurious data points or outliers is outlined. The validity of this technique and its effects on the inferred spectral index was tested through simulation. Spectra obtained with this technique are discussed. The measured spectral indices show variability with season and height, especially across the tropopause. The discussion briefly outlines the need for obtaining better climatologies of velocity spectra and for the refinements of the existing theories to explain their behavior.
Adaption from LWIR to visible wavebands of methods to describe the population of GEO belt debris
NASA Astrophysics Data System (ADS)
Meng, Kevin; Murray-Krezan, Jeremy; Seitzer, Patrick
2018-05-01
Prior efforts to characterize the number of GEO belt debris objects by statistically analyzing the distribution of debris as a function of size have relied on techniques unique to infrared measurements of the debris. Specifically the infrared measurement techniques permitted inference of the characteristic size of the debris. This report describes a method to adapt the previous techniques and measurements to visible wavebands. Results will be presented using data from a NASA optical, visible band survey of objects near the geosynchronous orbit, GEO belt. This survey used the University of Michigan's 0.6-m Curtis-Schmidt telescope, Michigan Orbital DEbris Survey Telescope (MODEST), located at Cerro Tololo Inter-American Observatory in Chile. The system is equipped with a scanning CCD with a field of view of 1.6°×1.6°, and can detect objects smaller than 20 cm diameter at GEO.
Nabi, Razieh; Shpitser, Ilya
2017-01-01
In this paper, we consider the problem of fair statistical inference involving outcome variables. Examples include classification and regression problems, and estimating treatment effects in randomized trials or observational data. The issue of fairness arises in such problems where some covariates or treatments are “sensitive,” in the sense of having potential of creating discrimination. In this paper, we argue that the presence of discrimination can be formalized in a sensible way as the presence of an effect of a sensitive covariate on the outcome along certain causal pathways, a view which generalizes (Pearl 2009). A fair outcome model can then be learned by solving a constrained optimization problem. We discuss a number of complications that arise in classical statistical inference due to this view and provide workarounds based on recent work in causal and semi-parametric inference.
P values are only an index to evidence: 20th- vs. 21st-century statistical science.
Burnham, K P; Anderson, D R
2014-03-01
Early statistical methods focused on pre-data probability statements (i.e., data as random variables) such as P values; these are not really inferences nor are P values evidential. Statistical science clung to these principles throughout much of the 20th century as a wide variety of methods were developed for special cases. Looking back, it is clear that the underlying paradigm (i.e., testing and P values) was weak. As Kuhn (1970) suggests, new paradigms have taken the place of earlier ones: this is a goal of good science. New methods have been developed and older methods extended and these allow proper measures of strength of evidence and multimodel inference. It is time to move forward with sound theory and practice for the difficult practical problems that lie ahead. Given data the useful foundation shifts to post-data probability statements such as model probabilities (Akaike weights) or related quantities such as odds ratios and likelihood intervals. These new methods allow formal inference from multiple models in the a prior set. These quantities are properly evidential. The past century was aimed at finding the "best" model and making inferences from it. The goal in the 21st century is to base inference on all the models weighted by their model probabilities (model averaging). Estimates of precision can include model selection uncertainty leading to variances conditional on the model set. The 21st century will be about the quantification of information, proper measures of evidence, and multi-model inference. Nelder (1999:261) concludes, "The most important task before us in developing statistical science is to demolish the P-value culture, which has taken root to a frightening extent in many areas of both pure and applied science and technology".
Statistical inference of the generation probability of T-cell receptors from sequence repertoires.
Murugan, Anand; Mora, Thierry; Walczak, Aleksandra M; Callan, Curtis G
2012-10-02
Stochastic rearrangement of germline V-, D-, and J-genes to create variable coding sequence for certain cell surface receptors is at the origin of immune system diversity. This process, known as "VDJ recombination", is implemented via a series of stochastic molecular events involving gene choices and random nucleotide insertions between, and deletions from, genes. We use large sequence repertoires of the variable CDR3 region of human CD4+ T-cell receptor beta chains to infer the statistical properties of these basic biochemical events. Because any given CDR3 sequence can be produced in multiple ways, the probability distribution of hidden recombination events cannot be inferred directly from the observed sequences; we therefore develop a maximum likelihood inference method to achieve this end. To separate the properties of the molecular rearrangement mechanism from the effects of selection, we focus on nonproductive CDR3 sequences in T-cell DNA. We infer the joint distribution of the various generative events that occur when a new T-cell receptor gene is created. We find a rich picture of correlation (and absence thereof), providing insight into the molecular mechanisms involved. The generative event statistics are consistent between individuals, suggesting a universal biochemical process. Our probabilistic model predicts the generation probability of any specific CDR3 sequence by the primitive recombination process, allowing us to quantify the potential diversity of the T-cell repertoire and to understand why some sequences are shared between individuals. We argue that the use of formal statistical inference methods, of the kind presented in this paper, will be essential for quantitative understanding of the generation and evolution of diversity in the adaptive immune system.
Emmert-Streib, Frank; Glazko, Galina V.; Altay, Gökmen; de Matos Simoes, Ricardo
2012-01-01
In this paper, we present a systematic and conceptual overview of methods for inferring gene regulatory networks from observational gene expression data. Further, we discuss two classic approaches to infer causal structures and compare them with contemporary methods by providing a conceptual categorization thereof. We complement the above by surveying global and local evaluation measures for assessing the performance of inference algorithms. PMID:22408642
NASA Astrophysics Data System (ADS)
Alexandridis, Konstantinos T.
This dissertation adopts a holistic and detailed approach to modeling spatially explicit agent-based artificial intelligent systems, using the Multi Agent-based Behavioral Economic Landscape (MABEL) model. The research questions that addresses stem from the need to understand and analyze the real-world patterns and dynamics of land use change from a coupled human-environmental systems perspective. Describes the systemic, mathematical, statistical, socio-economic and spatial dynamics of the MABEL modeling framework, and provides a wide array of cross-disciplinary modeling applications within the research, decision-making and policy domains. Establishes the symbolic properties of the MABEL model as a Markov decision process, analyzes the decision-theoretic utility and optimization attributes of agents towards comprising statistically and spatially optimal policies and actions, and explores the probabilogic character of the agents' decision-making and inference mechanisms via the use of Bayesian belief and decision networks. Develops and describes a Monte Carlo methodology for experimental replications of agent's decisions regarding complex spatial parcel acquisition and learning. Recognizes the gap on spatially-explicit accuracy assessment techniques for complex spatial models, and proposes an ensemble of statistical tools designed to address this problem. Advanced information assessment techniques such as the Receiver-Operator Characteristic curve, the impurity entropy and Gini functions, and the Bayesian classification functions are proposed. The theoretical foundation for modular Bayesian inference in spatially-explicit multi-agent artificial intelligent systems, and the ensembles of cognitive and scenario assessment modular tools build for the MABEL model are provided. Emphasizes the modularity and robustness as valuable qualitative modeling attributes, and examines the role of robust intelligent modeling as a tool for improving policy-decisions related to land use change. Finally, the major contributions to the science are presented along with valuable directions for future research.
Inferring gene regression networks with model trees
2010-01-01
Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database) is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear regressions to separate areas of the search space favoring to infer localized similarities over a more global similarity. Furthermore, experimental results show the good performance of REGNET. PMID:20950452
NASA Astrophysics Data System (ADS)
Weigt, Martin
Over the last years, biological research has been revolutionized by experimental high-throughput techniques, in particular by next-generation sequencing technology. Unprecedented amounts of data are accumulating, and there is a growing request for computational methods unveiling the information hidden in raw data, thereby increasing our understanding of complex biological systems. Statistical-physics models based on the maximum-entropy principle have, in the last few years, played an important role in this context. To give a specific example, proteins and many non-coding RNA show a remarkable degree of structural and functional conservation in the course of evolution, despite a large variability in amino acid sequences. We have developed a statistical-mechanics inspired inference approach - called Direct-Coupling Analysis - to link this sequence variability (easy to observe in sequence alignments, which are available in public sequence databases) to bio-molecular structure and function. In my presentation I will show, how this methodology can be used (i) to infer contacts between residues and thus to guide tertiary and quaternary protein structure prediction and RNA structure prediction, (ii) to discriminate interacting from non-interacting protein families, and thus to infer conserved protein-protein interaction networks, and (iii) to reconstruct mutational landscapes and thus to predict the phenotypic effect of mutations. References [1] M. Figliuzzi, H. Jacquier, A. Schug, O. Tenaillon and M. Weigt ''Coevolutionary landscape inference and the context-dependence of mutations in beta-lactamase TEM-1'', Mol. Biol. Evol. (2015), doi: 10.1093/molbev/msv211 [2] E. De Leonardis, B. Lutz, S. Ratz, S. Cocco, R. Monasson, A. Schug, M. Weigt ''Direct-Coupling Analysis of nucleotide coevolution facilitates RNA secondary and tertiary structure prediction'', Nucleic Acids Research (2015), doi: 10.1093/nar/gkv932 [3] F. Morcos, A. Pagnani, B. Lunt, A. Bertolino, D. Marks, C. Sander, R. Zecchina, J.N. Onuchic, T. Hwa, M. Weigt, ''Direct-coupling analysis of residue co-evolution captures native contacts across many protein families'', Proc. Natl. Acad. Sci. 108, E1293-E1301 (2011).
Detection of Failure in Asynchronous Motor Using Soft Computing Method
NASA Astrophysics Data System (ADS)
Vinoth Kumar, K.; Sony, Kevin; Achenkunju John, Alan; Kuriakose, Anto; John, Ano P.
2018-04-01
This paper investigates the stator short winding failure of asynchronous motor also their effects on motor current spectrums. A fuzzy logic approach i.e., model based technique possibly will help to detect the asynchronous motor failure. Actually, fuzzy logic similar to humanoid intelligent methods besides expected linguistic empowering inferences through vague statistics. The dynamic model is technologically advanced for asynchronous motor by means of fuzzy logic classifier towards investigate the stator inter turn failure in addition open phase failure. A hardware implementation was carried out with LabVIEW for the online-monitoring of faults.
Trace element fingerprinting of jewellery rubies by external beam PIXE
NASA Astrophysics Data System (ADS)
Calligaro, T.; Poirot, J.-P.; Querré, G.
1999-04-01
External beam PIXE analysis allows the non-destructive in situ characterisation of gemstones mounted on jewellery pieces. This technique was used for the determination of the geographical origin of 64 rubies set on a high-valued necklace. The trace element content of these gemstones was measured and compared to that of a set of rubies of known sources. Multivariate statistical processing of the results allowed us to infer the provenance of rubies : one comes from Thailand/Cambodia deposit while the remaining are attributed to Burma. This highlights the complementary capabilities of PIXE and conventional gemological observations.
Selvarasu, Suresh; Kim, Do Yun; Karimi, Iftekhar A; Lee, Dong-Yup
2010-10-01
We present an integrated framework for characterizing fed-batch cultures of mouse hybridoma cells producing monoclonal antibody (mAb). This framework systematically combines data preprocessing, elemental balancing and statistical analysis technique. Initially, specific rates of cell growth, glucose/amino acid consumptions and mAb/metabolite productions were calculated via curve fitting using logistic equations, with subsequent elemental balancing of the preprocessed data indicating the presence of experimental measurement errors. Multivariate statistical analysis was then employed to understand physiological characteristics of the cellular system. The results from principal component analysis (PCA) revealed three major clusters of amino acids with similar trends in their consumption profiles: (i) arginine, threonine and serine, (ii) glycine, tyrosine, phenylalanine, methionine, histidine and asparagine, and (iii) lysine, valine and isoleucine. Further analysis using partial least square (PLS) regression identified key amino acids which were positively or negatively correlated with the cell growth, mAb production and the generation of lactate and ammonia. Based on these results, the optimal concentrations of key amino acids in the feed medium can be inferred, potentially leading to an increase in cell viability and productivity, as well as a decrease in toxic waste production. The study demonstrated how the current methodological framework using multivariate statistical analysis techniques can serve as a potential tool for deriving rational medium design strategies. Copyright © 2010 Elsevier B.V. All rights reserved.
Network Meta-Analysis Using R: A Review of Currently Available Automated Packages
Neupane, Binod; Richer, Danielle; Bonner, Ashley Joel; Kibret, Taddele; Beyene, Joseph
2014-01-01
Network meta-analysis (NMA) – a statistical technique that allows comparison of multiple treatments in the same meta-analysis simultaneously – has become increasingly popular in the medical literature in recent years. The statistical methodology underpinning this technique and software tools for implementing the methods are evolving. Both commercial and freely available statistical software packages have been developed to facilitate the statistical computations using NMA with varying degrees of functionality and ease of use. This paper aims to introduce the reader to three R packages, namely, gemtc, pcnetmeta, and netmeta, which are freely available software tools implemented in R. Each automates the process of performing NMA so that users can perform the analysis with minimal computational effort. We present, compare and contrast the availability and functionality of different important features of NMA in these three packages so that clinical investigators and researchers can determine which R packages to implement depending on their analysis needs. Four summary tables detailing (i) data input and network plotting, (ii) modeling options, (iii) assumption checking and diagnostic testing, and (iv) inference and reporting tools, are provided, along with an analysis of a previously published dataset to illustrate the outputs available from each package. We demonstrate that each of the three packages provides a useful set of tools, and combined provide users with nearly all functionality that might be desired when conducting a NMA. PMID:25541687
Network meta-analysis using R: a review of currently available automated packages.
Neupane, Binod; Richer, Danielle; Bonner, Ashley Joel; Kibret, Taddele; Beyene, Joseph
2014-01-01
Network meta-analysis (NMA)--a statistical technique that allows comparison of multiple treatments in the same meta-analysis simultaneously--has become increasingly popular in the medical literature in recent years. The statistical methodology underpinning this technique and software tools for implementing the methods are evolving. Both commercial and freely available statistical software packages have been developed to facilitate the statistical computations using NMA with varying degrees of functionality and ease of use. This paper aims to introduce the reader to three R packages, namely, gemtc, pcnetmeta, and netmeta, which are freely available software tools implemented in R. Each automates the process of performing NMA so that users can perform the analysis with minimal computational effort. We present, compare and contrast the availability and functionality of different important features of NMA in these three packages so that clinical investigators and researchers can determine which R packages to implement depending on their analysis needs. Four summary tables detailing (i) data input and network plotting, (ii) modeling options, (iii) assumption checking and diagnostic testing, and (iv) inference and reporting tools, are provided, along with an analysis of a previously published dataset to illustrate the outputs available from each package. We demonstrate that each of the three packages provides a useful set of tools, and combined provide users with nearly all functionality that might be desired when conducting a NMA.
Bayesian inference on earthquake size distribution: a case study in Italy
NASA Astrophysics Data System (ADS)
Licia, Faenza; Carlo, Meletti; Laura, Sandri
2010-05-01
This paper is focused on the study of earthquake size statistical distribution by using Bayesian inference. The strategy consists in the definition of an a priori distribution based on instrumental seismicity, and modeled as a power law distribution. By using the observed historical data, the power law is then modified in order to obtain the posterior distribution. The aim of this paper is to define the earthquake size distribution using all the seismic database available (i.e., instrumental and historical catalogs) and a robust statistical technique. We apply this methodology to the Italian seismicity, dividing the territory in source zones as done for the seismic hazard assessment, taken here as a reference model. The results suggest that each area has its own peculiar trend: while the power law is able to capture the mean aspect of the earthquake size distribution, the posterior emphasizes different slopes in different areas. Our results are in general agreement with the ones used in the seismic hazard assessment in Italy. However, there are areas in which a flattening in the curve is shown, meaning a significant departure from the power law behavior and implying that there are some local aspects that a power law distribution is not able to capture.
Bayesian Monte Carlo and Maximum Likelihood Approach for ...
Model uncertainty estimation and risk assessment is essential to environmental management and informed decision making on pollution mitigation strategies. In this study, we apply a probabilistic methodology, which combines Bayesian Monte Carlo simulation and Maximum Likelihood estimation (BMCML) to calibrate a lake oxygen recovery model. We first derive an analytical solution of the differential equation governing lake-averaged oxygen dynamics as a function of time-variable wind speed. Statistical inferences on model parameters and predictive uncertainty are then drawn by Bayesian conditioning of the analytical solution on observed daily wind speed and oxygen concentration data obtained from an earlier study during two recovery periods on a eutrophic lake in upper state New York. The model is calibrated using oxygen recovery data for one year and statistical inferences were validated using recovery data for another year. Compared with essentially two-step, regression and optimization approach, the BMCML results are more comprehensive and performed relatively better in predicting the observed temporal dissolved oxygen levels (DO) in the lake. BMCML also produced comparable calibration and validation results with those obtained using popular Markov Chain Monte Carlo technique (MCMC) and is computationally simpler and easier to implement than the MCMC. Next, using the calibrated model, we derive an optimal relationship between liquid film-transfer coefficien
Does History Repeat Itself? Wavelets and the Phylodynamics of Influenza A
Tom, Jennifer A.; Sinsheimer, Janet S.; Suchard, Marc A.
2012-01-01
Unprecedented global surveillance of viruses will result in massive sequence data sets that require new statistical methods. These data sets press the limits of Bayesian phylogenetics as the high-dimensional parameters that comprise a phylogenetic tree increase the already sizable computational burden of these techniques. This burden often results in partitioning the data set, for example, by gene, and inferring the evolutionary dynamics of each partition independently, a compromise that results in stratified analyses that depend only on data within a given partition. However, parameter estimates inferred from these stratified models are likely strongly correlated, considering they rely on data from a single data set. To overcome this shortfall, we exploit the existing Monte Carlo realizations from stratified Bayesian analyses to efficiently estimate a nonparametric hierarchical wavelet-based model and learn about the time-varying parameters of effective population size that reflect levels of genetic diversity across all partitions simultaneously. Our methods are applied to complete genome influenza A sequences that span 13 years. We find that broad peaks and trends, as opposed to seasonal spikes, in the effective population size history distinguish individual segments from the complete genome. We also address hypotheses regarding intersegment dynamics within a formal statistical framework that accounts for correlation between segment-specific parameters. PMID:22160768
Emura, Takeshi; Konno, Yoshihiko; Michimae, Hirofumi
2015-07-01
Doubly truncated data consist of samples whose observed values fall between the right- and left- truncation limits. With such samples, the distribution function of interest is estimated using the nonparametric maximum likelihood estimator (NPMLE) that is obtained through a self-consistency algorithm. Owing to the complicated asymptotic distribution of the NPMLE, the bootstrap method has been suggested for statistical inference. This paper proposes a closed-form estimator for the asymptotic covariance function of the NPMLE, which is computationally attractive alternative to bootstrapping. Furthermore, we develop various statistical inference procedures, such as confidence interval, goodness-of-fit tests, and confidence bands to demonstrate the usefulness of the proposed covariance estimator. Simulations are performed to compare the proposed method with both the bootstrap and jackknife methods. The methods are illustrated using the childhood cancer dataset.
Structured statistical models of inductive reasoning.
Kemp, Charles; Tenenbaum, Joshua B
2009-01-01
Everyday inductive inferences are often guided by rich background knowledge. Formal models of induction should aim to incorporate this knowledge and should explain how different kinds of knowledge lead to the distinctive patterns of reasoning found in different inductive contexts. This article presents a Bayesian framework that attempts to meet both goals and describes [corrected] 4 applications of the framework: a taxonomic model, a spatial model, a threshold model, and a causal model. Each model makes probabilistic inferences about the extensions of novel properties, but the priors for the 4 models are defined over different kinds of structures that capture different relationships between the categories in a domain. The framework therefore shows how statistical inference can operate over structured background knowledge, and the authors argue that this interaction between structure and statistics is critical for explaining the power and flexibility of human reasoning.
Spriestersbach, Albert; Röhrig, Bernd; du Prel, Jean-Baptist; Gerhold-Ay, Aslihan; Blettner, Maria
2009-09-01
Descriptive statistics are an essential part of biometric analysis and a prerequisite for the understanding of further statistical evaluations, including the drawing of inferences. When data are well presented, it is usually obvious whether the author has collected and evaluated them correctly and in keeping with accepted practice in the field. Statistical variables in medicine may be of either the metric (continuous, quantitative) or categorical (nominal, ordinal) type. Easily understandable examples are given. Basic techniques for the statistical description of collected data are presented and illustrated with examples. The goal of a scientific study must always be clearly defined. The definition of the target value or clinical endpoint determines the level of measurement of the variables in question. Nearly all variables, whatever their level of measurement, can be usefully presented graphically and numerically. The level of measurement determines what types of diagrams and statistical values are appropriate. There are also different ways of presenting combinations of two independent variables graphically and numerically. The description of collected data is indispensable. If the data are of good quality, valid and important conclusions can already be drawn when they are properly described. Furthermore, data description provides a basis for inferential statistics.
Goyal, Ravi; De Gruttola, Victor
2018-01-30
Analysis of sexual history data intended to describe sexual networks presents many challenges arising from the fact that most surveys collect information on only a very small fraction of the population of interest. In addition, partners are rarely identified and responses are subject to reporting biases. Typically, each network statistic of interest, such as mean number of sexual partners for men or women, is estimated independently of other network statistics. There is, however, a complex relationship among networks statistics; and knowledge of these relationships can aid in addressing concerns mentioned earlier. We develop a novel method that constrains a posterior predictive distribution of a collection of network statistics in order to leverage the relationships among network statistics in making inference about network properties of interest. The method ensures that inference on network properties is compatible with an actual network. Through extensive simulation studies, we also demonstrate that use of this method can improve estimates in settings where there is uncertainty that arises both from sampling and from systematic reporting bias compared with currently available approaches to estimation. To illustrate the method, we apply it to estimate network statistics using data from the Chicago Health and Social Life Survey. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Population forecasts for Bangladesh, using a Bayesian methodology.
Mahsin, Md; Hossain, Syed Shahadat
2012-12-01
Population projection for many developing countries could be quite a challenging task for the demographers mostly due to lack of availability of enough reliable data. The objective of this paper is to present an overview of the existing methods for population forecasting and to propose an alternative based on the Bayesian statistics, combining the formality of inference. The analysis has been made using Markov Chain Monte Carlo (MCMC) technique for Bayesian methodology available with the software WinBUGS. Convergence diagnostic techniques available with the WinBUGS software have been applied to ensure the convergence of the chains necessary for the implementation of MCMC. The Bayesian approach allows for the use of observed data and expert judgements by means of appropriate priors, and a more realistic population forecasts, along with associated uncertainty, has been possible.
Advanced statistics: linear regression, part I: simple linear regression.
Marill, Keith A
2004-01-01
Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.
Three-dimensional analysis of magnetometer array data
NASA Technical Reports Server (NTRS)
Richmond, A. D.; Baumjohann, W.
1984-01-01
A technique is developed for mapping magnetic variation fields in three dimensions using data from an array of magnetometers, based on the theory of optimal linear estimation. The technique is applied to data from the Scandinavian Magnetometer Array. Estimates of the spatial power spectra for the internal and external magnetic variations are derived, which in turn provide estimates of the spatial autocorrelation functions of the three magnetic variation components. Statistical errors involved in mapping the external and internal fields are quantified and displayed over the mapping region. Examples of field mapping and of separation into external and internal components are presented. A comparison between the three-dimensional field separation and a two-dimensional separation from a single chain of stations shows that significant differences can arise in the inferred internal component.
NASA Astrophysics Data System (ADS)
Lee, K. David; Wiesenfeld, Eric; Gelfand, Andrew
2007-04-01
One of the greatest challenges in modern combat is maintaining a high level of timely Situational Awareness (SA). In many situations, computational complexity and accuracy considerations make the development and deployment of real-time, high-level inference tools very difficult. An innovative hybrid framework that combines Bayesian inference, in the form of Bayesian Networks, and Possibility Theory, in the form of Fuzzy Logic systems, has recently been introduced to provide a rigorous framework for high-level inference. In previous research, the theoretical basis and benefits of the hybrid approach have been developed. However, lacking is a concrete experimental comparison of the hybrid framework with traditional fusion methods, to demonstrate and quantify this benefit. The goal of this research, therefore, is to provide a statistical analysis on the comparison of the accuracy and performance of hybrid network theory, with pure Bayesian and Fuzzy systems and an inexact Bayesian system approximated using Particle Filtering. To accomplish this task, domain specific models will be developed under these different theoretical approaches and then evaluated, via Monte Carlo Simulation, in comparison to situational ground truth to measure accuracy and fidelity. Following this, a rigorous statistical analysis of the performance results will be performed, to quantify the benefit of hybrid inference to other fusion tools.
Statistically optimal perception and learning: from behavior to neural representations
Fiser, József; Berkes, Pietro; Orbán, Gergő; Lengyel, Máté
2010-01-01
Human perception has recently been characterized as statistical inference based on noisy and ambiguous sensory inputs. Moreover, suitable neural representations of uncertainty have been identified that could underlie such probabilistic computations. In this review, we argue that learning an internal model of the sensory environment is another key aspect of the same statistical inference procedure and thus perception and learning need to be treated jointly. We review evidence for statistically optimal learning in humans and animals, and reevaluate possible neural representations of uncertainty based on their potential to support statistically optimal learning. We propose that spontaneous activity can have a functional role in such representations leading to a new, sampling-based, framework of how the cortex represents information and uncertainty. PMID:20153683
On the analysis of very small samples of Gaussian repeated measurements: an alternative approach.
Westgate, Philip M; Burchett, Woodrow W
2017-03-15
The analysis of very small samples of Gaussian repeated measurements can be challenging. First, due to a very small number of independent subjects contributing outcomes over time, statistical power can be quite small. Second, nuisance covariance parameters must be appropriately accounted for in the analysis in order to maintain the nominal test size. However, available statistical strategies that ensure valid statistical inference may lack power, whereas more powerful methods may have the potential for inflated test sizes. Therefore, we explore an alternative approach to the analysis of very small samples of Gaussian repeated measurements, with the goal of maintaining valid inference while also improving statistical power relative to other valid methods. This approach uses generalized estimating equations with a bias-corrected empirical covariance matrix that accounts for all small-sample aspects of nuisance correlation parameter estimation in order to maintain valid inference. Furthermore, the approach utilizes correlation selection strategies with the goal of choosing the working structure that will result in the greatest power. In our study, we show that when accurate modeling of the nuisance correlation structure impacts the efficiency of regression parameter estimation, this method can improve power relative to existing methods that yield valid inference. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Statistical inference for noisy nonlinear ecological dynamic systems.
Wood, Simon N
2010-08-26
Chaotic ecological dynamic systems defy conventional statistical analysis. Systems with near-chaotic dynamics are little better. Such systems are almost invariably driven by endogenous dynamic processes plus demographic and environmental process noise, and are only observable with error. Their sensitivity to history means that minute changes in the driving noise realization, or the system parameters, will cause drastic changes in the system trajectory. This sensitivity is inherited and amplified by the joint probability density of the observable data and the process noise, rendering it useless as the basis for obtaining measures of statistical fit. Because the joint density is the basis for the fit measures used by all conventional statistical methods, this is a major theoretical shortcoming. The inability to make well-founded statistical inferences about biological dynamic models in the chaotic and near-chaotic regimes, other than on an ad hoc basis, leaves dynamic theory without the methods of quantitative validation that are essential tools in the rest of biological science. Here I show that this impasse can be resolved in a simple and general manner, using a method that requires only the ability to simulate the observed data on a system from the dynamic model about which inferences are required. The raw data series are reduced to phase-insensitive summary statistics, quantifying local dynamic structure and the distribution of observations. Simulation is used to obtain the mean and the covariance matrix of the statistics, given model parameters, allowing the construction of a 'synthetic likelihood' that assesses model fit. This likelihood can be explored using a straightforward Markov chain Monte Carlo sampler, but one further post-processing step returns pure likelihood-based inference. I apply the method to establish the dynamic nature of the fluctuations in Nicholson's classic blowfly experiments.
Probabilistic Graphical Model Representation in Phylogenetics
Höhna, Sebastian; Heath, Tracy A.; Boussau, Bastien; Landis, Michael J.; Ronquist, Fredrik; Huelsenbeck, John P.
2014-01-01
Recent years have seen a rapid expansion of the model space explored in statistical phylogenetics, emphasizing the need for new approaches to statistical model representation and software development. Clear communication and representation of the chosen model is crucial for: (i) reproducibility of an analysis, (ii) model development, and (iii) software design. Moreover, a unified, clear and understandable framework for model representation lowers the barrier for beginners and nonspecialists to grasp complex phylogenetic models, including their assumptions and parameter/variable dependencies. Graphical modeling is a unifying framework that has gained in popularity in the statistical literature in recent years. The core idea is to break complex models into conditionally independent distributions. The strength lies in the comprehensibility, flexibility, and adaptability of this formalism, and the large body of computational work based on it. Graphical models are well-suited to teach statistical models, to facilitate communication among phylogeneticists and in the development of generic software for simulation and statistical inference. Here, we provide an introduction to graphical models for phylogeneticists and extend the standard graphical model representation to the realm of phylogenetics. We introduce a new graphical model component, tree plates, to capture the changing structure of the subgraph corresponding to a phylogenetic tree. We describe a range of phylogenetic models using the graphical model framework and introduce modules to simplify the representation of standard components in large and complex models. Phylogenetic model graphs can be readily used in simulation, maximum likelihood inference, and Bayesian inference using, for example, Metropolis–Hastings or Gibbs sampling of the posterior distribution. [Computation; graphical models; inference; modularization; statistical phylogenetics; tree plate.] PMID:24951559
Protein and gene model inference based on statistical modeling in k-partite graphs.
Gerster, Sarah; Qeli, Ermir; Ahrens, Christian H; Bühlmann, Peter
2010-07-06
One of the major goals of proteomics is the comprehensive and accurate description of a proteome. Shotgun proteomics, the method of choice for the analysis of complex protein mixtures, requires that experimentally observed peptides are mapped back to the proteins they were derived from. This process is also known as protein inference. We present Markovian Inference of Proteins and Gene Models (MIPGEM), a statistical model based on clearly stated assumptions to address the problem of protein and gene model inference for shotgun proteomics data. In particular, we are dealing with dependencies among peptides and proteins using a Markovian assumption on k-partite graphs. We are also addressing the problems of shared peptides and ambiguous proteins by scoring the encoding gene models. Empirical results on two control datasets with synthetic mixtures of proteins and on complex protein samples of Saccharomyces cerevisiae, Drosophila melanogaster, and Arabidopsis thaliana suggest that the results with MIPGEM are competitive with existing tools for protein inference.
Meyer, Patrick E; Lafitte, Frédéric; Bontempi, Gianluca
2008-10-29
This paper presents the R/Bioconductor package minet (version 1.1.6) which provides a set of functions to infer mutual information networks from a dataset. Once fed with a microarray dataset, the package returns a network where nodes denote genes, edges model statistical dependencies between genes and the weight of an edge quantifies the statistical evidence of a specific (e.g transcriptional) gene-to-gene interaction. Four different entropy estimators are made available in the package minet (empirical, Miller-Madow, Schurmann-Grassberger and shrink) as well as four different inference methods, namely relevance networks, ARACNE, CLR and MRNET. Also, the package integrates accuracy assessment tools, like F-scores, PR-curves and ROC-curves in order to compare the inferred network with a reference one. The package minet provides a series of tools for inferring transcriptional networks from microarray data. It is freely available from the Comprehensive R Archive Network (CRAN) as well as from the Bioconductor website.
Applying dynamic Bayesian networks to perturbed gene expression data.
Dojer, Norbert; Gambin, Anna; Mizera, Andrzej; Wilczyński, Bartek; Tiuryn, Jerzy
2006-05-08
A central goal of molecular biology is to understand the regulatory mechanisms of gene transcription and protein synthesis. Because of their solid basis in statistics, allowing to deal with the stochastic aspects of gene expressions and noisy measurements in a natural way, Bayesian networks appear attractive in the field of inferring gene interactions structure from microarray experiments data. However, the basic formalism has some disadvantages, e.g. it is sometimes hard to distinguish between the origin and the target of an interaction. Two kinds of microarray experiments yield data particularly rich in information regarding the direction of interactions: time series and perturbation experiments. In order to correctly handle them, the basic formalism must be modified. For example, dynamic Bayesian networks (DBN) apply to time series microarray data. To our knowledge the DBN technique has not been applied in the context of perturbation experiments. We extend the framework of dynamic Bayesian networks in order to incorporate perturbations. Moreover, an exact algorithm for inferring an optimal network is proposed and a discretization method specialized for time series data from perturbation experiments is introduced. We apply our procedure to realistic simulations data. The results are compared with those obtained by standard DBN learning techniques. Moreover, the advantages of using exact learning algorithm instead of heuristic methods are analyzed. We show that the quality of inferred networks dramatically improves when using data from perturbation experiments. We also conclude that the exact algorithm should be used when it is possible, i.e. when considered set of genes is small enough.
Cardiovascular oscillations: in search of a nonlinear parametric model
NASA Astrophysics Data System (ADS)
Bandrivskyy, Andriy; Luchinsky, Dmitry; McClintock, Peter V.; Smelyanskiy, Vadim; Stefanovska, Aneta; Timucin, Dogan
2003-05-01
We suggest a fresh approach to the modeling of the human cardiovascular system. Taking advantage of a new Bayesian inference technique, able to deal with stochastic nonlinear systems, we show that one can estimate parameters for models of the cardiovascular system directly from measured time series. We present preliminary results of inference of parameters of a model of coupled oscillators from measured cardiovascular data addressing cardiorespiratory interaction. We argue that the inference technique offers a very promising tool for the modeling, able to contribute significantly towards the solution of a long standing challenge -- development of new diagnostic techniques based on noninvasive measurements.
Computational statistics using the Bayesian Inference Engine
NASA Astrophysics Data System (ADS)
Weinberg, Martin D.
2013-09-01
This paper introduces the Bayesian Inference Engine (BIE), a general parallel, optimized software package for parameter inference and model selection. This package is motivated by the analysis needs of modern astronomical surveys and the need to organize and reuse expensive derived data. The BIE is the first platform for computational statistics designed explicitly to enable Bayesian update and model comparison for astronomical problems. Bayesian update is based on the representation of high-dimensional posterior distributions using metric-ball-tree based kernel density estimation. Among its algorithmic offerings, the BIE emphasizes hybrid tempered Markov chain Monte Carlo schemes that robustly sample multimodal posterior distributions in high-dimensional parameter spaces. Moreover, the BIE implements a full persistence or serialization system that stores the full byte-level image of the running inference and previously characterized posterior distributions for later use. Two new algorithms to compute the marginal likelihood from the posterior distribution, developed for and implemented in the BIE, enable model comparison for complex models and data sets. Finally, the BIE was designed to be a collaborative platform for applying Bayesian methodology to astronomy. It includes an extensible object-oriented and easily extended framework that implements every aspect of the Bayesian inference. By providing a variety of statistical algorithms for all phases of the inference problem, a scientist may explore a variety of approaches with a single model and data implementation. Additional technical details and download details are available from http://www.astro.umass.edu/bie. The BIE is distributed under the GNU General Public License.
Hawe, David; Hernández Fernández, Francisco R; O'Suilleabháin, Liam; Huang, Jian; Wolsztynski, Eric; O'Sullivan, Finbarr
2012-05-01
In dynamic mode, positron emission tomography (PET) can be used to track the evolution of injected radio-labelled molecules in living tissue. This is a powerful diagnostic imaging technique that provides a unique opportunity to probe the status of healthy and pathological tissue by examining how it processes substrates. The spatial aspect of PET is well established in the computational statistics literature. This article focuses on its temporal aspect. The interpretation of PET time-course data is complicated because the measured signal is a combination of vascular delivery and tissue retention effects. If the arterial time-course is known, the tissue time-course can typically be expressed in terms of a linear convolution between the arterial time-course and the tissue residue. In statistical terms, the residue function is essentially a survival function - a familiar life-time data construct. Kinetic analysis of PET data is concerned with estimation of the residue and associated functionals such as flow, flux, volume of distribution and transit time summaries. This review emphasises a nonparametric approach to the estimation of the residue based on a piecewise linear form. Rapid implementation of this by quadratic programming is described. The approach provides a reference for statistical assessment of widely used one- and two-compartmental model forms. We illustrate the method with data from two of the most well-established PET radiotracers, (15)O-H(2)O and (18)F-fluorodeoxyglucose, used for assessment of blood perfusion and glucose metabolism respectively. The presentation illustrates the use of two open-source tools, AMIDE and R, for PET scan manipulation and model inference.
Teach a Confidence Interval for the Median in the First Statistics Course
ERIC Educational Resources Information Center
Howington, Eric B.
2017-01-01
Few introductory statistics courses consider statistical inference for the median. This article argues in favour of adding a confidence interval for the median to the first statistics course. Several methods suitable for introductory statistics students are identified and briefly reviewed.
Statistics, Adjusted Statistics, and Maladjusted Statistics.
Kaufman, Jay S
2017-05-01
Statistical adjustment is a ubiquitous practice in all quantitative fields that is meant to correct for improprieties or limitations in observed data, to remove the influence of nuisance variables or to turn observed correlations into causal inferences. These adjustments proceed by reporting not what was observed in the real world, but instead modeling what would have been observed in an imaginary world in which specific nuisances and improprieties are absent. These techniques are powerful and useful inferential tools, but their application can be hazardous or deleterious if consumers of the adjusted results mistake the imaginary world of models for the real world of data. Adjustments require decisions about which factors are of primary interest and which are imagined away, and yet many adjusted results are presented without any explanation or justification for these decisions. Adjustments can be harmful if poorly motivated, and are frequently misinterpreted in the media's reporting of scientific studies. Adjustment procedures have become so routinized that many scientists and readers lose the habit of relating the reported findings back to the real world in which we live.
Bayesian Orbit Computation Tools for Objects on Geocentric Orbits
NASA Astrophysics Data System (ADS)
Virtanen, J.; Granvik, M.; Muinonen, K.; Oszkiewicz, D.
2013-08-01
We consider the space-debris orbital inversion problem via the concept of Bayesian inference. The methodology has been put forward for the orbital analysis of solar system small bodies in early 1990's [7] and results in a full solution of the statistical inverse problem given in terms of a posteriori probability density function (PDF) for the orbital parameters. We demonstrate the applicability of our statistical orbital analysis software to Earth orbiting objects, both using well-established Monte Carlo (MC) techniques (for a review, see e.g. [13] as well as recently developed Markov-chain MC (MCMC) techniques (e.g., [9]). In particular, we exploit the novel virtual observation MCMC method [8], which is based on the characterization of the phase-space volume of orbital solutions before the actual MCMC sampling. Our statistical methods and the resulting PDFs immediately enable probabilistic impact predictions to be carried out. Furthermore, this can be readily done also for very sparse data sets and data sets of poor quality - providing that some a priori information on the observational uncertainty is available. For asteroids, impact probabilities with the Earth from the discovery night onwards have been provided, e.g., by [11] and [10], the latter study includes the sampling of the observational-error standard deviation as a random variable.
Yue, Lilly Q
2012-01-01
In the evaluation of medical products, including drugs, biological products, and medical devices, comparative observational studies could play an important role when properly conducted randomized, well-controlled clinical trials are infeasible due to ethical or practical reasons. However, various biases could be introduced at every stage and into every aspect of the observational study, and consequently the interpretation of the resulting statistical inference would be of concern. While there do exist statistical techniques for addressing some of the challenging issues, often based on propensity score methodology, these statistical tools probably have not been as widely employed in prospectively designing observational studies as they should be. There are also times when they are implemented in an unscientific manner, such as performing propensity score model selection for a dataset involving outcome data in the same dataset, so that the integrity of observational study design and the interpretability of outcome analysis results could be compromised. In this paper, regulatory considerations on prospective study design using propensity scores are shared and illustrated with hypothetical examples.
From Wald to Savage: homo economicus becomes a Bayesian statistician.
Giocoli, Nicola
2013-01-01
Bayesian rationality is the paradigm of rational behavior in neoclassical economics. An economic agent is deemed rational when she maximizes her subjective expected utility and consistently revises her beliefs according to Bayes's rule. The paper raises the question of how, when and why this characterization of rationality came to be endorsed by mainstream economists. Though no definitive answer is provided, it is argued that the question is of great historiographic importance. The story begins with Abraham Wald's behaviorist approach to statistics and culminates with Leonard J. Savage's elaboration of subjective expected utility theory in his 1954 classic The Foundations of Statistics. The latter's acknowledged fiasco to achieve a reinterpretation of traditional inference techniques along subjectivist and behaviorist lines raises the puzzle of how a failed project in statistics could turn into such a big success in economics. Possible answers call into play the emphasis on consistency requirements in neoclassical theory and the impact of the postwar transformation of U.S. business schools. © 2012 Wiley Periodicals, Inc.
On the effect of subliminal priming on subjective perception of images: a machine learning approach.
Kumar, Parmod; Mahmood, Faisal; Mohan, Dhanya Menoth; Wong, Ken; Agrawal, Abhishek; Elgendi, Mohamed; Shukla, Rohit; Dauwels, Justin; Chan, Alice H D
2014-01-01
The research presented in this article investigates the influence of subliminal prime words on peoples' judgment about images, through electroencephalograms (EEGs). In this cross domain priming paradigm, the participants are asked to rate how much they like the stimulus images, on a 7-point Likert scale, after being subliminally exposed to masked lexical prime words, with EEG recorded simultaneously. Statistical analysis tools are used to analyze the effect of priming on behavior, and machine learning techniques to infer the primes from EEGs. The experiment reveals strong effects of subliminal priming on the participants' explicit rating of images. The subjective judgment affected by the priming makes visible change in event-related potentials (ERPs); results show larger ERP amplitude for the negative primes compared with positive and neutral primes. In addition, Support Vector Machine (SVM) based classifiers are proposed to infer the prime types from the average ERPs, which yields a classification rate of 70%.
The logical foundations of forensic science: towards reliable knowledge
Evett, Ian
2015-01-01
The generation of observations is a technical process and the advances that have been made in forensic science techniques over the last 50 years have been staggering. But science is about reasoning—about making sense from observations. For the forensic scientist, this is the challenge of interpreting a pattern of observations within the context of a legal trial. Here too, there have been major advances over recent years and there is a broad consensus among serious thinkers, both scientific and legal, that the logical framework is furnished by Bayesian inference (Aitken et al. Fundamentals of Probability and Statistical Evidence in Criminal Proceedings). This paper shows how the paradigm has matured, centred on the notion of the balanced scientist. Progress through the courts has not been always smooth and difficulties arising from recent judgments are discussed. Nevertheless, the future holds exciting prospects, in particular the opportunities for managing and calibrating the knowledge of the forensic scientists who assign the probabilities that are at the foundation of logical inference in the courtroom. PMID:26101288
Probability, statistics, and computational science.
Beerenwinkel, Niko; Siebourg, Juliane
2012-01-01
In this chapter, we review basic concepts from probability theory and computational statistics that are fundamental to evolutionary genomics. We provide a very basic introduction to statistical modeling and discuss general principles, including maximum likelihood and Bayesian inference. Markov chains, hidden Markov models, and Bayesian network models are introduced in more detail as they occur frequently and in many variations in genomics applications. In particular, we discuss efficient inference algorithms and methods for learning these models from partially observed data. Several simple examples are given throughout the text, some of which point to models that are discussed in more detail in subsequent chapters.
A Not-So-Fundamental Limitation on Studying Complex Systems with Statistics: Comment on Rabin (2011)
NASA Astrophysics Data System (ADS)
Thomas, Drew M.
2012-12-01
Although living organisms are affected by many interrelated and unidentified variables, this complexity does not automatically impose a fundamental limitation on statistical inference. Nor need one invoke such complexity as an explanation of the "Truth Wears Off" or "decline" effect; similar "decline" effects occur with far simpler systems studied in physics. Selective reporting and publication bias, and scientists' biases in favor of reporting eye-catching results (in general) or conforming to others' results (in physics) better explain this feature of the "Truth Wears Off" effect than Rabin's suggested limitation on statistical inference.
NASA Astrophysics Data System (ADS)
Yang, Jing; Reichert, Peter; Abbaspour, Karim C.; Yang, Hong
2007-07-01
SummaryCalibration of hydrologic models is very difficult because of measurement errors in input and response, errors in model structure, and the large number of non-identifiable parameters of distributed models. The difficulties even increase in arid regions with high seasonal variation of precipitation, where the modelled residuals often exhibit high heteroscedasticity and autocorrelation. On the other hand, support of water management by hydrologic models is important in arid regions, particularly if there is increasing water demand due to urbanization. The use and assessment of model results for this purpose require a careful calibration and uncertainty analysis. Extending earlier work in this field, we developed a procedure to overcome (i) the problem of non-identifiability of distributed parameters by introducing aggregate parameters and using Bayesian inference, (ii) the problem of heteroscedasticity of errors by combining a Box-Cox transformation of results and data with seasonally dependent error variances, (iii) the problems of autocorrelated errors, missing data and outlier omission with a continuous-time autoregressive error model, and (iv) the problem of the seasonal variation of error correlations with seasonally dependent characteristic correlation times. The technique was tested with the calibration of the hydrologic sub-model of the Soil and Water Assessment Tool (SWAT) in the Chaohe Basin in North China. The results demonstrated the good performance of this approach to uncertainty analysis, particularly with respect to the fulfilment of statistical assumptions of the error model. A comparison with an independent error model and with error models that only considered a subset of the suggested techniques clearly showed the superiority of the approach based on all the features (i)-(iv) mentioned above.
NASA Technical Reports Server (NTRS)
Bonavito, N. L.; Gordon, C. L.; Inguva, R.; Serafino, G. N.; Barnes, R. A.
1994-01-01
NASA's Mission to Planet Earth (MTPE) will address important interdisciplinary and environmental issues such as global warming, ozone depletion, deforestation, acid rain, and the like with its long term satellite observations of the Earth and with its comprehensive Data and Information System. Extensive sets of satellite observations supporting MTPE will be provided by the Earth Observing System (EOS), while more specific process related observations will be provided by smaller Earth Probes. MTPE will use data from ground and airborne scientific investigations to supplement and validate the global observations obtained from satellite imagery, while the EOS satellites will support interdisciplinary research and model development. This is important for understanding the processes that control the global environment and for improving the prediction of events. In this paper we illustrate the potential for powerful artificial intelligence (AI) techniques when used in the analysis of the formidable problems that exist in the NASA Earth Science programs and of those to be encountered in the future MTPE and EOS programs. These techniques, based on the logical and probabilistic reasoning aspects of plausible inference, strongly emphasize the synergetic relation between data and information. As such, they are ideally suited for the analysis of the massive data streams to be provided by both MTPE and EOS. To demonstrate this, we address both the satellite imagery and model enhancement issues for the problem of ozone profile retrieval through a method based on plausible scientific inferencing. Since in the retrieval problem, the atmospheric ozone profile that is consistent with a given set of measured radiances may not be unique, an optimum statistical method is used to estimate a 'best' profile solution from the radiances and from additional a priori information.
Wavelet-Bayesian inference of cosmic strings embedded in the cosmic microwave background
NASA Astrophysics Data System (ADS)
McEwen, J. D.; Feeney, S. M.; Peiris, H. V.; Wiaux, Y.; Ringeval, C.; Bouchet, F. R.
2017-12-01
Cosmic strings are a well-motivated extension to the standard cosmological model and could induce a subdominant component in the anisotropies of the cosmic microwave background (CMB), in addition to the standard inflationary component. The detection of strings, while observationally challenging, would provide a direct probe of physics at very high-energy scales. We develop a framework for cosmic string inference from observations of the CMB made over the celestial sphere, performing a Bayesian analysis in wavelet space where the string-induced CMB component has distinct statistical properties to the standard inflationary component. Our wavelet-Bayesian framework provides a principled approach to compute the posterior distribution of the string tension Gμ and the Bayesian evidence ratio comparing the string model to the standard inflationary model. Furthermore, we present a technique to recover an estimate of any string-induced CMB map embedded in observational data. Using Planck-like simulations, we demonstrate the application of our framework and evaluate its performance. The method is sensitive to Gμ ∼ 5 × 10-7 for Nambu-Goto string simulations that include an integrated Sachs-Wolfe contribution only and do not include any recombination effects, before any parameters of the analysis are optimized. The sensitivity of the method compares favourably with other techniques applied to the same simulations.
Benzy, V K; Jasmin, E A; Koshy, Rachel Cherian; Amal, Frank; Indiradevi, K P
2018-01-01
The advancement in medical research and intelligent modeling techniques has lead to the developments in anaesthesia management. The present study is targeted to estimate the depth of anaesthesia using cognitive signal processing and intelligent modeling techniques. The neurophysiological signal that reflects cognitive state of anaesthetic drugs is the electroencephalogram signal. The information available on electroencephalogram signals during anaesthesia are drawn by extracting relative wave energy features from the anaesthetic electroencephalogram signals. Discrete wavelet transform is used to decomposes the electroencephalogram signals into four levels and then relative wave energy is computed from approximate and detail coefficients of sub-band signals. Relative wave energy is extracted to find out the degree of importance of different electroencephalogram frequency bands associated with different anaesthetic phases awake, induction, maintenance and recovery. The Kruskal-Wallis statistical test is applied on the relative wave energy features to check the discriminating capability of relative wave energy features as awake, light anaesthesia, moderate anaesthesia and deep anaesthesia. A novel depth of anaesthesia index is generated by implementing a Adaptive neuro-fuzzy inference system based fuzzy c-means clustering algorithm which uses relative wave energy features as inputs. Finally, the generated depth of anaesthesia index is compared with a commercially available depth of anaesthesia monitor Bispectral index.
Decision trees in epidemiological research.
Venkatasubramaniam, Ashwini; Wolfson, Julian; Mitchell, Nathan; Barnes, Timothy; JaKa, Meghan; French, Simone
2017-01-01
In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods. We review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART) technique and the newer Conditional Inference tree (CTree) technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees. Decision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation.
NASA Astrophysics Data System (ADS)
Shafii, M.; Tolson, B.; Matott, L. S.
2012-04-01
Hydrologic modeling has benefited from significant developments over the past two decades. This has resulted in building of higher levels of complexity into hydrologic models, which eventually makes the model evaluation process (parameter estimation via calibration and uncertainty analysis) more challenging. In order to avoid unreasonable parameter estimates, many researchers have suggested implementation of multi-criteria calibration schemes. Furthermore, for predictive hydrologic models to be useful, proper consideration of uncertainty is essential. Consequently, recent research has emphasized comprehensive model assessment procedures in which multi-criteria parameter estimation is combined with statistically-based uncertainty analysis routines such as Bayesian inference using Markov Chain Monte Carlo (MCMC) sampling. Such a procedure relies on the use of formal likelihood functions based on statistical assumptions, and moreover, the Bayesian inference structured on MCMC samplers requires a considerably large number of simulations. Due to these issues, especially in complex non-linear hydrological models, a variety of alternative informal approaches have been proposed for uncertainty analysis in the multi-criteria context. This study aims at exploring a number of such informal uncertainty analysis techniques in multi-criteria calibration of hydrological models. The informal methods addressed in this study are (i) Pareto optimality which quantifies the parameter uncertainty using the Pareto solutions, (ii) DDS-AU which uses the weighted sum of objective functions to derive the prediction limits, and (iii) GLUE which describes the total uncertainty through identification of behavioral solutions. The main objective is to compare such methods with MCMC-based Bayesian inference with respect to factors such as computational burden, and predictive capacity, which are evaluated based on multiple comparative measures. The measures for comparison are calculated both for calibration and evaluation periods. The uncertainty analysis methodologies are applied to a simple 5-parameter rainfall-runoff model, called HYMOD.
Confidence set inference with a prior quadratic bound
NASA Technical Reports Server (NTRS)
Backus, George E.
1988-01-01
In the uniqueness part of a geophysical inverse problem, the observer wants to predict all likely values of P unknown numerical properties z = (z sub 1,...,z sub p) of the earth from measurement of D other numerical properties y(0)=(y sub 1(0),...,y sub D(0)) knowledge of the statistical distribution of the random errors in y(0). The data space Y containing y(0) is D-dimensional, so when the model space X is infinite-dimensional the linear uniqueness problem usually is insoluble without prior information about the correct earth model x. If that information is a quadratic bound on x (e.g., energy or dissipation rate), Bayesian inference (BI) and stochastic inversion (SI) inject spurious structure into x, implied by neither the data nor the quadratic bound. Confidence set inference (CSI) provides an alternative inversion technique free of this objection. CSI is illustrated in the problem of estimating the geomagnetic field B at the core-mantle boundary (CMB) from components of B measured on or above the earth's surface. Neither the heat flow nor the energy bound is strong enough to permit estimation of B(r) at single points on the CMB, but the heat flow bound permits estimation of uniform averages of B(r) over discs on the CMB, and both bounds permit weighted disc-averages with continous weighting kernels. Both bounds also permit estimation of low-degree Gauss coefficients at the CMB. The heat flow bound resolves them up to degree 8 if the crustal field at satellite altitudes must be treated as a systematic error, but can resolve to degree 11 under the most favorable statistical treatment of the crust. These two limits produce circles of confusion on the CMB with diameters of 25 deg and 19 deg respectively.
Applications of spatial statistical network models to stream data
Isaak, Daniel J.; Peterson, Erin E.; Ver Hoef, Jay M.; Wenger, Seth J.; Falke, Jeffrey A.; Torgersen, Christian E.; Sowder, Colin; Steel, E. Ashley; Fortin, Marie-Josée; Jordan, Chris E.; Ruesch, Aaron S.; Som, Nicholas; Monestiez, Pascal
2014-01-01
Streams and rivers host a significant portion of Earth's biodiversity and provide important ecosystem services for human populations. Accurate information regarding the status and trends of stream resources is vital for their effective conservation and management. Most statistical techniques applied to data measured on stream networks were developed for terrestrial applications and are not optimized for streams. A new class of spatial statistical model, based on valid covariance structures for stream networks, can be used with many common types of stream data (e.g., water quality attributes, habitat conditions, biological surveys) through application of appropriate distributions (e.g., Gaussian, binomial, Poisson). The spatial statistical network models account for spatial autocorrelation (i.e., nonindependence) among measurements, which allows their application to databases with clustered measurement locations. Large amounts of stream data exist in many areas where spatial statistical analyses could be used to develop novel insights, improve predictions at unsampled sites, and aid in the design of efficient monitoring strategies at relatively low cost. We review the topic of spatial autocorrelation and its effects on statistical inference, demonstrate the use of spatial statistics with stream datasets relevant to common research and management questions, and discuss additional applications and development potential for spatial statistics on stream networks. Free software for implementing the spatial statistical network models has been developed that enables custom applications with many stream databases.
NASA Astrophysics Data System (ADS)
Hadwin, Paul J.; Sipkens, T. A.; Thomson, K. A.; Liu, F.; Daun, K. J.
2016-01-01
Auto-correlated laser-induced incandescence (AC-LII) infers the soot volume fraction (SVF) of soot particles by comparing the spectral incandescence from laser-energized particles to the pyrometrically inferred peak soot temperature. This calculation requires detailed knowledge of model parameters such as the absorption function of soot, which may vary with combustion chemistry, soot age, and the internal structure of the soot. This work presents a Bayesian methodology to quantify such uncertainties. This technique treats the additional "nuisance" model parameters, including the soot absorption function, as stochastic variables and incorporates the current state of knowledge of these parameters into the inference process through maximum entropy priors. While standard AC-LII analysis provides a point estimate of the SVF, Bayesian techniques infer the posterior probability density, which will allow scientists and engineers to better assess the reliability of AC-LII inferred SVFs in the context of environmental regulations and competing diagnostics.
Bayesian Inference: with ecological applications
Link, William A.; Barker, Richard J.
2010-01-01
This text provides a mathematically rigorous yet accessible and engaging introduction to Bayesian inference with relevant examples that will be of interest to biologists working in the fields of ecology, wildlife management and environmental studies as well as students in advanced undergraduate statistics.. This text opens the door to Bayesian inference, taking advantage of modern computational efficiencies and easily accessible software to evaluate complex hierarchical models.
Extracting neuronal functional network dynamics via adaptive Granger causality analysis.
Sheikhattar, Alireza; Miran, Sina; Liu, Ji; Fritz, Jonathan B; Shamma, Shihab A; Kanold, Patrick O; Babadi, Behtash
2018-04-24
Quantifying the functional relations between the nodes in a network based on local observations is a key challenge in studying complex systems. Most existing time series analysis techniques for this purpose provide static estimates of the network properties, pertain to stationary Gaussian data, or do not take into account the ubiquitous sparsity in the underlying functional networks. When applied to spike recordings from neuronal ensembles undergoing rapid task-dependent dynamics, they thus hinder a precise statistical characterization of the dynamic neuronal functional networks underlying adaptive behavior. We develop a dynamic estimation and inference paradigm for extracting functional neuronal network dynamics in the sense of Granger, by integrating techniques from adaptive filtering, compressed sensing, point process theory, and high-dimensional statistics. We demonstrate the utility of our proposed paradigm through theoretical analysis, algorithm development, and application to synthetic and real data. Application of our techniques to two-photon Ca 2+ imaging experiments from the mouse auditory cortex reveals unique features of the functional neuronal network structures underlying spontaneous activity at unprecedented spatiotemporal resolution. Our analysis of simultaneous recordings from the ferret auditory and prefrontal cortical areas suggests evidence for the role of rapid top-down and bottom-up functional dynamics across these areas involved in robust attentive behavior.
Theory-based Bayesian Models of Inductive Inference
2010-07-19
Subjective randomness and natural scene statistics. Psychonomic Bulletin & Review . http://cocosci.berkeley.edu/tom/papers/randscenes.pdf Page 1...in press). Exemplar models as a mechanism for performing Bayesian inference. Psychonomic Bulletin & Review . http://cocosci.berkeley.edu/tom
Differences in Performance Among Test Statistics for Assessing Phylogenomic Model Adequacy.
Duchêne, David A; Duchêne, Sebastian; Ho, Simon Y W
2018-05-18
Statistical phylogenetic analyses of genomic data depend on models of nucleotide or amino acid substitution. The adequacy of these substitution models can be assessed using a number of test statistics, allowing the model to be rejected when it is found to provide a poor description of the evolutionary process. A potentially valuable use of model-adequacy test statistics is to identify when data sets are likely to produce unreliable phylogenetic estimates, but their differences in performance are rarely explored. We performed a comprehensive simulation study to identify test statistics that are sensitive to some of the most commonly cited sources of phylogenetic estimation error. Our results show that, for many test statistics, traditional thresholds for assessing model adequacy can fail to reject the model when the phylogenetic inferences are inaccurate and imprecise. This is particularly problematic when analysing loci that have few variable informative sites. We propose new thresholds for assessing substitution model adequacy and demonstrate their effectiveness in analyses of three phylogenomic data sets. These thresholds lead to frequent rejection of the model for loci that yield topological inferences that are imprecise and are likely to be inaccurate. We also propose the use of a summary statistic that provides a practical assessment of overall model adequacy. Our approach offers a promising means of enhancing model choice in genome-scale data sets, potentially leading to improvements in the reliability of phylogenomic inference.
Inferring action structure and causal relationships in continuous sequences of human action.
Buchsbaum, Daphna; Griffiths, Thomas L; Plunkett, Dillon; Gopnik, Alison; Baldwin, Dare
2015-02-01
In the real world, causal variables do not come pre-identified or occur in isolation, but instead are embedded within a continuous temporal stream of events. A challenge faced by both human learners and machine learning algorithms is identifying subsequences that correspond to the appropriate variables for causal inference. A specific instance of this problem is action segmentation: dividing a sequence of observed behavior into meaningful actions, and determining which of those actions lead to effects in the world. Here we present a Bayesian analysis of how statistical and causal cues to segmentation should optimally be combined, as well as four experiments investigating human action segmentation and causal inference. We find that both people and our model are sensitive to statistical regularities and causal structure in continuous action, and are able to combine these sources of information in order to correctly infer both causal relationships and segmentation boundaries. Copyright © 2014. Published by Elsevier Inc.
Uncertainties in cylindrical anode current inferences on pulsed power drivers
NASA Astrophysics Data System (ADS)
Porwitzky, Andrew; Brown, Justin
2018-06-01
For over a decade, velocimetry based techniques have been used to infer the electrical current delivered to dynamic materials properties experiments on pulsed power drivers such as the Z Machine. Though originally developed for planar load geometries, in recent years, inferring the current delivered to cylindrical coaxial loads has become a valuable diagnostic tool for numerous platforms. Presented is a summary of uncertainties that can propagate through the current inference technique when applied to expanding cylindrical anodes. An equation representing quantitative uncertainty is developed which shows the unfold method to be accurate to a few percent above 10 MA of load current.
Applications of statistics to medical science (1) Fundamental concepts.
Watanabe, Hiroshi
2011-01-01
The conceptual framework of statistical tests and statistical inferences are discussed, and the epidemiological background of statistics is briefly reviewed. This study is one of a series in which we survey the basics of statistics and practical methods used in medical statistics. Arguments related to actual statistical analysis procedures will be made in subsequent papers.
Indexing the Environmental Quality Performance Based on A Fuzzy Inference Approach
NASA Astrophysics Data System (ADS)
Iswari, Lizda
2018-03-01
Environmental performance strongly deals with the quality of human life. In Indonesia, this performance is quantified through Environmental Quality Index (EQI) which consists of three indicators, i.e. river quality index, air quality index, and coverage of land cover. The current of this instrument data processing was done by averaging and weighting each index to represent the EQI at the provincial level. However, we found EQI interpretations that may contain some uncertainties and have a range of circumstances possibly less appropriate if processed under a common statistical approach. In this research, we aim to manage the indicators of EQI with a more intuitive computation technique and make some inferences related to the environmental performance in 33 provinces in Indonesia. Research was conducted in three stages of Mamdani Fuzzy Inference System (MAFIS), i.e. fuzzification, data inference, and defuzzification. Data input consists of 10 environmental parameters and the output is an index of Environmental Quality Performance (EQP). Research was applied to the environmental condition data set in 2015 and quantified the results into the scale of 0 to 100, i.e. 10 provinces at good performance with the EQP above 80 dominated by provinces in eastern part of Indonesia, 22 provinces with the EQP between 80 to 50, and one province in Java Island with the EQP below 20. This research shows that environmental quality performance can be quantified without eliminating the natures of the data set and simultaneously is able to show the environment behavior along with its spatial pattern distribution.
The role of causal criteria in causal inferences: Bradford Hill's "aspects of association".
Ward, Andrew C
2009-06-17
As noted by Wesley Salmon and many others, causal concepts are ubiquitous in every branch of theoretical science, in the practical disciplines and in everyday life. In the theoretical and practical sciences especially, people often base claims about causal relations on applications of statistical methods to data. However, the source and type of data place important constraints on the choice of statistical methods as well as on the warrant attributed to the causal claims based on the use of such methods. For example, much of the data used by people interested in making causal claims come from non-experimental, observational studies in which random allocations to treatment and control groups are not present. Thus, one of the most important problems in the social and health sciences concerns making justified causal inferences using non-experimental, observational data. In this paper, I examine one method of justifying such inferences that is especially widespread in epidemiology and the health sciences generally - the use of causal criteria. I argue that while the use of causal criteria is not appropriate for either deductive or inductive inferences, they do have an important role to play in inferences to the best explanation. As such, causal criteria, exemplified by what Bradford Hill referred to as "aspects of [statistical] associations", have an indispensible part to play in the goal of making justified causal claims.
The role of causal criteria in causal inferences: Bradford Hill's "aspects of association"
Ward, Andrew C
2009-01-01
As noted by Wesley Salmon and many others, causal concepts are ubiquitous in every branch of theoretical science, in the practical disciplines and in everyday life. In the theoretical and practical sciences especially, people often base claims about causal relations on applications of statistical methods to data. However, the source and type of data place important constraints on the choice of statistical methods as well as on the warrant attributed to the causal claims based on the use of such methods. For example, much of the data used by people interested in making causal claims come from non-experimental, observational studies in which random allocations to treatment and control groups are not present. Thus, one of the most important problems in the social and health sciences concerns making justified causal inferences using non-experimental, observational data. In this paper, I examine one method of justifying such inferences that is especially widespread in epidemiology and the health sciences generally – the use of causal criteria. I argue that while the use of causal criteria is not appropriate for either deductive or inductive inferences, they do have an important role to play in inferences to the best explanation. As such, causal criteria, exemplified by what Bradford Hill referred to as "aspects of [statistical] associations", have an indispensible part to play in the goal of making justified causal claims. PMID:19534788
Jacquin, Hugo; Gilson, Amy; Shakhnovich, Eugene; Cocco, Simona; Monasson, Rémi
2016-05-01
Inverse statistical approaches to determine protein structure and function from Multiple Sequence Alignments (MSA) are emerging as powerful tools in computational biology. However the underlying assumptions of the relationship between the inferred effective Potts Hamiltonian and real protein structure and energetics remain untested so far. Here we use lattice protein model (LP) to benchmark those inverse statistical approaches. We build MSA of highly stable sequences in target LP structures, and infer the effective pairwise Potts Hamiltonians from those MSA. We find that inferred Potts Hamiltonians reproduce many important aspects of 'true' LP structures and energetics. Careful analysis reveals that effective pairwise couplings in inferred Potts Hamiltonians depend not only on the energetics of the native structure but also on competing folds; in particular, the coupling values reflect both positive design (stabilization of native conformation) and negative design (destabilization of competing folds). In addition to providing detailed structural information, the inferred Potts models used as protein Hamiltonian for design of new sequences are able to generate with high probability completely new sequences with the desired folds, which is not possible using independent-site models. Those are remarkable results as the effective LP Hamiltonians used to generate MSA are not simple pairwise models due to the competition between the folds. Our findings elucidate the reasons for the success of inverse approaches to the modelling of proteins from sequence data, and their limitations.
Logical reasoning versus information processing in the dual-strategy model of reasoning.
Markovits, Henry; Brisson, Janie; de Chantal, Pier-Luc
2017-01-01
One of the major debates concerning the nature of inferential reasoning is between counterexample-based strategies such as mental model theory and statistical strategies underlying probabilistic models. The dual-strategy model, proposed by Verschueren, Schaeken, & d'Ydewalle (2005a, 2005b), which suggests that people might have access to both kinds of strategy has been supported by several recent studies. These have shown that statistical reasoners make inferences based on using information about premises in order to generate a likelihood estimate of conclusion probability. However, while results concerning counterexample reasoners are consistent with a counterexample detection model, these results could equally be interpreted as indicating a greater sensitivity to logical form. In order to distinguish these 2 interpretations, in Studies 1 and 2, we presented reasoners with Modus ponens (MP) inferences with statistical information about premise strength and in Studies 3 and 4, naturalistic MP inferences with premises having many disabling conditions. Statistical reasoners accepted the MP inference more often than counterexample reasoners in Studies 1 and 2, while the opposite pattern was observed in Studies 3 and 4. Results show that these strategies must be defined in terms of information processing, with no clear relations to "logical" reasoning. These results have additional implications for the underlying debate about the nature of human reasoning. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Yu, Jihnhee; Yang, Luge; Vexler, Albert; Hutson, Alan D
2016-06-15
The receiver operating characteristic (ROC) curve is a popular technique with applications, for example, investigating an accuracy of a biomarker to delineate between disease and non-disease groups. A common measure of accuracy of a given diagnostic marker is the area under the ROC curve (AUC). In contrast with the AUC, the partial area under the ROC curve (pAUC) looks into the area with certain specificities (i.e., true negative rate) only, and it can be often clinically more relevant than examining the entire ROC curve. The pAUC is commonly estimated based on a U-statistic with the plug-in sample quantile, making the estimator a non-traditional U-statistic. In this article, we propose an accurate and easy method to obtain the variance of the nonparametric pAUC estimator. The proposed method is easy to implement for both one biomarker test and the comparison of two correlated biomarkers because it simply adapts the existing variance estimator of U-statistics. In this article, we show accuracy and other advantages of the proposed variance estimation method by broadly comparing it with previously existing methods. Further, we develop an empirical likelihood inference method based on the proposed variance estimator through a simple implementation. In an application, we demonstrate that, depending on the inferences by either the AUC or pAUC, we can make a different decision on a prognostic ability of a same set of biomarkers. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Olsen, M. K.
2017-02-01
We propose and analyze a pumped and damped Bose-Hubbard dimer as a source of continuous-variable Einstein-Podolsky-Rosen (EPR) steering with non-Gaussian statistics. We use and compare the results of the approximate truncated Wigner and the exact positive-P representation to calculate and compare the predictions for intensities, second-order quantum correlations, and third- and fourth-order cumulants. We find agreement for intensities and the products of inferred quadrature variances, which indicate that states demonstrating the EPR paradox are present. We find clear signals of non-Gaussianity in the quantum states of the modes from both the approximate and exact techniques, with quantitative differences in their predictions. Our proposed experimental configuration is extrapolated from current experimental techniques and adds another apparatus to the current toolbox of quantum atom optics.
Graffelman, Jan; Sánchez, Milagros; Cook, Samantha; Moreno, Victor
2013-01-01
In genetic association studies, tests for Hardy-Weinberg proportions are often employed as a quality control checking procedure. Missing genotypes are typically discarded prior to testing. In this paper we show that inference for Hardy-Weinberg proportions can be biased when missing values are discarded. We propose to use multiple imputation of missing values in order to improve inference for Hardy-Weinberg proportions. For imputation we employ a multinomial logit model that uses information from allele intensities and/or neighbouring markers. Analysis of an empirical data set of single nucleotide polymorphisms possibly related to colon cancer reveals that missing genotypes are not missing completely at random. Deviation from Hardy-Weinberg proportions is mostly due to a lack of heterozygotes. Inbreeding coefficients estimated by multiple imputation of the missings are typically lowered with respect to inbreeding coefficients estimated by discarding the missings. Accounting for missings by multiple imputation qualitatively changed the results of 10 to 17% of the statistical tests performed. Estimates of inbreeding coefficients obtained by multiple imputation showed high correlation with estimates obtained by single imputation using an external reference panel. Our conclusion is that imputation of missing data leads to improved statistical inference for Hardy-Weinberg proportions.
[Application of statistics on chronic-diseases-relating observational research papers].
Hong, Zhi-heng; Wang, Ping; Cao, Wei-hua
2012-09-01
To study the application of statistics on Chronic-diseases-relating observational research papers which were recently published in the Chinese Medical Association Magazines, with influential index above 0.5. Using a self-developed criterion, two investigators individually participated in assessing the application of statistics on Chinese Medical Association Magazines, with influential index above 0.5. Different opinions reached an agreement through discussion. A total number of 352 papers from 6 magazines, including the Chinese Journal of Epidemiology, Chinese Journal of Oncology, Chinese Journal of Preventive Medicine, Chinese Journal of Cardiology, Chinese Journal of Internal Medicine and Chinese Journal of Endocrinology and Metabolism, were reviewed. The rate of clear statement on the following contents as: research objectives, t target audience, sample issues, objective inclusion criteria and variable definitions were 99.43%, 98.57%, 95.43%, 92.86% and 96.87%. The correct rates of description on quantitative and qualitative data were 90.94% and 91.46%, respectively. The rates on correctly expressing the results, on statistical inference methods related to quantitative, qualitative data and modeling were 100%, 95.32% and 87.19%, respectively. 89.49% of the conclusions could directly response to the research objectives. However, 69.60% of the papers did not mention the exact names of the study design, statistically, that the papers were using. 11.14% of the papers were in lack of further statement on the exclusion criteria. Percentage of the papers that could clearly explain the sample size estimation only taking up as 5.16%. Only 24.21% of the papers clearly described the variable value assignment. Regarding the introduction on statistical conduction and on database methods, the rate was only 24.15%. 18.75% of the papers did not express the statistical inference methods sufficiently. A quarter of the papers did not use 'standardization' appropriately. As for the aspect of statistical inference, the rate of description on statistical testing prerequisite was only 24.12% while 9.94% papers did not even employ the statistical inferential method that should be used. The main deficiencies on the application of Statistics used in papers related to Chronic-diseases-related observational research were as follows: lack of sample-size determination, variable value assignment description not sufficient, methods on statistics were not introduced clearly or properly, lack of consideration for pre-requisition regarding the use of statistical inferences.
Cornuet, Jean-Marie; Santos, Filipe; Beaumont, Mark A.; Robert, Christian P.; Marin, Jean-Michel; Balding, David J.; Guillemaud, Thomas; Estoup, Arnaud
2008-01-01
Summary: Genetic data obtained on population samples convey information about their evolutionary history. Inference methods can extract part of this information but they require sophisticated statistical techniques that have been made available to the biologist community (through computer programs) only for simple and standard situations typically involving a small number of samples. We propose here a computer program (DIY ABC) for inference based on approximate Bayesian computation (ABC), in which scenarios can be customized by the user to fit many complex situations involving any number of populations and samples. Such scenarios involve any combination of population divergences, admixtures and population size changes. DIY ABC can be used to compare competing scenarios, estimate parameters for one or more scenarios and compute bias and precision measures for a given scenario and known values of parameters (the current version applies to unlinked microsatellite data). This article describes key methods used in the program and provides its main features. The analysis of one simulated and one real dataset, both with complex evolutionary scenarios, illustrates the main possibilities of DIY ABC. Availability: The software DIY ABC is freely available at http://www.montpellier.inra.fr/CBGP/diyabc. Contact: j.cornuet@imperial.ac.uk Supplementary information: Supplementary data are also available at http://www.montpellier.inra.fr/CBGP/diyabc PMID:18842597
Royle, J. Andrew; Dorazio, Robert M.
2008-01-01
A guide to data collection, modeling and inference strategies for biological survey data using Bayesian and classical statistical methods. This book describes a general and flexible framework for modeling and inference in ecological systems based on hierarchical models, with a strict focus on the use of probability models and parametric inference. Hierarchical models represent a paradigm shift in the application of statistics to ecological inference problems because they combine explicit models of ecological system structure or dynamics with models of how ecological systems are observed. The principles of hierarchical modeling are developed and applied to problems in population, metapopulation, community, and metacommunity systems. The book provides the first synthetic treatment of many recent methodological advances in ecological modeling and unifies disparate methods and procedures. The authors apply principles of hierarchical modeling to ecological problems, including * occurrence or occupancy models for estimating species distribution * abundance models based on many sampling protocols, including distance sampling * capture-recapture models with individual effects * spatial capture-recapture models based on camera trapping and related methods * population and metapopulation dynamic models * models of biodiversity, community structure and dynamics.
Truth, models, model sets, AIC, and multimodel inference: a Bayesian perspective
Barker, Richard J.; Link, William A.
2015-01-01
Statistical inference begins with viewing data as realizations of stochastic processes. Mathematical models provide partial descriptions of these processes; inference is the process of using the data to obtain a more complete description of the stochastic processes. Wildlife and ecological scientists have become increasingly concerned with the conditional nature of model-based inference: what if the model is wrong? Over the last 2 decades, Akaike's Information Criterion (AIC) has been widely and increasingly used in wildlife statistics for 2 related purposes, first for model choice and second to quantify model uncertainty. We argue that for the second of these purposes, the Bayesian paradigm provides the natural framework for describing uncertainty associated with model choice and provides the most easily communicated basis for model weighting. Moreover, Bayesian arguments provide the sole justification for interpreting model weights (including AIC weights) as coherent (mathematically self consistent) model probabilities. This interpretation requires treating the model as an exact description of the data-generating mechanism. We discuss the implications of this assumption, and conclude that more emphasis is needed on model checking to provide confidence in the quality of inference.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shores, D.A.; Stout, J.H.; Gerberich, W.W.
1993-06-01
This report summarizes a three-year study of stresses arising in the oxide scale and underlying metal during high temperature oxidation and of scale cracking. In-situ XRD was developed to measure strains during oxidation over 1000{degrees}C on pure metals. Acoustic emission was used to observe scale fracture during isothermal oxidation and cooling, and statistical analysis was used to infer mechanical aspects of cracking. A microscratch technique was used to measure the fracture toughness of scale/metal interface. A theoretical model was evaluated for the development and relaxation of stresses in scale and metal substrate during oxidation.
Sileshi, G
2006-10-01
Researchers and regulatory agencies often make statistical inferences from insect count data using modelling approaches that assume homogeneous variance. Such models do not allow for formal appraisal of variability which in its different forms is the subject of interest in ecology. Therefore, the objectives of this paper were to (i) compare models suitable for handling variance heterogeneity and (ii) select optimal models to ensure valid statistical inferences from insect count data. The log-normal, standard Poisson, Poisson corrected for overdispersion, zero-inflated Poisson, the negative binomial distribution and zero-inflated negative binomial models were compared using six count datasets on foliage-dwelling insects and five families of soil-dwelling insects. Akaike's and Schwarz Bayesian information criteria were used for comparing the various models. Over 50% of the counts were zeros even in locally abundant species such as Ootheca bennigseni Weise, Mesoplatys ochroptera Stål and Diaecoderus spp. The Poisson model after correction for overdispersion and the standard negative binomial distribution model provided better description of the probability distribution of seven out of the 11 insects than the log-normal, standard Poisson, zero-inflated Poisson or zero-inflated negative binomial models. It is concluded that excess zeros and variance heterogeneity are common data phenomena in insect counts. If not properly modelled, these properties can invalidate the normal distribution assumptions resulting in biased estimation of ecological effects and jeopardizing the integrity of the scientific inferences. Therefore, it is recommended that statistical models appropriate for handling these data properties be selected using objective criteria to ensure efficient statistical inference.
Li, Huanjie; Nickerson, Lisa D; Nichols, Thomas E; Gao, Jia-Hong
2017-03-01
Two powerful methods for statistical inference on MRI brain images have been proposed recently, a non-stationary voxelation-corrected cluster-size test (CST) based on random field theory and threshold-free cluster enhancement (TFCE) based on calculating the level of local support for a cluster, then using permutation testing for inference. Unlike other statistical approaches, these two methods do not rest on the assumptions of a uniform and high degree of spatial smoothness of the statistic image. Thus, they are strongly recommended for group-level fMRI analysis compared to other statistical methods. In this work, the non-stationary voxelation-corrected CST and TFCE methods for group-level analysis were evaluated for both stationary and non-stationary images under varying smoothness levels, degrees of freedom and signal to noise ratios. Our results suggest that, both methods provide adequate control for the number of voxel-wise statistical tests being performed during inference on fMRI data and they are both superior to current CSTs implemented in popular MRI data analysis software packages. However, TFCE is more sensitive and stable for group-level analysis of VBM data. Thus, the voxelation-corrected CST approach may confer some advantages by being computationally less demanding for fMRI data analysis than TFCE with permutation testing and by also being applicable for single-subject fMRI analyses, while the TFCE approach is advantageous for VBM data. Hum Brain Mapp 38:1269-1280, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Powerful Statistical Inference for Nested Data Using Sufficient Summary Statistics
Dowding, Irene; Haufe, Stefan
2018-01-01
Hierarchically-organized data arise naturally in many psychology and neuroscience studies. As the standard assumption of independent and identically distributed samples does not hold for such data, two important problems are to accurately estimate group-level effect sizes, and to obtain powerful statistical tests against group-level null hypotheses. A common approach is to summarize subject-level data by a single quantity per subject, which is often the mean or the difference between class means, and treat these as samples in a group-level t-test. This “naive” approach is, however, suboptimal in terms of statistical power, as it ignores information about the intra-subject variance. To address this issue, we review several approaches to deal with nested data, with a focus on methods that are easy to implement. With what we call the sufficient-summary-statistic approach, we highlight a computationally efficient technique that can improve statistical power by taking into account within-subject variances, and we provide step-by-step instructions on how to apply this approach to a number of frequently-used measures of effect size. The properties of the reviewed approaches and the potential benefits over a group-level t-test are quantitatively assessed on simulated data and demonstrated on EEG data from a simulated-driving experiment. PMID:29615885
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bennett, Janine Camille; Thompson, David; Pebay, Philippe Pierre
Statistical analysis is typically used to reduce the dimensionality of and infer meaning from data. A key challenge of any statistical analysis package aimed at large-scale, distributed data is to address the orthogonal issues of parallel scalability and numerical stability. Many statistical techniques, e.g., descriptive statistics or principal component analysis, are based on moments and co-moments and, using robust online update formulas, can be computed in an embarrassingly parallel manner, amenable to a map-reduce style implementation. In this paper we focus on contingency tables, through which numerous derived statistics such as joint and marginal probability, point-wise mutual information, information entropy,more » and {chi}{sup 2} independence statistics can be directly obtained. However, contingency tables can become large as data size increases, requiring a correspondingly large amount of communication between processors. This potential increase in communication prevents optimal parallel speedup and is the main difference with moment-based statistics (which we discussed in [1]) where the amount of inter-processor communication is independent of data size. Here we present the design trade-offs which we made to implement the computation of contingency tables in parallel. We also study the parallel speedup and scalability properties of our open source implementation. In particular, we observe optimal speed-up and scalability when the contingency statistics are used in their appropriate context, namely, when the data input is not quasi-diffuse.« less
Wright, Marvin N; Dankowski, Theresa; Ziegler, Andreas
2017-04-15
The most popular approach for analyzing survival data is the Cox regression model. The Cox model may, however, be misspecified, and its proportionality assumption may not always be fulfilled. An alternative approach for survival prediction is random forests for survival outcomes. The standard split criterion for random survival forests is the log-rank test statistic, which favors splitting variables with many possible split points. Conditional inference forests avoid this split variable selection bias. However, linear rank statistics are utilized by default in conditional inference forests to select the optimal splitting variable, which cannot detect non-linear effects in the independent variables. An alternative is to use maximally selected rank statistics for the split point selection. As in conditional inference forests, splitting variables are compared on the p-value scale. However, instead of the conditional Monte-Carlo approach used in conditional inference forests, p-value approximations are employed. We describe several p-value approximations and the implementation of the proposed random forest approach. A simulation study demonstrates that unbiased split variable selection is possible. However, there is a trade-off between unbiased split variable selection and runtime. In benchmark studies of prediction performance on simulated and real datasets, the new method performs better than random survival forests if informative dichotomous variables are combined with uninformative variables with more categories and better than conditional inference forests if non-linear covariate effects are included. In a runtime comparison, the method proves to be computationally faster than both alternatives, if a simple p-value approximation is used. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
RESOLVE: A new algorithm for aperture synthesis imaging of extended emission in radio astronomy
NASA Astrophysics Data System (ADS)
Junklewitz, H.; Bell, M. R.; Selig, M.; Enßlin, T. A.
2016-02-01
We present resolve, a new algorithm for radio aperture synthesis imaging of extended and diffuse emission in total intensity. The algorithm is derived using Bayesian statistical inference techniques, estimating the surface brightness in the sky assuming a priori log-normal statistics. resolve estimates the measured sky brightness in total intensity, and the spatial correlation structure in the sky, which is used to guide the algorithm to an optimal reconstruction of extended and diffuse sources. During this process, the algorithm succeeds in deconvolving the effects of the radio interferometric point spread function. Additionally, resolve provides a map with an uncertainty estimate of the reconstructed surface brightness. Furthermore, with resolve we introduce a new, optimal visibility weighting scheme that can be viewed as an extension to robust weighting. In tests using simulated observations, the algorithm shows improved performance against two standard imaging approaches for extended sources, Multiscale-CLEAN and the Maximum Entropy Method.
Constructing networks with correlation maximization methods.
Mellor, Joseph C; Wu, Jie; Delisi, Charles
2004-01-01
Problems of inference in systems biology are ideally reduced to formulations which can efficiently represent the features of interest. In the case of predicting gene regulation and pathway networks, an important feature which describes connected genes and proteins is the relationship between active and inactive forms, i.e. between the "on" and "off" states of the components. While not optimal at the limits of resolution, these logical relationships between discrete states can often yield good approximations of the behavior in larger complex systems, where exact representation of measurement relationships may be intractable. We explore techniques for extracting binary state variables from measurement of gene expression, and go on to describe robust measures for statistical significance and information that can be applied to many such types of data. We show how statistical strength and information are equivalent criteria in limiting cases, and demonstrate the application of these measures to simple systems of gene regulation.
NASA Astrophysics Data System (ADS)
Parodi, A.; von Hardenberg, J.; Provenzale, A.
2012-04-01
Intense precipitation events are often associated with strong convective phenomena in the atmosphere. A deeper understanding of how microphysics affects the spatial and temporal variability of convective processes is relevant for many hydro-meteorological applications, such as the estimation of rainfall using remote sensing techniques and the ability to predict severe precipitation processes. In this paper, high-resolution simulations (0.1-1 km) of an atmosphere in radiative-convective equilibrium are performed using the Weather Research and Forecasting (WRF) model by prescribing different microphysical parameterizations. The dependence of fine-scale spatio-temporal properties of convective structures on microphysical details are investigated and the simulation results are compared with the known properties of radar maps of precipitation fields. We analyze and discuss similarities and differences and, based also on previous results on the dependence of precipitation statistics on the raindrop terminal velocity, try to draw some general inferences.
Incorporating Biological Knowledge into Evaluation of Casual Regulatory Hypothesis
NASA Technical Reports Server (NTRS)
Chrisman, Lonnie; Langley, Pat; Bay, Stephen; Pohorille, Andrew; DeVincenzi, D. (Technical Monitor)
2002-01-01
Biological data can be scarce and costly to obtain. The small number of samples available typically limits statistical power and makes reliable inference of causal relations extremely difficult. However, we argue that statistical power can be increased substantially by incorporating prior knowledge and data from diverse sources. We present a Bayesian framework that combines information from different sources and we show empirically that this lets one make correct causal inferences with small sample sizes that otherwise would be impossible.
A Review of Some Aspects of Robust Inference for Time Series.
1984-09-01
REVIEW OF SOME ASPECTSOF ROBUST INFERNCE FOR TIME SERIES by Ad . Dougla Main TE "iAL REPOW No. 63 Septermber 1984 Department of Statistics University of ...clear. One cannot hope to have a good method for dealing with outliers in time series by using only an instantaneous nonlinear transformation of the data...AI.49 716 A REVIEWd OF SOME ASPECTS OF ROBUST INFERENCE FOR TIME 1/1 SERIES(U) WASHINGTON UNIV SEATTLE DEPT OF STATISTICS R D MARTIN SEP 84 TR-53
The researcher and the consultant: from testing to probability statements.
Hamra, Ghassan B; Stang, Andreas; Poole, Charles
2015-09-01
In the first instalment of this series, Stang and Poole provided an overview of Fisher significance testing (ST), Neyman-Pearson null hypothesis testing (NHT), and their unfortunate and unintended offspring, null hypothesis significance testing. In addition to elucidating the distinction between the first two and the evolution of the third, the authors alluded to alternative models of statistical inference; namely, Bayesian statistics. Bayesian inference has experienced a revival in recent decades, with many researchers advocating for its use as both a complement and an alternative to NHT and ST. This article will continue in the direction of the first instalment, providing practicing researchers with an introduction to Bayesian inference. Our work will draw on the examples and discussion of the previous dialogue.
Spectral likelihood expansions for Bayesian inference
NASA Astrophysics Data System (ADS)
Nagel, Joseph B.; Sudret, Bruno
2016-03-01
A spectral approach to Bayesian inference is presented. It pursues the emulation of the posterior probability density. The starting point is a series expansion of the likelihood function in terms of orthogonal polynomials. From this spectral likelihood expansion all statistical quantities of interest can be calculated semi-analytically. The posterior is formally represented as the product of a reference density and a linear combination of polynomial basis functions. Both the model evidence and the posterior moments are related to the expansion coefficients. This formulation avoids Markov chain Monte Carlo simulation and allows one to make use of linear least squares instead. The pros and cons of spectral Bayesian inference are discussed and demonstrated on the basis of simple applications from classical statistics and inverse modeling.
Automating approximate Bayesian computation by local linear regression.
Thornton, Kevin R
2009-07-07
In several biological contexts, parameter inference often relies on computationally-intensive techniques. "Approximate Bayesian Computation", or ABC, methods based on summary statistics have become increasingly popular. A particular flavor of ABC based on using a linear regression to approximate the posterior distribution of the parameters, conditional on the summary statistics, is computationally appealing, yet no standalone tool exists to automate the procedure. Here, I describe a program to implement the method. The software package ABCreg implements the local linear-regression approach to ABC. The advantages are: 1. The code is standalone, and fully-documented. 2. The program will automatically process multiple data sets, and create unique output files for each (which may be processed immediately in R), facilitating the testing of inference procedures on simulated data, or the analysis of multiple data sets. 3. The program implements two different transformation methods for the regression step. 4. Analysis options are controlled on the command line by the user, and the program is designed to output warnings for cases where the regression fails. 5. The program does not depend on any particular simulation machinery (coalescent, forward-time, etc.), and therefore is a general tool for processing the results from any simulation. 6. The code is open-source, and modular.Examples of applying the software to empirical data from Drosophila melanogaster, and testing the procedure on simulated data, are shown. In practice, the ABCreg simplifies implementing ABC based on local-linear regression.
Spectral Entropies as Information-Theoretic Tools for Complex Network Comparison
NASA Astrophysics Data System (ADS)
De Domenico, Manlio; Biamonte, Jacob
2016-10-01
Any physical system can be viewed from the perspective that information is implicitly represented in its state. However, the quantification of this information when it comes to complex networks has remained largely elusive. In this work, we use techniques inspired by quantum statistical mechanics to define an entropy measure for complex networks and to develop a set of information-theoretic tools, based on network spectral properties, such as Rényi q entropy, generalized Kullback-Leibler and Jensen-Shannon divergences, the latter allowing us to define a natural distance measure between complex networks. First, we show that by minimizing the Kullback-Leibler divergence between an observed network and a parametric network model, inference of model parameter(s) by means of maximum-likelihood estimation can be achieved and model selection can be performed with appropriate information criteria. Second, we show that the information-theoretic metric quantifies the distance between pairs of networks and we can use it, for instance, to cluster the layers of a multilayer system. By applying this framework to networks corresponding to sites of the human microbiome, we perform hierarchical cluster analysis and recover with high accuracy existing community-based associations. Our results imply that spectral-based statistical inference in complex networks results in demonstrably superior performance as well as a conceptual backbone, filling a gap towards a network information theory.
Statistics, Computation, and Modeling in Cosmology
NASA Astrophysics Data System (ADS)
Jewell, Jeff; Guiness, Joe; SAMSI 2016 Working Group in Cosmology
2017-01-01
Current and future ground and space based missions are designed to not only detect, but map out with increasing precision, details of the universe in its infancy to the present-day. As a result we are faced with the challenge of analyzing and interpreting observations from a wide variety of instruments to form a coherent view of the universe. Finding solutions to a broad range of challenging inference problems in cosmology is one of the goals of the “Statistics, Computation, and Modeling in Cosmology” workings groups, formed as part of the year long program on ‘Statistical, Mathematical, and Computational Methods for Astronomy’, hosted by the Statistical and Applied Mathematical Sciences Institute (SAMSI), a National Science Foundation funded institute. Two application areas have emerged for focused development in the cosmology working group involving advanced algorithmic implementations of exact Bayesian inference for the Cosmic Microwave Background, and statistical modeling of galaxy formation. The former includes study and development of advanced Markov Chain Monte Carlo algorithms designed to confront challenging inference problems including inference for spatial Gaussian random fields in the presence of sources of galactic emission (an example of a source separation problem). Extending these methods to future redshift survey data probing the nonlinear regime of large scale structure formation is also included in the working group activities. In addition, the working group is also focused on the study of ‘Galacticus’, a galaxy formation model applied to dark matter-only cosmological N-body simulations operating on time-dependent halo merger trees. The working group is interested in calibrating the Galacticus model to match statistics of galaxy survey observations; specifically stellar mass functions, luminosity functions, and color-color diagrams. The group will use subsampling approaches and fractional factorial designs to statistically and computationally efficiently explore the Galacticus parameter space. The group will also use the Galacticus simulations to study the relationship between the topological and physical structure of the halo merger trees and the properties of the resulting galaxies.
Attitude of teaching faculty towards statistics at a medical university in Karachi, Pakistan.
Khan, Nazeer; Mumtaz, Yasmin
2009-01-01
Statistics is mainly used in biological research to verify the clinicians and researchers findings and feelings, and gives scientific validity for their inferences. In Pakistan, the educational curriculum is developed in such a way that the students who are interested in entering in the field of biological sciences do not study mathematics after grade 10. Therefore, due to their fragile background of mathematical skills, the Pakistani medical professionals feel that they do not have adequate base to understand the basic concepts of statistical techniques when they try to use it in their research or read a scientific article. The aim of the study was to assess the attitude of medical faculty towards statistics. A questionnaire containing 42 close-ended and 4 open-ended questions, related to the attitude and knowledge of statistics, was distributed among the teaching faculty of Dow University of Health Sciences (DUHS). One hundred and sixty-seven filled questionnaires were returned from 374 faculty members (response rate 44.7%). Forty-three percent of the respondents claimed that they had 'introductive' level of statistics courses, 63% of the respondents strongly agreed that a good researcher must have some training in statistics, 82% of the faculty was in favour (strongly agreed or agreed) that statistics was really useful for research. Only 17% correctly stated that statistics is the science of uncertainty. Half of the respondents accepted that they have problem of writing the statistical section of the article. 64% of the subjects indicated that statistical teaching methods were the main reasons for the impression of its difficulties. 53% of the faculty indicated that the co-authorship of the statistician should depend upon his/her contribution in the study. Gender did not show any significant difference among the responses. However, senior faculty showed higher level of the importance for the use of statistics and difficulties of writing result section of articles as compared to junior faculty. The study showed a low level of knowledge, but high level of the awareness for the use of statistical techniques in research and exhibited a good level of motivation for further training.
Hybrid regulatory models: a statistically tractable approach to model regulatory network dynamics.
Ocone, Andrea; Millar, Andrew J; Sanguinetti, Guido
2013-04-01
Computational modelling of the dynamics of gene regulatory networks is a central task of systems biology. For networks of small/medium scale, the dominant paradigm is represented by systems of coupled non-linear ordinary differential equations (ODEs). ODEs afford great mechanistic detail and flexibility, but calibrating these models to data is often an extremely difficult statistical problem. Here, we develop a general statistical inference framework for stochastic transcription-translation networks. We use a coarse-grained approach, which represents the system as a network of stochastic (binary) promoter and (continuous) protein variables. We derive an exact inference algorithm and an efficient variational approximation that allows scalable inference and learning of the model parameters. We demonstrate the power of the approach on two biological case studies, showing that the method allows a high degree of flexibility and is capable of testable novel biological predictions. http://homepages.inf.ed.ac.uk/gsanguin/software.html. Supplementary data are available at Bioinformatics online.
New developments of a knowledge based system (VEG) for inferring vegetation characteristics
NASA Technical Reports Server (NTRS)
Kimes, D. S.; Harrison, P. A.; Harrison, P. R.
1992-01-01
An extraction technique for inferring physical and biological surface properties of vegetation using nadir and/or directional reflectance data as input has been developed. A knowledge-based system (VEG) accepts spectral data of an unknown target as input, determines the best strategy for inferring the desired vegetation characteristic, applies the strategy to the target data, and provides a rigorous estimate of the accuracy of the inference. Progress in developing the system is presented. VEG combines methods from remote sensing and artificial intelligence, and integrates input spectral measurements with diverse knowledge bases. VEG has been developed to (1) infer spectral hemispherical reflectance from any combination of nadir and/or off-nadir view angles; (2) test and develop new extraction techniques on an internal spectral database; (3) browse, plot, or analyze directional reflectance data in the system's spectral database; (4) discriminate between user-defined vegetation classes using spectral and directional reflectance relationships; and (5) infer unknown view angles from known view angles (known as view angle extension).
Zeng, Irene Sui Lan; Lumley, Thomas
2018-01-01
Integrated omics is becoming a new channel for investigating the complex molecular system in modern biological science and sets a foundation for systematic learning for precision medicine. The statistical/machine learning methods that have emerged in the past decade for integrated omics are not only innovative but also multidisciplinary with integrated knowledge in biology, medicine, statistics, machine learning, and artificial intelligence. Here, we review the nontrivial classes of learning methods from the statistical aspects and streamline these learning methods within the statistical learning framework. The intriguing findings from the review are that the methods used are generalizable to other disciplines with complex systematic structure, and the integrated omics is part of an integrated information science which has collated and integrated different types of information for inferences and decision making. We review the statistical learning methods of exploratory and supervised learning from 42 publications. We also discuss the strengths and limitations of the extended principal component analysis, cluster analysis, network analysis, and regression methods. Statistical techniques such as penalization for sparsity induction when there are fewer observations than the number of features and using Bayesian approach when there are prior knowledge to be integrated are also included in the commentary. For the completeness of the review, a table of currently available software and packages from 23 publications for omics are summarized in the appendix.
A Comparison of Several Techniques to Assign Heights to Cloud Tracers.
NASA Astrophysics Data System (ADS)
Nieman, Steven J.; Schmetz, Johannes; Menzel, W. Paul
1993-09-01
Satellite-derived cloud-motion vector (CMV) production has been troubled by inaccurate height assignment of cloud tracers, especially in thin semitransparent clouds. This paper presents the results of an intercomparison of current operational height assignment techniques. Currently, heights are assigned by one of three techniques when the appropriate spectral radiance measurements are available. The infrared window (IRW) technique compares measured brightness temperatures to forecast temperature profiles and thus infers opaque cloud levels. In semitransparent or small subpixel clouds, the carbon dioxide (CO2) technique uses the ratio of radiances from different layers of the atmosphere to infer the correct cloud height. In the water vapor (H2O) technique, radiances influenced by upper-tropospheric moisture and IRW radiances are measured for several pixels viewing different cloud amounts, and their linear relationship is used to extrapolate the correct cloud height. The results presented in this paper suggest that the H2O technique is a viable alternative to the CO2 technique for inferring the heights of semitransparent cloud elements. This is important since future National Environmental Satellite, Data, and Information Service (NESDIS) operations will have to rely on H20-derived cloud-height assignments in the wind field determinations with the next operational geostationary satellite. On a given day, the heights from the two approaches compare to within 60 110 hPa rms; drier atmospheric conditions tend to reduce the effectiveness of the H2O technique. By inference one can conclude that the present height algorithms used operationally at NESDIS (with the C02 technique) and at the European Satellite Operations Center (ESOC) (with their version of the H20 technique) are providing similar results. Sample wind fields produced with the ESOC and NESDIS algorithms using Meteosat-4 data show good agreement.
CADDIS Volume 4. Data Analysis: PECBO Appendix - R Scripts for Non-Parametric Regressions
Script for computing nonparametric regression analysis. Overview of using scripts to infer environmental conditions from biological observations, statistically estimating species-environment relationships, statistical scripts.
A Statistical Description of Neural Ensemble Dynamics
Long, John D.; Carmena, Jose M.
2011-01-01
The growing use of multi-channel neural recording techniques in behaving animals has produced rich datasets that hold immense potential for advancing our understanding of how the brain mediates behavior. One limitation of these techniques is they do not provide important information about the underlying anatomical connections among the recorded neurons within an ensemble. Inferring these connections is often intractable because the set of possible interactions grows exponentially with ensemble size. This is a fundamental challenge one confronts when interpreting these data. Unfortunately, the combination of expert knowledge and ensemble data is often insufficient for selecting a unique model of these interactions. Our approach shifts away from modeling the network diagram of the ensemble toward analyzing changes in the dynamics of the ensemble as they relate to behavior. Our contribution consists of adapting techniques from signal processing and Bayesian statistics to track the dynamics of ensemble data on time-scales comparable with behavior. We employ a Bayesian estimator to weigh prior information against the available ensemble data, and use an adaptive quantization technique to aggregate poorly estimated regions of the ensemble data space. Importantly, our method is capable of detecting changes in both the magnitude and structure of correlations among neurons missed by firing rate metrics. We show that this method is scalable across a wide range of time-scales and ensemble sizes. Lastly, the performance of this method on both simulated and real ensemble data is used to demonstrate its utility. PMID:22319486
ERIC Educational Resources Information Center
Mislevy, Robert J.
Educational test theory consists of statistical and methodological tools to support inferences about examinees' knowledge, skills, and accomplishments. The evolution of test theory has been shaped by the nature of users' inferences which, until recently, have been framed almost exclusively in terms of trait and behavioral psychology. Progress in…
Data-driven sensitivity inference for Thomson scattering electron density measurement systems.
Fujii, Keisuke; Yamada, Ichihiro; Hasuo, Masahiro
2017-01-01
We developed a method to infer the calibration parameters of multichannel measurement systems, such as channel variations of sensitivity and noise amplitude, from experimental data. We regard such uncertainties of the calibration parameters as dependent noise. The statistical properties of the dependent noise and that of the latent functions were modeled and implemented in the Gaussian process kernel. Based on their statistical difference, both parameters were inferred from the data. We applied this method to the electron density measurement system by Thomson scattering for the Large Helical Device plasma, which is equipped with 141 spatial channels. Based on the 210 sets of experimental data, we evaluated the correction factor of the sensitivity and noise amplitude for each channel. The correction factor varies by ≈10%, and the random noise amplitude is ≈2%, i.e., the measurement accuracy increases by a factor of 5 after this sensitivity correction. The certainty improvement in the spatial derivative inference was demonstrated.
NASA Astrophysics Data System (ADS)
Calderon, Christopher P.; Weiss, Lucien E.; Moerner, W. E.
2014-05-01
Experimental advances have improved the two- (2D) and three-dimensional (3D) spatial resolution that can be extracted from in vivo single-molecule measurements. This enables researchers to quantitatively infer the magnitude and directionality of forces experienced by biomolecules in their native environment. Situations where such force information is relevant range from mitosis to directed transport of protein cargo along cytoskeletal structures. Models commonly applied to quantify single-molecule dynamics assume that effective forces and velocity in the x ,y (or x ,y,z) directions are statistically independent, but this assumption is physically unrealistic in many situations. We present a hypothesis testing approach capable of determining if there is evidence of statistical dependence between positional coordinates in experimentally measured trajectories; if the hypothesis of independence between spatial coordinates is rejected, then a new model accounting for 2D (3D) interactions can and should be considered. Our hypothesis testing technique is robust, meaning it can detect interactions, even if the noise statistics are not well captured by the model. The approach is demonstrated on control simulations and on experimental data (directed transport of intraflagellar transport protein 88 homolog in the primary cilium).
Plazzotta, Fernando; Otero, Carlos; Luna, Daniel; de Quiros, Fernan Gonzalez Bernaldo
2013-01-01
Physicians do not always keep the problem list accurate, complete and updated. To analyze natural language processing (NLP) techniques and inference rules as strategies to maintain completeness and accuracy of the problem list in EHRs. Non systematic literature review in PubMed, in the last 10 years. Strategies to maintain the EHRs problem list were analyzed in two ways: inputting and removing problems from the problem list. NLP and inference rules have acceptable performance for inputting problems into the problem list. No studies using these techniques for removing problems were published Conclusion: Both tools, NLP and inference rules have had acceptable results as tools for maintain the completeness and accuracy of the problem list.
Imaging Extended Emission-Line Regions of Obscured AGN with the Subaru Hyper Suprime-Cam Survey
NASA Astrophysics Data System (ADS)
Sun, Ai-Lei; Greene, Jenny E.; Zakamska, Nadia L.; Goulding, Andy; Strauss, Michael A.; Huang, Song; Johnson, Sean; Kawaguchi, Toshihiro; Matsuoka, Yoshiki; Marsteller, Alisabeth A.; Nagao, Tohru; Toba, Yoshiki
2018-05-01
Narrow-line regions excited by active galactic nuclei (AGN) are important for studying AGN photoionization and feedback. Their strong [O III] lines can be detected with broadband images, allowing morphological studies of these systems with large-area imaging surveys. We develop a new broad-band imaging technique to reconstruct the images of the [O III] line using the Subaru Hyper Suprime-Cam (HSC) Survey aided with spectra from the Sloan Digital Sky Survey (SDSS). The technique involves a careful subtraction of the galactic continuum to isolate emission from the [O III]λ5007 and [O III]λ4959 lines. Compared to traditional targeted observations, this technique is more efficient at covering larger samples without dedicated observational resources. We apply this technique to an SDSS spectroscopically selected sample of 300 obscured AGN at redshifts 0.1 - 0.7, uncovering extended emission-line region candidates with sizes up to tens of kpc. With the largest sample of uniformly derived narrow-line region sizes, we revisit the narrow-line region size - luminosity relation. The area and radii of the [O III] emission-line regions are strongly correlated with the AGN luminosity inferred from the mid-infrared (15 μm rest-frame) with a power-law slope of 0.62^{+0.05}_{-0.06}± 0.10 (statistical and systematic errors), consistent with previous spectroscopic findings. We discuss the implications for the physics of AGN emission-line regions and future applications of this technique, which should be useful for current and next-generation imaging surveys to study AGN photoionization and feedback with large statistical samples.
Local dependence in random graph models: characterization, properties and statistical inference
Schweinberger, Michael; Handcock, Mark S.
2015-01-01
Summary Dependent phenomena, such as relational, spatial and temporal phenomena, tend to be characterized by local dependence in the sense that units which are close in a well-defined sense are dependent. In contrast with spatial and temporal phenomena, though, relational phenomena tend to lack a natural neighbourhood structure in the sense that it is unknown which units are close and thus dependent. Owing to the challenge of characterizing local dependence and constructing random graph models with local dependence, many conventional exponential family random graph models induce strong dependence and are not amenable to statistical inference. We take first steps to characterize local dependence in random graph models, inspired by the notion of finite neighbourhoods in spatial statistics and M-dependence in time series, and we show that local dependence endows random graph models with desirable properties which make them amenable to statistical inference. We show that random graph models with local dependence satisfy a natural domain consistency condition which every model should satisfy, but conventional exponential family random graph models do not satisfy. In addition, we establish a central limit theorem for random graph models with local dependence, which suggests that random graph models with local dependence are amenable to statistical inference. We discuss how random graph models with local dependence can be constructed by exploiting either observed or unobserved neighbourhood structure. In the absence of observed neighbourhood structure, we take a Bayesian view and express the uncertainty about the neighbourhood structure by specifying a prior on a set of suitable neighbourhood structures. We present simulation results and applications to two real world networks with ‘ground truth’. PMID:26560142
FUNSTAT and statistical image representations
NASA Technical Reports Server (NTRS)
Parzen, E.
1983-01-01
General ideas of functional statistical inference analysis of one sample and two samples, univariate and bivariate are outlined. ONESAM program is applied to analyze the univariate probability distributions of multi-spectral image data.
Is it possible to identify a trend in problem/failure data
NASA Technical Reports Server (NTRS)
Church, Curtis K.
1990-01-01
One of the major obstacles in identifying and interpreting a trend is the small number of data points. Future trending reports will begin with 1983 data. As the problem/failure data are aggregated by year, there are just seven observations (1983 to 1989) for the 1990 reports. Any statistical inferences with a small amount of data will have a large degree of uncertainty. Consequently, a regression technique approach to identify a trend is limited. Though trend determination by failure mode may be unrealistic, the data may be explored for consistency or stability and the failure rate investigated. Various alternative data analysis procedures are briefly discussed. Techniques that could be used to explore problem/failure data by failure mode are addressed. The data used are taken from Section One, Space Shuttle Main Engine, of the Calspan Quarterly Report dated April 2, 1990.
Damage level prediction of non-reshaped berm breakwater using ANN, SVM and ANFIS models
NASA Astrophysics Data System (ADS)
Mandal, Sukomal; Rao, Subba; N., Harish; Lokesha
2012-06-01
The damage analysis of coastal structure is very important as it involves many design parameters to be considered for the better and safe design of structure. In the present study experimental data for non-reshaped berm breakwater are collected from Marine Structures Laboratory, Department of Applied Mechanics and Hydraulics, NITK, Surathkal, India. Soft computing techniques like Artificial Neural Network (ANN), Support Vector Machine (SVM) and Adaptive Neuro Fuzzy Inference system (ANFIS) models are constructed using experimental data sets to predict the damage level of non-reshaped berm breakwater. The experimental data are used to train ANN, SVM and ANFIS models and results are determined in terms of statistical measures like mean square error, root mean square error, correla-tion coefficient and scatter index. The result shows that soft computing techniques i.e., ANN, SVM and ANFIS can be efficient tools in predicting damage levels of non reshaped berm breakwater.
NASA Astrophysics Data System (ADS)
Gbedo, Yémalin Gabin; Mangin-Brinet, Mariane
2017-07-01
We present a new procedure to determine parton distribution functions (PDFs), based on Markov chain Monte Carlo (MCMC) methods. The aim of this paper is to show that we can replace the standard χ2 minimization by procedures grounded on statistical methods, and on Bayesian inference in particular, thus offering additional insight into the rich field of PDFs determination. After a basic introduction to these techniques, we introduce the algorithm we have chosen to implement—namely Hybrid (or Hamiltonian) Monte Carlo. This algorithm, initially developed for Lattice QCD, turns out to be very interesting when applied to PDFs determination by global analyses; we show that it allows us to circumvent the difficulties due to the high dimensionality of the problem, in particular concerning the acceptance. A first feasibility study is performed and presented, which indicates that Markov chain Monte Carlo can successfully be applied to the extraction of PDFs and of their uncertainties.
Contemporary Quantitative Methods and "Slow" Causal Inference: Response to Palinkas
ERIC Educational Resources Information Center
Stone, Susan
2014-01-01
This response considers together simultaneously occurring discussions about causal inference in social work and allied health and social science disciplines. It places emphasis on scholarship that integrates the potential outcomes model with directed acyclic graphing techniques to extract core steps in causal inference. Although this scholarship…
Heddam, Salim
2014-01-01
In this study, we present application of an artificial intelligence (AI) technique model called dynamic evolving neural-fuzzy inference system (DENFIS) based on an evolving clustering method (ECM), for modelling dissolved oxygen concentration in a river. To demonstrate the forecasting capability of DENFIS, a one year period from 1 January 2009 to 30 December 2009, of hourly experimental water quality data collected by the United States Geological Survey (USGS Station No: 420853121505500) station at Klamath River at Miller Island Boat Ramp, OR, USA, were used for model development. Two DENFIS-based models are presented and compared. The two DENFIS systems are: (1) offline-based system named DENFIS-OF, and (2) online-based system, named DENFIS-ON. The input variables used for the two models are water pH, temperature, specific conductance, and sensor depth. The performances of the models are evaluated using root mean square errors (RMSE), mean absolute error (MAE), Willmott index of agreement (d) and correlation coefficient (CC) statistics. The lowest root mean square error and highest correlation coefficient values were obtained with the DENFIS-ON method. The results obtained with DENFIS models are compared with linear (multiple linear regression, MLR) and nonlinear (multi-layer perceptron neural networks, MLPNN) methods. This study demonstrates that DENFIS-ON investigated herein outperforms all the proposed techniques for DO modelling.
Quakefinder: A scalable data mining system for detecting earthquakes from space
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stolorz, P.; Dean, C.
1996-12-31
We present an application of novel massively parallel datamining techniques to highly precise inference of important physical processes from remote sensing imagery. Specifically, we have developed and applied a system, Quakefinder, that automatically detects and measures tectonic activity in the earth`s crust by examination of satellite data. We have used Quakefinder to automatically map the direction and magnitude of ground displacements due to the 1992 Landers earthquake in Southern California, over a spatial region of several hundred square kilometers, at a resolution of 10 meters, to a (sub-pixel) precision of 1 meter. This is the first calculation that has evermore » been able to extract area-mapped information about 2D tectonic processes at this level of detail. We outline the architecture of the Quakefinder system, based upon a combination of techniques drawn from the fields of statistical inference, massively parallel computing and global optimization. We confirm the overall correctness of the procedure by comparison of our results with known locations of targeted faults obtained by careful and time-consuming field measurements. The system also performs knowledge discovery by indicating novel unexplained tectonic activity away from the primary faults that has never before been observed. We conclude by discussing the future potential of this data mining system in the broad context of studying subtle spatio-temporal processes within massive image streams.« less
Wolfson, Julian; Henn, Lisa
2014-01-01
In many areas of clinical investigation there is great interest in identifying and validating surrogate endpoints, biomarkers that can be measured a relatively short time after a treatment has been administered and that can reliably predict the effect of treatment on the clinical outcome of interest. However, despite dramatic advances in the ability to measure biomarkers, the recent history of clinical research is littered with failed surrogates. In this paper, we present a statistical perspective on why identifying surrogate endpoints is so difficult. We view the problem from the framework of causal inference, with a particular focus on the technique of principal stratification (PS), an approach which is appealing because the resulting estimands are not biased by unmeasured confounding. In many settings, PS estimands are not statistically identifiable and their degree of non-identifiability can be thought of as representing the statistical difficulty of assessing the surrogate value of a biomarker. In this work, we examine the identifiability issue and present key simplifying assumptions and enhanced study designs that enable the partial or full identification of PS estimands. We also present example situations where these assumptions and designs may or may not be feasible, providing insight into the problem characteristics which make the statistical evaluation of surrogate endpoints so challenging.
2014-01-01
In many areas of clinical investigation there is great interest in identifying and validating surrogate endpoints, biomarkers that can be measured a relatively short time after a treatment has been administered and that can reliably predict the effect of treatment on the clinical outcome of interest. However, despite dramatic advances in the ability to measure biomarkers, the recent history of clinical research is littered with failed surrogates. In this paper, we present a statistical perspective on why identifying surrogate endpoints is so difficult. We view the problem from the framework of causal inference, with a particular focus on the technique of principal stratification (PS), an approach which is appealing because the resulting estimands are not biased by unmeasured confounding. In many settings, PS estimands are not statistically identifiable and their degree of non-identifiability can be thought of as representing the statistical difficulty of assessing the surrogate value of a biomarker. In this work, we examine the identifiability issue and present key simplifying assumptions and enhanced study designs that enable the partial or full identification of PS estimands. We also present example situations where these assumptions and designs may or may not be feasible, providing insight into the problem characteristics which make the statistical evaluation of surrogate endpoints so challenging. PMID:25342953
NASA Astrophysics Data System (ADS)
Nooruddin, Hasan A.; Anifowose, Fatai; Abdulraheem, Abdulazeez
2014-03-01
Soft computing techniques are recently becoming very popular in the oil industry. A number of computational intelligence-based predictive methods have been widely applied in the industry with high prediction capabilities. Some of the popular methods include feed-forward neural networks, radial basis function network, generalized regression neural network, functional networks, support vector regression and adaptive network fuzzy inference system. A comparative study among most popular soft computing techniques is presented using a large dataset published in literature describing multimodal pore systems in the Arab D formation. The inputs to the models are air porosity, grain density, and Thomeer parameters obtained using mercury injection capillary pressure profiles. Corrected air permeability is the target variable. Applying developed permeability models in recent reservoir characterization workflow ensures consistency between micro and macro scale information represented mainly by Thomeer parameters and absolute permeability. The dataset was divided into two parts with 80% of data used for training and 20% for testing. The target permeability variable was transformed to the logarithmic scale as a pre-processing step and to show better correlations with the input variables. Statistical and graphical analysis of the results including permeability cross-plots and detailed error measures were created. In general, the comparative study showed very close results among the developed models. The feed-forward neural network permeability model showed the lowest average relative error, average absolute relative error, standard deviations of error and root means squares making it the best model for such problems. Adaptive network fuzzy inference system also showed very good results.
Dynamic causal modelling: a critical review of the biophysical and statistical foundations.
Daunizeau, J; David, O; Stephan, K E
2011-09-15
The goal of dynamic causal modelling (DCM) of neuroimaging data is to study experimentally induced changes in functional integration among brain regions. This requires (i) biophysically plausible and physiologically interpretable models of neuronal network dynamics that can predict distributed brain responses to experimental stimuli and (ii) efficient statistical methods for parameter estimation and model comparison. These two key components of DCM have been the focus of more than thirty methodological articles since the seminal work of Friston and colleagues published in 2003. In this paper, we provide a critical review of the current state-of-the-art of DCM. We inspect the properties of DCM in relation to the most common neuroimaging modalities (fMRI and EEG/MEG) and the specificity of inference on neural systems that can be made from these data. We then discuss both the plausibility of the underlying biophysical models and the robustness of the statistical inversion techniques. Finally, we discuss potential extensions of the current DCM framework, such as stochastic DCMs, plastic DCMs and field DCMs. Copyright © 2009 Elsevier Inc. All rights reserved.
Generating Models of Infinite-State Communication Protocols Using Regular Inference with Abstraction
NASA Astrophysics Data System (ADS)
Aarts, Fides; Jonsson, Bengt; Uijen, Johan
In order to facilitate model-based verification and validation, effort is underway to develop techniques for generating models of communication system components from observations of their external behavior. Most previous such work has employed regular inference techniques which generate modest-size finite-state models. They typically suppress parameters of messages, although these have a significant impact on control flow in many communication protocols. We present a framework, which adapts regular inference to include data parameters in messages and states for generating components with large or infinite message alphabets. A main idea is to adapt the framework of predicate abstraction, successfully used in formal verification. Since we are in a black-box setting, the abstraction must be supplied externally, using information about how the component manages data parameters. We have implemented our techniques by connecting the LearnLib tool for regular inference with the protocol simulator ns-2, and generated a model of the SIP component as implemented in ns-2.
Building Intuitions about Statistical Inference Based on Resampling
ERIC Educational Resources Information Center
Watson, Jane; Chance, Beth
2012-01-01
Formal inference, which makes theoretical assumptions about distributions and applies hypothesis testing procedures with null and alternative hypotheses, is notoriously difficult for tertiary students to master. The debate about whether this content should appear in Years 11 and 12 of the "Australian Curriculum: Mathematics" has gone on…
Theory-based Bayesian models of inductive learning and reasoning.
Tenenbaum, Joshua B; Griffiths, Thomas L; Kemp, Charles
2006-07-01
Inductive inference allows humans to make powerful generalizations from sparse data when learning about word meanings, unobserved properties, causal relationships, and many other aspects of the world. Traditional accounts of induction emphasize either the power of statistical learning, or the importance of strong constraints from structured domain knowledge, intuitive theories or schemas. We argue that both components are necessary to explain the nature, use and acquisition of human knowledge, and we introduce a theory-based Bayesian framework for modeling inductive learning and reasoning as statistical inferences over structured knowledge representations.
Data free inference with processed data products
Chowdhary, K.; Najm, H. N.
2014-07-12
Here, we consider the context of probabilistic inference of model parameters given error bars or confidence intervals on model output values, when the data is unavailable. We introduce a class of algorithms in a Bayesian framework, relying on maximum entropy arguments and approximate Bayesian computation methods, to generate consistent data with the given summary statistics. Once we obtain consistent data sets, we pool the respective posteriors, to arrive at a single, averaged density on the parameters. This approach allows us to perform accurate forward uncertainty propagation consistent with the reported statistics.
Statistics at the Chinese Universities.
1981-09-01
education in China in the postwar years is pro- vided to give some perspective. My observa- tions on statistics at the Chinese universities are necessarily...has been accepted as a member society of ISI. 3. Education in China Understanding of statistics in universities in China will be enhanced through some...programaming), Statistical Mathematics (infer- ence, data analysis, industrial statistics , information theory), tiathematical Physics (dif- ferential
Advanced statistics: linear regression, part II: multiple linear regression.
Marill, Keith A
2004-01-01
The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.
The role of familiarity in binary choice inferences.
Honda, Hidehito; Abe, Keiga; Matsuka, Toshihiko; Yamagishi, Kimihiko
2011-07-01
In research on the recognition heuristic (Goldstein & Gigerenzer, Psychological Review, 109, 75-90, 2002), knowledge of recognized objects has been categorized as "recognized" or "unrecognized" without regard to the degree of familiarity of the recognized object. In the present article, we propose a new inference model--familiarity-based inference. We hypothesize that when subjective knowledge levels (familiarity) of recognized objects differ, the degree of familiarity of recognized objects will influence inferences. Specifically, people are predicted to infer that the more familiar object in a pair of two objects has a higher criterion value on the to-be-judged dimension. In two experiments, using a binary choice task, we examined inferences about populations in a pair of two cities. Results support predictions of familiarity-based inference. Participants inferred that the more familiar city in a pair was more populous. Statistical modeling showed that individual differences in familiarity-based inference lie in the sensitivity to differences in familiarity. In addition, we found that familiarity-based inference can be generally regarded as an ecologically rational inference. Furthermore, when cue knowledge about the inference criterion was available, participants made inferences based on the cue knowledge about population instead of familiarity. Implications of the role of familiarity in psychological processes are discussed.
Statistical inference methods for sparse biological time series data.
Ndukum, Juliet; Fonseca, Luís L; Santos, Helena; Voit, Eberhard O; Datta, Susmita
2011-04-25
Comparing metabolic profiles under different biological perturbations has become a powerful approach to investigating the functioning of cells. The profiles can be taken as single snapshots of a system, but more information is gained if they are measured longitudinally over time. The results are short time series consisting of relatively sparse data that cannot be analyzed effectively with standard time series techniques, such as autocorrelation and frequency domain methods. In this work, we study longitudinal time series profiles of glucose consumption in the yeast Saccharomyces cerevisiae under different temperatures and preconditioning regimens, which we obtained with methods of in vivo nuclear magnetic resonance (NMR) spectroscopy. For the statistical analysis we first fit several nonlinear mixed effect regression models to the longitudinal profiles and then used an ANOVA likelihood ratio method in order to test for significant differences between the profiles. The proposed methods are capable of distinguishing metabolic time trends resulting from different treatments and associate significance levels to these differences. Among several nonlinear mixed-effects regression models tested, a three-parameter logistic function represents the data with highest accuracy. ANOVA and likelihood ratio tests suggest that there are significant differences between the glucose consumption rate profiles for cells that had been--or had not been--preconditioned by heat during growth. Furthermore, pair-wise t-tests reveal significant differences in the longitudinal profiles for glucose consumption rates between optimal conditions and heat stress, optimal and recovery conditions, and heat stress and recovery conditions (p-values <0.0001). We have developed a nonlinear mixed effects model that is appropriate for the analysis of sparse metabolic and physiological time profiles. The model permits sound statistical inference procedures, based on ANOVA likelihood ratio tests, for testing the significance of differences between short time course data under different biological perturbations.
NASA Astrophysics Data System (ADS)
Fatichi, S.; Ivanov, V. Y.; Caporali, E.
2013-04-01
This study extends a stochastic downscaling methodology to generation of an ensemble of hourly time series of meteorological variables that express possible future climate conditions at a point-scale. The stochastic downscaling uses general circulation model (GCM) realizations and an hourly weather generator, the Advanced WEather GENerator (AWE-GEN). Marginal distributions of factors of change are computed for several climate statistics using a Bayesian methodology that can weight GCM realizations based on the model relative performance with respect to a historical climate and a degree of disagreement in projecting future conditions. A Monte Carlo technique is used to sample the factors of change from their respective marginal distributions. As a comparison with traditional approaches, factors of change are also estimated by averaging GCM realizations. With either approach, the derived factors of change are applied to the climate statistics inferred from historical observations to re-evaluate parameters of the weather generator. The re-parameterized generator yields hourly time series of meteorological variables that can be considered to be representative of future climate conditions. In this study, the time series are generated in an ensemble mode to fully reflect the uncertainty of GCM projections, climate stochasticity, as well as uncertainties of the downscaling procedure. Applications of the methodology in reproducing future climate conditions for the periods of 2000-2009, 2046-2065 and 2081-2100, using the period of 1962-1992 as the historical baseline are discussed for the location of Firenze (Italy). The inferences of the methodology for the period of 2000-2009 are tested against observations to assess reliability of the stochastic downscaling procedure in reproducing statistics of meteorological variables at different time scales.
NASA Technical Reports Server (NTRS)
Matney, M.; Barker, E.; Seitzer, P.; Abercromby, K. J.; Rodriquez, H. M.
2006-01-01
NASA's Orbital Debris measurements program has a goal to characterize the small debris environment in the geosynchronous Earth-orbit (GEO) region using optical telescopes ("small" refers to objects too small to catalog and track with current systems). Traditionally, observations of GEO and near-GEO objects involve following the object with the telescope long enough to obtain an orbit suitable for tracking purposes. Telescopes operating in survey mode, however, randomly observe objects that pass through their field of view. Typically, these short-arc observation are inadequate to obtain detailed orbits, but can be used to estimate approximate circular orbit elements (semimajor axis, inclination, and ascending node). From this information, it should be possible to make statistical inferences about the orbital distributions of the GEO population bright enough to be observed by the system. The Michigan Orbital Debris Survey Telescope (MODEST) has been making such statistical surveys of the GEO region for four years. During that time, the telescope has made enough observations in enough areas of the GEO belt to have had nearly complete coverage. That means that almost all objects in all possible orbits in the GEO and near- GEO region had a non-zero chance of being observed. Some regions (such as those near zero inclination) have had good coverage, while others are poorly covered. Nevertheless, it is possible to remove these statistical biases and reconstruct the orbit populations within the limits of sampling error. In this paper, these statistical techniques and assumptions are described, and the techniques are applied to the current MODEST data set to arrive at our best estimate of the GEO orbit population distribution.
Rich analysis and rational models: Inferring individual behavior from infant looking data
Piantadosi, Steven T.; Kidd, Celeste; Aslin, Richard
2013-01-01
Studies of infant looking times over the past 50 years have provided profound insights about cognitive development, but their dependent measures and analytic techniques are quite limited. In the context of infants' attention to discrete sequential events, we show how a Bayesian data analysis approach can be combined with a rational cognitive model to create a rich data analysis framework for infant looking times. We formalize (i) a statistical learning model (ii) a parametric linking between the learning model's beliefs and infants' looking behavior, and (iii) a data analysis model that infers parameters of the cognitive model and linking function for groups and individuals. Using this approach, we show that recent findings from Kidd, Piantadosi, and Aslin (2012) of a U-shaped relationship between look-away probability and stimulus complexity even holds within infants and is not due to averaging subjects with different types of behavior. Our results indicate that individual infants prefer stimuli of intermediate complexity, reserving attention for events that are moderately predictable given their probabilistic expectations about the world. PMID:24750256
Rich analysis and rational models: inferring individual behavior from infant looking data.
Piantadosi, Steven T; Kidd, Celeste; Aslin, Richard
2014-05-01
Studies of infant looking times over the past 50 years have provided profound insights about cognitive development, but their dependent measures and analytic techniques are quite limited. In the context of infants' attention to discrete sequential events, we show how a Bayesian data analysis approach can be combined with a rational cognitive model to create a rich data analysis framework for infant looking times. We formalize (i) a statistical learning model, (ii) a parametric linking between the learning model's beliefs and infants' looking behavior, and (iii) a data analysis approach and model that infers parameters of the cognitive model and linking function for groups and individuals. Using this approach, we show that recent findings from Kidd, Piantadosi and Aslin (iv) of a U-shaped relationship between look-away probability and stimulus complexity even holds within infants and is not due to averaging subjects with different types of behavior. Our results indicate that individual infants prefer stimuli of intermediate complexity, reserving attention for events that are moderately predictable given their probabilistic expectations about the world. © 2014 John Wiley & Sons Ltd.
Comparison of two correlated ROC curves at a given specificity or sensitivity level
Bantis, Leonidas E.; Feng, Ziding
2017-01-01
The receiver operating characteristic (ROC) curve is the most popular statistical tool for evaluating the discriminatory capability of a given continuous biomarker. The need to compare two correlated ROC curves arises when individuals are measured with two biomarkers, which induces paired and thus correlated measurements. Many researchers have focused on comparing two correlated ROC curves in terms of the area under the curve (AUC), which summarizes the overall performance of the marker. However, particular values of specificity may be of interest. We focus on comparing two correlated ROC curves at a given specificity level. We propose parametric approaches, transformations to normality, and nonparametric kernel-based approaches. Our methods can be straightforwardly extended for inference in terms of ROC−1(t). This is of particular interest for comparing the accuracy of two correlated biomarkers at a given sensitivity level. Extensions also involve inference for the AUC and accommodating covariates. We evaluate the robustness of our techniques through simulations, compare to other known approaches and present a real data application involving prostate cancer screening. PMID:27324068
A Catalog of Distances to Molecular Clouds from Pan-STARRS1
NASA Astrophysics Data System (ADS)
Schlafly, Eddie; Green, G.; Finkbeiner, D. P.; Rix, H.
2014-01-01
We present a catalog of distances to molecular clouds, derived from PanSTARRS-1 photometry. We simultaneously infer the full probability distribution function of reddening and distance of the stars towards these clouds using the technique of Green et al. (2013) (see neighboring poster). We fit the resulting measurements using a simple dust screen model to infer the distance to each cloud. The result is a large, homogeneous catalog of distances to molecular clouds. For clouds with heliocentric distances greater than about 200 pc, typical statistical uncertainties in the distances are 5%, with systematic uncertainty stemming from the quality of our stellar models of about 10%. We have applied this analysis to many of the most well-studied clouds in the δ > -30° sky, including Orion, California, Taurus, Perseus, and Cepheus. We have also studied the entire catalog of Magnani, Blitz, and Mundy (1985; MBM), though for about half of those clouds we can provide only upper limits on the distances. We compare our distances with distances from the literature, when available, and find good agreement.
The logical foundations of forensic science: towards reliable knowledge.
Evett, Ian
2015-08-05
The generation of observations is a technical process and the advances that have been made in forensic science techniques over the last 50 years have been staggering. But science is about reasoning-about making sense from observations. For the forensic scientist, this is the challenge of interpreting a pattern of observations within the context of a legal trial. Here too, there have been major advances over recent years and there is a broad consensus among serious thinkers, both scientific and legal, that the logical framework is furnished by Bayesian inference (Aitken et al. Fundamentals of Probability and Statistical Evidence in Criminal Proceedings). This paper shows how the paradigm has matured, centred on the notion of the balanced scientist. Progress through the courts has not been always smooth and difficulties arising from recent judgments are discussed. Nevertheless, the future holds exciting prospects, in particular the opportunities for managing and calibrating the knowledge of the forensic scientists who assign the probabilities that are at the foundation of logical inference in the courtroom. © 2015 The Author(s) Published by the Royal Society. All rights reserved.
NASA Astrophysics Data System (ADS)
Kim, Junhan; Marrone, Daniel P.; Chan, Chi-Kwan; Medeiros, Lia; Özel, Feryal; Psaltis, Dimitrios
2016-12-01
The Event Horizon Telescope (EHT) is a millimeter-wavelength, very-long-baseline interferometry (VLBI) experiment that is capable of observing black holes with horizon-scale resolution. Early observations have revealed variable horizon-scale emission in the Galactic Center black hole, Sagittarius A* (Sgr A*). Comparing such observations to time-dependent general relativistic magnetohydrodynamic (GRMHD) simulations requires statistical tools that explicitly consider the variability in both the data and the models. We develop here a Bayesian method to compare time-resolved simulation images to variable VLBI data, in order to infer model parameters and perform model comparisons. We use mock EHT data based on GRMHD simulations to explore the robustness of this Bayesian method and contrast it to approaches that do not consider the effects of variability. We find that time-independent models lead to offset values of the inferred parameters with artificially reduced uncertainties. Moreover, neglecting the variability in the data and the models often leads to erroneous model selections. We finally apply our method to the early EHT data on Sgr A*.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Junhan; Marrone, Daniel P.; Chan, Chi-Kwan
2016-12-01
The Event Horizon Telescope (EHT) is a millimeter-wavelength, very-long-baseline interferometry (VLBI) experiment that is capable of observing black holes with horizon-scale resolution. Early observations have revealed variable horizon-scale emission in the Galactic Center black hole, Sagittarius A* (Sgr A*). Comparing such observations to time-dependent general relativistic magnetohydrodynamic (GRMHD) simulations requires statistical tools that explicitly consider the variability in both the data and the models. We develop here a Bayesian method to compare time-resolved simulation images to variable VLBI data, in order to infer model parameters and perform model comparisons. We use mock EHT data based on GRMHD simulations to explore themore » robustness of this Bayesian method and contrast it to approaches that do not consider the effects of variability. We find that time-independent models lead to offset values of the inferred parameters with artificially reduced uncertainties. Moreover, neglecting the variability in the data and the models often leads to erroneous model selections. We finally apply our method to the early EHT data on Sgr A*.« less
Kwak, Sehyun; Svensson, J; Brix, M; Ghim, Y-C
2016-02-01
A Bayesian model of the emission spectrum of the JET lithium beam has been developed to infer the intensity of the Li I (2p-2s) line radiation and associated uncertainties. The detected spectrum for each channel of the lithium beam emission spectroscopy system is here modelled by a single Li line modified by an instrumental function, Bremsstrahlung background, instrumental offset, and interference filter curve. Both the instrumental function and the interference filter curve are modelled with non-parametric Gaussian processes. All free parameters of the model, the intensities of the Li line, Bremsstrahlung background, and instrumental offset, are inferred using Bayesian probability theory with a Gaussian likelihood for photon statistics and electronic background noise. The prior distributions of the free parameters are chosen as Gaussians. Given these assumptions, the intensity of the Li line and corresponding uncertainties are analytically available using a Bayesian linear inversion technique. The proposed approach makes it possible to extract the intensity of Li line without doing a separate background subtraction through modulation of the Li beam.
Statistical modeling of software reliability
NASA Technical Reports Server (NTRS)
Miller, Douglas R.
1992-01-01
This working paper discusses the statistical simulation part of a controlled software development experiment being conducted under the direction of the System Validation Methods Branch, Information Systems Division, NASA Langley Research Center. The experiment uses guidance and control software (GCS) aboard a fictitious planetary landing spacecraft: real-time control software operating on a transient mission. Software execution is simulated to study the statistical aspects of reliability and other failure characteristics of the software during development, testing, and random usage. Quantification of software reliability is a major goal. Various reliability concepts are discussed. Experiments are described for performing simulations and collecting appropriate simulated software performance and failure data. This data is then used to make statistical inferences about the quality of the software development and verification processes as well as inferences about the reliability of software versions and reliability growth under random testing and debugging.
Quantum-Like Representation of Non-Bayesian Inference
NASA Astrophysics Data System (ADS)
Asano, M.; Basieva, I.; Khrennikov, A.; Ohya, M.; Tanaka, Y.
2013-01-01
This research is related to the problem of "irrational decision making or inference" that have been discussed in cognitive psychology. There are some experimental studies, and these statistical data cannot be described by classical probability theory. The process of decision making generating these data cannot be reduced to the classical Bayesian inference. For this problem, a number of quantum-like coginitive models of decision making was proposed. Our previous work represented in a natural way the classical Bayesian inference in the frame work of quantum mechanics. By using this representation, in this paper, we try to discuss the non-Bayesian (irrational) inference that is biased by effects like the quantum interference. Further, we describe "psychological factor" disturbing "rationality" as an "environment" correlating with the "main system" of usual Bayesian inference.
Self-Inference and the Foot-in-the-Door Technique: Quantity of Behavior and Attitudinal Mediation.
ERIC Educational Resources Information Center
Dillard, James Price
1990-01-01
Investigates the self-perception theory account of the foot-in-the-door (FITD) phenomenon, a sequential-request technique using a small first request followed by a larger, target request. Finds that a self-inference explanation is viable, but that a strict self-perception account fails because neither request size nor execution correspond to…
A simple method for processing data with least square method
NASA Astrophysics Data System (ADS)
Wang, Chunyan; Qi, Liqun; Chen, Yongxiang; Pang, Guangning
2017-08-01
The least square method is widely used in data processing and error estimation. The mathematical method has become an essential technique for parameter estimation, data processing, regression analysis and experimental data fitting, and has become a criterion tool for statistical inference. In measurement data analysis, the distribution of complex rules is usually based on the least square principle, i.e., the use of matrix to solve the final estimate and to improve its accuracy. In this paper, a new method is presented for the solution of the method which is based on algebraic computation and is relatively straightforward and easy to understand. The practicability of this method is described by a concrete example.
The basis function approach for modeling autocorrelation in ecological data
Hefley, Trevor J.; Broms, Kristin M.; Brost, Brian M.; Buderman, Frances E.; Kay, Shannon L.; Scharf, Henry; Tipton, John; Williams, Perry J.; Hooten, Mevin B.
2017-01-01
Analyzing ecological data often requires modeling the autocorrelation created by spatial and temporal processes. Many seemingly disparate statistical methods used to account for autocorrelation can be expressed as regression models that include basis functions. Basis functions also enable ecologists to modify a wide range of existing ecological models in order to account for autocorrelation, which can improve inference and predictive accuracy. Furthermore, understanding the properties of basis functions is essential for evaluating the fit of spatial or time-series models, detecting a hidden form of collinearity, and analyzing large data sets. We present important concepts and properties related to basis functions and illustrate several tools and techniques ecologists can use when modeling autocorrelation in ecological data.
Why environmental scientists are becoming Bayesians
James S. Clark
2005-01-01
Advances in computational statistics provide a general framework for the high dimensional models typically needed for ecological inference and prediction. Hierarchical Bayes (HB) represents a modelling structure with capacity to exploit diverse sources of information, to accommodate influences that are unknown (or unknowable), and to draw inference on large numbers of...
ERIC Educational Resources Information Center
Meiser, Thorsten; Rummel, Jan; Fleig, Hanna
2018-01-01
Pseudocontingencies are inferences about correlations in the environment that are formed on the basis of statistical regularities like skewed base rates or varying base rates across environmental contexts. Previous research has demonstrated that pseudocontingencies provide a pervasive mechanism of inductive inference in numerous social judgment…
Cross-Situational Learning of Minimal Word Pairs
ERIC Educational Resources Information Center
Escudero, Paola; Mulak, Karen E.; Vlach, Haley A.
2016-01-01
"Cross-situational statistical learning" of words involves tracking co-occurrences of auditory words and objects across time to infer word-referent mappings. Previous research has demonstrated that learners can infer referents across sets of very phonologically distinct words (e.g., WUG, DAX), but it remains unknown whether learners can…
NASA Astrophysics Data System (ADS)
Mullan, Donal; Chen, Jie; Zhang, Xunchang John
2016-02-01
Statistical downscaling (SD) methods have become a popular, low-cost and accessible means of bridging the gap between the coarse spatial resolution at which climate models output climate scenarios and the finer spatial scale at which impact modellers require these scenarios, with various different SD techniques used for a wide range of applications across the world. This paper compares the Generator for Point Climate Change (GPCC) model and the Statistical DownScaling Model (SDSM)—two contrasting SD methods—in terms of their ability to generate precipitation series under non-stationary conditions across ten contrasting global climates. The mean, maximum and a selection of distribution statistics as well as the cumulative frequencies of dry and wet spells for four different temporal resolutions were compared between the models and the observed series for a validation period. Results indicate that both methods can generate daily precipitation series that generally closely mirror observed series for a wide range of non-stationary climates. However, GPCC tends to overestimate higher precipitation amounts, whilst SDSM tends to underestimate these. This infers that GPCC is more likely to overestimate the effects of precipitation on a given impact sector, whilst SDSM is likely to underestimate the effects. GPCC performs better than SDSM in reproducing wet and dry day frequency, which is a key advantage for many impact sectors. Overall, the mixed performance of the two methods illustrates the importance of users performing a thorough validation in order to determine the influence of simulated precipitation on their chosen impact sector.
Accounting for measurement error: a critical but often overlooked process.
Harris, Edward F; Smith, Richard N
2009-12-01
Due to instrument imprecision and human inconsistencies, measurements are not free of error. Technical error of measurement (TEM) is the variability encountered between dimensions when the same specimens are measured at multiple sessions. A goal of a data collection regimen is to minimise TEM. The few studies that actually quantify TEM, regardless of discipline, report that it is substantial and can affect results and inferences. This paper reviews some statistical approaches for identifying and controlling TEM. Statistically, TEM is part of the residual ('unexplained') variance in a statistical test, so accounting for TEM, which requires repeated measurements, enhances the chances of finding a statistically significant difference if one exists. The aim of this paper was to review and discuss common statistical designs relating to types of error and statistical approaches to error accountability. This paper addresses issues of landmark location, validity, technical and systematic error, analysis of variance, scaled measures and correlation coefficients in order to guide the reader towards correct identification of true experimental differences. Researchers commonly infer characteristics about populations from comparatively restricted study samples. Most inferences are statistical and, aside from concerns about adequate accounting for known sources of variation with the research design, an important source of variability is measurement error. Variability in locating landmarks that define variables is obvious in odontometrics, cephalometrics and anthropometry, but the same concerns about measurement accuracy and precision extend to all disciplines. With increasing accessibility to computer-assisted methods of data collection, the ease of incorporating repeated measures into statistical designs has improved. Accounting for this technical source of variation increases the chance of finding biologically true differences when they exist.
Sumner, Jeremy G; Taylor, Amelia; Holland, Barbara R; Jarvis, Peter D
2017-12-01
Recently there has been renewed interest in phylogenetic inference methods based on phylogenetic invariants, alongside the related Markov invariants. Broadly speaking, both these approaches give rise to polynomial functions of sequence site patterns that, in expectation value, either vanish for particular evolutionary trees (in the case of phylogenetic invariants) or have well understood transformation properties (in the case of Markov invariants). While both approaches have been valued for their intrinsic mathematical interest, it is not clear how they relate to each other, and to what extent they can be used as practical tools for inference of phylogenetic trees. In this paper, by focusing on the special case of binary sequence data and quartets of taxa, we are able to view these two different polynomial-based approaches within a common framework. To motivate the discussion, we present three desirable statistical properties that we argue any invariant-based phylogenetic method should satisfy: (1) sensible behaviour under reordering of input sequences; (2) stability as the taxa evolve independently according to a Markov process; and (3) explicit dependence on the assumption of a continuous-time process. Motivated by these statistical properties, we develop and explore several new phylogenetic inference methods. In particular, we develop a statistically bias-corrected version of the Markov invariants approach which satisfies all three properties. We also extend previous work by showing that the phylogenetic invariants can be implemented in such a way as to satisfy property (3). A simulation study shows that, in comparison to other methods, our new proposed approach based on bias-corrected Markov invariants is extremely powerful for phylogenetic inference. The binary case is of particular theoretical interest as-in this case only-the Markov invariants can be expressed as linear combinations of the phylogenetic invariants. A wider implication of this is that, for models with more than two states-for example DNA sequence alignments with four-state models-we find that methods which rely on phylogenetic invariants are incapable of satisfying all three of the stated statistical properties. This is because in these cases the relevant Markov invariants belong to a class of polynomials independent from the phylogenetic invariants.
Imputation approaches for animal movement modeling
Scharf, Henry; Hooten, Mevin B.; Johnson, Devin S.
2017-01-01
The analysis of telemetry data is common in animal ecological studies. While the collection of telemetry data for individual animals has improved dramatically, the methods to properly account for inherent uncertainties (e.g., measurement error, dependence, barriers to movement) have lagged behind. Still, many new statistical approaches have been developed to infer unknown quantities affecting animal movement or predict movement based on telemetry data. Hierarchical statistical models are useful to account for some of the aforementioned uncertainties, as well as provide population-level inference, but they often come with an increased computational burden. For certain types of statistical models, it is straightforward to provide inference if the latent true animal trajectory is known, but challenging otherwise. In these cases, approaches related to multiple imputation have been employed to account for the uncertainty associated with our knowledge of the latent trajectory. Despite the increasing use of imputation approaches for modeling animal movement, the general sensitivity and accuracy of these methods have not been explored in detail. We provide an introduction to animal movement modeling and describe how imputation approaches may be helpful for certain types of models. We also assess the performance of imputation approaches in two simulation studies. Our simulation studies suggests that inference for model parameters directly related to the location of an individual may be more accurate than inference for parameters associated with higher-order processes such as velocity or acceleration. Finally, we apply these methods to analyze a telemetry data set involving northern fur seals (Callorhinus ursinus) in the Bering Sea. Supplementary materials accompanying this paper appear online.
Using a Five-Step Procedure for Inferential Statistical Analyses
ERIC Educational Resources Information Center
Kamin, Lawrence F.
2010-01-01
Many statistics texts pose inferential statistical problems in a disjointed way. By using a simple five-step procedure as a template for statistical inference problems, the student can solve problems in an organized fashion. The problem and its solution will thus be a stand-by-itself organic whole and a single unit of thought and effort. The…
Social Inferences from Faces: Ambient Images Generate a Three-Dimensional Model
ERIC Educational Resources Information Center
Sutherland, Clare A. M.; Oldmeadow, Julian A.; Santos, Isabel M.; Towler, John; Burt, D. Michael; Young, Andrew W.
2013-01-01
Three experiments are presented that investigate the two-dimensional valence/trustworthiness by dominance model of social inferences from faces (Oosterhof & Todorov, 2008). Experiment 1 used image averaging and morphing techniques to demonstrate that consistent facial cues subserve a range of social inferences, even in a highly variable sample of…
Design-based and model-based inference in surveys of freshwater mollusks
Dorazio, R.M.
1999-01-01
Well-known concepts in statistical inference and sampling theory are used to develop recommendations for planning and analyzing the results of quantitative surveys of freshwater mollusks. Two methods of inference commonly used in survey sampling (design-based and model-based) are described and illustrated using examples relevant in surveys of freshwater mollusks. The particular objectives of a survey and the type of information observed in each unit of sampling can be used to help select the sampling design and the method of inference. For example, the mean density of a sparsely distributed population of mollusks can be estimated with higher precision by using model-based inference or by using design-based inference with adaptive cluster sampling than by using design-based inference with conventional sampling. More experience with quantitative surveys of natural assemblages of freshwater mollusks is needed to determine the actual benefits of different sampling designs and inferential procedures.
Bayesian inference for joint modelling of longitudinal continuous, binary and ordinal events.
Li, Qiuju; Pan, Jianxin; Belcher, John
2016-12-01
In medical studies, repeated measurements of continuous, binary and ordinal outcomes are routinely collected from the same patient. Instead of modelling each outcome separately, in this study we propose to jointly model the trivariate longitudinal responses, so as to take account of the inherent association between the different outcomes and thus improve statistical inferences. This work is motivated by a large cohort study in the North West of England, involving trivariate responses from each patient: Body Mass Index, Depression (Yes/No) ascertained with cut-off score not less than 8 at the Hospital Anxiety and Depression Scale, and Pain Interference generated from the Medical Outcomes Study 36-item short-form health survey with values returned on an ordinal scale 1-5. There are some well-established methods for combined continuous and binary, or even continuous and ordinal responses, but little work was done on the joint analysis of continuous, binary and ordinal responses. We propose conditional joint random-effects models, which take into account the inherent association between the continuous, binary and ordinal outcomes. Bayesian analysis methods are used to make statistical inferences. Simulation studies show that, by jointly modelling the trivariate outcomes, standard deviations of the estimates of parameters in the models are smaller and much more stable, leading to more efficient parameter estimates and reliable statistical inferences. In the real data analysis, the proposed joint analysis yields a much smaller deviance information criterion value than the separate analysis, and shows other good statistical properties too. © The Author(s) 2014.
Cocco, Simona; Leibler, Stanislas; Monasson, Rémi
2009-01-01
Complexity of neural systems often makes impracticable explicit measurements of all interactions between their constituents. Inverse statistical physics approaches, which infer effective couplings between neurons from their spiking activity, have been so far hindered by their computational complexity. Here, we present 2 complementary, computationally efficient inverse algorithms based on the Ising and “leaky integrate-and-fire” models. We apply those algorithms to reanalyze multielectrode recordings in the salamander retina in darkness and under random visual stimulus. We find strong positive couplings between nearby ganglion cells common to both stimuli, whereas long-range couplings appear under random stimulus only. The uncertainty on the inferred couplings due to limitations in the recordings (duration, small area covered on the retina) is discussed. Our methods will allow real-time evaluation of couplings for large assemblies of neurons. PMID:19666487
Confidence crisis of results in biomechanics research.
Knudson, Duane
2017-11-01
Many biomechanics studies have small sample sizes and incorrect statistical analyses, so reporting of inaccurate inferences and inflated magnitude of effects are common in the field. This review examines these issues in biomechanics research and summarises potential solutions from research in other fields to increase the confidence in the experimental effects reported in biomechanics. Authors, reviewers and editors of biomechanics research reports are encouraged to improve sample sizes and the resulting statistical power, improve reporting transparency, improve the rigour of statistical analyses used, and increase the acceptance of replication studies to improve the validity of inferences from data in biomechanics research. The application of sports biomechanics research results would also improve if a larger percentage of unbiased effects and their uncertainty were reported in the literature.
The Empirical Nature and Statistical Treatment of Missing Data
ERIC Educational Resources Information Center
Tannenbaum, Christyn E.
2009-01-01
Introduction. Missing data is a common problem in research and can produce severely misleading analyses, including biased estimates of statistical parameters, and erroneous conclusions. In its 1999 report, the APA Task Force on Statistical Inference encouraged authors to report complications such as missing data and discouraged the use of…
Cognitive Transfer Outcomes for a Simulation-Based Introductory Statistics Curriculum
ERIC Educational Resources Information Center
Backman, Matthew D.; Delmas, Robert C.; Garfield, Joan
2017-01-01
Cognitive transfer is the ability to apply learned skills and knowledge to new applications and contexts. This investigation evaluates cognitive transfer outcomes for a tertiary-level introductory statistics course using the CATALST curriculum, which exclusively used simulation-based methods to develop foundations of statistical inference. A…
A statistical method for lung tumor segmentation uncertainty in PET images based on user inference.
Zheng, Chaojie; Wang, Xiuying; Feng, Dagan
2015-01-01
PET has been widely accepted as an effective imaging modality for lung tumor diagnosis and treatment. However, standard criteria for delineating tumor boundary from PET are yet to develop largely due to relatively low quality of PET images, uncertain tumor boundary definition, and variety of tumor characteristics. In this paper, we propose a statistical solution to segmentation uncertainty on the basis of user inference. We firstly define the uncertainty segmentation band on the basis of segmentation probability map constructed from Random Walks (RW) algorithm; and then based on the extracted features of the user inference, we use Principle Component Analysis (PCA) to formulate the statistical model for labeling the uncertainty band. We validated our method on 10 lung PET-CT phantom studies from the public RIDER collections [1] and 16 clinical PET studies where tumors were manually delineated by two experienced radiologists. The methods were validated using Dice similarity coefficient (DSC) to measure the spatial volume overlap. Our method achieved an average DSC of 0.878 ± 0.078 on phantom studies and 0.835 ± 0.039 on clinical studies.
Empirical evidence for acceleration-dependent amplification factors
Borcherdt, R.D.
2002-01-01
Site-specific amplification factors, Fa and Fv, used in current U.S. building codes decrease with increasing base acceleration level as implied by the Loma Prieta earthquake at 0.1g and extrapolated using numerical models and laboratory results. The Northridge earthquake recordings of 17 January 1994 and subsequent geotechnical data permit empirical estimates of amplification at base acceleration levels up to 0.5g. Distance measures and normalization procedures used to infer amplification ratios from soil-rock pairs in predetermined azimuth-distance bins significantly influence the dependence of amplification estimates on base acceleration. Factors inferred using a hypocentral distance norm do not show a statistically significant dependence on base acceleration. Factors inferred using norms implied by the attenuation functions of Abrahamson and Silva show a statistically significant decrease with increasing base acceleration. The decrease is statistically more significant for stiff clay and sandy soil (site class D) sites than for stiffer sites underlain by gravely soils and soft rock (site class C). The decrease in amplification with increasing base acceleration is more pronounced for the short-period amplification factor, Fa, than for the midperiod factor, Fv.
Karl Pearson and eugenics: personal opinions and scientific rigor.
Delzell, Darcie A P; Poliak, Cathy D
2013-09-01
The influence of personal opinions and biases on scientific conclusions is a threat to the advancement of knowledge. Expertise and experience does not render one immune to this temptation. In this work, one of the founding fathers of statistics, Karl Pearson, is used as an illustration of how even the most talented among us can produce misleading results when inferences are made without caution or reference to potential bias and other analysis limitations. A study performed by Pearson on British Jewish schoolchildren is examined in light of ethical and professional statistical practice. The methodology used and inferences made by Pearson and his coauthor are sometimes questionable and offer insight into how Pearson's support of eugenics and his own British nationalism could have potentially influenced his often careless and far-fetched inferences. A short background into Pearson's work and beliefs is provided, along with an in-depth examination of the authors' overall experimental design and statistical practices. In addition, portions of the study regarding intelligence and tuberculosis are discussed in more detail, along with historical reactions to their work.
In silico model-based inference: a contemporary approach for hypothesis testing in network biology
Klinke, David J.
2014-01-01
Inductive inference plays a central role in the study of biological systems where one aims to increase their understanding of the system by reasoning backwards from uncertain observations to identify causal relationships among components of the system. These causal relationships are postulated from prior knowledge as a hypothesis or simply a model. Experiments are designed to test the model. Inferential statistics are used to establish a level of confidence in how well our postulated model explains the acquired data. This iterative process, commonly referred to as the scientific method, either improves our confidence in a model or suggests that we revisit our prior knowledge to develop a new model. Advances in technology impact how we use prior knowledge and data to formulate models of biological networks and how we observe cellular behavior. However, the approach for model-based inference has remained largely unchanged since Fisher, Neyman and Pearson developed the ideas in the early 1900’s that gave rise to what is now known as classical statistical hypothesis (model) testing. Here, I will summarize conventional methods for model-based inference and suggest a contemporary approach to aid in our quest to discover how cells dynamically interpret and transmit information for therapeutic aims that integrates ideas drawn from high performance computing, Bayesian statistics, and chemical kinetics. PMID:25139179
In silico model-based inference: a contemporary approach for hypothesis testing in network biology.
Klinke, David J
2014-01-01
Inductive inference plays a central role in the study of biological systems where one aims to increase their understanding of the system by reasoning backwards from uncertain observations to identify causal relationships among components of the system. These causal relationships are postulated from prior knowledge as a hypothesis or simply a model. Experiments are designed to test the model. Inferential statistics are used to establish a level of confidence in how well our postulated model explains the acquired data. This iterative process, commonly referred to as the scientific method, either improves our confidence in a model or suggests that we revisit our prior knowledge to develop a new model. Advances in technology impact how we use prior knowledge and data to formulate models of biological networks and how we observe cellular behavior. However, the approach for model-based inference has remained largely unchanged since Fisher, Neyman and Pearson developed the ideas in the early 1900s that gave rise to what is now known as classical statistical hypothesis (model) testing. Here, I will summarize conventional methods for model-based inference and suggest a contemporary approach to aid in our quest to discover how cells dynamically interpret and transmit information for therapeutic aims that integrates ideas drawn from high performance computing, Bayesian statistics, and chemical kinetics. © 2014 American Institute of Chemical Engineers.
Spatio-temporal conditional inference and hypothesis tests for neural ensemble spiking precision
Harrison, Matthew T.; Amarasingham, Asohan; Truccolo, Wilson
2014-01-01
The collective dynamics of neural ensembles create complex spike patterns with many spatial and temporal scales. Understanding the statistical structure of these patterns can help resolve fundamental questions about neural computation and neural dynamics. Spatio-temporal conditional inference (STCI) is introduced here as a semiparametric statistical framework for investigating the nature of precise spiking patterns from collections of neurons that is robust to arbitrarily complex and nonstationary coarse spiking dynamics. The main idea is to focus statistical modeling and inference, not on the full distribution of the data, but rather on families of conditional distributions of precise spiking given different types of coarse spiking. The framework is then used to develop families of hypothesis tests for probing the spatio-temporal precision of spiking patterns. Relationships among different conditional distributions are used to improve multiple hypothesis testing adjustments and to design novel Monte Carlo spike resampling algorithms. Of special note are algorithms that can locally jitter spike times while still preserving the instantaneous peri-stimulus time histogram (PSTH) or the instantaneous total spike count from a group of recorded neurons. The framework can also be used to test whether first-order maximum entropy models with possibly random and time-varying parameters can account for observed patterns of spiking. STCI provides a detailed example of the generic principle of conditional inference, which may be applicable in other areas of neurostatistical analysis. PMID:25380339
Magnetic Footpoint Velocities: A Combination Of Minimum Energy Fit AndLocal Correlation Tracking
NASA Astrophysics Data System (ADS)
Belur, Ravindra; Longcope, D.
2006-06-01
Many numerical and time dependent MHD simulations of the solar atmosphererequire the underlying velocity fields which should be consistent with theinduction equation. Recently, Longcope (2004) introduced a new techniqueto infer the photospheric velocity field from sequence of vector magnetogramswhich are in agreement with the induction equation. The method, the Minimum Energy Fit (MEF), determines a set of velocities and selects the velocity which is smallest overall flow speed by minimizing an energy functional. The inferred velocity can be further constrained by information aboutthe velocity inferred from other techniques. With this adopted techniquewe would expect that the inferred velocity will be close to the photospheric velocity of magnetic footpoints. Here, we demonstrate that the inferred horizontal velocities from LCT can be used to constrain the MEFvelocities. We also apply this technique to actual vector magnetogramsequences and compare these velocities with velocities from LCT alone.This work is supported by DoD MURI and NSF SHINE programs.
Drug target inference through pathway analysis of genomics data
Ma, Haisu; Zhao, Hongyu
2013-01-01
Statistical modeling coupled with bioinformatics is commonly used for drug discovery. Although there exist many approaches for single target based drug design and target inference, recent years have seen a paradigm shift to system-level pharmacological research. Pathway analysis of genomics data represents one promising direction for computational inference of drug targets. This article aims at providing a comprehensive review on the evolving issues is this field, covering methodological developments, their pros and cons, as well as future research directions. PMID:23369829
Watanabe, Hiroshi
2012-01-01
Procedures of statistical analysis are reviewed to provide an overview of applications of statistics for general use. Topics that are dealt with are inference on a population, comparison of two populations with respect to means and probabilities, and multiple comparisons. This study is the second part of series in which we survey medical statistics. Arguments related to statistical associations and regressions will be made in subsequent papers.
PROBABILITY SAMPLING AND POPULATION INFERENCE IN MONITORING PROGRAMS
A fundamental difference between probability sampling and conventional statistics is that "sampling" deals with real, tangible populations, whereas "conventional statistics" usually deals with hypothetical populations that have no real-world realization. he focus here is on real ...
Statistical Inference in the Learning of Novel Phonetic Categories
ERIC Educational Resources Information Center
Zhao, Yuan
2010-01-01
Learning a phonetic category (or any linguistic category) requires integrating different sources of information. A crucial unsolved problem for phonetic learning is how this integration occurs: how can we update our previous knowledge about a phonetic category as we hear new exemplars of the category? One model of learning is Bayesian Inference,…
Conceptual Challenges in Coordinating Theoretical and Data-Centered Estimates of Probability
ERIC Educational Resources Information Center
Konold, Cliff; Madden, Sandra; Pollatsek, Alexander; Pfannkuch, Maxine; Wild, Chris; Ziedins, Ilze; Finzer, William; Horton, Nicholas J.; Kazak, Sibel
2011-01-01
A core component of informal statistical inference is the recognition that judgments based on sample data are inherently uncertain. This implies that instruction aimed at developing informal inference needs to foster basic probabilistic reasoning. In this article, we analyze and critique the now-common practice of introducing students to both…
Campbell's and Rubin's Perspectives on Causal Inference
ERIC Educational Resources Information Center
West, Stephen G.; Thoemmes, Felix
2010-01-01
Donald Campbell's approach to causal inference (D. T. Campbell, 1957; W. R. Shadish, T. D. Cook, & D. T. Campbell, 2002) is widely used in psychology and education, whereas Donald Rubin's causal model (P. W. Holland, 1986; D. B. Rubin, 1974, 2005) is widely used in economics, statistics, medicine, and public health. Campbell's approach focuses on…
Direct Evidence for a Dual Process Model of Deductive Inference
ERIC Educational Resources Information Center
Markovits, Henry; Brunet, Marie-Laurence; Thompson, Valerie; Brisson, Janie
2013-01-01
In 2 experiments, we tested a strong version of a dual process theory of conditional inference (cf. Verschueren et al., 2005a, 2005b) that assumes that most reasoners have 2 strategies available, the choice of which is determined by situational variables, cognitive capacity, and metacognitive control. The statistical strategy evaluates inferences…
The Role of Probability in Developing Learners' Models of Simulation Approaches to Inference
ERIC Educational Resources Information Center
Lee, Hollylynne S.; Doerr, Helen M.; Tran, Dung; Lovett, Jennifer N.
2016-01-01
Repeated sampling approaches to inference that rely on simulations have recently gained prominence in statistics education, and probabilistic concepts are at the core of this approach. In this approach, learners need to develop a mapping among the problem situation, a physical enactment, computer representations, and the underlying randomization…
From Blickets to Synapses: Inferring Temporal Causal Networks by Observation
ERIC Educational Resources Information Center
Fernando, Chrisantha
2013-01-01
How do human infants learn the causal dependencies between events? Evidence suggests that this remarkable feat can be achieved by observation of only a handful of examples. Many computational models have been produced to explain how infants perform causal inference without explicit teaching about statistics or the scientific method. Here, we…
It's a Girl! Random Numbers, Simulations, and the Law of Large Numbers
ERIC Educational Resources Information Center
Goodwin, Chris; Ortiz, Enrique
2015-01-01
Modeling using mathematics and making inferences about mathematical situations are becoming more prevalent in most fields of study. Descriptive statistics cannot be used to generalize about a population or make predictions of what can occur. Instead, inference must be used. Simulation and sampling are essential in building a foundation for…
Thou Shalt Not Bear False Witness against Null Hypothesis Significance Testing
ERIC Educational Resources Information Center
García-Pérez, Miguel A.
2017-01-01
Null hypothesis significance testing (NHST) has been the subject of debate for decades and alternative approaches to data analysis have been proposed. This article addresses this debate from the perspective of scientific inquiry and inference. Inference is an inverse problem and application of statistical methods cannot reveal whether effects…
Krefeld-Schwalb, Antonia; Witte, Erich H.; Zenker, Frank
2018-01-01
In psychology as elsewhere, the main statistical inference strategy to establish empirical effects is null-hypothesis significance testing (NHST). The recent failure to replicate allegedly well-established NHST-results, however, implies that such results lack sufficient statistical power, and thus feature unacceptably high error-rates. Using data-simulation to estimate the error-rates of NHST-results, we advocate the research program strategy (RPS) as a superior methodology. RPS integrates Frequentist with Bayesian inference elements, and leads from a preliminary discovery against a (random) H0-hypothesis to a statistical H1-verification. Not only do RPS-results feature significantly lower error-rates than NHST-results, RPS also addresses key-deficits of a “pure” Frequentist and a standard Bayesian approach. In particular, RPS aggregates underpowered results safely. RPS therefore provides a tool to regain the trust the discipline had lost during the ongoing replicability-crisis. PMID:29740363
NIRS-SPM: statistical parametric mapping for near infrared spectroscopy
NASA Astrophysics Data System (ADS)
Tak, Sungho; Jang, Kwang Eun; Jung, Jinwook; Jang, Jaeduck; Jeong, Yong; Ye, Jong Chul
2008-02-01
Even though there exists a powerful statistical parametric mapping (SPM) tool for fMRI, similar public domain tools are not available for near infrared spectroscopy (NIRS). In this paper, we describe a new public domain statistical toolbox called NIRS-SPM for quantitative analysis of NIRS signals. Specifically, NIRS-SPM statistically analyzes the NIRS data using GLM and makes inference as the excursion probability which comes from the random field that are interpolated from the sparse measurement. In order to obtain correct inference, NIRS-SPM offers the pre-coloring and pre-whitening method for temporal correlation estimation. For simultaneous recording NIRS signal with fMRI, the spatial mapping between fMRI image and real coordinate in 3-D digitizer is estimated using Horn's algorithm. These powerful tools allows us the super-resolution localization of the brain activation which is not possible using the conventional NIRS analysis tools.
Krefeld-Schwalb, Antonia; Witte, Erich H; Zenker, Frank
2018-01-01
In psychology as elsewhere, the main statistical inference strategy to establish empirical effects is null-hypothesis significance testing (NHST). The recent failure to replicate allegedly well-established NHST-results, however, implies that such results lack sufficient statistical power, and thus feature unacceptably high error-rates. Using data-simulation to estimate the error-rates of NHST-results, we advocate the research program strategy (RPS) as a superior methodology. RPS integrates Frequentist with Bayesian inference elements, and leads from a preliminary discovery against a (random) H 0 -hypothesis to a statistical H 1 -verification. Not only do RPS-results feature significantly lower error-rates than NHST-results, RPS also addresses key-deficits of a "pure" Frequentist and a standard Bayesian approach. In particular, RPS aggregates underpowered results safely. RPS therefore provides a tool to regain the trust the discipline had lost during the ongoing replicability-crisis.
NASA Astrophysics Data System (ADS)
Knuth, K. H.
2001-05-01
We consider the application of Bayesian inference to the study of self-organized structures in complex adaptive systems. In particular, we examine the distribution of elements, agents, or processes in systems dominated by hierarchical structure. We demonstrate that results obtained by Caianiello [1] on Hierarchical Modular Systems (HMS) can be found by applying Jaynes' Principle of Group Invariance [2] to a few key assumptions about our knowledge of hierarchical organization. Subsequent application of the Principle of Maximum Entropy allows inferences to be made about specific systems. The utility of the Bayesian method is considered by examining both successes and failures of the hierarchical model. We discuss how Caianiello's original statements suffer from the Mind Projection Fallacy [3] and we restate his assumptions thus widening the applicability of the HMS model. The relationship between inference and statistical physics, described by Jaynes [4], is reiterated with the expectation that this realization will aid the field of complex systems research by moving away from often inappropriate direct application of statistical mechanics to a more encompassing inferential methodology.
Conditional statistical inference with multistage testing designs.
Zwitser, Robert J; Maris, Gunter
2015-03-01
In this paper it is demonstrated how statistical inference from multistage test designs can be made based on the conditional likelihood. Special attention is given to parameter estimation, as well as the evaluation of model fit. Two reasons are provided why the fit of simple measurement models is expected to be better in adaptive designs, compared to linear designs: more parameters are available for the same number of observations; and undesirable response behavior, like slipping and guessing, might be avoided owing to a better match between item difficulty and examinee proficiency. The results are illustrated with simulated data, as well as with real data.
ERIC Educational Resources Information Center
Snyder, Patricia A.; Thompson, Bruce
The use of tests of statistical significance was explored, first by reviewing some criticisms of contemporary practice in the use of statistical tests as reflected in a series of articles in the "American Psychologist" and in the appointment of a "Task Force on Statistical Inference" by the American Psychological Association…
Standard deviation and standard error of the mean.
Lee, Dong Kyu; In, Junyong; Lee, Sangseok
2015-06-01
In most clinical and experimental studies, the standard deviation (SD) and the estimated standard error of the mean (SEM) are used to present the characteristics of sample data and to explain statistical analysis results. However, some authors occasionally muddle the distinctive usage between the SD and SEM in medical literature. Because the process of calculating the SD and SEM includes different statistical inferences, each of them has its own meaning. SD is the dispersion of data in a normal distribution. In other words, SD indicates how accurately the mean represents sample data. However the meaning of SEM includes statistical inference based on the sampling distribution. SEM is the SD of the theoretical distribution of the sample means (the sampling distribution). While either SD or SEM can be applied to describe data and statistical results, one should be aware of reasonable methods with which to use SD and SEM. We aim to elucidate the distinctions between SD and SEM and to provide proper usage guidelines for both, which summarize data and describe statistical results.
Standard deviation and standard error of the mean
In, Junyong; Lee, Sangseok
2015-01-01
In most clinical and experimental studies, the standard deviation (SD) and the estimated standard error of the mean (SEM) are used to present the characteristics of sample data and to explain statistical analysis results. However, some authors occasionally muddle the distinctive usage between the SD and SEM in medical literature. Because the process of calculating the SD and SEM includes different statistical inferences, each of them has its own meaning. SD is the dispersion of data in a normal distribution. In other words, SD indicates how accurately the mean represents sample data. However the meaning of SEM includes statistical inference based on the sampling distribution. SEM is the SD of the theoretical distribution of the sample means (the sampling distribution). While either SD or SEM can be applied to describe data and statistical results, one should be aware of reasonable methods with which to use SD and SEM. We aim to elucidate the distinctions between SD and SEM and to provide proper usage guidelines for both, which summarize data and describe statistical results. PMID:26045923
Visual shape perception as Bayesian inference of 3D object-centered shape representations.
Erdogan, Goker; Jacobs, Robert A
2017-11-01
Despite decades of research, little is known about how people visually perceive object shape. We hypothesize that a promising approach to shape perception is provided by a "visual perception as Bayesian inference" framework which augments an emphasis on visual representation with an emphasis on the idea that shape perception is a form of statistical inference. Our hypothesis claims that shape perception of unfamiliar objects can be characterized as statistical inference of 3D shape in an object-centered coordinate system. We describe a computational model based on our theoretical framework, and provide evidence for the model along two lines. First, we show that, counterintuitively, the model accounts for viewpoint-dependency of object recognition, traditionally regarded as evidence against people's use of 3D object-centered shape representations. Second, we report the results of an experiment using a shape similarity task, and present an extensive evaluation of existing models' abilities to account for the experimental data. We find that our shape inference model captures subjects' behaviors better than competing models. Taken as a whole, our experimental and computational results illustrate the promise of our approach and suggest that people's shape representations of unfamiliar objects are probabilistic, 3D, and object-centered. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Wang, Hong-Qiang; Tsai, Chung-Jui
2013-01-01
With the rapid increase of omics data, correlation analysis has become an indispensable tool for inferring meaningful associations from a large number of observations. Pearson correlation coefficient (PCC) and its variants are widely used for such purposes. However, it remains challenging to test whether an observed association is reliable both statistically and biologically. We present here a new method, CorSig, for statistical inference of correlation significance. CorSig is based on a biology-informed null hypothesis, i.e., testing whether the true PCC (ρ) between two variables is statistically larger than a user-specified PCC cutoff (τ), as opposed to the simple null hypothesis of ρ = 0 in existing methods, i.e., testing whether an association can be declared without a threshold. CorSig incorporates Fisher's Z transformation of the observed PCC (r), which facilitates use of standard techniques for p-value computation and multiple testing corrections. We compared CorSig against two methods: one uses a minimum PCC cutoff while the other (Zhu's procedure) controls correlation strength and statistical significance in two discrete steps. CorSig consistently outperformed these methods in various simulation data scenarios by balancing between false positives and false negatives. When tested on real-world Populus microarray data, CorSig effectively identified co-expressed genes in the flavonoid pathway, and discriminated between closely related gene family members for their differential association with flavonoid and lignin pathways. The p-values obtained by CorSig can be used as a stand-alone parameter for stratification of co-expressed genes according to their correlation strength in lieu of an arbitrary cutoff. CorSig requires one single tunable parameter, and can be readily extended to other correlation measures. Thus, CorSig should be useful for a wide range of applications, particularly for network analysis of high-dimensional genomic data. A web server for CorSig is provided at http://202.127.200.1:8080/probeWeb. R code for CorSig is freely available for non-commercial use at http://aspendb.uga.edu/downloads.
Robust inference for group sequential trials.
Ganju, Jitendra; Lin, Yunzhi; Zhou, Kefei
2017-03-01
For ethical reasons, group sequential trials were introduced to allow trials to stop early in the event of extreme results. Endpoints in such trials are usually mortality or irreversible morbidity. For a given endpoint, the norm is to use a single test statistic and to use that same statistic for each analysis. This approach is risky because the test statistic has to be specified before the study is unblinded, and there is loss in power if the assumptions that ensure optimality for each analysis are not met. To minimize the risk of moderate to substantial loss in power due to a suboptimal choice of a statistic, a robust method was developed for nonsequential trials. The concept is analogous to diversification of financial investments to minimize risk. The method is based on combining P values from multiple test statistics for formal inference while controlling the type I error rate at its designated value.This article evaluates the performance of 2 P value combining methods for group sequential trials. The emphasis is on time to event trials although results from less complex trials are also included. The gain or loss in power with the combination method relative to a single statistic is asymmetric in its favor. Depending on the power of each individual test, the combination method can give more power than any single test or give power that is closer to the test with the most power. The versatility of the method is that it can combine P values from different test statistics for analysis at different times. The robustness of results suggests that inference from group sequential trials can be strengthened with the use of combined tests. Copyright © 2017 John Wiley & Sons, Ltd.
Karain, Wael I
2017-11-28
Proteins undergo conformational transitions over different time scales. These transitions are closely intertwined with the protein's function. Numerous standard techniques such as principal component analysis are used to detect these transitions in molecular dynamics simulations. In this work, we add a new method that has the ability to detect transitions in dynamics based on the recurrences in the dynamical system. It combines bootstrapping and recurrence quantification analysis. We start from the assumption that a protein has a "baseline" recurrence structure over a given period of time. Any statistically significant deviation from this recurrence structure, as inferred from complexity measures provided by recurrence quantification analysis, is considered a transition in the dynamics of the protein. We apply this technique to a 132 ns long molecular dynamics simulation of the β-Lactamase Inhibitory Protein BLIP. We are able to detect conformational transitions in the nanosecond range in the recurrence dynamics of the BLIP protein during the simulation. The results compare favorably to those extracted using the principal component analysis technique. The recurrence quantification analysis based bootstrap technique is able to detect transitions between different dynamics states for a protein over different time scales. It is not limited to linear dynamics regimes, and can be generalized to any time scale. It also has the potential to be used to cluster frames in molecular dynamics trajectories according to the nature of their recurrence dynamics. One shortcoming for this method is the need to have large enough time windows to insure good statistical quality for the recurrence complexity measures needed to detect the transitions.
Long-term strategy for the statistical design of a forest health monitoring system
Hans T. Schreuder; Raymond L. Czaplewski
1993-01-01
A conceptual framework is given for a broad-scale survey of forest health that accomplishes three objectives: generate descriptive statistics; detect changes in such statistics; and simplify analytical inferences that identify, and possibly establish cause-effect relationships. Our paper discusses the development of sampling schemes to satisfy these three objectives,...
ERIC Educational Resources Information Center
Beeman, Jennifer Leigh Sloan
2013-01-01
Research has found that students successfully complete an introductory course in statistics without fully comprehending the underlying theory or being able to exhibit statistical reasoning. This is particularly true for the understanding about the sampling distribution of the mean, a crucial concept for statistical inference. This study…
Using Action Research to Develop a Course in Statistical Inference for Workplace-Based Adults
ERIC Educational Resources Information Center
Forbes, Sharleen
2014-01-01
Many adults who need an understanding of statistical concepts have limited mathematical skills. They need a teaching approach that includes as little mathematical context as possible. Iterative participatory qualitative research (action research) was used to develop a statistical literacy course for adult learners informed by teaching in…
Applying Statistical Process Control to Clinical Data: An Illustration.
ERIC Educational Resources Information Center
Pfadt, Al; And Others
1992-01-01
Principles of statistical process control are applied to a clinical setting through the use of control charts to detect changes, as part of treatment planning and clinical decision-making processes. The logic of control chart analysis is derived from principles of statistical inference. Sample charts offer examples of evaluating baselines and…
ERIC Educational Resources Information Center
Madhere, Serge
An analytic procedure, efficiency analysis, is proposed for improving the utility of quantitative program evaluation for decision making. The three features of the procedure are explained: (1) for statistical control, it adopts and extends the regression-discontinuity design; (2) for statistical inferences, it de-emphasizes hypothesis testing in…
Evaluating data-driven causal inference techniques in noisy physical and ecological systems
NASA Astrophysics Data System (ADS)
Tennant, C.; Larsen, L.
2016-12-01
Causal inference from observational time series challenges traditional approaches for understanding processes and offers exciting opportunities to gain new understanding of complex systems where nonlinearity, delayed forcing, and emergent behavior are common. We present a formal evaluation of the performance of convergent cross-mapping (CCM) and transfer entropy (TE) for data-driven causal inference under real-world conditions. CCM is based on nonlinear state-space reconstruction, and causality is determined by the convergence of prediction skill with an increasing number of observations of the system. TE is the uncertainty reduction based on transition probabilities of a pair of time-lagged variables. With TE, causal inference is based on asymmetry in information flow between the variables. Observational data and numerical simulations from a number of classical physical and ecological systems: atmospheric convection (the Lorenz system), species competition (patch-tournaments), and long-term climate change (Vostok ice core) were used to evaluate the ability of CCM and TE to infer causal-relationships as data series become increasingly corrupted by observational (instrument-driven) or process (model-or -stochastic-driven) noise. While both techniques show promise for causal inference, TE appears to be applicable to a wider range of systems, especially when the data series are of sufficient length to reliably estimate transition probabilities of system components. Both techniques also show a clear effect of observational noise on causal inference. For example, CCM exhibits a negative logarithmic decline in prediction skill as the noise level of the system increases. Changes in TE strongly depend on noise type and which variable the noise was added to. The ability of CCM and TE to detect driving influences suggest that their application to physical and ecological systems could be transformative for understanding driving mechanisms as Earth systems undergo change.
A shift from significance test to hypothesis test through power analysis in medical research.
Singh, G
2006-01-01
Medical research literature until recently, exhibited substantial dominance of the Fisher's significance test approach of statistical inference concentrating more on probability of type I error over Neyman-Pearson's hypothesis test considering both probability of type I and II error. Fisher's approach dichotomises results into significant or not significant results with a P value. The Neyman-Pearson's approach talks of acceptance or rejection of null hypothesis. Based on the same theory these two approaches deal with same objective and conclude in their own way. The advancement in computing techniques and availability of statistical software have resulted in increasing application of power calculations in medical research and thereby reporting the result of significance tests in the light of power of the test also. Significance test approach, when it incorporates power analysis contains the essence of hypothesis test approach. It may be safely argued that rising application of power analysis in medical research may have initiated a shift from Fisher's significance test to Neyman-Pearson's hypothesis test procedure.
Aoyagi, Miki; Nagata, Kenji
2012-06-01
The term algebraic statistics arises from the study of probabilistic models and techniques for statistical inference using methods from algebra and geometry (Sturmfels, 2009 ). The purpose of our study is to consider the generalization error and stochastic complexity in learning theory by using the log-canonical threshold in algebraic geometry. Such thresholds correspond to the main term of the generalization error in Bayesian estimation, which is called a learning coefficient (Watanabe, 2001a , 2001b ). The learning coefficient serves to measure the learning efficiencies in hierarchical learning models. In this letter, we consider learning coefficients for Vandermonde matrix-type singularities, by using a new approach: focusing on the generators of the ideal, which defines singularities. We give tight new bound values of learning coefficients for the Vandermonde matrix-type singularities and the explicit values with certain conditions. By applying our results, we can show the learning coefficients of three-layered neural networks and normal mixture models.
Pearson's chi-square test and rank correlation inferences for clustered data.
Shih, Joanna H; Fay, Michael P
2017-09-01
Pearson's chi-square test has been widely used in testing for association between two categorical responses. Spearman rank correlation and Kendall's tau are often used for measuring and testing association between two continuous or ordered categorical responses. However, the established statistical properties of these tests are only valid when each pair of responses are independent, where each sampling unit has only one pair of responses. When each sampling unit consists of a cluster of paired responses, the assumption of independent pairs is violated. In this article, we apply the within-cluster resampling technique to U-statistics to form new tests and rank-based correlation estimators for possibly tied clustered data. We develop large sample properties of the new proposed tests and estimators and evaluate their performance by simulations. The proposed methods are applied to a data set collected from a PET/CT imaging study for illustration. Published 2017. This article is a U.S. Government work and is in the public domain in the USA.
The penumbra of learning: a statistical theory of synaptic tagging and capture.
Gershman, Samuel J
2014-01-01
Learning in humans and animals is accompanied by a penumbra: Learning one task benefits from learning an unrelated task shortly before or after. At the cellular level, the penumbra of learning appears when weak potentiation of one synapse is amplified by strong potentiation of another synapse on the same neuron during a critical time window. Weak potentiation sets a molecular tag that enables the synapse to capture plasticity-related proteins synthesized in response to strong potentiation at another synapse. This paper describes a computational model which formalizes synaptic tagging and capture in terms of statistical learning mechanisms. According to this model, synaptic strength encodes a probabilistic inference about the dynamically changing association between pre- and post-synaptic firing rates. The rate of change is itself inferred, coupling together different synapses on the same neuron. When the inputs to one synapse change rapidly, the inferred rate of change increases, amplifying learning at other synapses.
Inference of missing data and chemical model parameters using experimental statistics
NASA Astrophysics Data System (ADS)
Casey, Tiernan; Najm, Habib
2017-11-01
A method for determining the joint parameter density of Arrhenius rate expressions through the inference of missing experimental data is presented. This approach proposes noisy hypothetical data sets from target experiments and accepts those which agree with the reported statistics, in the form of nominal parameter values and their associated uncertainties. The data exploration procedure is formalized using Bayesian inference, employing maximum entropy and approximate Bayesian computation methods to arrive at a joint density on data and parameters. The method is demonstrated in the context of reactions in the H2-O2 system for predictive modeling of combustion systems of interest. Work supported by the US DOE BES CSGB. Sandia National Labs is a multimission lab managed and operated by Nat. Technology and Eng'g Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell Intl, for the US DOE NCSA under contract DE-NA-0003525.
Statistical numeracy as a moderator of (pseudo)contingency effects on decision behavior.
Fleig, Hanna; Meiser, Thorsten; Ettlin, Florence; Rummel, Jan
2017-03-01
Pseudocontingencies denote contingency estimates inferred from base rates rather than from cell frequencies. We examined the role of statistical numeracy for effects of such fallible but adaptive inferences on choice behavior. In Experiment 1, we provided information on single observations as well as on base rates and tracked participants' eye movements. In Experiment 2, we manipulated the availability of information on cell frequencies and base rates between conditions. Our results demonstrate that a focus on base rates rather than cell frequencies benefits pseudocontingency effects. Learners who are more proficient in (conditional) probability calculation prefer to rely on cell frequencies in order to judge contingencies, though, as was evident from their gaze behavior. If cell frequencies are available in summarized format, they may infer the true contingency between options and outcomes. Otherwise, however, even highly numerate learners are susceptible to pseudocontingency effects. Copyright © 2017 Elsevier B.V. All rights reserved.
2010-01-01
Background Methods for the calculation and application of quantitative electromyographic (EMG) statistics for the characterization of EMG data detected from forearm muscles of individuals with and without pain associated with repetitive strain injury are presented. Methods A classification procedure using a multi-stage application of Bayesian inference is presented that characterizes a set of motor unit potentials acquired using needle electromyography. The utility of this technique in characterizing EMG data obtained from both normal individuals and those presenting with symptoms of "non-specific arm pain" is explored and validated. The efficacy of the Bayesian technique is compared with simple voting methods. Results The aggregate Bayesian classifier presented is found to perform with accuracy equivalent to that of majority voting on the test data, with an overall accuracy greater than 0.85. Theoretical foundations of the technique are discussed, and are related to the observations found. Conclusions Aggregation of motor unit potential conditional probability distributions estimated using quantitative electromyographic analysis, may be successfully used to perform electrodiagnostic characterization of "non-specific arm pain." It is expected that these techniques will also be able to be applied to other types of electrodiagnostic data. PMID:20156353
Comparison of three artificial intelligence techniques for discharge routing
NASA Astrophysics Data System (ADS)
Khatibi, Rahman; Ghorbani, Mohammad Ali; Kashani, Mahsa Hasanpour; Kisi, Ozgur
2011-06-01
SummaryThe inter-comparison of three artificial intelligence (AI) techniques are presented using the results of river flow/stage timeseries, that are otherwise handled by traditional discharge routing techniques. These models comprise Artificial Neural Network (ANN), Adaptive Nero-Fuzzy Inference System (ANFIS) and Genetic Programming (GP), which are for discharge routing of Kizilirmak River, Turkey. The daily mean river discharge data with a period between 1999 and 2003 were used for training and testing the models. The comparison includes both visual and parametric approaches using such statistic as Coefficient of Correlation (CC), Mean Absolute Error (MAE) and Mean Square Relative Error (MSRE), as well as a basic scoring system. Overall, the results indicate that ANN and ANFIS have mixed fortunes in discharge routing, and both have different abilities in capturing and reproducing some of the observed information. However, the performance of GP displays a better edge over the other two modelling approaches in most of the respects. Attention is given to the information contents of recorded timeseries in terms of their peak values and timings, where one performance measure may capture some of the information contents but be ineffective in others. Thus, this makes a case for compiling knowledge base for various modelling techniques.
Feinauer, Christoph; Procaccini, Andrea; Zecchina, Riccardo; Weigt, Martin; Pagnani, Andrea
2014-01-01
In the course of evolution, proteins show a remarkable conservation of their three-dimensional structure and their biological function, leading to strong evolutionary constraints on the sequence variability between homologous proteins. Our method aims at extracting such constraints from rapidly accumulating sequence data, and thereby at inferring protein structure and function from sequence information alone. Recently, global statistical inference methods (e.g. direct-coupling analysis, sparse inverse covariance estimation) have achieved a breakthrough towards this aim, and their predictions have been successfully implemented into tertiary and quaternary protein structure prediction methods. However, due to the discrete nature of the underlying variable (amino-acids), exact inference requires exponential time in the protein length, and efficient approximations are needed for practical applicability. Here we propose a very efficient multivariate Gaussian modeling approach as a variant of direct-coupling analysis: the discrete amino-acid variables are replaced by continuous Gaussian random variables. The resulting statistical inference problem is efficiently and exactly solvable. We show that the quality of inference is comparable or superior to the one achieved by mean-field approximations to inference with discrete variables, as done by direct-coupling analysis. This is true for (i) the prediction of residue-residue contacts in proteins, and (ii) the identification of protein-protein interaction partner in bacterial signal transduction. An implementation of our multivariate Gaussian approach is available at the website http://areeweb.polito.it/ricerca/cmp/code. PMID:24663061
The space of ultrametric phylogenetic trees.
Gavryushkin, Alex; Drummond, Alexei J
2016-08-21
The reliability of a phylogenetic inference method from genomic sequence data is ensured by its statistical consistency. Bayesian inference methods produce a sample of phylogenetic trees from the posterior distribution given sequence data. Hence the question of statistical consistency of such methods is equivalent to the consistency of the summary of the sample. More generally, statistical consistency is ensured by the tree space used to analyse the sample. In this paper, we consider two standard parameterisations of phylogenetic time-trees used in evolutionary models: inter-coalescent interval lengths and absolute times of divergence events. For each of these parameterisations we introduce a natural metric space on ultrametric phylogenetic trees. We compare the introduced spaces with existing models of tree space and formulate several formal requirements that a metric space on phylogenetic trees must possess in order to be a satisfactory space for statistical analysis, and justify them. We show that only a few known constructions of the space of phylogenetic trees satisfy these requirements. However, our results suggest that these basic requirements are not enough to distinguish between the two metric spaces we introduce and that the choice between metric spaces requires additional properties to be considered. Particularly, that the summary tree minimising the square distance to the trees from the sample might be different for different parameterisations. This suggests that further fundamental insight is needed into the problem of statistical consistency of phylogenetic inference methods. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
Assessing risk factors for dental caries: a statistical modeling approach.
Trottini, Mario; Bossù, Maurizio; Corridore, Denise; Ierardo, Gaetano; Luzzi, Valeria; Saccucci, Matteo; Polimeni, Antonella
2015-01-01
The problem of identifying potential determinants and predictors of dental caries is of key importance in caries research and it has received considerable attention in the scientific literature. From the methodological side, a broad range of statistical models is currently available to analyze dental caries indices (DMFT, dmfs, etc.). These models have been applied in several studies to investigate the impact of different risk factors on the cumulative severity of dental caries experience. However, in most of the cases (i) these studies focus on a very specific subset of risk factors; and (ii) in the statistical modeling only few candidate models are considered and model selection is at best only marginally addressed. As a result, our understanding of the robustness of the statistical inferences with respect to the choice of the model is very limited; the richness of the set of statistical models available for analysis in only marginally exploited; and inferences could be biased due the omission of potentially important confounding variables in the model's specification. In this paper we argue that these limitations can be overcome considering a general class of candidate models and carefully exploring the model space using standard model selection criteria and measures of global fit and predictive performance of the candidate models. Strengths and limitations of the proposed approach are illustrated with a real data set. In our illustration the model space contains more than 2.6 million models, which require inferences to be adjusted for 'optimism'.
Causal inference in economics and marketing.
Varian, Hal R
2016-07-05
This is an elementary introduction to causal inference in economics written for readers familiar with machine learning methods. The critical step in any causal analysis is estimating the counterfactual-a prediction of what would have happened in the absence of the treatment. The powerful techniques used in machine learning may be useful for developing better estimates of the counterfactual, potentially improving causal inference.
Causal inference in economics and marketing
Varian, Hal R.
2016-01-01
This is an elementary introduction to causal inference in economics written for readers familiar with machine learning methods. The critical step in any causal analysis is estimating the counterfactual—a prediction of what would have happened in the absence of the treatment. The powerful techniques used in machine learning may be useful for developing better estimates of the counterfactual, potentially improving causal inference. PMID:27382144
de la Rúa, Nicholas M.; Bustamante, Dulce M.; Menes, Marianela; Stevens, Lori; Monroy, Carlota; Kilpatrick, William; Rizzo, Donna; Klotz, Stephen A.; Schmidt, Justin; Axen, Heather J.; Dorn, Patricia L.
2014-01-01
Phylogenetic relationships of insect vectors of parasitic diseases are important for understanding the evolution of epidemiologically relevant traits, and may be useful in vector control. The subfamily Triatominae (Hemiptera:Reduviidae) includes ~140 extant species arranged in five tribes comprised of 15 genera. The genus Triatoma is the most species-rich and contains important vectors of Trypanosoma cruzi, the causative agent of Chagas disease. Triatoma species were grouped into complexes originally by morphology and more recently with the addition of information from molecular phylogenetics (the four-complex hypothesis); however, without a strict adherence to monophyly. To date, the validity of proposed species complexes has not been tested by statistical tests of topology. The goal of this study was to clarify the systematics of 19 Triatoma species from North and Central America. We inferred their evolutionary relatedness using two independent data sets: the complete nuclear Internal Transcribed Spacer-2 ribosomal DNA (ITS-2 rDNA) and head morphometrics. In addition, we used the Shimodaira-Hasegawa statistical test of topology to assess the fit of the data to a set of competing systematic hypotheses (topologies). An unconstrained topology inferred from the ITS-2 data was compared to topologies constrained based on the four-complex hypothesis or one inferred from our morphometry results. The unconstrained topology represents a statistically significant better fit of the molecular data than either the four-complex or the morphometric topology. We propose an update to the composition of species complexes in the North and Central American Triatoma, based on a phylogeny inferred from ITS-2 as a first step towards updating the phylogeny of the complexes based on monophyly and statistical tests of topologies. PMID:24681261
Emerging Concepts of Data Integration in Pathogen Phylodynamics.
Baele, Guy; Suchard, Marc A; Rambaut, Andrew; Lemey, Philippe
2017-01-01
Phylodynamics has become an increasingly popular statistical framework to extract evolutionary and epidemiological information from pathogen genomes. By harnessing such information, epidemiologists aim to shed light on the spatio-temporal patterns of spread and to test hypotheses about the underlying interaction of evolutionary and ecological dynamics in pathogen populations. Although the field has witnessed a rich development of statistical inference tools with increasing levels of sophistication, these tools initially focused on sequences as their sole primary data source. Integrating various sources of information, however, promises to deliver more precise insights in infectious diseases and to increase opportunities for statistical hypothesis testing. Here, we review how the emerging concept of data integration is stimulating new advances in Bayesian evolutionary inference methodology which formalize a marriage of statistical thinking and evolutionary biology. These approaches include connecting sequence to trait evolution, such as for host, phenotypic and geographic sampling information, but also the incorporation of covariates of evolutionary and epidemic processes in the reconstruction procedures. We highlight how a full Bayesian approach to covariate modeling and testing can generate further insights into sequence evolution, trait evolution, and population dynamics in pathogen populations. Specific examples demonstrate how such approaches can be used to test the impact of host on rabies and HIV evolutionary rates, to identify the drivers of influenza dispersal as well as the determinants of rabies cross-species transmissions, and to quantify the evolutionary dynamics of influenza antigenicity. Finally, we briefly discuss how data integration is now also permeating through the inference of transmission dynamics, leading to novel insights into tree-generative processes and detailed reconstructions of transmission trees. [Bayesian inference; birth–death models; coalescent models; continuous trait evolution; covariates; data integration; discrete trait evolution; pathogen phylodynamics.
Emerging Concepts of Data Integration in Pathogen Phylodynamics
Baele, Guy; Suchard, Marc A.; Rambaut, Andrew; Lemey, Philippe
2017-01-01
Phylodynamics has become an increasingly popular statistical framework to extract evolutionary and epidemiological information from pathogen genomes. By harnessing such information, epidemiologists aim to shed light on the spatio-temporal patterns of spread and to test hypotheses about the underlying interaction of evolutionary and ecological dynamics in pathogen populations. Although the field has witnessed a rich development of statistical inference tools with increasing levels of sophistication, these tools initially focused on sequences as their sole primary data source. Integrating various sources of information, however, promises to deliver more precise insights in infectious diseases and to increase opportunities for statistical hypothesis testing. Here, we review how the emerging concept of data integration is stimulating new advances in Bayesian evolutionary inference methodology which formalize a marriage of statistical thinking and evolutionary biology. These approaches include connecting sequence to trait evolution, such as for host, phenotypic and geographic sampling information, but also the incorporation of covariates of evolutionary and epidemic processes in the reconstruction procedures. We highlight how a full Bayesian approach to covariate modeling and testing can generate further insights into sequence evolution, trait evolution, and population dynamics in pathogen populations. Specific examples demonstrate how such approaches can be used to test the impact of host on rabies and HIV evolutionary rates, to identify the drivers of influenza dispersal as well as the determinants of rabies cross-species transmissions, and to quantify the evolutionary dynamics of influenza antigenicity. Finally, we briefly discuss how data integration is now also permeating through the inference of transmission dynamics, leading to novel insights into tree-generative processes and detailed reconstructions of transmission trees. [Bayesian inference; birth–death models; coalescent models; continuous trait evolution; covariates; data integration; discrete trait evolution; pathogen phylodynamics. PMID:28173504
Ayres, Daniel L; Darling, Aaron; Zwickl, Derrick J; Beerli, Peter; Holder, Mark T; Lewis, Paul O; Huelsenbeck, John P; Ronquist, Fredrik; Swofford, David L; Cummings, Michael P; Rambaut, Andrew; Suchard, Marc A
2012-01-01
Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years, there has been a concentration of interest in statistical approaches such as Bayesian inference and maximum likelihood estimation. Yet, for large data sets and realistic or interesting models of evolution, these approaches remain computationally demanding. High-throughput sequencing can yield data for thousands of taxa, but scaling to such problems using serial computing often necessitates the use of nonstatistical or approximate approaches. The recent emergence of graphics processing units (GPUs) provides an opportunity to leverage their excellent floating-point computational performance to accelerate statistical phylogenetic inference. A specialized library for phylogenetic calculation would allow existing software packages to make more effective use of available computer hardware, including GPUs. Adoption of a common library would also make it easier for other emerging computing architectures, such as field programmable gate arrays, to be used in the future. We present BEAGLE, an application programming interface (API) and library for high-performance statistical phylogenetic inference. The API provides a uniform interface for performing phylogenetic likelihood calculations on a variety of compute hardware platforms. The library includes a set of efficient implementations and can currently exploit hardware including GPUs using NVIDIA CUDA, central processing units (CPUs) with Streaming SIMD Extensions and related processor supplementary instruction sets, and multicore CPUs via OpenMP. To demonstrate the advantages of a common API, we have incorporated the library into several popular phylogenetic software packages. The BEAGLE library is free open source software licensed under the Lesser GPL and available from http://beagle-lib.googlecode.com. An example client program is available as public domain software.
Ayres, Daniel L.; Darling, Aaron; Zwickl, Derrick J.; Beerli, Peter; Holder, Mark T.; Lewis, Paul O.; Huelsenbeck, John P.; Ronquist, Fredrik; Swofford, David L.; Cummings, Michael P.; Rambaut, Andrew; Suchard, Marc A.
2012-01-01
Abstract Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years, there has been a concentration of interest in statistical approaches such as Bayesian inference and maximum likelihood estimation. Yet, for large data sets and realistic or interesting models of evolution, these approaches remain computationally demanding. High-throughput sequencing can yield data for thousands of taxa, but scaling to such problems using serial computing often necessitates the use of nonstatistical or approximate approaches. The recent emergence of graphics processing units (GPUs) provides an opportunity to leverage their excellent floating-point computational performance to accelerate statistical phylogenetic inference. A specialized library for phylogenetic calculation would allow existing software packages to make more effective use of available computer hardware, including GPUs. Adoption of a common library would also make it easier for other emerging computing architectures, such as field programmable gate arrays, to be used in the future. We present BEAGLE, an application programming interface (API) and library for high-performance statistical phylogenetic inference. The API provides a uniform interface for performing phylogenetic likelihood calculations on a variety of compute hardware platforms. The library includes a set of efficient implementations and can currently exploit hardware including GPUs using NVIDIA CUDA, central processing units (CPUs) with Streaming SIMD Extensions and related processor supplementary instruction sets, and multicore CPUs via OpenMP. To demonstrate the advantages of a common API, we have incorporated the library into several popular phylogenetic software packages. The BEAGLE library is free open source software licensed under the Lesser GPL and available from http://beagle-lib.googlecode.com. An example client program is available as public domain software. PMID:21963610
NASA Astrophysics Data System (ADS)
Oriani, Fabio
2017-04-01
The unpredictable nature of rainfall makes its estimation as much difficult as it is essential to hydrological applications. Stochastic simulation is often considered a convenient approach to asses the uncertainty of rainfall processes, but preserving their irregular behavior and variability at multiple scales is a challenge even for the most advanced techniques. In this presentation, an overview on the Direct Sampling technique [1] and its recent application to rainfall and hydrological data simulation [2, 3] is given. The algorithm, having its roots in multiple-point statistics, makes use of a training data set to simulate the outcome of a process without inferring any explicit probability measure: the data are simulated in time or space by sampling the training data set where a sufficiently similar group of neighbor data exists. This approach allows preserving complex statistical dependencies at different scales with a good approximation, while reducing the parameterization to the minimum. The straights and weaknesses of the Direct Sampling approach are shown through a series of applications to rainfall and hydrological data: from time-series simulation to spatial rainfall fields conditioned by elevation or a climate scenario. In the era of vast databases, is this data-driven approach a valid alternative to parametric simulation techniques? [1] Mariethoz G., Renard P., and Straubhaar J. (2010), The Direct Sampling method to perform multiple-point geostatistical simulations, Water. Rerous. Res., 46(11), http://dx.doi.org/10.1029/2008WR007621 [2] Oriani F., Straubhaar J., Renard P., and Mariethoz G. (2014), Simulation of rainfall time series from different climatic regions using the direct sampling technique, Hydrol. Earth Syst. Sci., 18, 3015-3031, http://dx.doi.org/10.5194/hess-18-3015-2014 [3] Oriani F., Borghi A., Straubhaar J., Mariethoz G., Renard P. (2016), Missing data simulation inside flow rate time-series using multiple-point statistics, Environ. Model. Softw., vol. 86, pp. 264 - 276, http://dx.doi.org/10.1016/j.envsoft.2016.10.002
ERIC Educational Resources Information Center
Page, Robert; Satake, Eiki
2017-01-01
While interest in Bayesian statistics has been growing in statistics education, the treatment of the topic is still inadequate in both textbooks and the classroom. Because so many fields of study lead to careers that involve a decision-making process requiring an understanding of Bayesian methods, it is becoming increasingly clear that Bayesian…
Human Inferences about Sequences: A Minimal Transition Probability Model
2016-01-01
The brain constantly infers the causes of the inputs it receives and uses these inferences to generate statistical expectations about future observations. Experimental evidence for these expectations and their violations include explicit reports, sequential effects on reaction times, and mismatch or surprise signals recorded in electrophysiology and functional MRI. Here, we explore the hypothesis that the brain acts as a near-optimal inference device that constantly attempts to infer the time-varying matrix of transition probabilities between the stimuli it receives, even when those stimuli are in fact fully unpredictable. This parsimonious Bayesian model, with a single free parameter, accounts for a broad range of findings on surprise signals, sequential effects and the perception of randomness. Notably, it explains the pervasive asymmetry between repetitions and alternations encountered in those studies. Our analysis suggests that a neural machinery for inferring transition probabilities lies at the core of human sequence knowledge. PMID:28030543
Exploring High School Students Beginning Reasoning about Significance Tests with Technology
ERIC Educational Resources Information Center
García, Víctor N.; Sánchez, Ernesto
2017-01-01
In the present study we analyze how students reason about or make inferences given a particular hypothesis testing problem (without having studied formal methods of statistical inference) when using Fathom. They use Fathom to create an empirical sampling distribution through computer simulation. It is found that most student´s reasoning rely on…
Stan: A Probabilistic Programming Language for Bayesian Inference and Optimization
ERIC Educational Resources Information Center
Gelman, Andrew; Lee, Daniel; Guo, Jiqiang
2015-01-01
Stan is a free and open-source C++ program that performs Bayesian inference or optimization for arbitrary user-specified models and can be called from the command line, R, Python, Matlab, or Julia and has great promise for fitting large and complex statistical models in many areas of application. We discuss Stan from users' and developers'…
IMNN: Information Maximizing Neural Networks
NASA Astrophysics Data System (ADS)
Charnock, Tom; Lavaux, Guilhem; Wandelt, Benjamin D.
2018-04-01
This software trains artificial neural networks to find non-linear functionals of data that maximize Fisher information: information maximizing neural networks (IMNNs). As compressing large data sets vastly simplifies both frequentist and Bayesian inference, important information may be inadvertently missed. Likelihood-free inference based on automatically derived IMNN summaries produces summaries that are good approximations to sufficient statistics. IMNNs are robustly capable of automatically finding optimal, non-linear summaries of the data even in cases where linear compression fails: inferring the variance of Gaussian signal in the presence of noise, inferring cosmological parameters from mock simulations of the Lyman-α forest in quasar spectra, and inferring frequency-domain parameters from LISA-like detections of gravitational waveforms. In this final case, the IMNN summary outperforms linear data compression by avoiding the introduction of spurious likelihood maxima.
Anchoring quartet-based phylogenetic distances and applications to species tree reconstruction.
Sayyari, Erfan; Mirarab, Siavash
2016-11-11
Inferring species trees from gene trees using the coalescent-based summary methods has been the subject of much attention, yet new scalable and accurate methods are needed. We introduce DISTIQUE, a new statistically consistent summary method for inferring species trees from gene trees under the coalescent model. We generalize our results to arbitrary phylogenetic inference problems; we show that two arbitrarily chosen leaves, called anchors, can be used to estimate relative distances between all other pairs of leaves by inferring relevant quartet trees. This results in a family of distance-based tree inference methods, with running times ranging between quadratic to quartic in the number of leaves. We show in simulated studies that DISTIQUE has comparable accuracy to leading coalescent-based summary methods and reduced running times.
Data Acquisition and Preprocessing in Studies on Humans: What Is Not Taught in Statistics Classes?
Zhu, Yeyi; Hernandez, Ladia M; Mueller, Peter; Dong, Yongquan; Forman, Michele R
2013-01-01
The aim of this paper is to address issues in research that may be missing from statistics classes and important for (bio-)statistics students. In the context of a case study, we discuss data acquisition and preprocessing steps that fill the gap between research questions posed by subject matter scientists and statistical methodology for formal inference. Issues include participant recruitment, data collection training and standardization, variable coding, data review and verification, data cleaning and editing, and documentation. Despite the critical importance of these details in research, most of these issues are rarely discussed in an applied statistics program. One reason for the lack of more formal training is the difficulty in addressing the many challenges that can possibly arise in the course of a study in a systematic way. This article can help to bridge this gap between research questions and formal statistical inference by using an illustrative case study for a discussion. We hope that reading and discussing this paper and practicing data preprocessing exercises will sensitize statistics students to these important issues and achieve optimal conduct, quality control, analysis, and interpretation of a study.
An Artificial Intelligence Approach to Analyzing Student Errors in Statistics.
ERIC Educational Resources Information Center
Sebrechts, Marc M.; Schooler, Lael J.
1987-01-01
Describes the development of an artificial intelligence system called GIDE that analyzes student errors in statistics problems by inferring the students' intentions. Learning strategies involved in problem solving are discussed and the inclusion of goal structures is explained. (LRW)
Network inference using informative priors.
Mukherjee, Sach; Speed, Terence P
2008-09-23
Recent years have seen much interest in the study of systems characterized by multiple interacting components. A class of statistical models called graphical models, in which graphs are used to represent probabilistic relationships between variables, provides a framework for formal inference regarding such systems. In many settings, the object of inference is the network structure itself. This problem of "network inference" is well known to be a challenging one. However, in scientific settings there is very often existing information regarding network connectivity. A natural idea then is to take account of such information during inference. This article addresses the question of incorporating prior information into network inference. We focus on directed models called Bayesian networks, and use Markov chain Monte Carlo to draw samples from posterior distributions over network structures. We introduce prior distributions on graphs capable of capturing information regarding network features including edges, classes of edges, degree distributions, and sparsity. We illustrate our approach in the context of systems biology, applying our methods to network inference in cancer signaling.
Mixed-Methods Research in Nutrition and Dietetics.
Zoellner, Jamie; Harris, Jeffrey E
2017-05-01
This work focuses on mixed-methods research (MMR) and is the 11th in a series exploring the importance of research design, statistical analysis, and epidemiologic methods as applied to nutrition and dietetics research. MMR research is an investigative technique that applies both quantitative and qualitative data. The purpose of this article is to define MMR; describe its history and nature; provide reasons for its use; describe and explain the six different MMR designs; describe sample selection; and provide guidance in data collection, analysis, and inference. MMR concepts are applied and integrated with nutrition-related scenarios in real-world research contexts and summary recommendations are provided. Copyright © 2017 Academy of Nutrition and Dietetics. Published by Elsevier Inc. All rights reserved.
High-Reproducibility and High-Accuracy Method for Automated Topic Classification
NASA Astrophysics Data System (ADS)
Lancichinetti, Andrea; Sirer, M. Irmak; Wang, Jane X.; Acuna, Daniel; Körding, Konrad; Amaral, Luís A. Nunes
2015-01-01
Much of human knowledge sits in large databases of unstructured text. Leveraging this knowledge requires algorithms that extract and record metadata on unstructured text documents. Assigning topics to documents will enable intelligent searching, statistical characterization, and meaningful classification. Latent Dirichlet allocation (LDA) is the state of the art in topic modeling. Here, we perform a systematic theoretical and numerical analysis that demonstrates that current optimization techniques for LDA often yield results that are not accurate in inferring the most suitable model parameters. Adapting approaches from community detection in networks, we propose a new algorithm that displays high reproducibility and high accuracy and also has high computational efficiency. We apply it to a large set of documents in the English Wikipedia and reveal its hierarchical structure.
Restoration of HST images with missing data
NASA Technical Reports Server (NTRS)
Adorf, Hans-Martin
1992-01-01
Missing data are a fairly common problem when restoring Hubble Space Telescope observations of extended sources. On Wide Field and Planetary Camera images cosmic ray hits and CCD hot spots are the prevalent causes of data losses, whereas on Faint Object Camera images data are lossed due to reseaux marks, blemishes, areas of saturation and the omnipresent frame edges. This contribution discusses a technique for 'filling in' missing data by statistical inference using information from the surrounding pixels. The major gain consists in minimizing adverse spill-over effects to the restoration in areas neighboring those where data are missing. When the mask delineating the support of 'missing data' is made dynamic, cosmic ray hits, etc. can be detected on the fly during restoration.
Inferring molecular interactions pathways from eQTL data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rashid, Imran; McDermott, Jason E.; Samudrala, Ram
Analysis of expression quantitative trait loci (eQTL) helps elucidate the connection between genotype, gene expression levels, and phenotype. However, standard statistical genetics can only attribute changes in expression levels to loci on the genome, not specific genes. Each locus can contain many genes, making it very difficult to discover which gene is controlling the expression levels of other genes. Furthermore, it is even more difficult to find a pathway of molecular interactions responsible for controlling the expression levels. Here we describe a series of techniques for finding explanatory pathways by exploring graphs of molecular interactions. We show several simple methodsmore » can find complete pathways the explain the mechanism of differential expression in eQTL data.« less
The basis function approach for modeling autocorrelation in ecological data.
Hefley, Trevor J; Broms, Kristin M; Brost, Brian M; Buderman, Frances E; Kay, Shannon L; Scharf, Henry R; Tipton, John R; Williams, Perry J; Hooten, Mevin B
2017-03-01
Analyzing ecological data often requires modeling the autocorrelation created by spatial and temporal processes. Many seemingly disparate statistical methods used to account for autocorrelation can be expressed as regression models that include basis functions. Basis functions also enable ecologists to modify a wide range of existing ecological models in order to account for autocorrelation, which can improve inference and predictive accuracy. Furthermore, understanding the properties of basis functions is essential for evaluating the fit of spatial or time-series models, detecting a hidden form of collinearity, and analyzing large data sets. We present important concepts and properties related to basis functions and illustrate several tools and techniques ecologists can use when modeling autocorrelation in ecological data. © 2016 by the Ecological Society of America.
NASA Technical Reports Server (NTRS)
Ding, Y. J.; Hong, Q. F.; Hagyard, M. J.; Deloach, A. C.; Liu, X. P.
1987-01-01
Techniques to identify sources of electric current systems and their channels of flow in solar active regions are explored. Measured photospheric vector magnetic fields together with high-resolution white-light and H-alpha filtergrams provide the data base to derive the current systems in the photosphere and chromosphere. As an example, the techniques are then applied to infer current systems in AR 2372 in early April 1980.
Reaction Time in Grade 5: Data Collection within the Practice of Statistics
ERIC Educational Resources Information Center
Watson, Jane; English, Lyn
2017-01-01
This study reports on a classroom activity for Grade 5 students investigating their reaction times. The investigation was part of a 3-year research project introducing students to informal inference and giving them experience carrying out the practice of statistics. For this activity the focus within the practice of statistics was on introducing…
ERIC Educational Resources Information Center
Bakker, Arthur; Ben-Zvi, Dani; Makar, Katie
2017-01-01
To understand how statistical and other types of reasoning are coordinated with actions to reduce uncertainty, we conducted a case study in vocational education that involved statistical hypothesis testing. We analyzed an intern's research project in a hospital laboratory in which reducing uncertainties was crucial to make a valid statistical…
Causal inference in biology networks with integrated belief propagation.
Chang, Rui; Karr, Jonathan R; Schadt, Eric E
2015-01-01
Inferring causal relationships among molecular and higher order phenotypes is a critical step in elucidating the complexity of living systems. Here we propose a novel method for inferring causality that is no longer constrained by the conditional dependency arguments that limit the ability of statistical causal inference methods to resolve causal relationships within sets of graphical models that are Markov equivalent. Our method utilizes Bayesian belief propagation to infer the responses of perturbation events on molecular traits given a hypothesized graph structure. A distance measure between the inferred response distribution and the observed data is defined to assess the 'fitness' of the hypothesized causal relationships. To test our algorithm, we infer causal relationships within equivalence classes of gene networks in which the form of the functional interactions that are possible are assumed to be nonlinear, given synthetic microarray and RNA sequencing data. We also apply our method to infer causality in real metabolic network with v-structure and feedback loop. We show that our method can recapitulate the causal structure and recover the feedback loop only from steady-state data which conventional method cannot.
Wilkinson, Michael
2014-03-01
Decisions about support for predictions of theories in light of data are made using statistical inference. The dominant approach in sport and exercise science is the Neyman-Pearson (N-P) significance-testing approach. When applied correctly it provides a reliable procedure for making dichotomous decisions for accepting or rejecting zero-effect null hypotheses with known and controlled long-run error rates. Type I and type II error rates must be specified in advance and the latter controlled by conducting an a priori sample size calculation. The N-P approach does not provide the probability of hypotheses or indicate the strength of support for hypotheses in light of data, yet many scientists believe it does. Outcomes of analyses allow conclusions only about the existence of non-zero effects, and provide no information about the likely size of true effects or their practical/clinical value. Bayesian inference can show how much support data provide for different hypotheses, and how personal convictions should be altered in light of data, but the approach is complicated by formulating probability distributions about prior subjective estimates of population effects. A pragmatic solution is magnitude-based inference, which allows scientists to estimate the true magnitude of population effects and how likely they are to exceed an effect magnitude of practical/clinical importance, thereby integrating elements of subjective Bayesian-style thinking. While this approach is gaining acceptance, progress might be hastened if scientists appreciate the shortcomings of traditional N-P null hypothesis significance testing.
Bayesian multimodel inference for dose-response studies
Link, W.A.; Albers, P.H.
2007-01-01
Statistical inference in dose?response studies is model-based: The analyst posits a mathematical model of the relation between exposure and response, estimates parameters of the model, and reports conclusions conditional on the model. Such analyses rarely include any accounting for the uncertainties associated with model selection. The Bayesian inferential system provides a convenient framework for model selection and multimodel inference. In this paper we briefly describe the Bayesian paradigm and Bayesian multimodel inference. We then present a family of models for multinomial dose?response data and apply Bayesian multimodel inferential methods to the analysis of data on the reproductive success of American kestrels (Falco sparveriuss) exposed to various sublethal dietary concentrations of methylmercury.
Tropical geometry of statistical models.
Pachter, Lior; Sturmfels, Bernd
2004-11-16
This article presents a unified mathematical framework for inference in graphical models, building on the observation that graphical models are algebraic varieties. From this geometric viewpoint, observations generated from a model are coordinates of a point in the variety, and the sum-product algorithm is an efficient tool for evaluating specific coordinates. Here, we address the question of how the solutions to various inference problems depend on the model parameters. The proposed answer is expressed in terms of tropical algebraic geometry. The Newton polytope of a statistical model plays a key role. Our results are applied to the hidden Markov model and the general Markov model on a binary tree.
Salehi, Sohrab; Steif, Adi; Roth, Andrew; Aparicio, Samuel; Bouchard-Côté, Alexandre; Shah, Sohrab P
2017-03-01
Next-generation sequencing (NGS) of bulk tumour tissue can identify constituent cell populations in cancers and measure their abundance. This requires computational deconvolution of allelic counts from somatic mutations, which may be incapable of fully resolving the underlying population structure. Single cell sequencing (SCS) is a more direct method, although its replacement of NGS is impeded by technical noise and sampling limitations. We propose ddClone, which analytically integrates NGS and SCS data, leveraging their complementary attributes through joint statistical inference. We show on real and simulated datasets that ddClone produces more accurate results than can be achieved by either method alone.
The Heuristic Value of p in Inductive Statistical Inference
Krueger, Joachim I.; Heck, Patrick R.
2017-01-01
Many statistical methods yield the probability of the observed data – or data more extreme – under the assumption that a particular hypothesis is true. This probability is commonly known as ‘the’ p-value. (Null Hypothesis) Significance Testing ([NH]ST) is the most prominent of these methods. The p-value has been subjected to much speculation, analysis, and criticism. We explore how well the p-value predicts what researchers presumably seek: the probability of the hypothesis being true given the evidence, and the probability of reproducing significant results. We also explore the effect of sample size on inferential accuracy, bias, and error. In a series of simulation experiments, we find that the p-value performs quite well as a heuristic cue in inductive inference, although there are identifiable limits to its usefulness. We conclude that despite its general usefulness, the p-value cannot bear the full burden of inductive inference; it is but one of several heuristic cues available to the data analyst. Depending on the inferential challenge at hand, investigators may supplement their reports with effect size estimates, Bayes factors, or other suitable statistics, to communicate what they think the data say. PMID:28649206
Subjective randomness as statistical inference.
Griffiths, Thomas L; Daniels, Dylan; Austerweil, Joseph L; Tenenbaum, Joshua B
2018-06-01
Some events seem more random than others. For example, when tossing a coin, a sequence of eight heads in a row does not seem very random. Where do these intuitions about randomness come from? We argue that subjective randomness can be understood as the result of a statistical inference assessing the evidence that an event provides for having been produced by a random generating process. We show how this account provides a link to previous work relating randomness to algorithmic complexity, in which random events are those that cannot be described by short computer programs. Algorithmic complexity is both incomputable and too general to capture the regularities that people can recognize, but viewing randomness as statistical inference provides two paths to addressing these problems: considering regularities generated by simpler computing machines, and restricting the set of probability distributions that characterize regularity. Building on previous work exploring these different routes to a more restricted notion of randomness, we define strong quantitative models of human randomness judgments that apply not just to binary sequences - which have been the focus of much of the previous work on subjective randomness - but also to binary matrices and spatial clustering. Copyright © 2018 Elsevier Inc. All rights reserved.
The Heuristic Value of p in Inductive Statistical Inference.
Krueger, Joachim I; Heck, Patrick R
2017-01-01
Many statistical methods yield the probability of the observed data - or data more extreme - under the assumption that a particular hypothesis is true. This probability is commonly known as 'the' p -value. (Null Hypothesis) Significance Testing ([NH]ST) is the most prominent of these methods. The p -value has been subjected to much speculation, analysis, and criticism. We explore how well the p -value predicts what researchers presumably seek: the probability of the hypothesis being true given the evidence, and the probability of reproducing significant results. We also explore the effect of sample size on inferential accuracy, bias, and error. In a series of simulation experiments, we find that the p -value performs quite well as a heuristic cue in inductive inference, although there are identifiable limits to its usefulness. We conclude that despite its general usefulness, the p -value cannot bear the full burden of inductive inference; it is but one of several heuristic cues available to the data analyst. Depending on the inferential challenge at hand, investigators may supplement their reports with effect size estimates, Bayes factors, or other suitable statistics, to communicate what they think the data say.
Statistical inference of seabed sound-speed structure in the Gulf of Oman Basin.
Sagers, Jason D; Knobles, David P
2014-06-01
Addressed is the statistical inference of the sound-speed depth profile of a thick soft seabed from broadband sound propagation data recorded in the Gulf of Oman Basin in 1977. The acoustic data are in the form of time series signals recorded on a sparse vertical line array and generated by explosive sources deployed along a 280 km track. The acoustic data offer a unique opportunity to study a deep-water bottom-limited thickly sedimented environment because of the large number of time series measurements, very low seabed attenuation, and auxiliary measurements. A maximum entropy method is employed to obtain a conditional posterior probability distribution (PPD) for the sound-speed ratio and the near-surface sound-speed gradient. The multiple data samples allow for a determination of the average error constraint value required to uniquely specify the PPD for each data sample. Two complicating features of the statistical inference study are addressed: (1) the need to develop an error function that can both utilize the measured multipath arrival structure and mitigate the effects of data errors and (2) the effect of small bathymetric slopes on the structure of the bottom interacting arrivals.
Experimental and environmental factors affect spurious detection of ecological thresholds
Daily, Jonathan P.; Hitt, Nathaniel P.; Smith, David; Snyder, Craig D.
2012-01-01
Threshold detection methods are increasingly popular for assessing nonlinear responses to environmental change, but their statistical performance remains poorly understood. We simulated linear change in stream benthic macroinvertebrate communities and evaluated the performance of commonly used threshold detection methods based on model fitting (piecewise quantile regression [PQR]), data partitioning (nonparametric change point analysis [NCPA]), and a hybrid approach (significant zero crossings [SiZer]). We demonstrated that false detection of ecological thresholds (type I errors) and inferences on threshold locations are influenced by sample size, rate of linear change, and frequency of observations across the environmental gradient (i.e., sample-environment distribution, SED). However, the relative importance of these factors varied among statistical methods and between inference types. False detection rates were influenced primarily by user-selected parameters for PQR (τ) and SiZer (bandwidth) and secondarily by sample size (for PQR) and SED (for SiZer). In contrast, the location of reported thresholds was influenced primarily by SED. Bootstrapped confidence intervals for NCPA threshold locations revealed strong correspondence to SED. We conclude that the choice of statistical methods for threshold detection should be matched to experimental and environmental constraints to minimize false detection rates and avoid spurious inferences regarding threshold location.
Identifying biologically relevant differences between metagenomic communities.
Parks, Donovan H; Beiko, Robert G
2010-03-15
Metagenomics is the study of genetic material recovered directly from environmental samples. Taxonomic and functional differences between metagenomic samples can highlight the influence of ecological factors on patterns of microbial life in a wide range of habitats. Statistical hypothesis tests can help us distinguish ecological influences from sampling artifacts, but knowledge of only the P-value from a statistical hypothesis test is insufficient to make inferences about biological relevance. Current reporting practices for pairwise comparative metagenomics are inadequate, and better tools are needed for comparative metagenomic analysis. We have developed a new software package, STAMP, for comparative metagenomics that supports best practices in analysis and reporting. Examination of a pair of iron mine metagenomes demonstrates that deeper biological insights can be gained using statistical techniques available in our software. An analysis of the functional potential of 'Candidatus Accumulibacter phosphatis' in two enhanced biological phosphorus removal metagenomes identified several subsystems that differ between the A.phosphatis stains in these related communities, including phosphate metabolism, secretion and metal transport. Python source code and binaries are freely available from our website at http://kiwi.cs.dal.ca/Software/STAMP CONTACT: beiko@cs.dal.ca Supplementary data are available at Bioinformatics online.
NASA Astrophysics Data System (ADS)
Liu, Bilan; Qiu, Xing; Zhu, Tong; Tian, Wei; Hu, Rui; Ekholm, Sven; Schifitto, Giovanni; Zhong, Jianhui
2016-03-01
Subject-specific longitudinal DTI study is vital for investigation of pathological changes of lesions and disease evolution. Spatial Regression Analysis of Diffusion tensor imaging (SPREAD) is a non-parametric permutation-based statistical framework that combines spatial regression and resampling techniques to achieve effective detection of localized longitudinal diffusion changes within the whole brain at individual level without a priori hypotheses. However, boundary blurring and dislocation limit its sensitivity, especially towards detecting lesions of irregular shapes. In the present study, we propose an improved SPREAD (dubbed improved SPREAD, or iSPREAD) method by incorporating a three-dimensional (3D) nonlinear anisotropic diffusion filtering method, which provides edge-preserving image smoothing through a nonlinear scale space approach. The statistical inference based on iSPREAD was evaluated and compared with the original SPREAD method using both simulated and in vivo human brain data. Results demonstrated that the sensitivity and accuracy of the SPREAD method has been improved substantially by adapting nonlinear anisotropic filtering. iSPREAD identifies subject-specific longitudinal changes in the brain with improved sensitivity, accuracy, and enhanced statistical power, especially when the spatial correlation is heterogeneous among neighboring image pixels in DTI.
Time-dependent summary receiver operating characteristics for meta-analysis of prognostic studies.
Hattori, Satoshi; Zhou, Xiao-Hua
2016-11-20
Prognostic studies are widely conducted to examine whether biomarkers are associated with patient's prognoses and play important roles in medical decisions. Because findings from one prognostic study may be very limited, meta-analyses may be useful to obtain sound evidence. However, prognostic studies are often analyzed by relying on a study-specific cut-off value, which can lead to difficulty in applying the standard meta-analysis techniques. In this paper, we propose two methods to estimate a time-dependent version of the summary receiver operating characteristics curve for meta-analyses of prognostic studies with a right-censored time-to-event outcome. We introduce a bivariate normal model for the pair of time-dependent sensitivity and specificity and propose a method to form inferences based on summary statistics reported in published papers. This method provides a valid inference asymptotically. In addition, we consider a bivariate binomial model. To draw inferences from this bivariate binomial model, we introduce a multiple imputation method. The multiple imputation is found to be approximately proper multiple imputation, and thus the standard Rubin's variance formula is justified from a Bayesian view point. Our simulation study and application to a real dataset revealed that both methods work well with a moderate or large number of studies and the bivariate binomial model coupled with the multiple imputation outperforms the bivariate normal model with a small number of studies. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Valid statistical approaches for analyzing sholl data: Mixed effects versus simple linear models.
Wilson, Machelle D; Sethi, Sunjay; Lein, Pamela J; Keil, Kimberly P
2017-03-01
The Sholl technique is widely used to quantify dendritic morphology. Data from such studies, which typically sample multiple neurons per animal, are often analyzed using simple linear models. However, simple linear models fail to account for intra-class correlation that occurs with clustered data, which can lead to faulty inferences. Mixed effects models account for intra-class correlation that occurs with clustered data; thus, these models more accurately estimate the standard deviation of the parameter estimate, which produces more accurate p-values. While mixed models are not new, their use in neuroscience has lagged behind their use in other disciplines. A review of the published literature illustrates common mistakes in analyses of Sholl data. Analysis of Sholl data collected from Golgi-stained pyramidal neurons in the hippocampus of male and female mice using both simple linear and mixed effects models demonstrates that the p-values and standard deviations obtained using the simple linear models are biased downwards and lead to erroneous rejection of the null hypothesis in some analyses. The mixed effects approach more accurately models the true variability in the data set, which leads to correct inference. Mixed effects models avoid faulty inference in Sholl analysis of data sampled from multiple neurons per animal by accounting for intra-class correlation. Given the widespread practice in neuroscience of obtaining multiple measurements per subject, there is a critical need to apply mixed effects models more widely. Copyright © 2017 Elsevier B.V. All rights reserved.
Finding user personal interests by tweet-mining using advanced machine learning algorithm in R
NASA Astrophysics Data System (ADS)
Krithika, L. B.; Roy, P.; Asha Jerlin, M.
2017-11-01
The social-media plays a key role in every individual’s life by anyone’s personal views about their liking-ness/disliking-ness. This methodology is a sharp departure from the traditional techniques of inferring interests of a user from the tweets that he/she posts or receives. It is showed that the topics of interest inferred by the proposed methodology are far superior than the topics extracted by state-of-the-art techniques such as using topic models (Labelled LDA) on tweets. Based upon the proposed methodology, a system has been built, “Who is interested in what”, which can infer the interests of millions of Twitter users. A novel mechanism is proposed to infer topics of interest of individual users in the twitter social network. It has been observed that in twitter, a user generally follows experts on various topics of his/her interest in order to acquire information on those topics. A methodology based on social annotations is used to first deduce the topical expertise of popular twitter users and then transitively infer the interests of the users who follow them.
Advances in Bayesian Modeling in Educational Research
ERIC Educational Resources Information Center
Levy, Roy
2016-01-01
In this article, I provide a conceptually oriented overview of Bayesian approaches to statistical inference and contrast them with frequentist approaches that currently dominate conventional practice in educational research. The features and advantages of Bayesian approaches are illustrated with examples spanning several statistical modeling…
Statistical Inference for Quality-Adjusted Survival Time
2003-08-01
survival functions of QAL. If an influence function for a test statistic exists for complete data case, denoted as ’i, then a test statistic for...the survival function for the censoring variable. Zhao and Tsiatis (2001) proposed a test statistic where O is the influence function of the general...to 1 everywhere until a subject’s death. We have considered other forms of test statistics. One option is to use an influence function 0i that is
Intimate Partner Violence in the United States - 2010
... administration............................................................................. 9 Statistical testing and inference ................................................................... 9 Additional methodological information ..........................................................10 2. Prevalence and Frequency of Individual ...
Estimating the probability of rare events: addressing zero failure data.
Quigley, John; Revie, Matthew
2011-07-01
Traditional statistical procedures for estimating the probability of an event result in an estimate of zero when no events are realized. Alternative inferential procedures have been proposed for the situation where zero events have been realized but often these are ad hoc, relying on selecting methods dependent on the data that have been realized. Such data-dependent inference decisions violate fundamental statistical principles, resulting in estimation procedures whose benefits are difficult to assess. In this article, we propose estimating the probability of an event occurring through minimax inference on the probability that future samples of equal size realize no more events than that in the data on which the inference is based. Although motivated by inference on rare events, the method is not restricted to zero event data and closely approximates the maximum likelihood estimate (MLE) for nonzero data. The use of the minimax procedure provides a risk adverse inferential procedure where there are no events realized. A comparison is made with the MLE and regions of the underlying probability are identified where this approach is superior. Moreover, a comparison is made with three standard approaches to supporting inference where no event data are realized, which we argue are unduly pessimistic. We show that for situations of zero events the estimator can be simply approximated with 1/2.5n, where n is the number of trials. © 2011 Society for Risk Analysis.
Xu, Jason; Minin, Vladimir N
2015-07-01
Branching processes are a class of continuous-time Markov chains (CTMCs) with ubiquitous applications. A general difficulty in statistical inference under partially observed CTMC models arises in computing transition probabilities when the discrete state space is large or uncountable. Classical methods such as matrix exponentiation are infeasible for large or countably infinite state spaces, and sampling-based alternatives are computationally intensive, requiring integration over all possible hidden events. Recent work has successfully applied generating function techniques to computing transition probabilities for linear multi-type branching processes. While these techniques often require significantly fewer computations than matrix exponentiation, they also become prohibitive in applications with large populations. We propose a compressed sensing framework that significantly accelerates the generating function method, decreasing computational cost up to a logarithmic factor by only assuming the probability mass of transitions is sparse. We demonstrate accurate and efficient transition probability computations in branching process models for blood cell formation and evolution of self-replicating transposable elements in bacterial genomes.
NASA Astrophysics Data System (ADS)
Troupin, C.; Lenartz, F.; Sirjacobs, D.; Alvera-Azcárate, A.; Barth, A.; Ouberdous, M.; Beckers, J.-M.
2009-04-01
In order to evaluate the variability of the sea surface temperature (SST) in the Western Mediterranean Sea between 1985 and 2005, an integrated approach combining geostatistical tools and modelling techniques has been set up. The objectives are: underline the capability of each tool to capture characteristic phenomena, compare and assess the quality of their outputs, infer an interannual trend from the results. Diva (Data Interpolating Variationnal Analysis, Brasseur et al. (1996) Deep-Sea Res.) was applied on a collection of in situ data gathered from various sources (World Ocean Database 2005, Hydrobase2, Coriolis and MedAtlas2), from which duplicates and suspect values were removed. This provided monthly gridded fields in the region of interest. Heterogeneous time data coverage was taken into account by computing and removing the annual trend, provided by Diva detrending tool. Heterogeneous correlation length was applied through an advection constraint. Statistical technique DINEOF (Data Interpolation with Empirical Orthogonal Functions, Alvera-Azc
Understanding and controlling regime switching in molecular diffusion
NASA Astrophysics Data System (ADS)
Hallerberg, S.; de Wijn, A. S.
2014-12-01
Diffusion can be strongly affected by ballistic flights (long jumps) as well as long-lived sticking trajectories (long sticks). Using statistical inference techniques in the spirit of Granger causality, we investigate the appearance of long jumps and sticks in molecular-dynamics simulations of diffusion in a prototype system, a benzene molecule on a graphite substrate. We find that specific fluctuations in certain, but not all, internal degrees of freedom of the molecule can be linked to either long jumps or sticks. Furthermore, by changing the prevalence of these predictors with an outside influence, the diffusion of the molecule can be controlled. The approach presented in this proof of concept study is very generic and can be applied to larger and more complex molecules. Additionally, the predictor variables can be chosen in a general way so as to be accessible in experiments, making the method feasible for control of diffusion in applications. Our results also demonstrate that data-mining techniques can be used to investigate the phase-space structure of high-dimensional nonlinear dynamical systems.
Xu, Jason; Minin, Vladimir N.
2016-01-01
Branching processes are a class of continuous-time Markov chains (CTMCs) with ubiquitous applications. A general difficulty in statistical inference under partially observed CTMC models arises in computing transition probabilities when the discrete state space is large or uncountable. Classical methods such as matrix exponentiation are infeasible for large or countably infinite state spaces, and sampling-based alternatives are computationally intensive, requiring integration over all possible hidden events. Recent work has successfully applied generating function techniques to computing transition probabilities for linear multi-type branching processes. While these techniques often require significantly fewer computations than matrix exponentiation, they also become prohibitive in applications with large populations. We propose a compressed sensing framework that significantly accelerates the generating function method, decreasing computational cost up to a logarithmic factor by only assuming the probability mass of transitions is sparse. We demonstrate accurate and efficient transition probability computations in branching process models for blood cell formation and evolution of self-replicating transposable elements in bacterial genomes. PMID:26949377
DOE Office of Scientific and Technical Information (OSTI.GOV)
Del Pozzo, Walter; Nikhef National Institute for Subatomic Physics, Science Park 105, 1098 XG Amsterdam; Veitch, John
Second-generation interferometric gravitational-wave detectors, such as Advanced LIGO and Advanced Virgo, are expected to begin operation by 2015. Such instruments plan to reach sensitivities that will offer the unique possibility to test general relativity in the dynamical, strong-field regime and investigate departures from its predictions, in particular, using the signal from coalescing binary systems. We introduce a statistical framework based on Bayesian model selection in which the Bayes factor between two competing hypotheses measures which theory is favored by the data. Probability density functions of the model parameters are then used to quantify the inference on individual parameters. We alsomore » develop a method to combine the information coming from multiple independent observations of gravitational waves, and show how much stronger inference could be. As an introduction and illustration of this framework-and a practical numerical implementation through the Monte Carlo integration technique of nested sampling-we apply it to gravitational waves from the inspiral phase of coalescing binary systems as predicted by general relativity and a very simple alternative theory in which the graviton has a nonzero mass. This method can (and should) be extended to more realistic and physically motivated theories.« less
Comparison of two correlated ROC curves at a given specificity or sensitivity level.
Bantis, Leonidas E; Feng, Ziding
2016-10-30
The receiver operating characteristic (ROC) curve is the most popular statistical tool for evaluating the discriminatory capability of a given continuous biomarker. The need to compare two correlated ROC curves arises when individuals are measured with two biomarkers, which induces paired and thus correlated measurements. Many researchers have focused on comparing two correlated ROC curves in terms of the area under the curve (AUC), which summarizes the overall performance of the marker. However, particular values of specificity may be of interest. We focus on comparing two correlated ROC curves at a given specificity level. We propose parametric approaches, transformations to normality, and nonparametric kernel-based approaches. Our methods can be straightforwardly extended for inference in terms of ROC -1 (t). This is of particular interest for comparing the accuracy of two correlated biomarkers at a given sensitivity level. Extensions also involve inference for the AUC and accommodating covariates. We evaluate the robustness of our techniques through simulations, compare them with other known approaches, and present a real-data application involving prostate cancer screening. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Williams, M S; Ebel, E D; Cao, Y
2013-01-01
The fitting of statistical distributions to microbial sampling data is a common application in quantitative microbiology and risk assessment applications. An underlying assumption of most fitting techniques is that data are collected with simple random sampling, which is often times not the case. This study develops a weighted maximum likelihood estimation framework that is appropriate for microbiological samples that are collected with unequal probabilities of selection. A weighted maximum likelihood estimation framework is proposed for microbiological samples that are collected with unequal probabilities of selection. Two examples, based on the collection of food samples during processing, are provided to demonstrate the method and highlight the magnitude of biases in the maximum likelihood estimator when data are inappropriately treated as a simple random sample. Failure to properly weight samples to account for how data are collected can introduce substantial biases into inferences drawn from the data. The proposed methodology will reduce or eliminate an important source of bias in inferences drawn from the analysis of microbial data. This will also make comparisons between studies and the combination of results from different studies more reliable, which is important for risk assessment applications. © 2012 No claim to US Government works.
Fortunato, Laura; Jordan, Fiona
2010-01-01
Accurate reconstruction of prehistoric social organization is important if we are to put together satisfactory multidisciplinary scenarios about, for example, the dispersal of human groups. Such considerations apply in the case of Indo-European and Austronesian, two large-scale language families that are thought to represent Neolithic expansions. Ancestral kinship patterns have mostly been inferred through reconstruction of kin terminologies in ancestral proto-languages using the linguistic comparative method, and through geographical or distributional arguments based on the comparative patterns of kin terms and ethnographic kinship ‘facts’. While these approaches are detailed and valuable, the processes through which conclusions have been drawn from the data fail to provide explicit criteria for systematic testing of alternative hypotheses. Here, we use language trees derived using phylogenetic tree-building techniques on Indo-European and Austronesian vocabulary data. With these trees, ethnographic data and Bayesian phylogenetic comparative methods, we statistically reconstruct past marital residence and infer rates of cultural change between different residence forms, showing Proto-Indo-European to be virilocal and Proto-Malayo-Polynesian uxorilocal. The instability of uxorilocality and the rare loss of virilocality once gained emerge as common features of both families. PMID:21041215
Dynamic prediction in functional concurrent regression with an application to child growth.
Leroux, Andrew; Xiao, Luo; Crainiceanu, Ciprian; Checkley, William
2018-04-15
In many studies, it is of interest to predict the future trajectory of subjects based on their historical data, referred to as dynamic prediction. Mixed effects models have traditionally been used for dynamic prediction. However, the commonly used random intercept and slope model is often not sufficiently flexible for modeling subject-specific trajectories. In addition, there may be useful exposures/predictors of interest that are measured concurrently with the outcome, complicating dynamic prediction. To address these problems, we propose a dynamic functional concurrent regression model to handle the case where both the functional response and the functional predictors are irregularly measured. Currently, such a model cannot be fit by existing software. We apply the model to dynamically predict children's length conditional on prior length, weight, and baseline covariates. Inference on model parameters and subject-specific trajectories is conducted using the mixed effects representation of the proposed model. An extensive simulation study shows that the dynamic functional regression model provides more accurate estimation and inference than existing methods. Methods are supported by fast, flexible, open source software that uses heavily tested smoothing techniques. © 2017 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Hansen, Angela; Kraus, Tamara; Pellerin, Brian; Fleck, Jacob; Downing, Bryan D.; Bergamaschi, Brian
2016-01-01
Advances in spectroscopic techniques have led to an increase in the use of optical properties (absorbance and fluorescence) to assess dissolved organic matter (DOM) composition and infer sources and processing. However, little information is available to assess the impact of biological and photolytic processing on the optical properties of original DOM source materials. We measured changes in commonly used optical properties and indices in DOM leached from peat soil, plants, and algae following biological and photochemical degradation to determine whether they provide unique signatures that can be linked to original DOM source. Changes in individual optical parameters varied by source material and process, with biodegradation and photodegradation often causing values to shift in opposite directions. Although values for different source materials overlapped at the end of the 111-day lab experiment, multivariate statistical analyses showed that unique optical signatures could be linked to original DOM source material even after degradation, with 17 optical properties determined by discriminant analysis to be significant (p<0.05) in distinguishing between DOM source and environmental processing. These results demonstrate that inferring the source material from optical properties is possible when parameters are evaluated in combination even after extensive biological and photochemical alteration.
Statistical primer: how to deal with missing data in scientific research?
Papageorgiou, Grigorios; Grant, Stuart W; Takkenberg, Johanna J M; Mokhles, Mostafa M
2018-05-10
Missing data are a common challenge encountered in research which can compromise the results of statistical inference when not handled appropriately. This paper aims to introduce basic concepts of missing data to a non-statistical audience, list and compare some of the most popular approaches for handling missing data in practice and provide guidelines and recommendations for dealing with and reporting missing data in scientific research. Complete case analysis and single imputation are simple approaches for handling missing data and are popular in practice, however, in most cases they are not guaranteed to provide valid inferences. Multiple imputation is a robust and general alternative which is appropriate for data missing at random, surpassing the disadvantages of the simpler approaches, but should always be conducted with care. The aforementioned approaches are illustrated and compared in an example application using Cox regression.
ERIC Educational Resources Information Center
Lane-Getaz, Sharon
2017-01-01
In reaction to misuses and misinterpretations of p-values and confidence intervals, a social science journal editor banned p-values from its pages. This study aimed to show that education could address misuse and abuse. This study examines inference-related learning outcomes for social science students in an introductory course supplemented with…
Sandoval-Castellanos, Edson; Palkopoulou, Eleftheria; Dalén, Love
2014-01-01
Inference of population demographic history has vastly improved in recent years due to a number of technological and theoretical advances including the use of ancient DNA. Approximate Bayesian computation (ABC) stands among the most promising methods due to its simple theoretical fundament and exceptional flexibility. However, limited availability of user-friendly programs that perform ABC analysis renders it difficult to implement, and hence programming skills are frequently required. In addition, there is limited availability of programs able to deal with heterochronous data. Here we present the software BaySICS: Bayesian Statistical Inference of Coalescent Simulations. BaySICS provides an integrated and user-friendly platform that performs ABC analyses by means of coalescent simulations from DNA sequence data. It estimates historical demographic population parameters and performs hypothesis testing by means of Bayes factors obtained from model comparisons. Although providing specific features that improve inference from datasets with heterochronous data, BaySICS also has several capabilities making it a suitable tool for analysing contemporary genetic datasets. Those capabilities include joint analysis of independent tables, a graphical interface and the implementation of Markov-chain Monte Carlo without likelihoods.
Multiple outcomes are often measured on each experimental unit in toxicology experiments. These multiple observations typically imply the existence of correlation between endpoints, and a statistical analysis that incorporates it may result in improved inference. When both disc...
The Love of Large Numbers: A Popularity Bias in Consumer Choice.
Powell, Derek; Yu, Jingqi; DeWolf, Melissa; Holyoak, Keith J
2017-10-01
Social learning-the ability to learn from observing the decisions of other people and the outcomes of those decisions-is fundamental to human evolutionary and cultural success. The Internet now provides social evidence on an unprecedented scale. However, properly utilizing this evidence requires a capacity for statistical inference. We examined how people's interpretation of online review scores is influenced by the numbers of reviews-a potential indicator both of an item's popularity and of the precision of the average review score. Our task was designed to pit statistical information against social information. We modeled the behavior of an "intuitive statistician" using empirical prior information from millions of reviews posted on Amazon.com and then compared the model's predictions with the behavior of experimental participants. Under certain conditions, people preferred a product with more reviews to one with fewer reviews even though the statistical model indicated that the latter was likely to be of higher quality than the former. Overall, participants' judgments suggested that they failed to make meaningful statistical inferences.
Learning what to expect (in visual perception)
Seriès, Peggy; Seitz, Aaron R.
2013-01-01
Expectations are known to greatly affect our experience of the world. A growing theory in computational neuroscience is that perception can be successfully described using Bayesian inference models and that the brain is “Bayes-optimal” under some constraints. In this context, expectations are particularly interesting, because they can be viewed as prior beliefs in the statistical inference process. A number of questions remain unsolved, however, for example: How fast do priors change over time? Are there limits in the complexity of the priors that can be learned? How do an individual’s priors compare to the true scene statistics? Can we unlearn priors that are thought to correspond to natural scene statistics? Where and what are the neural substrate of priors? Focusing on the perception of visual motion, we here review recent studies from our laboratories and others addressing these issues. We discuss how these data on motion perception fit within the broader literature on perceptual Bayesian priors, perceptual expectations, and statistical and perceptual learning and review the possible neural basis of priors. PMID:24187536
Statistical Inference for Porous Materials using Persistent Homology.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moon, Chul; Heath, Jason E.; Mitchell, Scott A.
2017-12-01
We propose a porous materials analysis pipeline using persistent homology. We rst compute persistent homology of binarized 3D images of sampled material subvolumes. For each image we compute sets of homology intervals, which are represented as summary graphics called persistence diagrams. We convert persistence diagrams into image vectors in order to analyze the similarity of the homology of the material images using the mature tools for image analysis. Each image is treated as a vector and we compute its principal components to extract features. We t a statistical model using the loadings of principal components to estimate material porosity, permeability,more » anisotropy, and tortuosity. We also propose an adaptive version of the structural similarity index (SSIM), a similarity metric for images, as a measure to determine the statistical representative elementary volumes (sREV) for persistence homology. Thus we provide a capability for making a statistical inference of the uid ow and transport properties of porous materials based on their geometry and connectivity.« less
Philosophy and the practice of Bayesian statistics
Gelman, Andrew; Shalizi, Cosma Rohilla
2015-01-01
A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypothetico-deductivism. We examine the actual role played by prior distributions in Bayesian models, and the crucial aspects of model checking and model revision, which fall outside the scope of Bayesian confirmation theory. We draw on the literature on the consistency of Bayesian updating and also on our experience of applied work in social science. Clarity about these matters should benefit not just philosophy of science, but also statistical practice. At best, the inductivist view has encouraged researchers to fit and compare models without checking them; at worst, theorists have actively discouraged practitioners from performing model checking because it does not fit into their framework. PMID:22364575
Philosophy and the practice of Bayesian statistics.
Gelman, Andrew; Shalizi, Cosma Rohilla
2013-02-01
A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypothetico-deductivism. We examine the actual role played by prior distributions in Bayesian models, and the crucial aspects of model checking and model revision, which fall outside the scope of Bayesian confirmation theory. We draw on the literature on the consistency of Bayesian updating and also on our experience of applied work in social science. Clarity about these matters should benefit not just philosophy of science, but also statistical practice. At best, the inductivist view has encouraged researchers to fit and compare models without checking them; at worst, theorists have actively discouraged practitioners from performing model checking because it does not fit into their framework. © 2012 The British Psychological Society.
Sever, Ivan; Klaric, Eva; Tarle, Zrinka
2016-07-01
Dental microhardness experiments are influenced by unobserved factors related to the varying tooth characteristics that affect measurement reproducibility. This paper explores the appropriate analytical tools for modeling different sources of unobserved variability to reduce the biases encountered and increase the validity of microhardness studies. The enamel microhardness of human third molars was measured by Vickers diamond. The effects of five bleaching agents-10, 16, and 30 % carbamide peroxide, and 25 and 38 % hydrogen peroxide-were examined, as well as the effect of artificial saliva and amorphous calcium phosphate. To account for both between- and within-tooth heterogeneity in evaluating treatment effects, the statistical analysis was performed in the mixed-effects framework, which also included the appropriate weighting procedure to adjust for confounding. The results were compared to those of the standard ANOVA model usually applied. The weighted mixed-effects model produced the parameter estimates of different magnitude and significance than the standard ANOVA model. The results of the former model were more intuitive, with more precise estimates and better fit. Confounding could seriously bias the study outcomes, highlighting the need for more robust statistical procedures in dental research that account for the measurement reliability. The presented framework is more flexible and informative than existing analytical techniques and may improve the quality of inference in dental research. Reported results could be misleading if underlying heterogeneity of microhardness measurements is not taken into account. The confidence in treatment outcomes could be increased by applying the framework presented.
Inference of a Nonlinear Stochastic Model of the Cardiorespiratory Interaction
NASA Astrophysics Data System (ADS)
Smelyanskiy, V. N.; Luchinsky, D. G.; Stefanovska, A.; McClintock, P. V.
2005-03-01
We reconstruct a nonlinear stochastic model of the cardiorespiratory interaction in terms of a set of polynomial basis functions representing the nonlinear force governing system oscillations. The strength and direction of coupling and noise intensity are simultaneously inferred from a univariate blood pressure signal. Our new inference technique does not require extensive global optimization, and it is applicable to a wide range of complex dynamical systems subject to noise.
Process Mining for Individualized Behavior Modeling Using Wireless Tracking in Nursing Homes
Fernández-Llatas, Carlos; Benedi, José-Miguel; García-Gómez, Juan M.; Traver, Vicente
2013-01-01
The analysis of human behavior patterns is increasingly used for several research fields. The individualized modeling of behavior using classical techniques requires too much time and resources to be effective. A possible solution would be the use of pattern recognition techniques to automatically infer models to allow experts to understand individual behavior. However, traditional pattern recognition algorithms infer models that are not readily understood by human experts. This limits the capacity to benefit from the inferred models. Process mining technologies can infer models as workflows, specifically designed to be understood by experts, enabling them to detect specific behavior patterns in users. In this paper, the eMotiva process mining algorithms are presented. These algorithms filter, infer and visualize workflows. The workflows are inferred from the samples produced by an indoor location system that stores the location of a resident in a nursing home. The visualization tool is able to compare and highlight behavior patterns in order to facilitate expert understanding of human behavior. This tool was tested with nine real users that were monitored for a 25-week period. The results achieved suggest that the behavior of users is continuously evolving and changing and that this change can be measured, allowing for behavioral change detection. PMID:24225907
Federal Register 2010, 2011, 2012, 2013, 2014
2013-04-24
... Bureau, Statistical Abstract of the United States: 2011, Table 427 (2007). \\28\\ The 2007 U.S Census data.... (U.S. CENSUS BUREAU, STATISTICAL ABSTRACT OF THE UNITED STATES 2011, Table 428.) The criterion by... Statistical Abstract of the U.S., that inference is further supported by the fact that in both Tables, many...
NASA Astrophysics Data System (ADS)
Lukman, Iing; Ibrahim, Noor A.; Daud, Isa B.; Maarof, Fauziah; Hassan, Mohd N.
2002-03-01
Survival analysis algorithm is often applied in the data mining process. Cox regression is one of the survival analysis tools that has been used in many areas, and it can be used to analyze the failure times of aircraft crashed. Another survival analysis tool is the competing risks where we have more than one cause of failure acting simultaneously. Lunn-McNeil analyzed the competing risks in the survival model using Cox regression with censored data. The modified Lunn-McNeil technique is a simplify of the Lunn-McNeil technique. The Kalbfleisch-Prentice technique is involving fitting models separately from each type of failure, treating other failure types as censored. To compare the two techniques, (the modified Lunn-McNeil and Kalbfleisch-Prentice) a simulation study was performed. Samples with various sizes and censoring percentages were generated and fitted using both techniques. The study was conducted by comparing the inference of models, using Root Mean Square Error (RMSE), the power tests, and the Schoenfeld residual analysis. The power tests in this study were likelihood ratio test, Rao-score test, and Wald statistics. The Schoenfeld residual analysis was conducted to check the proportionality of the model through its covariates. The estimated parameters were computed for the cause-specific hazard situation. Results showed that the modified Lunn-McNeil technique was better than the Kalbfleisch-Prentice technique based on the RMSE measurement and Schoenfeld residual analysis. However, the Kalbfleisch-Prentice technique was better than the modified Lunn-McNeil technique based on power tests measurement.
Augmenting Latent Dirichlet Allocation and Rank Threshold Detection with Ontologies
2010-03-01
Probabilistic Latent Semantic Indexing (PLSI) is an automated indexing information retrieval model [20]. It is based on a statistical latent class model which is...uses a statistical foundation that is more accurate in finding hidden semantic relationships [20]. The model uses factor analysis of count data, number...principle of statistical infer- ence which asserts that all of the information in a sample is contained in the likelihood function [20]. The statistical
NASA Technical Reports Server (NTRS)
Mcdade, Ian C.
1991-01-01
Techniques were developed for recovering two-dimensional distributions of auroral volume emission rates from rocket photometer measurements made in a tomographic spin scan mode. These tomographic inversion procedures are based upon an algebraic reconstruction technique (ART) and utilize two different iterative relaxation techniques for solving the problems associated with noise in the observational data. One of the inversion algorithms is based upon a least squares method and the other on a maximum probability approach. The performance of the inversion algorithms, and the limitations of the rocket tomography technique, were critically assessed using various factors such as (1) statistical and non-statistical noise in the observational data, (2) rocket penetration of the auroral form, (3) background sources of emission, (4) smearing due to the photometer field of view, and (5) temporal variations in the auroral form. These tests show that the inversion procedures may be successfully applied to rocket observations made in medium intensity aurora with standard rocket photometer instruments. The inversion procedures have been used to recover two-dimensional distributions of auroral emission rates and ionization rates from an existing set of N2+3914A rocket photometer measurements which were made in a tomographic spin scan mode during the ARIES auroral campaign. The two-dimensional distributions of the 3914A volume emission rates recoverd from the inversion of the rocket data compare very well with the distributions that were inferred from ground-based measurements using triangulation-tomography techniques and the N2 ionization rates derived from the rocket tomography results are in very good agreement with the in situ particle measurements that were made during the flight. Three pre-prints describing the tomographic inversion techniques and the tomographic analysis of the ARIES rocket data are included as appendices.
Inference Control Mechanism for Statistical Database: Frequency-Imposed Data Distortions.
ERIC Educational Resources Information Center
Liew, Chong K.; And Others
1985-01-01
Introduces two data distortion methods (Frequency-Imposed Distortion, Frequency-Imposed Probability Distortion) and uses a Monte Carlo study to compare their performance with that of other distortion methods (Point Distortion, Probability Distortion). Indications that data generated by these two methods produce accurate statistics and protect…
An argument for mechanism-based statistical inference in cancer
Ochs, Michael; Price, Nathan D.; Tomasetti, Cristian; Younes, Laurent
2015-01-01
Cancer is perhaps the prototypical systems disease, and as such has been the focus of extensive study in quantitative systems biology. However, translating these programs into personalized clinical care remains elusive and incomplete. In this perspective, we argue that realizing this agenda—in particular, predicting disease phenotypes, progression and treatment response for individuals—requires going well beyond standard computational and bioinformatics tools and algorithms. It entails designing global mathematical models over network-scale configurations of genomic states and molecular concentrations, and learning the model parameters from limited available samples of high-dimensional and integrative omics data. As such, any plausible design should accommodate: biological mechanism, necessary for both feasible learning and interpretable decision making; stochasticity, to deal with uncertainty and observed variation at many scales; and a capacity for statistical inference at the patient level. This program, which requires a close, sustained collaboration between mathematicians and biologists, is illustrated in several contexts, including learning bio-markers, metabolism, cell signaling, network inference and tumorigenesis. PMID:25381197
Network inference using informative priors
Mukherjee, Sach; Speed, Terence P.
2008-01-01
Recent years have seen much interest in the study of systems characterized by multiple interacting components. A class of statistical models called graphical models, in which graphs are used to represent probabilistic relationships between variables, provides a framework for formal inference regarding such systems. In many settings, the object of inference is the network structure itself. This problem of “network inference” is well known to be a challenging one. However, in scientific settings there is very often existing information regarding network connectivity. A natural idea then is to take account of such information during inference. This article addresses the question of incorporating prior information into network inference. We focus on directed models called Bayesian networks, and use Markov chain Monte Carlo to draw samples from posterior distributions over network structures. We introduce prior distributions on graphs capable of capturing information regarding network features including edges, classes of edges, degree distributions, and sparsity. We illustrate our approach in the context of systems biology, applying our methods to network inference in cancer signaling. PMID:18799736
Bayesian Inference for Functional Dynamics Exploring in fMRI Data.
Guo, Xuan; Liu, Bing; Chen, Le; Chen, Guantao; Pan, Yi; Zhang, Jing
2016-01-01
This paper aims to review state-of-the-art Bayesian-inference-based methods applied to functional magnetic resonance imaging (fMRI) data. Particularly, we focus on one specific long-standing challenge in the computational modeling of fMRI datasets: how to effectively explore typical functional interactions from fMRI time series and the corresponding boundaries of temporal segments. Bayesian inference is a method of statistical inference which has been shown to be a powerful tool to encode dependence relationships among the variables with uncertainty. Here we provide an introduction to a group of Bayesian-inference-based methods for fMRI data analysis, which were designed to detect magnitude or functional connectivity change points and to infer their functional interaction patterns based on corresponding temporal boundaries. We also provide a comparison of three popular Bayesian models, that is, Bayesian Magnitude Change Point Model (BMCPM), Bayesian Connectivity Change Point Model (BCCPM), and Dynamic Bayesian Variable Partition Model (DBVPM), and give a summary of their applications. We envision that more delicate Bayesian inference models will be emerging and play increasingly important roles in modeling brain functions in the years to come.
Modular Spectral Inference Framework Applied to Young Stars and Brown Dwarfs
NASA Technical Reports Server (NTRS)
Gully-Santiago, Michael A.; Marley, Mark S.
2017-01-01
In practice, synthetic spectral models are imperfect, causing inaccurate estimates of stellar parameters. Using forward modeling and statistical inference, we derive accurate stellar parameters for a given observed spectrum by emulating a grid of precomputed spectra to track uncertainties. Spectral inference as applied to brown dwarfs re: Synthetic spectral models (Marley et al 1996 and 2014) via the newest grid spans a massive multi-dimensional grid applied to IGRINS spectra, improving atmospheric models for JWST. When applied to young stars(10Myr) with large starpots, they can be measured spectroscopically, especially in the near-IR with IGRINS.
Inference of neuronal network spike dynamics and topology from calcium imaging data
Lütcke, Henry; Gerhard, Felipe; Zenke, Friedemann; Gerstner, Wulfram; Helmchen, Fritjof
2013-01-01
Two-photon calcium imaging enables functional analysis of neuronal circuits by inferring action potential (AP) occurrence (“spike trains”) from cellular fluorescence signals. It remains unclear how experimental parameters such as signal-to-noise ratio (SNR) and acquisition rate affect spike inference and whether additional information about network structure can be extracted. Here we present a simulation framework for quantitatively assessing how well spike dynamics and network topology can be inferred from noisy calcium imaging data. For simulated AP-evoked calcium transients in neocortical pyramidal cells, we analyzed the quality of spike inference as a function of SNR and data acquisition rate using a recently introduced peeling algorithm. Given experimentally attainable values of SNR and acquisition rate, neural spike trains could be reconstructed accurately and with up to millisecond precision. We then applied statistical neuronal network models to explore how remaining uncertainties in spike inference affect estimates of network connectivity and topological features of network organization. We define the experimental conditions suitable for inferring whether the network has a scale-free structure and determine how well hub neurons can be identified. Our findings provide a benchmark for future calcium imaging studies that aim to reliably infer neuronal network properties. PMID:24399936
NASA Astrophysics Data System (ADS)
Mandel, Kaisey; Kirshner, R. P.; Narayan, G.; Wood-Vasey, W. M.; Friedman, A. S.; Hicken, M.
2010-01-01
I have constructed a comprehensive statistical model for Type Ia supernova light curves spanning optical through near infrared data simultaneously. The near infrared light curves are found to be excellent standard candles (sigma(MH) = 0.11 +/- 0.03 mag) that are less vulnerable to systematic error from dust extinction, a major confounding factor for cosmological studies. A hierarchical statistical framework incorporates coherently multiple sources of randomness and uncertainty, including photometric error, intrinsic supernova light curve variations and correlations, dust extinction and reddening, peculiar velocity dispersion and distances, for probabilistic inference with Type Ia SN light curves. Inferences are drawn from the full probability density over individual supernovae and the SN Ia and dust populations, conditioned on a dataset of SN Ia light curves and redshifts. To compute probabilistic inferences with hierarchical models, I have developed BayeSN, a Markov Chain Monte Carlo algorithm based on Gibbs sampling. This code explores and samples the global probability density of parameters describing individual supernovae and the population. I have applied this hierarchical model to optical and near infrared data of over 100 nearby Type Ia SN from PAIRITEL, the CfA3 sample, and the literature. Using this statistical model, I find that SN with optical and NIR data have a smaller residual scatter in the Hubble diagram than SN with only optical data. The continued study of Type Ia SN in the near infrared will be important for improving their utility as precise and accurate cosmological distance indicators.
Statistical inference involving binomial and negative binomial parameters.
García-Pérez, Miguel A; Núñez-Antón, Vicente
2009-05-01
Statistical inference about two binomial parameters implies that they are both estimated by binomial sampling. There are occasions in which one aims at testing the equality of two binomial parameters before and after the occurrence of the first success along a sequence of Bernoulli trials. In these cases, the binomial parameter before the first success is estimated by negative binomial sampling whereas that after the first success is estimated by binomial sampling, and both estimates are related. This paper derives statistical tools to test two hypotheses, namely, that both binomial parameters equal some specified value and that both parameters are equal though unknown. Simulation studies are used to show that in small samples both tests are accurate in keeping the nominal Type-I error rates, and also to determine sample size requirements to detect large, medium, and small effects with adequate power. Additional simulations also show that the tests are sufficiently robust to certain violations of their assumptions.
Econophysical visualization of Adam Smith’s invisible hand
NASA Astrophysics Data System (ADS)
Cohen, Morrel H.; Eliazar, Iddo I.
2013-02-01
Consider a complex system whose macrostate is statistically observable, but yet whose operating mechanism is an unknown black-box. In this paper we address the problem of inferring, from the system’s macrostate statistics, the system’s intrinsic force yielding the observed statistics. The inference is established via two diametrically opposite approaches which result in the very same intrinsic force: a top-down approach based on the notion of entropy, and a bottom-up approach based on the notion of Langevin dynamics. The general results established are applied to the problem of visualizing the intrinsic socioeconomic force-Adam Smith’s invisible hand-shaping the distribution of wealth in human societies. Our analysis yields quantitative econophysical representations of figurative socioeconomic forces, quantitative definitions of “poor” and “rich”, and a quantitative characterization of the “poor-get-poorer” and the “rich-get-richer” phenomena.
Sampling and counting genome rearrangement scenarios
2015-01-01
Background Even for moderate size inputs, there are a tremendous number of optimal rearrangement scenarios, regardless what the model is and which specific question is to be answered. Therefore giving one optimal solution might be misleading and cannot be used for statistical inferring. Statistically well funded methods are necessary to sample uniformly from the solution space and then a small number of samples are sufficient for statistical inferring. Contribution In this paper, we give a mini-review about the state-of-the-art of sampling and counting rearrangement scenarios, focusing on the reversal, DCJ and SCJ models. Above that, we also give a Gibbs sampler for sampling most parsimonious labeling of evolutionary trees under the SCJ model. The method has been implemented and tested on real life data. The software package together with example data can be downloaded from http://www.renyi.hu/~miklosi/SCJ-Gibbs/ PMID:26452124
Online Updating of Statistical Inference in the Big Data Setting.
Schifano, Elizabeth D; Wu, Jing; Wang, Chun; Yan, Jun; Chen, Ming-Hui
2016-01-01
We present statistical methods for big data arising from online analytical processing, where large amounts of data arrive in streams and require fast analysis without storage/access to the historical data. In particular, we develop iterative estimating algorithms and statistical inferences for linear models and estimating equations that update as new data arrive. These algorithms are computationally efficient, minimally storage-intensive, and allow for possible rank deficiencies in the subset design matrices due to rare-event covariates. Within the linear model setting, the proposed online-updating framework leads to predictive residual tests that can be used to assess the goodness-of-fit of the hypothesized model. We also propose a new online-updating estimator under the estimating equation setting. Theoretical properties of the goodness-of-fit tests and proposed estimators are examined in detail. In simulation studies and real data applications, our estimator compares favorably with competing approaches under the estimating equation setting.
When mechanism matters: Bayesian forecasting using models of ecological diffusion
Hefley, Trevor J.; Hooten, Mevin B.; Russell, Robin E.; Walsh, Daniel P.; Powell, James A.
2017-01-01
Ecological diffusion is a theory that can be used to understand and forecast spatio-temporal processes such as dispersal, invasion, and the spread of disease. Hierarchical Bayesian modelling provides a framework to make statistical inference and probabilistic forecasts, using mechanistic ecological models. To illustrate, we show how hierarchical Bayesian models of ecological diffusion can be implemented for large data sets that are distributed densely across space and time. The hierarchical Bayesian approach is used to understand and forecast the growth and geographic spread in the prevalence of chronic wasting disease in white-tailed deer (Odocoileus virginianus). We compare statistical inference and forecasts from our hierarchical Bayesian model to phenomenological regression-based methods that are commonly used to analyse spatial occurrence data. The mechanistic statistical model based on ecological diffusion led to important ecological insights, obviated a commonly ignored type of collinearity, and was the most accurate method for forecasting.
Probabilistic Signal Recovery and Random Matrices
2016-12-08
applications in statistics , biomedical data analysis, quantization, dimen- sion reduction, and networks science. 1. High-dimensional inference and geometry Our...low-rank approxima- tion, with applications to community detection in networks, Annals of Statistics 44 (2016), 373–400. [7] C. Le, E. Levina, R...approximation, with applications to community detection in networks, Annals of Statistics 44 (2016), 373–400. C. Le, E. Levina, R. Vershynin, Concentration
Synthetic Training Data Generation for Activity Monitoring and Behavior Analysis
NASA Astrophysics Data System (ADS)
Monekosso, Dorothy; Remagnino, Paolo
This paper describes a data generator that produces synthetic data to simulate observations from an array of environment monitoring sensors. The overall goal of our work is to monitor the well-being of one occupant in a home. Sensors are embedded in a smart home to unobtrusively record environmental parameters. Based on the sensor observations, behavior analysis and modeling are performed. However behavior analysis and modeling require large data sets to be collected over long periods of time to achieve the level of accuracy expected. A data generator - was developed based on initial data i.e. data collected over periods lasting weeks to facilitate concurrent data collection and development of algorithms. The data generator is based on statistical inference techniques. Variation is introduced into the data using perturbation models.
An Introduction to Confidence Intervals for Both Statistical Estimates and Effect Sizes.
ERIC Educational Resources Information Center
Capraro, Mary Margaret
This paper summarizes methods of estimating confidence intervals, including classical intervals and intervals for effect sizes. The recent American Psychological Association (APA) Task Force on Statistical Inference report suggested that confidence intervals should always be reported, and the fifth edition of the APA "Publication Manual"…
Modeling Cross-Situational Word-Referent Learning: Prior Questions
ERIC Educational Resources Information Center
Yu, Chen; Smith, Linda B.
2012-01-01
Both adults and young children possess powerful statistical computation capabilities--they can infer the referent of a word from highly ambiguous contexts involving many words and many referents by aggregating cross-situational statistical information across contexts. This ability has been explained by models of hypothesis testing and by models of…
Temporal and Statistical Information in Causal Structure Learning
ERIC Educational Resources Information Center
McCormack, Teresa; Frosch, Caren; Patrick, Fiona; Lagnado, David
2015-01-01
Three experiments examined children's and adults' abilities to use statistical and temporal information to distinguish between common cause and causal chain structures. In Experiment 1, participants were provided with conditional probability information and/or temporal information and asked to infer the causal structure of a 3-variable mechanical…
Some General Goals in Teaching Statistics.
ERIC Educational Resources Information Center
Blalock, H. M.
1987-01-01
States that regardless of the content or level of a statistics course, five goals to reach are: (1) overcoming fears, resistances, and tendencies to memorize; (2) the importance of intellectual honesty and integrity; (3) understanding relationship between deductive and inductive inferences; (4) learning to play role of reasonable critic; and (5)…
Propensity Score Analysis: An Alternative Statistical Approach for HRD Researchers
ERIC Educational Resources Information Center
Keiffer, Greggory L.; Lane, Forrest C.
2016-01-01
Purpose: This paper aims to introduce matching in propensity score analysis (PSA) as an alternative statistical approach for researchers looking to make causal inferences using intact groups. Design/methodology/approach: An illustrative example demonstrated the varying results of analysis of variance, analysis of covariance and PSA on a heuristic…
Technology Focus: Using Technology to Explore Statistical Inference
ERIC Educational Resources Information Center
Garofalo, Joe; Juersivich, Nicole
2007-01-01
There is much research that documents what many teachers know, that students struggle with many concepts in probability and statistics. This article presents two sample activities the authors use to help preservice teachers develop ideas about how they can use technology to promote their students' ability to understand mathematics and connect…
ERIC Educational Resources Information Center
Conant, Darcy Lynn
2013-01-01
Stochastic understanding of probability distribution undergirds development of conceptual connections between probability and statistics and supports development of a principled understanding of statistical inference. This study investigated the impact of an instructional course intervention designed to support development of stochastic…
Basic Statistical Concepts and Methods for Earth Scientists
Olea, Ricardo A.
2008-01-01
INTRODUCTION Statistics is the science of collecting, analyzing, interpreting, modeling, and displaying masses of numerical data primarily for the characterization and understanding of incompletely known systems. Over the years, these objectives have lead to a fair amount of analytical work to achieve, substantiate, and guide descriptions and inferences.
ERIC Educational Resources Information Center
Amundson, Vickie E.; Bernstein, Ira H.
1973-01-01
Authors note that Fehrer and Biederman's two statistical tests were not of equal power and that their conclusion could be a statistical artifact of both the lesser power of the verbal report comparison and the insensitivity of their particular verbal report indicator. (Editor)
Optimism bias leads to inconclusive results - an empirical study
Djulbegovic, Benjamin; Kumar, Ambuj; Magazin, Anja; Schroen, Anneke T.; Soares, Heloisa; Hozo, Iztok; Clarke, Mike; Sargent, Daniel; Schell, Michael J.
2010-01-01
Objective Optimism bias refers to unwarranted belief in the efficacy of new therapies. We assessed the impact of optimism bias on a proportion of trials that did not answer their research question successfully, and explored whether poor accrual or optimism bias is responsible for inconclusive results. Study Design Systematic review Setting Retrospective analysis of a consecutive series phase III randomized controlled trials (RCTs) performed under the aegis of National Cancer Institute Cooperative groups. Results 359 trials (374 comparisons) enrolling 150,232 patients were analyzed. 70% (262/374) of the trials generated conclusive results according to the statistical criteria. Investigators made definitive statements related to the treatment preference in 73% (273/374) of studies. Investigators’ judgments and statistical inferences were concordant in 75% (279/374) of trials. Investigators consistently overestimated their expected treatment effects, but to a significantly larger extent for inconclusive trials. The median ratio of expected over observed hazard ratio or odds ratio was 1.34 (range 0.19 – 15.40) in conclusive trials compared to 1.86 (range 1.09 – 12.00) in inconclusive studies (p<0.0001). Only 17% of the trials had treatment effects that matched original researchers’ expectations. Conclusion Formal statistical inference is sufficient to answer the research question in 75% of RCTs. The answers to the other 25% depend mostly on subjective judgments, which at times are in conflict with statistical inference. Optimism bias significantly contributes to inconclusive results. PMID:21163620
Optimism bias leads to inconclusive results-an empirical study.
Djulbegovic, Benjamin; Kumar, Ambuj; Magazin, Anja; Schroen, Anneke T; Soares, Heloisa; Hozo, Iztok; Clarke, Mike; Sargent, Daniel; Schell, Michael J
2011-06-01
Optimism bias refers to unwarranted belief in the efficacy of new therapies. We assessed the impact of optimism bias on a proportion of trials that did not answer their research question successfully and explored whether poor accrual or optimism bias is responsible for inconclusive results. Systematic review. Retrospective analysis of a consecutive-series phase III randomized controlled trials (RCTs) performed under the aegis of National Cancer Institute Cooperative groups. Three hundred fifty-nine trials (374 comparisons) enrolling 150,232 patients were analyzed. Seventy percent (262 of 374) of the trials generated conclusive results according to the statistical criteria. Investigators made definitive statements related to the treatment preference in 73% (273 of 374) of studies. Investigators' judgments and statistical inferences were concordant in 75% (279 of 374) of trials. Investigators consistently overestimated their expected treatment effects but to a significantly larger extent for inconclusive trials. The median ratio of expected and observed hazard ratio or odds ratio was 1.34 (range: 0.19-15.40) in conclusive trials compared with 1.86 (range: 1.09-12.00) in inconclusive studies (P<0.0001). Only 17% of the trials had treatment effects that matched original researchers' expectations. Formal statistical inference is sufficient to answer the research question in 75% of RCTs. The answers to the other 25% depend mostly on subjective judgments, which at times are in conflict with statistical inference. Optimism bias significantly contributes to inconclusive results. Copyright © 2011 Elsevier Inc. All rights reserved.
Forward and backward inference in spatial cognition.
Penny, Will D; Zeidman, Peter; Burgess, Neil
2013-01-01
This paper shows that the various computations underlying spatial cognition can be implemented using statistical inference in a single probabilistic model. Inference is implemented using a common set of 'lower-level' computations involving forward and backward inference over time. For example, to estimate where you are in a known environment, forward inference is used to optimally combine location estimates from path integration with those from sensory input. To decide which way to turn to reach a goal, forward inference is used to compute the likelihood of reaching that goal under each option. To work out which environment you are in, forward inference is used to compute the likelihood of sensory observations under the different hypotheses. For reaching sensory goals that require a chaining together of decisions, forward inference can be used to compute a state trajectory that will lead to that goal, and backward inference to refine the route and estimate control signals that produce the required trajectory. We propose that these computations are reflected in recent findings of pattern replay in the mammalian brain. Specifically, that theta sequences reflect decision making, theta flickering reflects model selection, and remote replay reflects route and motor planning. We also propose a mapping of the above computational processes onto lateral and medial entorhinal cortex and hippocampus.
Forward and Backward Inference in Spatial Cognition
Penny, Will D.; Zeidman, Peter; Burgess, Neil
2013-01-01
This paper shows that the various computations underlying spatial cognition can be implemented using statistical inference in a single probabilistic model. Inference is implemented using a common set of ‘lower-level’ computations involving forward and backward inference over time. For example, to estimate where you are in a known environment, forward inference is used to optimally combine location estimates from path integration with those from sensory input. To decide which way to turn to reach a goal, forward inference is used to compute the likelihood of reaching that goal under each option. To work out which environment you are in, forward inference is used to compute the likelihood of sensory observations under the different hypotheses. For reaching sensory goals that require a chaining together of decisions, forward inference can be used to compute a state trajectory that will lead to that goal, and backward inference to refine the route and estimate control signals that produce the required trajectory. We propose that these computations are reflected in recent findings of pattern replay in the mammalian brain. Specifically, that theta sequences reflect decision making, theta flickering reflects model selection, and remote replay reflects route and motor planning. We also propose a mapping of the above computational processes onto lateral and medial entorhinal cortex and hippocampus. PMID:24348230
NASA Astrophysics Data System (ADS)
Ma, Chuang; Chen, Han-Shuang; Lai, Ying-Cheng; Zhang, Hai-Feng
2018-02-01
Complex networks hosting binary-state dynamics arise in a variety of contexts. In spite of previous works, to fully reconstruct the network structure from observed binary data remains challenging. We articulate a statistical inference based approach to this problem. In particular, exploiting the expectation-maximization (EM) algorithm, we develop a method to ascertain the neighbors of any node in the network based solely on binary data, thereby recovering the full topology of the network. A key ingredient of our method is the maximum-likelihood estimation of the probabilities associated with actual or nonexistent links, and we show that the EM algorithm can distinguish the two kinds of probability values without any ambiguity, insofar as the length of the available binary time series is reasonably long. Our method does not require any a priori knowledge of the detailed dynamical processes, is parameter-free, and is capable of accurate reconstruction even in the presence of noise. We demonstrate the method using combinations of distinct types of binary dynamical processes and network topologies, and provide a physical understanding of the underlying reconstruction mechanism. Our statistical inference based reconstruction method contributes an additional piece to the rapidly expanding "toolbox" of data based reverse engineering of complex networked systems.
A statistical model for interpreting computerized dynamic posturography data
NASA Technical Reports Server (NTRS)
Feiveson, Alan H.; Metter, E. Jeffrey; Paloski, William H.
2002-01-01
Computerized dynamic posturography (CDP) is widely used for assessment of altered balance control. CDP trials are quantified using the equilibrium score (ES), which ranges from zero to 100, as a decreasing function of peak sway angle. The problem of how best to model and analyze ESs from a controlled study is considered. The ES often exhibits a skewed distribution in repeated trials, which can lead to incorrect inference when applying standard regression or analysis of variance models. Furthermore, CDP trials are terminated when a patient loses balance. In these situations, the ES is not observable, but is assigned the lowest possible score--zero. As a result, the response variable has a mixed discrete-continuous distribution, further compromising inference obtained by standard statistical methods. Here, we develop alternative methodology for analyzing ESs under a stochastic model extending the ES to a continuous latent random variable that always exists, but is unobserved in the event of a fall. Loss of balance occurs conditionally, with probability depending on the realized latent ES. After fitting the model by a form of quasi-maximum-likelihood, one may perform statistical inference to assess the effects of explanatory variables. An example is provided, using data from the NIH/NIA Baltimore Longitudinal Study on Aging.
NASA Astrophysics Data System (ADS)
Hincks, Ian; Granade, Christopher; Cory, David G.
2018-01-01
The analysis of photon count data from the standard nitrogen vacancy (NV) measurement process is treated as a statistical inference problem. This has applications toward gaining better and more rigorous error bars for tasks such as parameter estimation (e.g. magnetometry), tomography, and randomized benchmarking. We start by providing a summary of the standard phenomenological model of the NV optical process in terms of Lindblad jump operators. This model is used to derive random variables describing emitted photons during measurement, to which finite visibility, dark counts, and imperfect state preparation are added. NV spin-state measurement is then stated as an abstract statistical inference problem consisting of an underlying biased coin obstructed by three Poisson rates. Relevant frequentist and Bayesian estimators are provided, discussed, and quantitatively compared. We show numerically that the risk of the maximum likelihood estimator is well approximated by the Cramér-Rao bound, for which we provide a simple formula. Of the estimators, we in particular promote the Bayes estimator, owing to its slightly better risk performance, and straightforward error propagation into more complex experiments. This is illustrated on experimental data, where quantum Hamiltonian learning is performed and cross-validated in a fully Bayesian setting, and compared to a more traditional weighted least squares fit.