Sample records for statistical inference procedures

  1. Introducing Statistical Inference to Biology Students through Bootstrapping and Randomization

    ERIC Educational Resources Information Center

    Lock, Robin H.; Lock, Patti Frazer

    2008-01-01

    Bootstrap methods and randomization tests are increasingly being used as alternatives to standard statistical procedures in biology. They also serve as an effective introduction to the key ideas of statistical inference in introductory courses for biology students. We discuss the use of such simulation based procedures in an integrated curriculum…

  2. Data-driven inference for the spatial scan statistic.

    PubMed

    Almeida, Alexandre C L; Duarte, Anderson R; Duczmal, Luiz H; Oliveira, Fernando L P; Takahashi, Ricardo H C

    2011-08-02

    Kulldorff's spatial scan statistic for aggregated area maps searches for clusters of cases without specifying their size (number of areas) or geographic location in advance. Their statistical significance is tested while adjusting for the multiple testing inherent in such a procedure. However, as is shown in this work, this adjustment is not done in an even manner for all possible cluster sizes. A modification is proposed to the usual inference test of the spatial scan statistic, incorporating additional information about the size of the most likely cluster found. A new interpretation of the results of the spatial scan statistic is done, posing a modified inference question: what is the probability that the null hypothesis is rejected for the original observed cases map with a most likely cluster of size k, taking into account only those most likely clusters of size k found under null hypothesis for comparison? This question is especially important when the p-value computed by the usual inference process is near the alpha significance level, regarding the correctness of the decision based in this inference. A practical procedure is provided to make more accurate inferences about the most likely cluster found by the spatial scan statistic.

  3. Application of Transformations in Parametric Inference

    ERIC Educational Resources Information Center

    Brownstein, Naomi; Pensky, Marianna

    2008-01-01

    The objective of the present paper is to provide a simple approach to statistical inference using the method of transformations of variables. We demonstrate performance of this powerful tool on examples of constructions of various estimation procedures, hypothesis testing, Bayes analysis and statistical inference for the stress-strength systems.…

  4. Statistical Signal Models and Algorithms for Image Analysis

    DTIC Science & Technology

    1984-10-25

    In this report, two-dimensional stochastic linear models are used in developing algorithms for image analysis such as classification, segmentation, and object detection in images characterized by textured backgrounds. These models generate two-dimensional random processes as outputs to which statistical inference procedures can naturally be applied. A common thread throughout our algorithms is the interpretation of the inference procedures in terms of linear prediction

  5. Statistical inference for extended or shortened phase II studies based on Simon's two-stage designs.

    PubMed

    Zhao, Junjun; Yu, Menggang; Feng, Xi-Ping

    2015-06-07

    Simon's two-stage designs are popular choices for conducting phase II clinical trials, especially in the oncology trials to reduce the number of patients placed on ineffective experimental therapies. Recently Koyama and Chen (2008) discussed how to conduct proper inference for such studies because they found that inference procedures used with Simon's designs almost always ignore the actual sampling plan used. In particular, they proposed an inference method for studies when the actual second stage sample sizes differ from planned ones. We consider an alternative inference method based on likelihood ratio. In particular, we order permissible sample paths under Simon's two-stage designs using their corresponding conditional likelihood. In this way, we can calculate p-values using the common definition: the probability of obtaining a test statistic value at least as extreme as that observed under the null hypothesis. In addition to providing inference for a couple of scenarios where Koyama and Chen's method can be difficult to apply, the resulting estimate based on our method appears to have certain advantage in terms of inference properties in many numerical simulations. It generally led to smaller biases and narrower confidence intervals while maintaining similar coverages. We also illustrated the two methods in a real data setting. Inference procedures used with Simon's designs almost always ignore the actual sampling plan. Reported P-values, point estimates and confidence intervals for the response rate are not usually adjusted for the design's adaptiveness. Proper statistical inference procedures should be used.

  6. A Test by Any Other Name: P Values, Bayes Factors, and Statistical Inference.

    PubMed

    Stern, Hal S

    2016-01-01

    Procedures used for statistical inference are receiving increased scrutiny as the scientific community studies the factors associated with insuring reproducible research. This note addresses recent negative attention directed at p values, the relationship of confidence intervals and tests, and the role of Bayesian inference and Bayes factors, with an eye toward better understanding these different strategies for statistical inference. We argue that researchers and data analysts too often resort to binary decisions (e.g., whether to reject or accept the null hypothesis) in settings where this may not be required.

  7. Using a Five-Step Procedure for Inferential Statistical Analyses

    ERIC Educational Resources Information Center

    Kamin, Lawrence F.

    2010-01-01

    Many statistics texts pose inferential statistical problems in a disjointed way. By using a simple five-step procedure as a template for statistical inference problems, the student can solve problems in an organized fashion. The problem and its solution will thus be a stand-by-itself organic whole and a single unit of thought and effort. The…

  8. Efficiency Analysis: Enhancing the Statistical and Evaluative Power of the Regression-Discontinuity Design.

    ERIC Educational Resources Information Center

    Madhere, Serge

    An analytic procedure, efficiency analysis, is proposed for improving the utility of quantitative program evaluation for decision making. The three features of the procedure are explained: (1) for statistical control, it adopts and extends the regression-discontinuity design; (2) for statistical inferences, it de-emphasizes hypothesis testing in…

  9. Estimating the probability of rare events: addressing zero failure data.

    PubMed

    Quigley, John; Revie, Matthew

    2011-07-01

    Traditional statistical procedures for estimating the probability of an event result in an estimate of zero when no events are realized. Alternative inferential procedures have been proposed for the situation where zero events have been realized but often these are ad hoc, relying on selecting methods dependent on the data that have been realized. Such data-dependent inference decisions violate fundamental statistical principles, resulting in estimation procedures whose benefits are difficult to assess. In this article, we propose estimating the probability of an event occurring through minimax inference on the probability that future samples of equal size realize no more events than that in the data on which the inference is based. Although motivated by inference on rare events, the method is not restricted to zero event data and closely approximates the maximum likelihood estimate (MLE) for nonzero data. The use of the minimax procedure provides a risk adverse inferential procedure where there are no events realized. A comparison is made with the MLE and regions of the underlying probability are identified where this approach is superior. Moreover, a comparison is made with three standard approaches to supporting inference where no event data are realized, which we argue are unduly pessimistic. We show that for situations of zero events the estimator can be simply approximated with 1/2.5n, where n is the number of trials. © 2011 Society for Risk Analysis.

  10. Applications of statistics to medical science, II overview of statistical procedures for general use.

    PubMed

    Watanabe, Hiroshi

    2012-01-01

    Procedures of statistical analysis are reviewed to provide an overview of applications of statistics for general use. Topics that are dealt with are inference on a population, comparison of two populations with respect to means and probabilities, and multiple comparisons. This study is the second part of series in which we survey medical statistics. Arguments related to statistical associations and regressions will be made in subsequent papers.

  11. Evaluating sufficient similarity for drinking-water disinfection by-product (DBP) mixtures with bootstrap hypothesis test procedures.

    PubMed

    Feder, Paul I; Ma, Zhenxu J; Bull, Richard J; Teuschler, Linda K; Rice, Glenn

    2009-01-01

    In chemical mixtures risk assessment, the use of dose-response data developed for one mixture to estimate risk posed by a second mixture depends on whether the two mixtures are sufficiently similar. While evaluations of similarity may be made using qualitative judgments, this article uses nonparametric statistical methods based on the "bootstrap" resampling technique to address the question of similarity among mixtures of chemical disinfectant by-products (DBP) in drinking water. The bootstrap resampling technique is a general-purpose, computer-intensive approach to statistical inference that substitutes empirical sampling for theoretically based parametric mathematical modeling. Nonparametric, bootstrap-based inference involves fewer assumptions than parametric normal theory based inference. The bootstrap procedure is appropriate, at least in an asymptotic sense, whether or not the parametric, distributional assumptions hold, even approximately. The statistical analysis procedures in this article are initially illustrated with data from 5 water treatment plants (Schenck et al., 2009), and then extended using data developed from a study of 35 drinking-water utilities (U.S. EPA/AMWA, 1989), which permits inclusion of a greater number of water constituents and increased structure in the statistical models.

  12. Statistics for nuclear engineers and scientists. Part 1. Basic statistical inference

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Beggs, W.J.

    1981-02-01

    This report is intended for the use of engineers and scientists working in the nuclear industry, especially at the Bettis Atomic Power Laboratory. It serves as the basis for several Bettis in-house statistics courses. The objectives of the report are to introduce the reader to the language and concepts of statistics and to provide a basic set of techniques to apply to problems of the collection and analysis of data. Part 1 covers subjects of basic inference. The subjects include: descriptive statistics; probability; simple inference for normally distributed populations, and for non-normal populations as well; comparison of two populations; themore » analysis of variance; quality control procedures; and linear regression analysis.« less

  13. Applications of statistics to medical science (1) Fundamental concepts.

    PubMed

    Watanabe, Hiroshi

    2011-01-01

    The conceptual framework of statistical tests and statistical inferences are discussed, and the epidemiological background of statistics is briefly reviewed. This study is one of a series in which we survey the basics of statistics and practical methods used in medical statistics. Arguments related to actual statistical analysis procedures will be made in subsequent papers.

  14. Building Intuitions about Statistical Inference Based on Resampling

    ERIC Educational Resources Information Center

    Watson, Jane; Chance, Beth

    2012-01-01

    Formal inference, which makes theoretical assumptions about distributions and applies hypothesis testing procedures with null and alternative hypotheses, is notoriously difficult for tertiary students to master. The debate about whether this content should appear in Years 11 and 12 of the "Australian Curriculum: Mathematics" has gone on…

  15. Statistically optimal perception and learning: from behavior to neural representations

    PubMed Central

    Fiser, József; Berkes, Pietro; Orbán, Gergő; Lengyel, Máté

    2010-01-01

    Human perception has recently been characterized as statistical inference based on noisy and ambiguous sensory inputs. Moreover, suitable neural representations of uncertainty have been identified that could underlie such probabilistic computations. In this review, we argue that learning an internal model of the sensory environment is another key aspect of the same statistical inference procedure and thus perception and learning need to be treated jointly. We review evidence for statistically optimal learning in humans and animals, and reevaluate possible neural representations of uncertainty based on their potential to support statistically optimal learning. We propose that spontaneous activity can have a functional role in such representations leading to a new, sampling-based, framework of how the cortex represents information and uncertainty. PMID:20153683

  16. Notes on power of normality tests of error terms in regression models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Střelec, Luboš

    2015-03-10

    Normality is one of the basic assumptions in applying statistical procedures. For example in linear regression most of the inferential procedures are based on the assumption of normality, i.e. the disturbance vector is assumed to be normally distributed. Failure to assess non-normality of the error terms may lead to incorrect results of usual statistical inference techniques such as t-test or F-test. Thus, error terms should be normally distributed in order to allow us to make exact inferences. As a consequence, normally distributed stochastic errors are necessary in order to make a not misleading inferences which explains a necessity and importancemore » of robust tests of normality. Therefore, the aim of this contribution is to discuss normality testing of error terms in regression models. In this contribution, we introduce the general RT class of robust tests for normality, and present and discuss the trade-off between power and robustness of selected classical and robust normality tests of error terms in regression models.« less

  17. In defence of model-based inference in phylogeography

    PubMed Central

    Beaumont, Mark A.; Nielsen, Rasmus; Robert, Christian; Hey, Jody; Gaggiotti, Oscar; Knowles, Lacey; Estoup, Arnaud; Panchal, Mahesh; Corander, Jukka; Hickerson, Mike; Sisson, Scott A.; Fagundes, Nelson; Chikhi, Lounès; Beerli, Peter; Vitalis, Renaud; Cornuet, Jean-Marie; Huelsenbeck, John; Foll, Matthieu; Yang, Ziheng; Rousset, Francois; Balding, David; Excoffier, Laurent

    2017-01-01

    Recent papers have promoted the view that model-based methods in general, and those based on Approximate Bayesian Computation (ABC) in particular, are flawed in a number of ways, and are therefore inappropriate for the analysis of phylogeographic data. These papers further argue that Nested Clade Phylogeographic Analysis (NCPA) offers the best approach in statistical phylogeography. In order to remove the confusion and misconceptions introduced by these papers, we justify and explain the reasoning behind model-based inference. We argue that ABC is a statistically valid approach, alongside other computational statistical techniques that have been successfully used to infer parameters and compare models in population genetics. We also examine the NCPA method and highlight numerous deficiencies, either when used with single or multiple loci. We further show that the ages of clades are carelessly used to infer ages of demographic events, that these ages are estimated under a simple model of panmixia and population stationarity but are then used under different and unspecified models to test hypotheses, a usage the invalidates these testing procedures. We conclude by encouraging researchers to study and use model-based inference in population genetics. PMID:29284924

  18. Statistical inference based on the nonparametric maximum likelihood estimator under double-truncation.

    PubMed

    Emura, Takeshi; Konno, Yoshihiko; Michimae, Hirofumi

    2015-07-01

    Doubly truncated data consist of samples whose observed values fall between the right- and left- truncation limits. With such samples, the distribution function of interest is estimated using the nonparametric maximum likelihood estimator (NPMLE) that is obtained through a self-consistency algorithm. Owing to the complicated asymptotic distribution of the NPMLE, the bootstrap method has been suggested for statistical inference. This paper proposes a closed-form estimator for the asymptotic covariance function of the NPMLE, which is computationally attractive alternative to bootstrapping. Furthermore, we develop various statistical inference procedures, such as confidence interval, goodness-of-fit tests, and confidence bands to demonstrate the usefulness of the proposed covariance estimator. Simulations are performed to compare the proposed method with both the bootstrap and jackknife methods. The methods are illustrated using the childhood cancer dataset.

  19. Ensemble stacking mitigates biases in inference of synaptic connectivity.

    PubMed

    Chambers, Brendan; Levy, Maayan; Dechery, Joseph B; MacLean, Jason N

    2018-01-01

    A promising alternative to directly measuring the anatomical connections in a neuronal population is inferring the connections from the activity. We employ simulated spiking neuronal networks to compare and contrast commonly used inference methods that identify likely excitatory synaptic connections using statistical regularities in spike timing. We find that simple adjustments to standard algorithms improve inference accuracy: A signing procedure improves the power of unsigned mutual-information-based approaches and a correction that accounts for differences in mean and variance of background timing relationships, such as those expected to be induced by heterogeneous firing rates, increases the sensitivity of frequency-based methods. We also find that different inference methods reveal distinct subsets of the synaptic network and each method exhibits different biases in the accurate detection of reciprocity and local clustering. To correct for errors and biases specific to single inference algorithms, we combine methods into an ensemble. Ensemble predictions, generated as a linear combination of multiple inference algorithms, are more sensitive than the best individual measures alone, and are more faithful to ground-truth statistics of connectivity, mitigating biases specific to single inference methods. These weightings generalize across simulated datasets, emphasizing the potential for the broad utility of ensemble-based approaches.

  20. Outcome-Dependent Sampling Design and Inference for Cox's Proportional Hazards Model.

    PubMed

    Yu, Jichang; Liu, Yanyan; Cai, Jianwen; Sandler, Dale P; Zhou, Haibo

    2016-11-01

    We propose a cost-effective outcome-dependent sampling design for the failure time data and develop an efficient inference procedure for data collected with this design. To account for the biased sampling scheme, we derive estimators from a weighted partial likelihood estimating equation. The proposed estimators for regression parameters are shown to be consistent and asymptotically normally distributed. A criteria that can be used to optimally implement the ODS design in practice is proposed and studied. The small sample performance of the proposed method is evaluated by simulation studies. The proposed design and inference procedure is shown to be statistically more powerful than existing alternative designs with the same sample sizes. We illustrate the proposed method with an existing real data from the Cancer Incidence and Mortality of Uranium Miners Study.

  1. Testing manifest monotonicity using order-constrained statistical inference.

    PubMed

    Tijmstra, Jesper; Hessen, David J; van der Heijden, Peter G M; Sijtsma, Klaas

    2013-01-01

    Most dichotomous item response models share the assumption of latent monotonicity, which states that the probability of a positive response to an item is a nondecreasing function of a latent variable intended to be measured. Latent monotonicity cannot be evaluated directly, but it implies manifest monotonicity across a variety of observed scores, such as the restscore, a single item score, and in some cases the total score. In this study, we show that manifest monotonicity can be tested by means of the order-constrained statistical inference framework. We propose a procedure that uses this framework to determine whether manifest monotonicity should be rejected for specific items. This approach provides a likelihood ratio test for which the p-value can be approximated through simulation. A simulation study is presented that evaluates the Type I error rate and power of the test, and the procedure is applied to empirical data.

  2. Using the Coefficient of Confidence to Make the Philosophical Switch from a Posteriori to a Priori Inferential Statistics

    ERIC Educational Resources Information Center

    Trafimow, David

    2017-01-01

    There has been much controversy over the null hypothesis significance testing procedure, with much of the criticism centered on the problem of inverse inference. Specifically, p gives the probability of the finding (or one more extreme) given the null hypothesis, whereas the null hypothesis significance testing procedure involves drawing a…

  3. Outcome-Dependent Sampling Design and Inference for Cox’s Proportional Hazards Model

    PubMed Central

    Yu, Jichang; Liu, Yanyan; Cai, Jianwen; Sandler, Dale P.; Zhou, Haibo

    2016-01-01

    We propose a cost-effective outcome-dependent sampling design for the failure time data and develop an efficient inference procedure for data collected with this design. To account for the biased sampling scheme, we derive estimators from a weighted partial likelihood estimating equation. The proposed estimators for regression parameters are shown to be consistent and asymptotically normally distributed. A criteria that can be used to optimally implement the ODS design in practice is proposed and studied. The small sample performance of the proposed method is evaluated by simulation studies. The proposed design and inference procedure is shown to be statistically more powerful than existing alternative designs with the same sample sizes. We illustrate the proposed method with an existing real data from the Cancer Incidence and Mortality of Uranium Miners Study. PMID:28090134

  4. Managing heteroscedasticity in general linear models.

    PubMed

    Rosopa, Patrick J; Schaffer, Meline M; Schroeder, Amber N

    2013-09-01

    Heteroscedasticity refers to a phenomenon where data violate a statistical assumption. This assumption is known as homoscedasticity. When the homoscedasticity assumption is violated, this can lead to increased Type I error rates or decreased statistical power. Because this can adversely affect substantive conclusions, the failure to detect and manage heteroscedasticity could have serious implications for theory, research, and practice. In addition, heteroscedasticity is not uncommon in the behavioral and social sciences. Thus, in the current article, we synthesize extant literature in applied psychology, econometrics, quantitative psychology, and statistics, and we offer recommendations for researchers and practitioners regarding available procedures for detecting heteroscedasticity and mitigating its effects. In addition to discussing the strengths and weaknesses of various procedures and comparing them in terms of existing simulation results, we describe a 3-step data-analytic process for detecting and managing heteroscedasticity: (a) fitting a model based on theory and saving residuals, (b) the analysis of residuals, and (c) statistical inferences (e.g., hypothesis tests and confidence intervals) involving parameter estimates. We also demonstrate this data-analytic process using an illustrative example. Overall, detecting violations of the homoscedasticity assumption and mitigating its biasing effects can strengthen the validity of inferences from behavioral and social science data.

  5. Hierarchical modeling and inference in ecology: The analysis of data from populations, metapopulations and communities

    USGS Publications Warehouse

    Royle, J. Andrew; Dorazio, Robert M.

    2008-01-01

    A guide to data collection, modeling and inference strategies for biological survey data using Bayesian and classical statistical methods. This book describes a general and flexible framework for modeling and inference in ecological systems based on hierarchical models, with a strict focus on the use of probability models and parametric inference. Hierarchical models represent a paradigm shift in the application of statistics to ecological inference problems because they combine explicit models of ecological system structure or dynamics with models of how ecological systems are observed. The principles of hierarchical modeling are developed and applied to problems in population, metapopulation, community, and metacommunity systems. The book provides the first synthetic treatment of many recent methodological advances in ecological modeling and unifies disparate methods and procedures. The authors apply principles of hierarchical modeling to ecological problems, including * occurrence or occupancy models for estimating species distribution * abundance models based on many sampling protocols, including distance sampling * capture-recapture models with individual effects * spatial capture-recapture models based on camera trapping and related methods * population and metapopulation dynamic models * models of biodiversity, community structure and dynamics.

  6. Design-based and model-based inference in surveys of freshwater mollusks

    USGS Publications Warehouse

    Dorazio, R.M.

    1999-01-01

    Well-known concepts in statistical inference and sampling theory are used to develop recommendations for planning and analyzing the results of quantitative surveys of freshwater mollusks. Two methods of inference commonly used in survey sampling (design-based and model-based) are described and illustrated using examples relevant in surveys of freshwater mollusks. The particular objectives of a survey and the type of information observed in each unit of sampling can be used to help select the sampling design and the method of inference. For example, the mean density of a sparsely distributed population of mollusks can be estimated with higher precision by using model-based inference or by using design-based inference with adaptive cluster sampling than by using design-based inference with conventional sampling. More experience with quantitative surveys of natural assemblages of freshwater mollusks is needed to determine the actual benefits of different sampling designs and inferential procedures.

  7. Statistical inference for Hardy-Weinberg proportions in the presence of missing genotype information.

    PubMed

    Graffelman, Jan; Sánchez, Milagros; Cook, Samantha; Moreno, Victor

    2013-01-01

    In genetic association studies, tests for Hardy-Weinberg proportions are often employed as a quality control checking procedure. Missing genotypes are typically discarded prior to testing. In this paper we show that inference for Hardy-Weinberg proportions can be biased when missing values are discarded. We propose to use multiple imputation of missing values in order to improve inference for Hardy-Weinberg proportions. For imputation we employ a multinomial logit model that uses information from allele intensities and/or neighbouring markers. Analysis of an empirical data set of single nucleotide polymorphisms possibly related to colon cancer reveals that missing genotypes are not missing completely at random. Deviation from Hardy-Weinberg proportions is mostly due to a lack of heterozygotes. Inbreeding coefficients estimated by multiple imputation of the missings are typically lowered with respect to inbreeding coefficients estimated by discarding the missings. Accounting for missings by multiple imputation qualitatively changed the results of 10 to 17% of the statistical tests performed. Estimates of inbreeding coefficients obtained by multiple imputation showed high correlation with estimates obtained by single imputation using an external reference panel. Our conclusion is that imputation of missing data leads to improved statistical inference for Hardy-Weinberg proportions.

  8. Methods for estimating private forest ownership statistics: revised methods for the USDA Forest Service's National Woodland Owner Survey

    Treesearch

    Brenton J. ​Dickinson; Brett J. Butler

    2013-01-01

    The USDA Forest Service's National Woodland Owner Survey (NWOS) is conducted to better understand the attitudes and behaviors of private forest ownerships, which control more than half of US forestland. Inferences about the populations of interest should be based on theoretically sound estimation procedures. A recent review of the procedures disclosed an error in...

  9. Analysis of Sensitivity Experiments - An Expanded Primer

    DTIC Science & Technology

    2017-03-08

    diehard practitioners. The difficulty associated with mastering statistical inference presents a true dilemma. Statistics is an extremely applied...lost, perhaps forever. In other words, when on this safari, you need a guide. This report is designed to be a guide, of sorts. It focuses on analytical...estimated accurately if our analysis is to have real meaning. For this reason, the sensitivity test procedure is designed to concentrate measurements

  10. Inference of missing data and chemical model parameters using experimental statistics

    NASA Astrophysics Data System (ADS)

    Casey, Tiernan; Najm, Habib

    2017-11-01

    A method for determining the joint parameter density of Arrhenius rate expressions through the inference of missing experimental data is presented. This approach proposes noisy hypothetical data sets from target experiments and accepts those which agree with the reported statistics, in the form of nominal parameter values and their associated uncertainties. The data exploration procedure is formalized using Bayesian inference, employing maximum entropy and approximate Bayesian computation methods to arrive at a joint density on data and parameters. The method is demonstrated in the context of reactions in the H2-O2 system for predictive modeling of combustion systems of interest. Work supported by the US DOE BES CSGB. Sandia National Labs is a multimission lab managed and operated by Nat. Technology and Eng'g Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell Intl, for the US DOE NCSA under contract DE-NA-0003525.

  11. Inference for multivariate regression model based on multiply imputed synthetic data generated via posterior predictive sampling

    NASA Astrophysics Data System (ADS)

    Moura, Ricardo; Sinha, Bimal; Coelho, Carlos A.

    2017-06-01

    The recent popularity of the use of synthetic data as a Statistical Disclosure Control technique has enabled the development of several methods of generating and analyzing such data, but almost always relying in asymptotic distributions and in consequence being not adequate for small sample datasets. Thus, a likelihood-based exact inference procedure is derived for the matrix of regression coefficients of the multivariate regression model, for multiply imputed synthetic data generated via Posterior Predictive Sampling. Since it is based in exact distributions this procedure may even be used in small sample datasets. Simulation studies compare the results obtained from the proposed exact inferential procedure with the results obtained from an adaptation of Reiters combination rule to multiply imputed synthetic datasets and an application to the 2000 Current Population Survey is discussed.

  12. On Statistical Analysis of Neuroimages with Imperfect Registration

    PubMed Central

    Kim, Won Hwa; Ravi, Sathya N.; Johnson, Sterling C.; Okonkwo, Ozioma C.; Singh, Vikas

    2016-01-01

    A variety of studies in neuroscience/neuroimaging seek to perform statistical inference on the acquired brain image scans for diagnosis as well as understanding the pathological manifestation of diseases. To do so, an important first step is to register (or co-register) all of the image data into a common coordinate system. This permits meaningful comparison of the intensities at each voxel across groups (e.g., diseased versus healthy) to evaluate the effects of the disease and/or use machine learning algorithms in a subsequent step. But errors in the underlying registration make this problematic, they either decrease the statistical power or make the follow-up inference tasks less effective/accurate. In this paper, we derive a novel algorithm which offers immunity to local errors in the underlying deformation field obtained from registration procedures. By deriving a deformation invariant representation of the image, the downstream analysis can be made more robust as if one had access to a (hypothetical) far superior registration procedure. Our algorithm is based on recent work on scattering transform. Using this as a starting point, we show how results from harmonic analysis (especially, non-Euclidean wavelets) yields strategies for designing deformation and additive noise invariant representations of large 3-D brain image volumes. We present a set of results on synthetic and real brain images where we achieve robust statistical analysis even in the presence of substantial deformation errors; here, standard analysis procedures significantly under-perform and fail to identify the true signal. PMID:27042168

  13. Estimating Classification Consistency and Accuracy for Cognitive Diagnostic Assessment

    ERIC Educational Resources Information Center

    Cui, Ying; Gierl, Mark J.; Chang, Hua-Hua

    2012-01-01

    This article introduces procedures for the computation and asymptotic statistical inference for classification consistency and accuracy indices specifically designed for cognitive diagnostic assessments. The new classification indices can be used as important indicators of the reliability and validity of classification results produced by…

  14. Empirical evidence for acceleration-dependent amplification factors

    USGS Publications Warehouse

    Borcherdt, R.D.

    2002-01-01

    Site-specific amplification factors, Fa and Fv, used in current U.S. building codes decrease with increasing base acceleration level as implied by the Loma Prieta earthquake at 0.1g and extrapolated using numerical models and laboratory results. The Northridge earthquake recordings of 17 January 1994 and subsequent geotechnical data permit empirical estimates of amplification at base acceleration levels up to 0.5g. Distance measures and normalization procedures used to infer amplification ratios from soil-rock pairs in predetermined azimuth-distance bins significantly influence the dependence of amplification estimates on base acceleration. Factors inferred using a hypocentral distance norm do not show a statistically significant dependence on base acceleration. Factors inferred using norms implied by the attenuation functions of Abrahamson and Silva show a statistically significant decrease with increasing base acceleration. The decrease is statistically more significant for stiff clay and sandy soil (site class D) sites than for stiffer sites underlain by gravely soils and soft rock (site class C). The decrease in amplification with increasing base acceleration is more pronounced for the short-period amplification factor, Fa, than for the midperiod factor, Fv.

  15. Bayesian estimation of the transmissivity spatial structure from pumping test data

    NASA Astrophysics Data System (ADS)

    Demir, Mehmet Taner; Copty, Nadim K.; Trinchero, Paolo; Sanchez-Vila, Xavier

    2017-06-01

    Estimating the statistical parameters (mean, variance, and integral scale) that define the spatial structure of the transmissivity or hydraulic conductivity fields is a fundamental step for the accurate prediction of subsurface flow and contaminant transport. In practice, the determination of the spatial structure is a challenge because of spatial heterogeneity and data scarcity. In this paper, we describe a novel approach that uses time drawdown data from multiple pumping tests to determine the transmissivity statistical spatial structure. The method builds on the pumping test interpretation procedure of Copty et al. (2011) (Continuous Derivation method, CD), which uses the time-drawdown data and its time derivative to estimate apparent transmissivity values as a function of radial distance from the pumping well. A Bayesian approach is then used to infer the statistical parameters of the transmissivity field by combining prior information about the parameters and the likelihood function expressed in terms of radially-dependent apparent transmissivities determined from pumping tests. A major advantage of the proposed Bayesian approach is that the likelihood function is readily determined from randomly generated multiple realizations of the transmissivity field, without the need to solve the groundwater flow equation. Applying the method to synthetically-generated pumping test data, we demonstrate that, through a relatively simple procedure, information on the spatial structure of the transmissivity may be inferred from pumping tests data. It is also shown that the prior parameter distribution has a significant influence on the estimation procedure, given the non-uniqueness of the estimation procedure. Results also indicate that the reliability of the estimated transmissivity statistical parameters increases with the number of available pumping tests.

  16. New robust statistical procedures for the polytomous logistic regression models.

    PubMed

    Castilla, Elena; Ghosh, Abhik; Martin, Nirian; Pardo, Leandro

    2018-05-17

    This article derives a new family of estimators, namely the minimum density power divergence estimators, as a robust generalization of the maximum likelihood estimator for the polytomous logistic regression model. Based on these estimators, a family of Wald-type test statistics for linear hypotheses is introduced. Robustness properties of both the proposed estimators and the test statistics are theoretically studied through the classical influence function analysis. Appropriate real life examples are presented to justify the requirement of suitable robust statistical procedures in place of the likelihood based inference for the polytomous logistic regression model. The validity of the theoretical results established in the article are further confirmed empirically through suitable simulation studies. Finally, an approach for the data-driven selection of the robustness tuning parameter is proposed with empirical justifications. © 2018, The International Biometric Society.

  17. Hierarchical animal movement models for population-level inference

    USGS Publications Warehouse

    Hooten, Mevin B.; Buderman, Frances E.; Brost, Brian M.; Hanks, Ephraim M.; Ivans, Jacob S.

    2016-01-01

    New methods for modeling animal movement based on telemetry data are developed regularly. With advances in telemetry capabilities, animal movement models are becoming increasingly sophisticated. Despite a need for population-level inference, animal movement models are still predominantly developed for individual-level inference. Most efforts to upscale the inference to the population level are either post hoc or complicated enough that only the developer can implement the model. Hierarchical Bayesian models provide an ideal platform for the development of population-level animal movement models but can be challenging to fit due to computational limitations or extensive tuning required. We propose a two-stage procedure for fitting hierarchical animal movement models to telemetry data. The two-stage approach is statistically rigorous and allows one to fit individual-level movement models separately, then resample them using a secondary MCMC algorithm. The primary advantages of the two-stage approach are that the first stage is easily parallelizable and the second stage is completely unsupervised, allowing for an automated fitting procedure in many cases. We demonstrate the two-stage procedure with two applications of animal movement models. The first application involves a spatial point process approach to modeling telemetry data, and the second involves a more complicated continuous-time discrete-space animal movement model. We fit these models to simulated data and real telemetry data arising from a population of monitored Canada lynx in Colorado, USA.

  18. Statistical Inference at Work: Statistical Process Control as an Example

    ERIC Educational Resources Information Center

    Bakker, Arthur; Kent, Phillip; Derry, Jan; Noss, Richard; Hoyles, Celia

    2008-01-01

    To characterise statistical inference in the workplace this paper compares a prototypical type of statistical inference at work, statistical process control (SPC), with a type of statistical inference that is better known in educational settings, hypothesis testing. Although there are some similarities between the reasoning structure involved in…

  19. Statistical analysis of particle trajectories in living cells

    NASA Astrophysics Data System (ADS)

    Briane, Vincent; Kervrann, Charles; Vimond, Myriam

    2018-06-01

    Recent advances in molecular biology and fluorescence microscopy imaging have made possible the inference of the dynamics of molecules in living cells. Such inference allows us to understand and determine the organization and function of the cell. The trajectories of particles (e.g., biomolecules) in living cells, computed with the help of object tracking methods, can be modeled with diffusion processes. Three types of diffusion are considered: (i) free diffusion, (ii) subdiffusion, and (iii) superdiffusion. The mean-square displacement (MSD) is generally used to discriminate the three types of particle dynamics. We propose here a nonparametric three-decision test as an alternative to the MSD method. The rejection of the null hypothesis, i.e., free diffusion, is accompanied by claims of the direction of the alternative (subdiffusion or superdiffusion). We study the asymptotic behavior of the test statistic under the null hypothesis and under parametric alternatives which are currently considered in the biophysics literature. In addition, we adapt the multiple-testing procedure of Benjamini and Hochberg to fit with the three-decision-test setting, in order to apply the test procedure to a collection of independent trajectories. The performance of our procedure is much better than the MSD method as confirmed by Monte Carlo experiments. The method is demonstrated on real data sets corresponding to protein dynamics observed in fluorescence microscopy.

  20. A close examination of double filtering with fold change and t test in microarray analysis

    PubMed Central

    2009-01-01

    Background Many researchers use the double filtering procedure with fold change and t test to identify differentially expressed genes, in the hope that the double filtering will provide extra confidence in the results. Due to its simplicity, the double filtering procedure has been popular with applied researchers despite the development of more sophisticated methods. Results This paper, for the first time to our knowledge, provides theoretical insight on the drawback of the double filtering procedure. We show that fold change assumes all genes to have a common variance while t statistic assumes gene-specific variances. The two statistics are based on contradicting assumptions. Under the assumption that gene variances arise from a mixture of a common variance and gene-specific variances, we develop the theoretically most powerful likelihood ratio test statistic. We further demonstrate that the posterior inference based on a Bayesian mixture model and the widely used significance analysis of microarrays (SAM) statistic are better approximations to the likelihood ratio test than the double filtering procedure. Conclusion We demonstrate through hypothesis testing theory, simulation studies and real data examples, that well constructed shrinkage testing methods, which can be united under the mixture gene variance assumption, can considerably outperform the double filtering procedure. PMID:19995439

  1. Kernel canonical-correlation Granger causality for multiple time series

    NASA Astrophysics Data System (ADS)

    Wu, Guorong; Duan, Xujun; Liao, Wei; Gao, Qing; Chen, Huafu

    2011-04-01

    Canonical-correlation analysis as a multivariate statistical technique has been applied to multivariate Granger causality analysis to infer information flow in complex systems. It shows unique appeal and great superiority over the traditional vector autoregressive method, due to the simplified procedure that detects causal interaction between multiple time series, and the avoidance of potential model estimation problems. However, it is limited to the linear case. Here, we extend the framework of canonical correlation to include the estimation of multivariate nonlinear Granger causality for drawing inference about directed interaction. Its feasibility and effectiveness are verified on simulated data.

  2. Reproducing a Prospective Clinical Study as a Computational Retrospective Study in MIMIC-II.

    PubMed

    Kury, Fabrício S P; Huser, Vojtech; Cimino, James J

    2015-01-01

    In this paper we sought to reproduce, as a computational retrospective study in an EHR database (MIMIC-II), a recent large prospective clinical study: the 2013 publication, by the Japanese Association for Acute Medicine (JAAM), about disseminated intravascular coagulation, in the journal Critical Care (PMID: 23787004). We designed in SQL and Java a set of electronic phenotypes that reproduced the study's data sampling, and used R to perform the same statistical inference procedures. All produced source code is available online at https://github.com/fabkury/paamia2015. Our program identified 2,257 eligible patients in MIMIC-II, and the results remarkably agreed with the prospective study. A minority of the needed data elements was not found in MIMIC-II, and statistically significant inferences were possible in the majority of the cases.

  3. Knowledge dimensions in hypothesis test problems

    NASA Astrophysics Data System (ADS)

    Krishnan, Saras; Idris, Noraini

    2012-05-01

    The reformation in statistics education over the past two decades has predominantly shifted the focus of statistical teaching and learning from procedural understanding to conceptual understanding. The emphasis of procedural understanding is on the formulas and calculation procedures. Meanwhile, conceptual understanding emphasizes students knowing why they are using a particular formula or executing a specific procedure. In addition, the Revised Bloom's Taxonomy offers a twodimensional framework to describe learning objectives comprising of the six revised cognition levels of original Bloom's taxonomy and four knowledge dimensions. Depending on the level of complexities, the four knowledge dimensions essentially distinguish basic understanding from the more connected understanding. This study identifiesthe factual, procedural and conceptual knowledgedimensions in hypothesis test problems. Hypothesis test being an important tool in making inferences about a population from sample informationis taught in many introductory statistics courses. However, researchers find that students in these courses still have difficulty in understanding the underlying concepts of hypothesis test. Past studies also show that even though students can perform the hypothesis testing procedure, they may not understand the rationale of executing these steps or know how to apply them in novel contexts. Besides knowing the procedural steps in conducting a hypothesis test, students must have fundamental statistical knowledge and deep understanding of the underlying inferential concepts such as sampling distribution and central limit theorem. By identifying the knowledge dimensions of hypothesis test problems in this study, suitable instructional and assessment strategies can be developed in future to enhance students' learning of hypothesis test as a valuable inferential tool.

  4. Investigation of Statistical Inference Methodologies Through Scale Model Propagation Experiments

    DTIC Science & Technology

    2015-09-30

    statistical inference methodologies for ocean- acoustic problems by investigating and applying statistical methods to data collected from scale-model...to begin planning experiments for statistical inference applications. APPROACH In the ocean acoustics community over the past two decades...solutions for waveguide parameters. With the introduction of statistical inference to the field of ocean acoustics came the desire to interpret marginal

  5. Distinguishing between statistical significance and practical/clinical meaningfulness using statistical inference.

    PubMed

    Wilkinson, Michael

    2014-03-01

    Decisions about support for predictions of theories in light of data are made using statistical inference. The dominant approach in sport and exercise science is the Neyman-Pearson (N-P) significance-testing approach. When applied correctly it provides a reliable procedure for making dichotomous decisions for accepting or rejecting zero-effect null hypotheses with known and controlled long-run error rates. Type I and type II error rates must be specified in advance and the latter controlled by conducting an a priori sample size calculation. The N-P approach does not provide the probability of hypotheses or indicate the strength of support for hypotheses in light of data, yet many scientists believe it does. Outcomes of analyses allow conclusions only about the existence of non-zero effects, and provide no information about the likely size of true effects or their practical/clinical value. Bayesian inference can show how much support data provide for different hypotheses, and how personal convictions should be altered in light of data, but the approach is complicated by formulating probability distributions about prior subjective estimates of population effects. A pragmatic solution is magnitude-based inference, which allows scientists to estimate the true magnitude of population effects and how likely they are to exceed an effect magnitude of practical/clinical importance, thereby integrating elements of subjective Bayesian-style thinking. While this approach is gaining acceptance, progress might be hastened if scientists appreciate the shortcomings of traditional N-P null hypothesis significance testing.

  6. Inference of reaction rate parameters based on summary statistics from experiments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Khalil, Mohammad; Chowdhary, Kamaljit Singh; Safta, Cosmin

    Here, we present the results of an application of Bayesian inference and maximum entropy methods for the estimation of the joint probability density for the Arrhenius rate para meters of the rate coefficient of the H 2/O 2-mechanism chain branching reaction H + O 2 → OH + O. Available published data is in the form of summary statistics in terms of nominal values and error bars of the rate coefficient of this reaction at a number of temperature values obtained from shock-tube experiments. Our approach relies on generating data, in this case OH concentration profiles, consistent with the givenmore » summary statistics, using Approximate Bayesian Computation methods and a Markov Chain Monte Carlo procedure. The approach permits the forward propagation of parametric uncertainty through the computational model in a manner that is consistent with the published statistics. A consensus joint posterior on the parameters is obtained by pooling the posterior parameter densities given each consistent data set. To expedite this process, we construct efficient surrogates for the OH concentration using a combination of Pad'e and polynomial approximants. These surrogate models adequately represent forward model observables and their dependence on input parameters and are computationally efficient to allow their use in the Bayesian inference procedure. We also utilize Gauss-Hermite quadrature with Gaussian proposal probability density functions for moment computation resulting in orders of magnitude speedup in data likelihood evaluation. Despite the strong non-linearity in the model, the consistent data sets all res ult in nearly Gaussian conditional parameter probability density functions. The technique also accounts for nuisance parameters in the form of Arrhenius parameters of other rate coefficients with prescribed uncertainty. The resulting pooled parameter probability density function is propagated through stoichiometric hydrogen-air auto-ignition computations to illustrate the need to account for correlation among the Arrhenius rate parameters of one reaction and across rate parameters of different reactions.« less

  7. Inference of reaction rate parameters based on summary statistics from experiments

    DOE PAGES

    Khalil, Mohammad; Chowdhary, Kamaljit Singh; Safta, Cosmin; ...

    2016-10-15

    Here, we present the results of an application of Bayesian inference and maximum entropy methods for the estimation of the joint probability density for the Arrhenius rate para meters of the rate coefficient of the H 2/O 2-mechanism chain branching reaction H + O 2 → OH + O. Available published data is in the form of summary statistics in terms of nominal values and error bars of the rate coefficient of this reaction at a number of temperature values obtained from shock-tube experiments. Our approach relies on generating data, in this case OH concentration profiles, consistent with the givenmore » summary statistics, using Approximate Bayesian Computation methods and a Markov Chain Monte Carlo procedure. The approach permits the forward propagation of parametric uncertainty through the computational model in a manner that is consistent with the published statistics. A consensus joint posterior on the parameters is obtained by pooling the posterior parameter densities given each consistent data set. To expedite this process, we construct efficient surrogates for the OH concentration using a combination of Pad'e and polynomial approximants. These surrogate models adequately represent forward model observables and their dependence on input parameters and are computationally efficient to allow their use in the Bayesian inference procedure. We also utilize Gauss-Hermite quadrature with Gaussian proposal probability density functions for moment computation resulting in orders of magnitude speedup in data likelihood evaluation. Despite the strong non-linearity in the model, the consistent data sets all res ult in nearly Gaussian conditional parameter probability density functions. The technique also accounts for nuisance parameters in the form of Arrhenius parameters of other rate coefficients with prescribed uncertainty. The resulting pooled parameter probability density function is propagated through stoichiometric hydrogen-air auto-ignition computations to illustrate the need to account for correlation among the Arrhenius rate parameters of one reaction and across rate parameters of different reactions.« less

  8. Lifetime Prediction for Degradation of Solar Mirrors using Step-Stress Accelerated Testing (Presentation)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lee, J.; Elmore, R.; Kennedy, C.

    This research is to illustrate the use of statistical inference techniques in order to quantify the uncertainty surrounding reliability estimates in a step-stress accelerated degradation testing (SSADT) scenario. SSADT can be used when a researcher is faced with a resource-constrained environment, e.g., limits on chamber time or on the number of units to test. We apply the SSADT methodology to a degradation experiment involving concentrated solar power (CSP) mirrors and compare the results to a more traditional multiple accelerated testing paradigm. Specifically, our work includes: (1) designing a durability testing plan for solar mirrors (3M's new improved silvered acrylic "Solarmore » Reflector Film (SFM) 1100") through the ultra-accelerated weathering system (UAWS), (2) defining degradation paths of optical performance based on the SSADT model which is accelerated by high UV-radiant exposure, and (3) developing service lifetime prediction models for solar mirrors using advanced statistical inference. We use the method of least squares to estimate the model parameters and this serves as the basis for the statistical inference in SSADT. Several quantities of interest can be estimated from this procedure, e.g., mean-time-to-failure (MTTF) and warranty time. The methods allow for the estimation of quantities that may be of interest to the domain scientists.« less

  9. Refinement of a Bias-Correction Procedure for the Weighted Likelihood Estimator of Ability. Research Report. ETS RR-07-23

    ERIC Educational Resources Information Center

    Zhang, Jinming; Lu, Ting

    2007-01-01

    In practical applications of item response theory (IRT), item parameters are usually estimated first from a calibration sample. After treating these estimates as fixed and known, ability parameters are then estimated. However, the statistical inferences based on the estimated abilities can be misleading if the uncertainty of the item parameter…

  10. Numerical Differentiation Methods for Computing Error Covariance Matrices in Item Response Theory Modeling: An Evaluation and a New Proposal

    ERIC Educational Resources Information Center

    Tian, Wei; Cai, Li; Thissen, David; Xin, Tao

    2013-01-01

    In item response theory (IRT) modeling, the item parameter error covariance matrix plays a critical role in statistical inference procedures. When item parameters are estimated using the EM algorithm, the parameter error covariance matrix is not an automatic by-product of item calibration. Cai proposed the use of Supplemented EM algorithm for…

  11. Assessing the fit of site-occupancy models

    USGS Publications Warehouse

    MacKenzie, D.I.; Bailey, L.L.

    2004-01-01

    Few species are likely to be so evident that they will always be detected at a site when present. Recently a model has been developed that enables estimation of the proportion of area occupied, when the target species is not detected with certainty. Here we apply this modeling approach to data collected on terrestrial salamanders in the Plethodon glutinosus complex in the Great Smoky Mountains National Park, USA, and wish to address the question 'how accurately does the fitted model represent the data?' The goodness-of-fit of the model needs to be assessed in order to make accurate inferences. This article presents a method where a simple Pearson chi-square statistic is calculated and a parametric bootstrap procedure is used to determine whether the observed statistic is unusually large. We found evidence that the most global model considered provides a poor fit to the data, hence estimated an overdispersion factor to adjust model selection procedures and inflate standard errors. Two hypothetical datasets with known assumption violations are also analyzed, illustrating that the method may be used to guide researchers to making appropriate inferences. The results of a simulation study are presented to provide a broader view of the methods properties.

  12. Automating approximate Bayesian computation by local linear regression.

    PubMed

    Thornton, Kevin R

    2009-07-07

    In several biological contexts, parameter inference often relies on computationally-intensive techniques. "Approximate Bayesian Computation", or ABC, methods based on summary statistics have become increasingly popular. A particular flavor of ABC based on using a linear regression to approximate the posterior distribution of the parameters, conditional on the summary statistics, is computationally appealing, yet no standalone tool exists to automate the procedure. Here, I describe a program to implement the method. The software package ABCreg implements the local linear-regression approach to ABC. The advantages are: 1. The code is standalone, and fully-documented. 2. The program will automatically process multiple data sets, and create unique output files for each (which may be processed immediately in R), facilitating the testing of inference procedures on simulated data, or the analysis of multiple data sets. 3. The program implements two different transformation methods for the regression step. 4. Analysis options are controlled on the command line by the user, and the program is designed to output warnings for cases where the regression fails. 5. The program does not depend on any particular simulation machinery (coalescent, forward-time, etc.), and therefore is a general tool for processing the results from any simulation. 6. The code is open-source, and modular.Examples of applying the software to empirical data from Drosophila melanogaster, and testing the procedure on simulated data, are shown. In practice, the ABCreg simplifies implementing ABC based on local-linear regression.

  13. Application of maximum entropy to statistical inference for inversion of data from a single track segment.

    PubMed

    Stotts, Steven A; Koch, Robert A

    2017-08-01

    In this paper an approach is presented to estimate the constraint required to apply maximum entropy (ME) for statistical inference with underwater acoustic data from a single track segment. Previous algorithms for estimating the ME constraint require multiple source track segments to determine the constraint. The approach is relevant for addressing model mismatch effects, i.e., inaccuracies in parameter values determined from inversions because the propagation model does not account for all acoustic processes that contribute to the measured data. One effect of model mismatch is that the lowest cost inversion solution may be well outside a relatively well-known parameter value's uncertainty interval (prior), e.g., source speed from track reconstruction or towed source levels. The approach requires, for some particular parameter value, the ME constraint to produce an inferred uncertainty interval that encompasses the prior. Motivating this approach is the hypothesis that the proposed constraint determination procedure would produce a posterior probability density that accounts for the effect of model mismatch on inferred values of other inversion parameters for which the priors might be quite broad. Applications to both measured and simulated data are presented for model mismatch that produces minimum cost solutions either inside or outside some priors.

  14. Robust regression for large-scale neuroimaging studies.

    PubMed

    Fritsch, Virgile; Da Mota, Benoit; Loth, Eva; Varoquaux, Gaël; Banaschewski, Tobias; Barker, Gareth J; Bokde, Arun L W; Brühl, Rüdiger; Butzek, Brigitte; Conrod, Patricia; Flor, Herta; Garavan, Hugh; Lemaitre, Hervé; Mann, Karl; Nees, Frauke; Paus, Tomas; Schad, Daniel J; Schümann, Gunter; Frouin, Vincent; Poline, Jean-Baptiste; Thirion, Bertrand

    2015-05-01

    Multi-subject datasets used in neuroimaging group studies have a complex structure, as they exhibit non-stationary statistical properties across regions and display various artifacts. While studies with small sample sizes can rarely be shown to deviate from standard hypotheses (such as the normality of the residuals) due to the poor sensitivity of normality tests with low degrees of freedom, large-scale studies (e.g. >100 subjects) exhibit more obvious deviations from these hypotheses and call for more refined models for statistical inference. Here, we demonstrate the benefits of robust regression as a tool for analyzing large neuroimaging cohorts. First, we use an analytic test based on robust parameter estimates; based on simulations, this procedure is shown to provide an accurate statistical control without resorting to permutations. Second, we show that robust regression yields more detections than standard algorithms using as an example an imaging genetics study with 392 subjects. Third, we show that robust regression can avoid false positives in a large-scale analysis of brain-behavior relationships with over 1500 subjects. Finally we embed robust regression in the Randomized Parcellation Based Inference (RPBI) method and demonstrate that this combination further improves the sensitivity of tests carried out across the whole brain. Altogether, our results show that robust procedures provide important advantages in large-scale neuroimaging group studies. Copyright © 2015 Elsevier Inc. All rights reserved.

  15. Emerging Concepts of Data Integration in Pathogen Phylodynamics.

    PubMed

    Baele, Guy; Suchard, Marc A; Rambaut, Andrew; Lemey, Philippe

    2017-01-01

    Phylodynamics has become an increasingly popular statistical framework to extract evolutionary and epidemiological information from pathogen genomes. By harnessing such information, epidemiologists aim to shed light on the spatio-temporal patterns of spread and to test hypotheses about the underlying interaction of evolutionary and ecological dynamics in pathogen populations. Although the field has witnessed a rich development of statistical inference tools with increasing levels of sophistication, these tools initially focused on sequences as their sole primary data source. Integrating various sources of information, however, promises to deliver more precise insights in infectious diseases and to increase opportunities for statistical hypothesis testing. Here, we review how the emerging concept of data integration is stimulating new advances in Bayesian evolutionary inference methodology which formalize a marriage of statistical thinking and evolutionary biology. These approaches include connecting sequence to trait evolution, such as for host, phenotypic and geographic sampling information, but also the incorporation of covariates of evolutionary and epidemic processes in the reconstruction procedures. We highlight how a full Bayesian approach to covariate modeling and testing can generate further insights into sequence evolution, trait evolution, and population dynamics in pathogen populations. Specific examples demonstrate how such approaches can be used to test the impact of host on rabies and HIV evolutionary rates, to identify the drivers of influenza dispersal as well as the determinants of rabies cross-species transmissions, and to quantify the evolutionary dynamics of influenza antigenicity. Finally, we briefly discuss how data integration is now also permeating through the inference of transmission dynamics, leading to novel insights into tree-generative processes and detailed reconstructions of transmission trees. [Bayesian inference; birth–death models; coalescent models; continuous trait evolution; covariates; data integration; discrete trait evolution; pathogen phylodynamics.

  16. Emerging Concepts of Data Integration in Pathogen Phylodynamics

    PubMed Central

    Baele, Guy; Suchard, Marc A.; Rambaut, Andrew; Lemey, Philippe

    2017-01-01

    Phylodynamics has become an increasingly popular statistical framework to extract evolutionary and epidemiological information from pathogen genomes. By harnessing such information, epidemiologists aim to shed light on the spatio-temporal patterns of spread and to test hypotheses about the underlying interaction of evolutionary and ecological dynamics in pathogen populations. Although the field has witnessed a rich development of statistical inference tools with increasing levels of sophistication, these tools initially focused on sequences as their sole primary data source. Integrating various sources of information, however, promises to deliver more precise insights in infectious diseases and to increase opportunities for statistical hypothesis testing. Here, we review how the emerging concept of data integration is stimulating new advances in Bayesian evolutionary inference methodology which formalize a marriage of statistical thinking and evolutionary biology. These approaches include connecting sequence to trait evolution, such as for host, phenotypic and geographic sampling information, but also the incorporation of covariates of evolutionary and epidemic processes in the reconstruction procedures. We highlight how a full Bayesian approach to covariate modeling and testing can generate further insights into sequence evolution, trait evolution, and population dynamics in pathogen populations. Specific examples demonstrate how such approaches can be used to test the impact of host on rabies and HIV evolutionary rates, to identify the drivers of influenza dispersal as well as the determinants of rabies cross-species transmissions, and to quantify the evolutionary dynamics of influenza antigenicity. Finally, we briefly discuss how data integration is now also permeating through the inference of transmission dynamics, leading to novel insights into tree-generative processes and detailed reconstructions of transmission trees. [Bayesian inference; birth–death models; coalescent models; continuous trait evolution; covariates; data integration; discrete trait evolution; pathogen phylodynamics. PMID:28173504

  17. [Methodological design of the National Health and Nutrition Survey 2016].

    PubMed

    Romero-Martínez, Martín; Shamah-Levy, Teresa; Cuevas-Nasu, Lucía; Gómez-Humarán, Ignacio Méndez; Gaona-Pineda, Elsa Berenice; Gómez-Acosta, Luz María; Rivera-Dommarco, Juan Ángel; Hernández-Ávila, Mauricio

    2017-01-01

    Describe the design methodology of the halfway health and nutrition national survey (Ensanut-MC) 2016. The Ensanut-MC is a national probabilistic survey whose objective population are the inhabitants of private households in Mexico. The sample size was determined to make inferences on the urban and rural areas in four regions. Describes main design elements: target population, topics of study, sampling procedure, measurement procedure and logistics organization. A final sample of 9 479 completed household interviews, and a sample of 16 591 individual interviews. The response rate for households was 77.9%, and the response rate for individuals was 91.9%. The Ensanut-MC probabilistic design allows valid statistical inferences about interest parameters for Mexico´s public health and nutrition, specifically on overweight, obesity and diabetes mellitus. Updated information also supports the monitoring, updating and formulation of new policies and priority programs.

  18. Stan: Statistical inference

    NASA Astrophysics Data System (ADS)

    Stan Development Team

    2018-01-01

    Stan facilitates statistical inference at the frontiers of applied statistics and provides both a modeling language for specifying complex statistical models and a library of statistical algorithms for computing inferences with those models. These components are exposed through interfaces in environments such as R, Python, and the command line.

  19. The search for causal inferences: using propensity scores post hoc to reduce estimation error with nonexperimental research.

    PubMed

    Tumlinson, Samuel E; Sass, Daniel A; Cano, Stephanie M

    2014-03-01

    While experimental designs are regarded as the gold standard for establishing causal relationships, such designs are usually impractical owing to common methodological limitations. The objective of this article is to illustrate how propensity score matching (PSM) and using propensity scores (PS) as a covariate are viable alternatives to reduce estimation error when experimental designs cannot be implemented. To mimic common pediatric research practices, data from 140 simulated participants were used to resemble an experimental and nonexperimental design that assessed the effect of treatment status on participant weight loss for diabetes. Pretreatment participant characteristics (age, gender, physical activity, etc.) were then used to generate PS for use in the various statistical approaches. Results demonstrate how PSM and using the PS as a covariate can be used to reduce estimation error and improve statistical inferences. References for issues related to the implementation of these procedures are provided to assist researchers.

  20. The Development of Introductory Statistics Students' Informal Inferential Reasoning and Its Relationship to Formal Inferential Reasoning

    ERIC Educational Resources Information Center

    Jacob, Bridgette L.

    2013-01-01

    The difficulties introductory statistics students have with formal statistical inference are well known in the field of statistics education. "Informal" statistical inference has been studied as a means to introduce inferential reasoning well before and without the formalities of formal statistical inference. This mixed methods study…

  1. Students' Emergent Articulations of Statistical Models and Modeling in Making Informal Statistical Inferences

    ERIC Educational Resources Information Center

    Braham, Hana Manor; Ben-Zvi, Dani

    2017-01-01

    A fundamental aspect of statistical inference is representation of real-world data using statistical models. This article analyzes students' articulations of statistical models and modeling during their first steps in making informal statistical inferences. An integrated modeling approach (IMA) was designed and implemented to help students…

  2. Bayesian hypothesis testing for human threat conditioning research: an introduction and the condir R package

    PubMed Central

    Krypotos, Angelos-Miltiadis; Klugkist, Irene; Engelhard, Iris M.

    2017-01-01

    ABSTRACT Threat conditioning procedures have allowed the experimental investigation of the pathogenesis of Post-Traumatic Stress Disorder. The findings of these procedures have also provided stable foundations for the development of relevant intervention programs (e.g. exposure therapy). Statistical inference of threat conditioning procedures is commonly based on p-values and Null Hypothesis Significance Testing (NHST). Nowadays, however, there is a growing concern about this statistical approach, as many scientists point to the various limitations of p-values and NHST. As an alternative, the use of Bayes factors and Bayesian hypothesis testing has been suggested. In this article, we apply this statistical approach to threat conditioning data. In order to enable the easy computation of Bayes factors for threat conditioning data we present a new R package named condir, which can be used either via the R console or via a Shiny application. This article provides both a non-technical introduction to Bayesian analysis for researchers using the threat conditioning paradigm, and the necessary tools for computing Bayes factors easily. PMID:29038683

  3. NASA DOE POD NDE Capabilities Data Book

    NASA Technical Reports Server (NTRS)

    Generazio, Edward R.

    2015-01-01

    This data book contains the Directed Design of Experiments for Validating Probability of Detection (POD) Capability of NDE Systems (DOEPOD) analyses of the nondestructive inspection data presented in the NTIAC, Nondestructive Evaluation (NDE) Capabilities Data Book, 3rd ed., NTIAC DB-97-02. DOEPOD is designed as a decision support system to validate inspection system, personnel, and protocol demonstrating 0.90 POD with 95% confidence at critical flaw sizes, a90/95. The test methodology used in DOEPOD is based on the field of statistical sequential analysis founded by Abraham Wald. Sequential analysis is a method of statistical inference whose characteristic feature is that the number of observations required by the procedure is not determined in advance of the experiment. The decision to terminate the experiment depends, at each stage, on the results of the observations previously made. A merit of the sequential method, as applied to testing statistical hypotheses, is that test procedures can be constructed which require, on average, a substantially smaller number of observations than equally reliable test procedures based on a predetermined number of observations.

  4. Statistical atlas based extrapolation of CT data

    NASA Astrophysics Data System (ADS)

    Chintalapani, Gouthami; Murphy, Ryan; Armiger, Robert S.; Lepisto, Jyri; Otake, Yoshito; Sugano, Nobuhiko; Taylor, Russell H.; Armand, Mehran

    2010-02-01

    We present a framework to estimate the missing anatomical details from a partial CT scan with the help of statistical shape models. The motivating application is periacetabular osteotomy (PAO), a technique for treating developmental hip dysplasia, an abnormal condition of the hip socket that, if untreated, may lead to osteoarthritis. The common goals of PAO are to reduce pain, joint subluxation and improve contact pressure distribution by increasing the coverage of the femoral head by the hip socket. While current diagnosis and planning is based on radiological measurements, because of significant structural variations in dysplastic hips, a computer-assisted geometrical and biomechanical planning based on CT data is desirable to help the surgeon achieve optimal joint realignments. Most of the patients undergoing PAO are young females, hence it is usually desirable to minimize the radiation dose by scanning only the joint portion of the hip anatomy. These partial scans, however, do not provide enough information for biomechanical analysis due to missing iliac region. A statistical shape model of full pelvis anatomy is constructed from a database of CT scans. The partial volume is first aligned with the statistical atlas using an iterative affine registration, followed by a deformable registration step and the missing information is inferred from the atlas. The atlas inferences are further enhanced by the use of X-ray images of the patient, which are very common in an osteotomy procedure. The proposed method is validated with a leave-one-out analysis method. Osteotomy cuts are simulated and the effect of atlas predicted models on the actual procedure is evaluated.

  5. Inference of median difference based on the Box-Cox model in randomized clinical trials.

    PubMed

    Maruo, K; Isogawa, N; Gosho, M

    2015-05-10

    In randomized clinical trials, many medical and biological measurements are not normally distributed and are often skewed. The Box-Cox transformation is a powerful procedure for comparing two treatment groups for skewed continuous variables in terms of a statistical test. However, it is difficult to directly estimate and interpret the location difference between the two groups on the original scale of the measurement. We propose a helpful method that infers the difference of the treatment effect on the original scale in a more easily interpretable form. We also provide statistical analysis packages that consistently include an estimate of the treatment effect, covariance adjustments, standard errors, and statistical hypothesis tests. The simulation study that focuses on randomized parallel group clinical trials with two treatment groups indicates that the performance of the proposed method is equivalent to or better than that of the existing non-parametric approaches in terms of the type-I error rate and power. We illustrate our method with cluster of differentiation 4 data in an acquired immune deficiency syndrome clinical trial. Copyright © 2015 John Wiley & Sons, Ltd.

  6. The Reasoning behind Informal Statistical Inference

    ERIC Educational Resources Information Center

    Makar, Katie; Bakker, Arthur; Ben-Zvi, Dani

    2011-01-01

    Informal statistical inference (ISI) has been a frequent focus of recent research in statistics education. Considering the role that context plays in developing ISI calls into question the need to be more explicit about the reasoning that underpins ISI. This paper uses educational literature on informal statistical inference and philosophical…

  7. METEOR - an artificial intelligence system for convective storm forecasting

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Elio, R.; De haan, J.; Strong, G.S.

    1987-03-01

    An AI system called METEOR, which uses the meteorologist's heuristics, strategies, and statistical tools to forecast severe hailstorms in Alberta, is described, emphasizing the information and knowledge that METEOR uses to mimic the forecasting procedure of an expert meteorologist. METEOR is then discussed as an AI system, emphasizing the ways in which it is qualitatively different from algorithmic or statistical approaches to prediction. Some features of METEOR's design and the AI techniques for representing meteorological knowledge and for reasoning and inference are presented. Finally, some observations on designing and implementing intelligent consultants for meteorological applications are made. 7 references.

  8. Statistical Models and Inference Procedures for Structural and Materials Reliability

    DTIC Science & Technology

    1990-12-01

    as an official Department of the Army positio~n, policy, or decision, unless sD designated by other documentazion. 12a. DISTRIBUTION /AVAILABILITY...Some general stress-strength models were also developed and applied to the failure of systems subject to cyclic loading. Involved in the failure of...process control ideas and sequential design and analysis methods. Finally, smooth nonparametric quantile .wJ function estimators were studied. All of

  9. Automated Box-Cox Transformations for Improved Visual Encoding.

    PubMed

    Maciejewski, Ross; Pattath, Avin; Ko, Sungahn; Hafen, Ryan; Cleveland, William S; Ebert, David S

    2013-01-01

    The concept of preconditioning data (utilizing a power transformation as an initial step) for analysis and visualization is well established within the statistical community and is employed as part of statistical modeling and analysis. Such transformations condition the data to various inherent assumptions of statistical inference procedures, as well as making the data more symmetric and easier to visualize and interpret. In this paper, we explore the use of the Box-Cox family of power transformations to semiautomatically adjust visual parameters. We focus on time-series scaling, axis transformations, and color binning for choropleth maps. We illustrate the usage of this transformation through various examples, and discuss the value and some issues in semiautomatically using these transformations for more effective data visualization.

  10. Multi-objective calibration and uncertainty analysis of hydrologic models; A comparative study between formal and informal methods

    NASA Astrophysics Data System (ADS)

    Shafii, M.; Tolson, B.; Matott, L. S.

    2012-04-01

    Hydrologic modeling has benefited from significant developments over the past two decades. This has resulted in building of higher levels of complexity into hydrologic models, which eventually makes the model evaluation process (parameter estimation via calibration and uncertainty analysis) more challenging. In order to avoid unreasonable parameter estimates, many researchers have suggested implementation of multi-criteria calibration schemes. Furthermore, for predictive hydrologic models to be useful, proper consideration of uncertainty is essential. Consequently, recent research has emphasized comprehensive model assessment procedures in which multi-criteria parameter estimation is combined with statistically-based uncertainty analysis routines such as Bayesian inference using Markov Chain Monte Carlo (MCMC) sampling. Such a procedure relies on the use of formal likelihood functions based on statistical assumptions, and moreover, the Bayesian inference structured on MCMC samplers requires a considerably large number of simulations. Due to these issues, especially in complex non-linear hydrological models, a variety of alternative informal approaches have been proposed for uncertainty analysis in the multi-criteria context. This study aims at exploring a number of such informal uncertainty analysis techniques in multi-criteria calibration of hydrological models. The informal methods addressed in this study are (i) Pareto optimality which quantifies the parameter uncertainty using the Pareto solutions, (ii) DDS-AU which uses the weighted sum of objective functions to derive the prediction limits, and (iii) GLUE which describes the total uncertainty through identification of behavioral solutions. The main objective is to compare such methods with MCMC-based Bayesian inference with respect to factors such as computational burden, and predictive capacity, which are evaluated based on multiple comparative measures. The measures for comparison are calculated both for calibration and evaluation periods. The uncertainty analysis methodologies are applied to a simple 5-parameter rainfall-runoff model, called HYMOD.

  11. Assessing dynamics, spatial scale, and uncertainty in task-related brain network analyses

    PubMed Central

    Stephen, Emily P.; Lepage, Kyle Q.; Eden, Uri T.; Brunner, Peter; Schalk, Gerwin; Brumberg, Jonathan S.; Guenther, Frank H.; Kramer, Mark A.

    2014-01-01

    The brain is a complex network of interconnected elements, whose interactions evolve dynamically in time to cooperatively perform specific functions. A common technique to probe these interactions involves multi-sensor recordings of brain activity during a repeated task. Many techniques exist to characterize the resulting task-related activity, including establishing functional networks, which represent the statistical associations between brain areas. Although functional network inference is commonly employed to analyze neural time series data, techniques to assess the uncertainty—both in the functional network edges and the corresponding aggregate measures of network topology—are lacking. To address this, we describe a statistically principled approach for computing uncertainty in functional networks and aggregate network measures in task-related data. The approach is based on a resampling procedure that utilizes the trial structure common in experimental recordings. We show in simulations that this approach successfully identifies functional networks and associated measures of confidence emergent during a task in a variety of scenarios, including dynamically evolving networks. In addition, we describe a principled technique for establishing functional networks based on predetermined regions of interest using canonical correlation. Doing so provides additional robustness to the functional network inference. Finally, we illustrate the use of these methods on example invasive brain voltage recordings collected during an overt speech task. The general strategy described here—appropriate for static and dynamic network inference and different statistical measures of coupling—permits the evaluation of confidence in network measures in a variety of settings common to neuroscience. PMID:24678295

  12. Assessing dynamics, spatial scale, and uncertainty in task-related brain network analyses.

    PubMed

    Stephen, Emily P; Lepage, Kyle Q; Eden, Uri T; Brunner, Peter; Schalk, Gerwin; Brumberg, Jonathan S; Guenther, Frank H; Kramer, Mark A

    2014-01-01

    The brain is a complex network of interconnected elements, whose interactions evolve dynamically in time to cooperatively perform specific functions. A common technique to probe these interactions involves multi-sensor recordings of brain activity during a repeated task. Many techniques exist to characterize the resulting task-related activity, including establishing functional networks, which represent the statistical associations between brain areas. Although functional network inference is commonly employed to analyze neural time series data, techniques to assess the uncertainty-both in the functional network edges and the corresponding aggregate measures of network topology-are lacking. To address this, we describe a statistically principled approach for computing uncertainty in functional networks and aggregate network measures in task-related data. The approach is based on a resampling procedure that utilizes the trial structure common in experimental recordings. We show in simulations that this approach successfully identifies functional networks and associated measures of confidence emergent during a task in a variety of scenarios, including dynamically evolving networks. In addition, we describe a principled technique for establishing functional networks based on predetermined regions of interest using canonical correlation. Doing so provides additional robustness to the functional network inference. Finally, we illustrate the use of these methods on example invasive brain voltage recordings collected during an overt speech task. The general strategy described here-appropriate for static and dynamic network inference and different statistical measures of coupling-permits the evaluation of confidence in network measures in a variety of settings common to neuroscience.

  13. Design-based Sample and Probability Law-Assumed Sample: Their Role in Scientific Investigation.

    ERIC Educational Resources Information Center

    Ojeda, Mario Miguel; Sahai, Hardeo

    2002-01-01

    Discusses some key statistical concepts in probabilistic and non-probabilistic sampling to provide an overview for understanding the inference process. Suggests a statistical model constituting the basis of statistical inference and provides a brief review of the finite population descriptive inference and a quota sampling inferential theory.…

  14. The Importance of Statistical Modeling in Data Analysis and Inference

    ERIC Educational Resources Information Center

    Rollins, Derrick, Sr.

    2017-01-01

    Statistical inference simply means to draw a conclusion based on information that comes from data. Error bars are the most commonly used tool for data analysis and inference in chemical engineering data studies. This work demonstrates, using common types of data collection studies, the importance of specifying the statistical model for sound…

  15. The geographic mosaic of Ecuadorian Y-chromosome ancestry.

    PubMed

    Toscanini, U; Gaviria, A; Pardo-Seco, J; Gómez-Carballa, A; Moscoso, F; Vela, M; Cobos, S; Lupero, A; Zambrano, A K; Martinón-Torres, F; Carabajo-Marcillo, A; Yunga-León, R; Ugalde-Noritz, N; Ordoñez-Ugalde, A; Salas, A

    2018-03-01

    Ecuadorians originated from a complex mixture of Native American indigenous people with Europeans and Africans. We analyzed Y-chromosome STRs (Y-STRs) in a sample of 415 Ecuadorians (145 using the AmpFlSTR ® Yfiler™ system [Life Technologies, USA] and 270 using the PowerPlex ® Y23 system [Promega Corp., USA]; hereafter Yfiler and PPY23, respectively) representing three main ecological continental regions of the country, namely Amazon rainforest, Andes, and Pacific coast. Diversity values are high in the three regions, and the PPY23 exhibits higher discrimination power than the Yfiler set. While summary statistics, AMOVA, and R ST distances show low to moderate levels of population stratification, inferred ancestry derived from Y-STRs reveal clear patterns of geographic variation. The major ancestry in Ecuadorian males is European (61%), followed by an important Native American component (34%); whereas the African ancestry (5%) is mainly concentrated in the Northwest corner of the country. We conclude that classical procedures for measuring population stratification do not have the desirable sensitivity. Statistical inference of ancestry from Y-STRS is a satisfactory alternative for revealing patterns of spatial variation that would pass unnoticed when using popular statistical summary indices. Copyright © 2017 Elsevier B.V. All rights reserved.

  16. Suggestions for presenting the results of data analyses

    USGS Publications Warehouse

    Anderson, David R.; Link, William A.; Johnson, Douglas H.; Burnham, Kenneth P.

    2001-01-01

    We give suggestions for the presentation of research results from frequentist, information-theoretic, and Bayesian analysis paradigms, followed by several general suggestions. The information-theoretic and Bayesian methods offer alternative approaches to data analysis and inference compared to traditionally used methods. Guidance is lacking on the presentation of results under these alternative procedures and on nontesting aspects of classical frequentists methods of statistical analysis. Null hypothesis testing has come under intense criticism. We recommend less reporting of the results of statistical tests of null hypotheses in cases where the null is surely false anyway, or where the null hypothesis is of little interest to science or management.

  17. A Robust Adaptive Autonomous Approach to Optimal Experimental Design

    NASA Astrophysics Data System (ADS)

    Gu, Hairong

    Experimentation is the fundamental tool of scientific inquiries to understand the laws governing the nature and human behaviors. Many complex real-world experimental scenarios, particularly in quest of prediction accuracy, often encounter difficulties to conduct experiments using an existing experimental procedure for the following two reasons. First, the existing experimental procedures require a parametric model to serve as the proxy of the latent data structure or data-generating mechanism at the beginning of an experiment. However, for those experimental scenarios of concern, a sound model is often unavailable before an experiment. Second, those experimental scenarios usually contain a large number of design variables, which potentially leads to a lengthy and costly data collection cycle. Incompetently, the existing experimental procedures are unable to optimize large-scale experiments so as to minimize the experimental length and cost. Facing the two challenges in those experimental scenarios, the aim of the present study is to develop a new experimental procedure that allows an experiment to be conducted without the assumption of a parametric model while still achieving satisfactory prediction, and performs optimization of experimental designs to improve the efficiency of an experiment. The new experimental procedure developed in the present study is named robust adaptive autonomous system (RAAS). RAAS is a procedure for sequential experiments composed of multiple experimental trials, which performs function estimation, variable selection, reverse prediction and design optimization on each trial. Directly addressing the challenges in those experimental scenarios of concern, function estimation and variable selection are performed by data-driven modeling methods to generate a predictive model from data collected during the course of an experiment, thus exempting the requirement of a parametric model at the beginning of an experiment; design optimization is performed to select experimental designs on the fly of an experiment based on their usefulness so that fewest designs are needed to reach useful inferential conclusions. Technically, function estimation is realized by Bayesian P-splines, variable selection is realized by Bayesian spike-and-slab prior, reverse prediction is realized by grid-search and design optimization is realized by the concepts of active learning. The present study demonstrated that RAAS achieves statistical robustness by making accurate predictions without the assumption of a parametric model serving as the proxy of latent data structure while the existing procedures can draw poor statistical inferences if a misspecified model is assumed; RAAS also achieves inferential efficiency by taking fewer designs to acquire useful statistical inferences than non-optimal procedures. Thus, RAAS is expected to be a principled solution to real-world experimental scenarios pursuing robust prediction and efficient experimentation.

  18. Inferring Demographic History Using Two-Locus Statistics.

    PubMed

    Ragsdale, Aaron P; Gutenkunst, Ryan N

    2017-06-01

    Population demographic history may be learned from contemporary genetic variation data. Methods based on aggregating the statistics of many single loci into an allele frequency spectrum (AFS) have proven powerful, but such methods ignore potentially informative patterns of linkage disequilibrium (LD) between neighboring loci. To leverage such patterns, we developed a composite-likelihood framework for inferring demographic history from aggregated statistics of pairs of loci. Using this framework, we show that two-locus statistics are more sensitive to demographic history than single-locus statistics such as the AFS. In particular, two-locus statistics escape the notorious confounding of depth and duration of a bottleneck, and they provide a means to estimate effective population size based on the recombination rather than mutation rate. We applied our approach to a Zambian population of Drosophila melanogaster Notably, using both single- and two-locus statistics, we inferred a substantially lower ancestral effective population size than previous works and did not infer a bottleneck history. Together, our results demonstrate the broad potential for two-locus statistics to enable powerful population genetic inference. Copyright © 2017 by the Genetics Society of America.

  19. Teaching Statistical Inference for Causal Effects in Experiments and Observational Studies

    ERIC Educational Resources Information Center

    Rubin, Donald B.

    2004-01-01

    Inference for causal effects is a critical activity in many branches of science and public policy. The field of statistics is the one field most suited to address such problems, whether from designed experiments or observational studies. Consequently, it is arguably essential that departments of statistics teach courses in causal inference to both…

  20. M-071 critical data analysis

    NASA Technical Reports Server (NTRS)

    Hegsted, D. M.

    1975-01-01

    A prototype balance study was conducted on earth prior to the balance studies conducted in Skylab itself. Collected were daily dietary intake data of 6 minerals and nitrogen, and fecal and urinary outputs on each of three astronauts. Essential statistical issues show what quantities need to be estimated and establish the scope of inference associated with alternative variance estimates. The procedures for obtaining the final variability due both to errors of measurement and total error (total = measurement and biological variability) are exhibited.

  1. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Priel, Nadav; Landsman, Hagar; Manfredini, Alessandro

    We propose a safeguard procedure for statistical inference that provides universal protection against mismodeling of the background. The method quantifies and incorporates the signal-like residuals of the background model into the likelihood function, using information available in a calibration dataset. This prevents possible false discovery claims that may arise through unknown mismodeling, and corrects the bias in limit setting created by overestimated or underestimated background. We demonstrate how the method removes the bias created by an incomplete background model using three realistic case studies.

  2. Fast maximum likelihood estimation using continuous-time neural point process models.

    PubMed

    Lepage, Kyle Q; MacDonald, Christopher J

    2015-06-01

    A recent report estimates that the number of simultaneously recorded neurons is growing exponentially. A commonly employed statistical paradigm using discrete-time point process models of neural activity involves the computation of a maximum-likelihood estimate. The time to computate this estimate, per neuron, is proportional to the number of bins in a finely spaced discretization of time. By using continuous-time models of neural activity and the optimally efficient Gaussian quadrature, memory requirements and computation times are dramatically decreased in the commonly encountered situation where the number of parameters p is much less than the number of time-bins n. In this regime, with q equal to the quadrature order, memory requirements are decreased from O(np) to O(qp), and the number of floating-point operations are decreased from O(np(2)) to O(qp(2)). Accuracy of the proposed estimates is assessed based upon physiological consideration, error bounds, and mathematical results describing the relation between numerical integration error and numerical error affecting both parameter estimates and the observed Fisher information. A check is provided which is used to adapt the order of numerical integration. The procedure is verified in simulation and for hippocampal recordings. It is found that in 95 % of hippocampal recordings a q of 60 yields numerical error negligible with respect to parameter estimate standard error. Statistical inference using the proposed methodology is a fast and convenient alternative to statistical inference performed using a discrete-time point process model of neural activity. It enables the employment of the statistical methodology available with discrete-time inference, but is faster, uses less memory, and avoids any error due to discretization.

  3. Reasoning about Informal Statistical Inference: One Statistician's View

    ERIC Educational Resources Information Center

    Rossman, Allan J.

    2008-01-01

    This paper identifies key concepts and issues associated with the reasoning of informal statistical inference. I focus on key ideas of inference that I think all students should learn, including at secondary level as well as tertiary. I argue that a fundamental component of inference is to go beyond the data at hand, and I propose that statistical…

  4. Assessment of statistical education in Indonesia: Preliminary results and initiation to simulation-based inference

    NASA Astrophysics Data System (ADS)

    Saputra, K. V. I.; Cahyadi, L.; Sembiring, U. A.

    2018-01-01

    Start in this paper, we assess our traditional elementary statistics education and also we introduce elementary statistics with simulation-based inference. To assess our statistical class, we adapt the well-known CAOS (Comprehensive Assessment of Outcomes in Statistics) test that serves as an external measure to assess the student’s basic statistical literacy. This test generally represents as an accepted measure of statistical literacy. We also introduce a new teaching method on elementary statistics class. Different from the traditional elementary statistics course, we will introduce a simulation-based inference method to conduct hypothesis testing. From the literature, it has shown that this new teaching method works very well in increasing student’s understanding of statistics.

  5. Apes are intuitive statisticians.

    PubMed

    Rakoczy, Hannes; Clüver, Annette; Saucke, Liane; Stoffregen, Nicole; Gräbener, Alice; Migura, Judith; Call, Josep

    2014-04-01

    Inductive learning and reasoning, as we use it both in everyday life and in science, is characterized by flexible inferences based on statistical information: inferences from populations to samples and vice versa. Many forms of such statistical reasoning have been found to develop late in human ontogeny, depending on formal education and language, and to be fragile even in adults. New revolutionary research, however, suggests that even preverbal human infants make use of intuitive statistics. Here, we conducted the first investigation of such intuitive statistical reasoning with non-human primates. In a series of 7 experiments, Bonobos, Chimpanzees, Gorillas and Orangutans drew flexible statistical inferences from populations to samples. These inferences, furthermore, were truly based on statistical information regarding the relative frequency distributions in a population, and not on absolute frequencies. Intuitive statistics in its most basic form is thus an evolutionarily more ancient rather than a uniquely human capacity. Copyright © 2014 Elsevier B.V. All rights reserved.

  6. Cluster mass inference via random field theory.

    PubMed

    Zhang, Hui; Nichols, Thomas E; Johnson, Timothy D

    2009-01-01

    Cluster extent and voxel intensity are two widely used statistics in neuroimaging inference. Cluster extent is sensitive to spatially extended signals while voxel intensity is better for intense but focal signals. In order to leverage strength from both statistics, several nonparametric permutation methods have been proposed to combine the two methods. Simulation studies have shown that of the different cluster permutation methods, the cluster mass statistic is generally the best. However, to date, there is no parametric cluster mass inference available. In this paper, we propose a cluster mass inference method based on random field theory (RFT). We develop this method for Gaussian images, evaluate it on Gaussian and Gaussianized t-statistic images and investigate its statistical properties via simulation studies and real data. Simulation results show that the method is valid under the null hypothesis and demonstrate that it can be more powerful than the cluster extent inference method. Further, analyses with a single subject and a group fMRI dataset demonstrate better power than traditional cluster size inference, and good accuracy relative to a gold-standard permutation test.

  7. Inverse probability weighting for covariate adjustment in randomized studies

    PubMed Central

    Li, Xiaochun; Li, Lingling

    2013-01-01

    SUMMARY Covariate adjustment in randomized clinical trials has the potential benefit of precision gain. It also has the potential pitfall of reduced objectivity as it opens the possibility of selecting “favorable” model that yields strong treatment benefit estimate. Although there is a large volume of statistical literature targeting on the first aspect, realistic solutions to enforce objective inference and improve precision are rare. As a typical randomized trial needs to accommodate many implementation issues beyond statistical considerations, maintaining the objectivity is at least as important as precision gain if not more, particularly from the perspective of the regulatory agencies. In this article, we propose a two-stage estimation procedure based on inverse probability weighting to achieve better precision without compromising objectivity. The procedure is designed in a way such that the covariate adjustment is performed before seeing the outcome, effectively reducing the possibility of selecting a “favorable” model that yields a strong intervention effect. Both theoretical and numerical properties of the estimation procedure are presented. Application of the proposed method to a real data example is presented. PMID:24038458

  8. Inverse probability weighting for covariate adjustment in randomized studies.

    PubMed

    Shen, Changyu; Li, Xiaochun; Li, Lingling

    2014-02-20

    Covariate adjustment in randomized clinical trials has the potential benefit of precision gain. It also has the potential pitfall of reduced objectivity as it opens the possibility of selecting a 'favorable' model that yields strong treatment benefit estimate. Although there is a large volume of statistical literature targeting on the first aspect, realistic solutions to enforce objective inference and improve precision are rare. As a typical randomized trial needs to accommodate many implementation issues beyond statistical considerations, maintaining the objectivity is at least as important as precision gain if not more, particularly from the perspective of the regulatory agencies. In this article, we propose a two-stage estimation procedure based on inverse probability weighting to achieve better precision without compromising objectivity. The procedure is designed in a way such that the covariate adjustment is performed before seeing the outcome, effectively reducing the possibility of selecting a 'favorable' model that yields a strong intervention effect. Both theoretical and numerical properties of the estimation procedure are presented. Application of the proposed method to a real data example is presented. Copyright © 2013 John Wiley & Sons, Ltd.

  9. Classification image analysis: estimation and statistical inference for two-alternative forced-choice experiments

    NASA Technical Reports Server (NTRS)

    Abbey, Craig K.; Eckstein, Miguel P.

    2002-01-01

    We consider estimation and statistical hypothesis testing on classification images obtained from the two-alternative forced-choice experimental paradigm. We begin with a probabilistic model of task performance for simple forced-choice detection and discrimination tasks. Particular attention is paid to general linear filter models because these models lead to a direct interpretation of the classification image as an estimate of the filter weights. We then describe an estimation procedure for obtaining classification images from observer data. A number of statistical tests are presented for testing various hypotheses from classification images based on some more compact set of features derived from them. As an example of how the methods we describe can be used, we present a case study investigating detection of a Gaussian bump profile.

  10. An evaluation of three statistical estimation methods for assessing health policy effects on prescription drug claims.

    PubMed

    Mittal, Manish; Harrison, Donald L; Thompson, David M; Miller, Michael J; Farmer, Kevin C; Ng, Yu-Tze

    2016-01-01

    While the choice of analytical approach affects study results and their interpretation, there is no consensus to guide the choice of statistical approaches to evaluate public health policy change. This study compared and contrasted three statistical estimation procedures in the assessment of a U.S. Food and Drug Administration (FDA) suicidality warning, communicated in January 2008 and implemented in May 2009, on antiepileptic drug (AED) prescription claims. Longitudinal designs were utilized to evaluate Oklahoma (U.S. State) Medicaid claim data from January 2006 through December 2009. The study included 9289 continuously eligible individuals with prevalent diagnoses of epilepsy and/or psychiatric disorder. Segmented regression models using three estimation procedures [i.e., generalized linear models (GLM), generalized estimation equations (GEE), and generalized linear mixed models (GLMM)] were used to estimate trends of AED prescription claims across three time periods: before (January 2006-January 2008); during (February 2008-May 2009); and after (June 2009-December 2009) the FDA warning. All three statistical procedures estimated an increasing trend (P < 0.0001) in AED prescription claims before the FDA warning period. No procedures detected a significant change in trend during (GLM: -30.0%, 99% CI: -60.0% to 10.0%; GEE: -20.0%, 99% CI: -70.0% to 30.0%; GLMM: -23.5%, 99% CI: -58.8% to 1.2%) and after (GLM: 50.0%, 99% CI: -70.0% to 160.0%; GEE: 80.0%, 99% CI: -20.0% to 200.0%; GLMM: 47.1%, 99% CI: -41.2% to 135.3%) the FDA warning when compared to pre-warning period. Although the three procedures provided consistent inferences, the GEE and GLMM approaches accounted appropriately for correlation. Further, marginal models estimated using GEE produced more robust and valid population-level estimations. Copyright © 2016 Elsevier Inc. All rights reserved.

  11. Efficient inference for genetic association studies with multiple outcomes.

    PubMed

    Ruffieux, Helene; Davison, Anthony C; Hager, Jorg; Irincheeva, Irina

    2017-10-01

    Combined inference for heterogeneous high-dimensional data is critical in modern biology, where clinical and various kinds of molecular data may be available from a single study. Classical genetic association studies regress a single clinical outcome on many genetic variants one by one, but there is an increasing demand for joint analysis of many molecular outcomes and genetic variants in order to unravel functional interactions. Unfortunately, most existing approaches to joint modeling are either too simplistic to be powerful or are impracticable for computational reasons. Inspired by Richardson and others (2010, Bayesian Statistics 9), we consider a sparse multivariate regression model that allows simultaneous selection of predictors and associated responses. As Markov chain Monte Carlo (MCMC) inference on such models can be prohibitively slow when the number of genetic variants exceeds a few thousand, we propose a variational inference approach which produces posterior information very close to that of MCMC inference, at a much reduced computational cost. Extensive numerical experiments show that our approach outperforms popular variable selection methods and tailored Bayesian procedures, dealing within hours with problems involving hundreds of thousands of genetic variants and tens to hundreds of clinical or molecular outcomes. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  12. Integrating Genetic and Functional Genomic Data to Elucidate Common Disease Tra

    NASA Astrophysics Data System (ADS)

    Schadt, Eric

    2005-03-01

    The reconstruction of genetic networks in mammalian systems is one of the primary goals in biological research, especially as such reconstructions relate to elucidating not only common, polygenic human diseases, but living systems more generally. Here I present a statistical procedure for inferring causal relationships between gene expression traits and more classic clinical traits, including complex disease traits. This procedure has been generalized to the gene network reconstruction problem, where naturally occurring genetic variations in segregating mouse populations are used as a source of perturbations to elucidate tissue-specific gene networks. Differences in the extent of genetic control between genders and among four different tissues are highlighted. I also demonstrate that the networks derived from expression data in segregating mouse populations using the novel network reconstruction algorithm are able to capture causal associations between genes that result in increased predictive power, compared to more classically reconstructed networks derived from the same data. This approach to causal inference in large segregating mouse populations over multiple tissues not only elucidates fundamental aspects of transcriptional control, it also allows for the objective identification of key drivers of common human diseases.

  13. Gaussian process based modeling and experimental design for sensor calibration in drifting environments

    PubMed Central

    Geng, Zongyu; Yang, Feng; Chen, Xi; Wu, Nianqiang

    2016-01-01

    It remains a challenge to accurately calibrate a sensor subject to environmental drift. The calibration task for such a sensor is to quantify the relationship between the sensor’s response and its exposure condition, which is specified by not only the analyte concentration but also the environmental factors such as temperature and humidity. This work developed a Gaussian Process (GP)-based procedure for the efficient calibration of sensors in drifting environments. Adopted as the calibration model, GP is not only able to capture the possibly nonlinear relationship between the sensor responses and the various exposure-condition factors, but also able to provide valid statistical inference for uncertainty quantification of the target estimates (e.g., the estimated analyte concentration of an unknown environment). Built on GP’s inference ability, an experimental design method was developed to achieve efficient sampling of calibration data in a batch sequential manner. The resulting calibration procedure, which integrates the GP-based modeling and experimental design, was applied on a simulated chemiresistor sensor to demonstrate its effectiveness and its efficiency over the traditional method. PMID:26924894

  14. Using Guided Reinvention to Develop Teachers' Understanding of Hypothesis Testing Concepts

    ERIC Educational Resources Information Center

    Dolor, Jason; Noll, Jennifer

    2015-01-01

    Statistics education reform efforts emphasize the importance of informal inference in the learning of statistics. Research suggests statistics teachers experience similar difficulties understanding statistical inference concepts as students and how teacher knowledge can impact student learning. This study investigates how teachers reinvented an…

  15. Inferring thermodynamic stability relationship of polymorphs from melting data.

    PubMed

    Yu, L

    1995-08-01

    This study investigates the possibility of inferring the thermodynamic stability relationship of polymorphs from their melting data. Thermodynamic formulas are derived for calculating the Gibbs free energy difference (delta G) between two polymorphs and its temperature slope from mainly the temperatures and heats of melting. This information is then used to estimate delta G, thus relative stability, at other temperatures by extrapolation. Both linear and nonlinear extrapolations are considered. Extrapolating delta G to zero gives an estimation of the transition (or virtual transition) temperature, from which the presence of monotropy or enantiotropy is inferred. This procedure is analogous to the use of solubility data measured near the ambient temperature to estimate a transition point at higher temperature. For several systems examined, the two methods are in good agreement. The qualitative rule introduced this way for inferring the presence of monotropy or enantiotropy is approximately the same as The Heat of Fusion Rule introduced previously on a statistical mechanical basis. This method is applied to 96 pairs of polymorphs from the literature. In most cases, the result agrees with the previous determination. The deviation of the calculated transition temperatures from their previous values (n = 18) is 2% on average and 7% at maximum.

  16. TARGETED SEQUENTIAL DESIGN FOR TARGETED LEARNING INFERENCE OF THE OPTIMAL TREATMENT RULE AND ITS MEAN REWARD.

    PubMed

    Chambaz, Antoine; Zheng, Wenjing; van der Laan, Mark J

    2017-01-01

    This article studies the targeted sequential inference of an optimal treatment rule (TR) and its mean reward in the non-exceptional case, i.e. , assuming that there is no stratum of the baseline covariates where treatment is neither beneficial nor harmful, and under a companion margin assumption. Our pivotal estimator, whose definition hinges on the targeted minimum loss estimation (TMLE) principle, actually infers the mean reward under the current estimate of the optimal TR. This data-adaptive statistical parameter is worthy of interest on its own. Our main result is a central limit theorem which enables the construction of confidence intervals on both mean rewards under the current estimate of the optimal TR and under the optimal TR itself. The asymptotic variance of the estimator takes the form of the variance of an efficient influence curve at a limiting distribution, allowing to discuss the efficiency of inference. As a by product, we also derive confidence intervals on two cumulated pseudo-regrets, a key notion in the study of bandits problems. A simulation study illustrates the procedure. One of the corner-stones of the theoretical study is a new maximal inequality for martingales with respect to the uniform entropy integral.

  17. Assessment of a stochastic downscaling methodology in generating an ensemble of hourly future climate time series

    NASA Astrophysics Data System (ADS)

    Fatichi, S.; Ivanov, V. Y.; Caporali, E.

    2013-04-01

    This study extends a stochastic downscaling methodology to generation of an ensemble of hourly time series of meteorological variables that express possible future climate conditions at a point-scale. The stochastic downscaling uses general circulation model (GCM) realizations and an hourly weather generator, the Advanced WEather GENerator (AWE-GEN). Marginal distributions of factors of change are computed for several climate statistics using a Bayesian methodology that can weight GCM realizations based on the model relative performance with respect to a historical climate and a degree of disagreement in projecting future conditions. A Monte Carlo technique is used to sample the factors of change from their respective marginal distributions. As a comparison with traditional approaches, factors of change are also estimated by averaging GCM realizations. With either approach, the derived factors of change are applied to the climate statistics inferred from historical observations to re-evaluate parameters of the weather generator. The re-parameterized generator yields hourly time series of meteorological variables that can be considered to be representative of future climate conditions. In this study, the time series are generated in an ensemble mode to fully reflect the uncertainty of GCM projections, climate stochasticity, as well as uncertainties of the downscaling procedure. Applications of the methodology in reproducing future climate conditions for the periods of 2000-2009, 2046-2065 and 2081-2100, using the period of 1962-1992 as the historical baseline are discussed for the location of Firenze (Italy). The inferences of the methodology for the period of 2000-2009 are tested against observations to assess reliability of the stochastic downscaling procedure in reproducing statistics of meteorological variables at different time scales.

  18. Evaluating the Use of Random Distribution Theory to Introduce Statistical Inference Concepts to Business Students

    ERIC Educational Resources Information Center

    Larwin, Karen H.; Larwin, David A.

    2011-01-01

    Bootstrapping methods and random distribution methods are increasingly recommended as better approaches for teaching students about statistical inference in introductory-level statistics courses. The authors examined the effect of teaching undergraduate business statistics students using random distribution and bootstrapping simulations. It is the…

  19. Temporal variation and scale in movement-based resource selection functions

    USGS Publications Warehouse

    Hooten, M.B.; Hanks, E.M.; Johnson, D.S.; Alldredge, M.W.

    2013-01-01

    A common population characteristic of interest in animal ecology studies pertains to the selection of resources. That is, given the resources available to animals, what do they ultimately choose to use? A variety of statistical approaches have been employed to examine this question and each has advantages and disadvantages with respect to the form of available data and the properties of estimators given model assumptions. A wealth of high resolution telemetry data are now being collected to study animal population movement and space use and these data present both challenges and opportunities for statistical inference. We summarize traditional methods for resource selection and then describe several extensions to deal with measurement uncertainty and an explicit movement process that exists in studies involving high-resolution telemetry data. Our approach uses a correlated random walk movement model to obtain temporally varying use and availability distributions that are employed in a weighted distribution context to estimate selection coefficients. The temporally varying coefficients are then weighted by their contribution to selection and combined to provide inference at the population level. The result is an intuitive and accessible statistical procedure that uses readily available software and is computationally feasible for large datasets. These methods are demonstrated using data collected as part of a large-scale mountain lion monitoring study in Colorado, USA.

  20. Statistical inference for tumor growth inhibition T/C ratio.

    PubMed

    Wu, Jianrong

    2010-09-01

    The tumor growth inhibition T/C ratio is commonly used to quantify treatment effects in drug screening tumor xenograft experiments. The T/C ratio is converted to an antitumor activity rating using an arbitrary cutoff point and often without any formal statistical inference. Here, we applied a nonparametric bootstrap method and a small sample likelihood ratio statistic to make a statistical inference of the T/C ratio, including both hypothesis testing and a confidence interval estimate. Furthermore, sample size and power are also discussed for statistical design of tumor xenograft experiments. Tumor xenograft data from an actual experiment were analyzed to illustrate the application.

  1. Quantifying temporal change in biodiversity: challenges and opportunities

    PubMed Central

    Dornelas, Maria; Magurran, Anne E.; Buckland, Stephen T.; Chao, Anne; Chazdon, Robin L.; Colwell, Robert K.; Curtis, Tom; Gaston, Kevin J.; Gotelli, Nicholas J.; Kosnik, Matthew A.; McGill, Brian; McCune, Jenny L.; Morlon, Hélène; Mumby, Peter J.; Øvreås, Lise; Studeny, Angelika; Vellend, Mark

    2013-01-01

    Growing concern about biodiversity loss underscores the need to quantify and understand temporal change. Here, we review the opportunities presented by biodiversity time series, and address three related issues: (i) recognizing the characteristics of temporal data; (ii) selecting appropriate statistical procedures for analysing temporal data; and (iii) inferring and forecasting biodiversity change. With regard to the first issue, we draw attention to defining characteristics of biodiversity time series—lack of physical boundaries, uni-dimensionality, autocorrelation and directionality—that inform the choice of analytic methods. Second, we explore methods of quantifying change in biodiversity at different timescales, noting that autocorrelation can be viewed as a feature that sheds light on the underlying structure of temporal change. Finally, we address the transition from inferring to forecasting biodiversity change, highlighting potential pitfalls associated with phase-shifts and novel conditions. PMID:23097514

  2. Comparing nonparametric Bayesian tree priors for clonal reconstruction of tumors.

    PubMed

    Deshwar, Amit G; Vembu, Shankar; Morris, Quaid

    2015-01-01

    Statistical machine learning methods, especially nonparametric Bayesian methods, have become increasingly popular to infer clonal population structure of tumors. Here we describe the treeCRP, an extension of the Chinese restaurant process (CRP), a popular construction used in nonparametric mixture models, to infer the phylogeny and genotype of major subclonal lineages represented in the population of cancer cells. We also propose new split-merge updates tailored to the subclonal reconstruction problem that improve the mixing time of Markov chains. In comparisons with the tree-structured stick breaking prior used in PhyloSub, we demonstrate superior mixing and running time using the treeCRP with our new split-merge procedures. We also show that given the same number of samples, TSSB and treeCRP have similar ability to recover the subclonal structure of a tumor…

  3. Investigating Mathematics Teachers' Thoughts of Statistical Inference

    ERIC Educational Resources Information Center

    Yang, Kai-Lin

    2012-01-01

    Research on statistical cognition and application suggests that statistical inference concepts are commonly misunderstood by students and even misinterpreted by researchers. Although some research has been done on students' misunderstanding or misconceptions of confidence intervals (CIs), few studies explore either students' or mathematics…

  4. Lessons from Inferentialism for Statistics Education

    ERIC Educational Resources Information Center

    Bakker, Arthur; Derry, Jan

    2011-01-01

    This theoretical paper relates recent interest in informal statistical inference (ISI) to the semantic theory termed inferentialism, a significant development in contemporary philosophy, which places inference at the heart of human knowing. This theory assists epistemological reflection on challenges in statistics education encountered when…

  5. Statistical Inference and Patterns of Inequality in the Global North

    ERIC Educational Resources Information Center

    Moran, Timothy Patrick

    2006-01-01

    Cross-national inequality trends have historically been a crucial field of inquiry across the social sciences, and new methodological techniques of statistical inference have recently improved the ability to analyze these trends over time. This paper applies Monte Carlo, bootstrap inference methods to the income surveys of the Luxembourg Income…

  6. Statistical inference and Aristotle's Rhetoric.

    PubMed

    Macdonald, Ranald R

    2004-11-01

    Formal logic operates in a closed system where all the information relevant to any conclusion is present, whereas this is not the case when one reasons about events and states of the world. Pollard and Richardson drew attention to the fact that the reasoning behind statistical tests does not lead to logically justifiable conclusions. In this paper statistical inferences are defended not by logic but by the standards of everyday reasoning. Aristotle invented formal logic, but argued that people mostly get at the truth with the aid of enthymemes--incomplete syllogisms which include arguing from examples, analogies and signs. It is proposed that statistical tests work in the same way--in that they are based on examples, invoke the analogy of a model and use the size of the effect under test as a sign that the chance hypothesis is unlikely. Of existing theories of statistical inference only a weak version of Fisher's takes this into account. Aristotle anticipated Fisher by producing an argument of the form that there were too many cases in which an outcome went in a particular direction for that direction to be plausibly attributed to chance. We can therefore conclude that Aristotle would have approved of statistical inference and there is a good reason for calling this form of statistical inference classical.

  7. Gene network inference by fusing data from diverse distributions

    PubMed Central

    Žitnik, Marinka; Zupan, Blaž

    2015-01-01

    Motivation: Markov networks are undirected graphical models that are widely used to infer relations between genes from experimental data. Their state-of-the-art inference procedures assume the data arise from a Gaussian distribution. High-throughput omics data, such as that from next generation sequencing, often violates this assumption. Furthermore, when collected data arise from multiple related but otherwise nonidentical distributions, their underlying networks are likely to have common features. New principled statistical approaches are needed that can deal with different data distributions and jointly consider collections of datasets. Results: We present FuseNet, a Markov network formulation that infers networks from a collection of nonidentically distributed datasets. Our approach is computationally efficient and general: given any number of distributions from an exponential family, FuseNet represents model parameters through shared latent factors that define neighborhoods of network nodes. In a simulation study, we demonstrate good predictive performance of FuseNet in comparison to several popular graphical models. We show its effectiveness in an application to breast cancer RNA-sequencing and somatic mutation data, a novel application of graphical models. Fusion of datasets offers substantial gains relative to inference of separate networks for each dataset. Our results demonstrate that network inference methods for non-Gaussian data can help in accurate modeling of the data generated by emergent high-throughput technologies. Availability and implementation: Source code is at https://github.com/marinkaz/fusenet. Contact: blaz.zupan@fri.uni-lj.si Supplementary information: Supplementary information is available at Bioinformatics online. PMID:26072487

  8. CADDIS Volume 4. Data Analysis: Biological and Environmental Data Requirements

    EPA Pesticide Factsheets

    Overview of PECBO Module, using scripts to infer environmental conditions from biological observations, statistically estimating species-environment relationships, methods for inferring environmental conditions, statistical scripts in module.

  9. Statistical methods for the beta-binomial model in teratology.

    PubMed Central

    Yamamoto, E; Yanagimoto, T

    1994-01-01

    The beta-binomial model is widely used for analyzing teratological data involving littermates. Recent developments in statistical analyses of teratological data are briefly reviewed with emphasis on the model. For statistical inference of the parameters in the beta-binomial distribution, separation of the likelihood introduces an likelihood inference. This leads to reducing biases of estimators and also to improving accuracy of empirical significance levels of tests. Separate inference of the parameters can be conducted in a unified way. PMID:8187716

  10. On Some Assumptions of the Null Hypothesis Statistical Testing

    ERIC Educational Resources Information Center

    Patriota, Alexandre Galvão

    2017-01-01

    Bayesian and classical statistical approaches are based on different types of logical principles. In order to avoid mistaken inferences and misguided interpretations, the practitioner must respect the inference rules embedded into each statistical method. Ignoring these principles leads to the paradoxical conclusions that the hypothesis…

  11. Improving mass-univariate analysis of neuroimaging data by modelling important unknown covariates: Application to Epigenome-Wide Association Studies.

    PubMed

    Guillaume, Bryan; Wang, Changqing; Poh, Joann; Shen, Mo Jun; Ong, Mei Lyn; Tan, Pei Fang; Karnani, Neerja; Meaney, Michael; Qiu, Anqi

    2018-06-01

    Statistical inference on neuroimaging data is often conducted using a mass-univariate model, equivalent to fitting a linear model at every voxel with a known set of covariates. Due to the large number of linear models, it is challenging to check if the selection of covariates is appropriate and to modify this selection adequately. The use of standard diagnostics, such as residual plotting, is clearly not practical for neuroimaging data. However, the selection of covariates is crucial for linear regression to ensure valid statistical inference. In particular, the mean model of regression needs to be reasonably well specified. Unfortunately, this issue is often overlooked in the field of neuroimaging. This study aims to adopt the existing Confounder Adjusted Testing and Estimation (CATE) approach and to extend it for use with neuroimaging data. We propose a modification of CATE that can yield valid statistical inferences using Principal Component Analysis (PCA) estimators instead of Maximum Likelihood (ML) estimators. We then propose a non-parametric hypothesis testing procedure that can improve upon parametric testing. Monte Carlo simulations show that the modification of CATE allows for more accurate modelling of neuroimaging data and can in turn yield a better control of False Positive Rate (FPR) and Family-Wise Error Rate (FWER). We demonstrate its application to an Epigenome-Wide Association Study (EWAS) on neonatal brain imaging and umbilical cord DNA methylation data obtained as part of a longitudinal cohort study. Software for this CATE study is freely available at http://www.bioeng.nus.edu.sg/cfa/Imaging_Genetics2.html. Copyright © 2018 The Author(s). Published by Elsevier Inc. All rights reserved.

  12. Efficient statistical tests to compare Youden index: accounting for contingency correlation.

    PubMed

    Chen, Fangyao; Xue, Yuqiang; Tan, Ming T; Chen, Pingyan

    2015-04-30

    Youden index is widely utilized in studies evaluating accuracy of diagnostic tests and performance of predictive, prognostic, or risk models. However, both one and two independent sample tests on Youden index have been derived ignoring the dependence (association) between sensitivity and specificity, resulting in potentially misleading findings. Besides, paired sample test on Youden index is currently unavailable. This article develops efficient statistical inference procedures for one sample, independent, and paired sample tests on Youden index by accounting for contingency correlation, namely associations between sensitivity and specificity and paired samples typically represented in contingency tables. For one and two independent sample tests, the variances are estimated by Delta method, and the statistical inference is based on the central limit theory, which are then verified by bootstrap estimates. For paired samples test, we show that the estimated covariance of the two sensitivities and specificities can be represented as a function of kappa statistic so the test can be readily carried out. We then show the remarkable accuracy of the estimated variance using a constrained optimization approach. Simulation is performed to evaluate the statistical properties of the derived tests. The proposed approaches yield more stable type I errors at the nominal level and substantially higher power (efficiency) than does the original Youden's approach. Therefore, the simple explicit large sample solution performs very well. Because we can readily implement the asymptotic and exact bootstrap computation with common software like R, the method is broadly applicable to the evaluation of diagnostic tests and model performance. Copyright © 2015 John Wiley & Sons, Ltd.

  13. Interpretable inference on the mixed effect model with the Box-Cox transformation.

    PubMed

    Maruo, K; Yamaguchi, Y; Noma, H; Gosho, M

    2017-07-10

    We derived results for inference on parameters of the marginal model of the mixed effect model with the Box-Cox transformation based on the asymptotic theory approach. We also provided a robust variance estimator of the maximum likelihood estimator of the parameters of this model in consideration of the model misspecifications. Using these results, we developed an inference procedure for the difference of the model median between treatment groups at the specified occasion in the context of mixed effects models for repeated measures analysis for randomized clinical trials, which provided interpretable estimates of the treatment effect. From simulation studies, it was shown that our proposed method controlled type I error of the statistical test for the model median difference in almost all the situations and had moderate or high performance for power compared with the existing methods. We illustrated our method with cluster of differentiation 4 (CD4) data in an AIDS clinical trial, where the interpretability of the analysis results based on our proposed method is demonstrated. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  14. Inference with viral quasispecies diversity indices: clonal and NGS approaches.

    PubMed

    Gregori, Josep; Salicrú, Miquel; Domingo, Esteban; Sanchez, Alex; Esteban, Juan I; Rodríguez-Frías, Francisco; Quer, Josep

    2014-04-15

    Given the inherent dynamics of a viral quasispecies, we are often interested in the comparison of diversity indices of sequential samples of a patient, or in the comparison of diversity indices of virus in groups of patients in a treated versus control design. It is then important to make sure that the diversity measures from each sample may be compared with no bias and within a consistent statistical framework. In the present report, we review some indices often used as measures for viral quasispecies complexity and provide means for statistical inference, applying procedures taken from the ecology field. In particular, we examine the Shannon entropy and the mutation frequency, and we discuss the appropriateness of different normalization methods of the Shannon entropy found in the literature. By taking amplicons ultra-deep pyrosequencing (UDPS) raw data as a surrogate of a real hepatitis C virus viral population, we study through in-silico sampling the statistical properties of these indices under two methods of viral quasispecies sampling, classical cloning followed by Sanger sequencing (CCSS) and next-generation sequencing (NGS) such as UDPS. We propose solutions specific to each of the two sampling methods-CCSS and NGS-to guarantee statistically conforming conclusions as free of bias as possible. josep.gregori@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  15. A Measure of the Conformity of a Parameter Set to a Trend: The Partially Ordered Case.

    DTIC Science & Technology

    1983-05-01

    A-A3214 A MEASURE OF THE CONFORMIYO QAPARAMEfERSETO QA / TREND: THE PARTIAL .U) IOWA UNIV IOWA CIT DEPT OF ......STATISTICS AND ACTURIAL SCIENCE.T...and j with i o j. Such a vector 0 = (Oi,0j,.... 0k is said to be isotone (with respect to _). In studying such inference procedures it is helpful to...noticed that none of the measures studied here are applicable in alIl the situations considered. In studying locat ion pa rameter- wlhich are not

  16. Direct evidence for a dual process model of deductive inference.

    PubMed

    Markovits, Henry; Brunet, Marie-Laurence; Thompson, Valerie; Brisson, Janie

    2013-07-01

    In 2 experiments, we tested a strong version of a dual process theory of conditional inference (cf. Verschueren et al., 2005a, 2005b) that assumes that most reasoners have 2 strategies available, the choice of which is determined by situational variables, cognitive capacity, and metacognitive control. The statistical strategy evaluates inferences probabilistically, accepting those with high conditional probability. The counterexample strategy rejects inferences when a counterexample shows the inference to be invalid. To discriminate strategy use, we presented reasoners with conditional statements (if p, then q) and explicit statistical information about the relative frequency of the probability of p/q (50% vs. 90%). A statistical strategy would accept the more probable inferences more frequently, whereas the counterexample one would reject both. In Experiment 1, reasoners under time pressure used the statistical strategy more, but switched to the counterexample strategy when time constraints were removed; the former took less time than the latter. These data are consistent with the hypothesis that the statistical strategy is the default heuristic. Under a free-time condition, reasoners preferred the counterexample strategy and kept it when put under time pressure. Thus, it is not simply a lack of capacity that produces a statistical strategy; instead, it seems that time pressure disrupts the ability to make good metacognitive choices. In line with this conclusion, in a 2nd experiment, we measured reasoners' confidence in their performance; those under time pressure were less confident in the statistical than the counterexample strategy and more likely to switch strategies under free-time conditions. PsycINFO Database Record (c) 2013 APA, all rights reserved.

  17. The APA Task Force on Statistical Inference (TFSI) Report as a Framework for Teaching and Evaluating Students' Understandings of Study Validity.

    ERIC Educational Resources Information Center

    Thompson, Bruce

    Web-based statistical instruction, like all statistical instruction, ought to focus on teaching the essence of the research endeavor: the exercise of reflective judgment. Using the framework of the recent report of the American Psychological Association (APA) Task Force on Statistical Inference (Wilkinson and the APA Task Force on Statistical…

  18. Data Analysis Techniques for Physical Scientists

    NASA Astrophysics Data System (ADS)

    Pruneau, Claude A.

    2017-10-01

    Preface; How to read this book; 1. The scientific method; Part I. Foundation in Probability and Statistics: 2. Probability; 3. Probability models; 4. Classical inference I: estimators; 5. Classical inference II: optimization; 6. Classical inference III: confidence intervals and statistical tests; 7. Bayesian inference; Part II. Measurement Techniques: 8. Basic measurements; 9. Event reconstruction; 10. Correlation functions; 11. The multiple facets of correlation functions; 12. Data correction methods; Part III. Simulation Techniques: 13. Monte Carlo methods; 14. Collision and detector modeling; List of references; Index.

  19. CADDIS Volume 4. Data Analysis: Predicting Environmental Conditions from Biological Observations (PECBO Appendix)

    EPA Pesticide Factsheets

    Overview of PECBO Module, using scripts to infer environmental conditions from biological observations, statistically estimating species-environment relationships, methods for inferring environmental conditions, statistical scripts in module.

  20. The statistical analysis of circadian phase and amplitude in constant-routine core-temperature data

    NASA Technical Reports Server (NTRS)

    Brown, E. N.; Czeisler, C. A.

    1992-01-01

    Accurate estimation of the phases and amplitude of the endogenous circadian pacemaker from constant-routine core-temperature series is crucial for making inferences about the properties of the human biological clock from data collected under this protocol. This paper presents a set of statistical methods based on a harmonic-regression-plus-correlated-noise model for estimating the phases and the amplitude of the endogenous circadian pacemaker from constant-routine core-temperature data. The methods include a Bayesian Monte Carlo procedure for computing the uncertainty in these circadian functions. We illustrate the techniques with a detailed study of a single subject's core-temperature series and describe their relationship to other statistical methods for circadian data analysis. In our laboratory, these methods have been successfully used to analyze more than 300 constant routines and provide a highly reliable means of extracting phase and amplitude information from core-temperature data.

  1. A probabilistic framework for microarray data analysis: fundamental probability models and statistical inference.

    PubMed

    Ogunnaike, Babatunde A; Gelmi, Claudio A; Edwards, Jeremy S

    2010-05-21

    Gene expression studies generate large quantities of data with the defining characteristic that the number of genes (whose expression profiles are to be determined) exceed the number of available replicates by several orders of magnitude. Standard spot-by-spot analysis still seeks to extract useful information for each gene on the basis of the number of available replicates, and thus plays to the weakness of microarrays. On the other hand, because of the data volume, treating the entire data set as an ensemble, and developing theoretical distributions for these ensembles provides a framework that plays instead to the strength of microarrays. We present theoretical results that under reasonable assumptions, the distribution of microarray intensities follows the Gamma model, with the biological interpretations of the model parameters emerging naturally. We subsequently establish that for each microarray data set, the fractional intensities can be represented as a mixture of Beta densities, and develop a procedure for using these results to draw statistical inference regarding differential gene expression. We illustrate the results with experimental data from gene expression studies on Deinococcus radiodurans following DNA damage using cDNA microarrays. Copyright (c) 2010 Elsevier Ltd. All rights reserved.

  2. The same analysis approach: Practical protection against the pitfalls of novel neuroimaging analysis methods.

    PubMed

    Görgen, Kai; Hebart, Martin N; Allefeld, Carsten; Haynes, John-Dylan

    2017-12-27

    Standard neuroimaging data analysis based on traditional principles of experimental design, modelling, and statistical inference is increasingly complemented by novel analysis methods, driven e.g. by machine learning methods. While these novel approaches provide new insights into neuroimaging data, they often have unexpected properties, generating a growing literature on possible pitfalls. We propose to meet this challenge by adopting a habit of systematic testing of experimental design, analysis procedures, and statistical inference. Specifically, we suggest to apply the analysis method used for experimental data also to aspects of the experimental design, simulated confounds, simulated null data, and control data. We stress the importance of keeping the analysis method the same in main and test analyses, because only this way possible confounds and unexpected properties can be reliably detected and avoided. We describe and discuss this Same Analysis Approach in detail, and demonstrate it in two worked examples using multivariate decoding. With these examples, we reveal two sources of error: A mismatch between counterbalancing (crossover designs) and cross-validation which leads to systematic below-chance accuracies, and linear decoding of a nonlinear effect, a difference in variance. Copyright © 2017 Elsevier Inc. All rights reserved.

  3. Statistical Inferences from Formaldehyde Dna-Protein Cross-Link Data

    EPA Science Inventory

    Physiologically-based pharmacokinetic (PBPK) modeling has reached considerable sophistication in its application in the pharmacological and environmental health areas. Yet, mature methodologies for making statistical inferences have not been routinely incorporated in these applic...

  4. Developing Young Children's Emergent Inferential Practices in Statistics

    ERIC Educational Resources Information Center

    Makar, Katie

    2016-01-01

    Informal statistical inference has now been researched at all levels of schooling and initial tertiary study. Work in informal statistical inference is least understood in the early years, where children have had little if any exposure to data handling. A qualitative study in Australia was carried out through a series of teaching experiments with…

  5. Estimating pseudocounts and fold changes for digital expression measurements.

    PubMed

    Erhard, Florian

    2018-06-19

    Fold changes from count based high-throughput experiments such as RNA-seq suffer from a zero-frequency problem. To circumvent division by zero, so-called pseudocounts are added to make all observed counts strictly positive. The magnitude of pseudocounts for digital expression measurements and on which stage of the analysis they are introduced remained an arbitrary choice. Moreover, in the strict sense, fold changes are not quantities that can be computed. Instead, due to the stochasticity involved in the experiments, they must be estimated by statistical inference. Here, we build on a statistical framework for fold changes, where pseudocounts correspond to the parameters of the prior distribution used for Bayesian inference of the fold change. We show that arbirary and widely used choices for applying pseudocounts can lead to biased results. As a statistical rigorous alternative, we propose and test an empirical Bayes procedure to choose appropriate pseudocounts. Moreover, we introduce the novel estimator Ψ LFC for fold changes showing favorable properties with small counts and smaller deviations from the truth in simulations and real data compared to existing methods. Our results have direct implications for entities with few reads in sequencing experiments, and indirectly also affect results for entities with many reads. Ψ LFC is available as an R package under https://github.com/erhard-lab/lfc (Apache 2.0 license); R scripts to generate all figures are available at zenodo (doi:10.5281/zenodo.1163029).

  6. Accounting for measurement reliability to improve the quality of inference in dental microhardness research: a worked example.

    PubMed

    Sever, Ivan; Klaric, Eva; Tarle, Zrinka

    2016-07-01

    Dental microhardness experiments are influenced by unobserved factors related to the varying tooth characteristics that affect measurement reproducibility. This paper explores the appropriate analytical tools for modeling different sources of unobserved variability to reduce the biases encountered and increase the validity of microhardness studies. The enamel microhardness of human third molars was measured by Vickers diamond. The effects of five bleaching agents-10, 16, and 30 % carbamide peroxide, and 25 and 38 % hydrogen peroxide-were examined, as well as the effect of artificial saliva and amorphous calcium phosphate. To account for both between- and within-tooth heterogeneity in evaluating treatment effects, the statistical analysis was performed in the mixed-effects framework, which also included the appropriate weighting procedure to adjust for confounding. The results were compared to those of the standard ANOVA model usually applied. The weighted mixed-effects model produced the parameter estimates of different magnitude and significance than the standard ANOVA model. The results of the former model were more intuitive, with more precise estimates and better fit. Confounding could seriously bias the study outcomes, highlighting the need for more robust statistical procedures in dental research that account for the measurement reliability. The presented framework is more flexible and informative than existing analytical techniques and may improve the quality of inference in dental research. Reported results could be misleading if underlying heterogeneity of microhardness measurements is not taken into account. The confidence in treatment outcomes could be increased by applying the framework presented.

  7. Using Alien Coins to Test Whether Simple Inference Is Bayesian

    ERIC Educational Resources Information Center

    Cassey, Peter; Hawkins, Guy E.; Donkin, Chris; Brown, Scott D.

    2016-01-01

    Reasoning and inference are well-studied aspects of basic cognition that have been explained as statistically optimal Bayesian inference. Using a simplified experimental design, we conducted quantitative comparisons between Bayesian inference and human inference at the level of individuals. In 3 experiments, with more than 13,000 participants, we…

  8. Students' Expressions of Uncertainty in Making Informal Inference When Engaged in a Statistical Investigation Using TinkerPlots

    ERIC Educational Resources Information Center

    Henriques, Ana; Oliveira, Hélia

    2016-01-01

    This paper reports on the results of a study investigating the potential to embed Informal Statistical Inference in statistical investigations, using TinkerPlots, for assisting 8th grade students' informal inferential reasoning to emerge, particularly their articulations of uncertainty. Data collection included students' written work on a…

  9. Faster Mass Spectrometry-based Protein Inference: Junction Trees are More Efficient than Sampling and Marginalization by Enumeration

    PubMed Central

    Serang, Oliver; Noble, William Stafford

    2012-01-01

    The problem of identifying the proteins in a complex mixture using tandem mass spectrometry can be framed as an inference problem on a graph that connects peptides to proteins. Several existing protein identification methods make use of statistical inference methods for graphical models, including expectation maximization, Markov chain Monte Carlo, and full marginalization coupled with approximation heuristics. We show that, for this problem, the majority of the cost of inference usually comes from a few highly connected subgraphs. Furthermore, we evaluate three different statistical inference methods using a common graphical model, and we demonstrate that junction tree inference substantially improves rates of convergence compared to existing methods. The python code used for this paper is available at http://noble.gs.washington.edu/proj/fido. PMID:22331862

  10. Using genetic data to strengthen causal inference in observational research.

    PubMed

    Pingault, Jean-Baptiste; O'Reilly, Paul F; Schoeler, Tabea; Ploubidis, George B; Rijsdijk, Frühling; Dudbridge, Frank

    2018-06-05

    Causal inference is essential across the biomedical, behavioural and social sciences.By progressing from confounded statistical associations to evidence of causal relationships, causal inference can reveal complex pathways underlying traits and diseases and help to prioritize targets for intervention. Recent progress in genetic epidemiology - including statistical innovation, massive genotyped data sets and novel computational tools for deep data mining - has fostered the intense development of methods exploiting genetic data and relatedness to strengthen causal inference in observational research. In this Review, we describe how such genetically informed methods differ in their rationale, applicability and inherent limitations and outline how they should be integrated in the future to offer a rich causal inference toolbox.

  11. Risk, statistical inference, and the law of evidence: the use of epidemiological data in toxic tort cases.

    PubMed

    Brannigan, V M; Bier, V M; Berg, C

    1992-09-01

    Toxic torts are product liability cases dealing with alleged injuries due to chemical or biological hazards such as radiation, thalidomide, or Agent Orange. Toxic tort cases typically rely more heavily than other product liability cases on indirect or statistical proof of injury. There have been numerous theoretical analyses of statistical proof of injury in toxic tort cases. However, there have been only a handful of actual legal decisions regarding the use of such statistical evidence, and most of those decisions have been inconclusive. Recently, a major case from the Fifth Circuit, involving allegations that Benedectin (a morning sickness drug) caused birth defects, was decided entirely on the basis of statistical inference. This paper examines both the conceptual basis of that decision, and also the relationships among statistical inference, scientific evidence, and the rules of product liability in general.

  12. Global statistics of microphysical properties of cloud-top ice crystals

    NASA Astrophysics Data System (ADS)

    van Diedenhoven, B.; Fridlind, A. M.; Cairns, B.; Ackerman, A. S.; Riedi, J.

    2017-12-01

    Ice crystals in clouds are highly complex. Their sizes, macroscale shape (i.e., habit), mesoscale shape (i.e., aspect ratio of components) and microscale shape (i.e., surface roughness) determine optical properties and affect physical properties such as fall speeds, growth rates and aggregation efficiency. Our current understanding on the formation and evolution of ice crystals under various conditions can be considered poor. Commonly, ice crystal size and shape are related to ambient temperature and humidity, but global observational statistics on the variation of ice crystal size and particularly shape have not been available. Here we show results of a project aiming to infer ice crystal size, shape and scattering properties from a combination of MODIS measurements and POLDER-PARASOL multi-angle polarimetry. The shape retrieval procedure infers the mean aspect ratios of components of ice crystals and the mean microscale surface roughness levels, which are quantifiable parameters that mostly affect the scattering properties, in contrast to "habit". We present global statistics on the variation of ice effective radius, component aspect ratio, microscale surface roughness and scattering asymmetry parameter as a function of cloud top temperature, latitude, location, cloud type, season, etc. Generally, with increasing height, sizes decrease, roughness increases, asymmetry parameters decrease and aspect ratios increase towards unity. Some systematic differences are observed for clouds warmer and colder than the homogeneous freezing level. Uncertainties in the retrievals will be discussed. These statistics can be used as observational targets for modeling efforts and to better constrain other satellite remote sensing applications and their uncertainties.

  13. Global Statistics of Microphysical Properties of Cloud-Top Ice Crystals

    NASA Technical Reports Server (NTRS)

    Van Diedenhoven, Bastiaan; Fridlind, Ann; Cairns, Brian; Ackerman, Andrew; Riedl, Jerome

    2017-01-01

    Ice crystals in clouds are highly complex. Their sizes, macroscale shape (i.e., habit), mesoscale shape (i.e., aspect ratio of components) and microscale shape (i.e., surface roughness) determine optical properties and affect physical properties such as fall speeds, growth rates and aggregation efficiency. Our current understanding on the formation and evolution of ice crystals under various conditions can be considered poor. Commonly, ice crystal size and shape are related to ambient temperature and humidity, but global observational statistics on the variation of ice crystal size and particularly shape have not been available. Here we show results of a project aiming to infer ice crystal size, shape and scattering properties from a combination of MODIS measurements and POLDER-PARASOL multi-angle polarimetry. The shape retrieval procedure infers the mean aspect ratios of components of ice crystals and the mean microscale surface roughness levels, which are quantifiable parameters that mostly affect the scattering properties, in contrast to a habit. We present global statistics on the variation of ice effective radius, component aspect ratio, microscale surface roughness and scattering asymmetry parameter as a function of cloud top temperature, latitude, location, cloud type, season, etc. Generally, with increasing height, sizes decrease, roughness increases, asymmetry parameters decrease and aspect ratios increase towards unity. Some systematic differences are observed for clouds warmer and colder than the homogeneous freezing level. Uncertainties in the retrievals will be discussed. These statistics can be used as observational targets for modeling efforts and to better constrain other satellite remote sensing applications and their uncertainties.

  14. Making statistical inferences about software reliability

    NASA Technical Reports Server (NTRS)

    Miller, Douglas R.

    1988-01-01

    Failure times of software undergoing random debugging can be modelled as order statistics of independent but nonidentically distributed exponential random variables. Using this model inferences can be made about current reliability and, if debugging continues, future reliability. This model also shows the difficulty inherent in statistical verification of very highly reliable software such as that used by digital avionics in commercial aircraft.

  15. Comparing Trend and Gap Statistics across Tests: Distributional Change Using Ordinal Methods and Bayesian Inference

    ERIC Educational Resources Information Center

    Denbleyker, John Nickolas

    2012-01-01

    The shortcomings of the proportion above cut (PAC) statistic used so prominently in the educational landscape renders it a very problematic measure for making correct inferences with student test data. The limitations of PAC-based statistics are more pronounced with cross-test comparisons due to their dependency on cut-score locations. A better…

  16. Aspects of First Year Statistics Students' Reasoning When Performing Intuitive Analysis of Variance: Effects of Within- and Between-Group Variability

    ERIC Educational Resources Information Center

    Trumpower, David L.

    2015-01-01

    Making inferences about population differences based on samples of data, that is, performing intuitive analysis of variance (IANOVA), is common in everyday life. However, the intuitive reasoning of individuals when making such inferences (even following statistics instruction), often differs from the normative logic of formal statistics. The…

  17. Difference to Inference: teaching logical and statistical reasoning through on-line interactivity.

    PubMed

    Malloy, T E

    2001-05-01

    Difference to Inference is an on-line JAVA program that simulates theory testing and falsification through research design and data collection in a game format. The program, based on cognitive and epistemological principles, is designed to support learning of the thinking skills underlying deductive and inductive logic and statistical reasoning. Difference to Inference has database connectivity so that game scores can be counted as part of course grades.

  18. Deep Unfolding for Topic Models.

    PubMed

    Chien, Jen-Tzung; Lee, Chao-Hsi

    2018-02-01

    Deep unfolding provides an approach to integrate the probabilistic generative models and the deterministic neural networks. Such an approach is benefited by deep representation, easy interpretation, flexible learning and stochastic modeling. This study develops the unsupervised and supervised learning of deep unfolded topic models for document representation and classification. Conventionally, the unsupervised and supervised topic models are inferred via the variational inference algorithm where the model parameters are estimated by maximizing the lower bound of logarithm of marginal likelihood using input documents without and with class labels, respectively. The representation capability or classification accuracy is constrained by the variational lower bound and the tied model parameters across inference procedure. This paper aims to relax these constraints by directly maximizing the end performance criterion and continuously untying the parameters in learning process via deep unfolding inference (DUI). The inference procedure is treated as the layer-wise learning in a deep neural network. The end performance is iteratively improved by using the estimated topic parameters according to the exponentiated updates. Deep learning of topic models is therefore implemented through a back-propagation procedure. Experimental results show the merits of DUI with increasing number of layers compared with variational inference in unsupervised as well as supervised topic models.

  19. Identifying and exploiting trait-relevant tissues with multiple functional annotations in genome-wide association studies

    PubMed Central

    Zhang, Shujun

    2018-01-01

    Genome-wide association studies (GWASs) have identified many disease associated loci, the majority of which have unknown biological functions. Understanding the mechanism underlying trait associations requires identifying trait-relevant tissues and investigating associations in a trait-specific fashion. Here, we extend the widely used linear mixed model to incorporate multiple SNP functional annotations from omics studies with GWAS summary statistics to facilitate the identification of trait-relevant tissues, with which to further construct powerful association tests. Specifically, we rely on a generalized estimating equation based algorithm for parameter inference, a mixture modeling framework for trait-tissue relevance classification, and a weighted sequence kernel association test constructed based on the identified trait-relevant tissues for powerful association analysis. We refer to our analytic procedure as the Scalable Multiple Annotation integration for trait-Relevant Tissue identification and usage (SMART). With extensive simulations, we show how our method can make use of multiple complementary annotations to improve the accuracy for identifying trait-relevant tissues. In addition, our procedure allows us to make use of the inferred trait-relevant tissues, for the first time, to construct more powerful SNP set tests. We apply our method for an in-depth analysis of 43 traits from 28 GWASs using tissue-specific annotations in 105 tissues derived from ENCODE and Roadmap. Our results reveal new trait-tissue relevance, pinpoint important annotations that are informative of trait-tissue relationship, and illustrate how we can use the inferred trait-relevant tissues to construct more powerful association tests in the Wellcome trust case control consortium study. PMID:29377896

  20. Inference as Prediction

    ERIC Educational Resources Information Center

    Watson, Jane

    2007-01-01

    Inference, or decision making, is seen in curriculum documents as the final step in a statistical investigation. For a formal statistical enquiry this may be associated with sophisticated tests involving probability distributions. For young students without the mathematical background to perform such tests, it is still possible to draw informal…

  1. A Framework for Thinking about Informal Statistical Inference

    ERIC Educational Resources Information Center

    Makar, Katie; Rubin, Andee

    2009-01-01

    Informal inferential reasoning has shown some promise in developing students' deeper understanding of statistical processes. This paper presents a framework to think about three key principles of informal inference--generalizations "beyond the data," probabilistic language, and data as evidence. The authors use primary school classroom…

  2. Sensitivity to the Sampling Process Emerges From the Principle of Efficiency.

    PubMed

    Jara-Ettinger, Julian; Sun, Felix; Schulz, Laura; Tenenbaum, Joshua B

    2018-05-01

    Humans can seamlessly infer other people's preferences, based on what they do. Broadly, two types of accounts have been proposed to explain different aspects of this ability. The first account focuses on spatial information: Agents' efficient navigation in space reveals what they like. The second account focuses on statistical information: Uncommon choices reveal stronger preferences. Together, these two lines of research suggest that we have two distinct capacities for inferring preferences. Here we propose that this is not the case, and that spatial-based and statistical-based preference inferences can be explained by the assumption that agents are efficient alone. We show that people's sensitivity to spatial and statistical information when they infer preferences is best predicted by a computational model of the principle of efficiency, and that this model outperforms dual-system models, even when the latter are fit to participant judgments. Our results suggest that, as adults, a unified understanding of agency under the principle of efficiency underlies our ability to infer preferences. Copyright © 2018 Cognitive Science Society, Inc.

  3. Statistical inferences for data from studies conducted with an aggregated multivariate outcome-dependent sample design

    PubMed Central

    Lu, Tsui-Shan; Longnecker, Matthew P.; Zhou, Haibo

    2016-01-01

    Outcome-dependent sampling (ODS) scheme is a cost-effective sampling scheme where one observes the exposure with a probability that depends on the outcome. The well-known such design is the case-control design for binary response, the case-cohort design for the failure time data and the general ODS design for a continuous response. While substantial work has been done for the univariate response case, statistical inference and design for the ODS with multivariate cases remain under-developed. Motivated by the need in biological studies for taking the advantage of the available responses for subjects in a cluster, we propose a multivariate outcome dependent sampling (Multivariate-ODS) design that is based on a general selection of the continuous responses within a cluster. The proposed inference procedure for the Multivariate-ODS design is semiparametric where all the underlying distributions of covariates are modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimator is consistent and developed the asymptotically normality properties. Simulation studies show that the proposed estimator is more efficient than the estimator obtained using only the simple-random-sample portion of the Multivariate-ODS or the estimator from a simple random sample with the same sample size. The Multivariate-ODS design together with the proposed estimator provides an approach to further improve study efficiency for a given fixed study budget. We illustrate the proposed design and estimator with an analysis of association of PCB exposure to hearing loss in children born to the Collaborative Perinatal Study. PMID:27966260

  4. Massive optimal data compression and density estimation for scalable, likelihood-free inference in cosmology

    NASA Astrophysics Data System (ADS)

    Alsing, Justin; Wandelt, Benjamin; Feeney, Stephen

    2018-07-01

    Many statistical models in cosmology can be simulated forwards but have intractable likelihood functions. Likelihood-free inference methods allow us to perform Bayesian inference from these models using only forward simulations, free from any likelihood assumptions or approximations. Likelihood-free inference generically involves simulating mock data and comparing to the observed data; this comparison in data space suffers from the curse of dimensionality and requires compression of the data to a small number of summary statistics to be tractable. In this paper, we use massive asymptotically optimal data compression to reduce the dimensionality of the data space to just one number per parameter, providing a natural and optimal framework for summary statistic choice for likelihood-free inference. Secondly, we present the first cosmological application of Density Estimation Likelihood-Free Inference (DELFI), which learns a parametrized model for joint distribution of data and parameters, yielding both the parameter posterior and the model evidence. This approach is conceptually simple, requires less tuning than traditional Approximate Bayesian Computation approaches to likelihood-free inference and can give high-fidelity posteriors from orders of magnitude fewer forward simulations. As an additional bonus, it enables parameter inference and Bayesian model comparison simultaneously. We demonstrate DELFI with massive data compression on an analysis of the joint light-curve analysis supernova data, as a simple validation case study. We show that high-fidelity posterior inference is possible for full-scale cosmological data analyses with as few as ˜104 simulations, with substantial scope for further improvement, demonstrating the scalability of likelihood-free inference to large and complex cosmological data sets.

  5. A Coalitional Game for Distributed Inference in Sensor Networks With Dependent Observations

    NASA Astrophysics Data System (ADS)

    He, Hao; Varshney, Pramod K.

    2016-04-01

    We consider the problem of collaborative inference in a sensor network with heterogeneous and statistically dependent sensor observations. Each sensor aims to maximize its inference performance by forming a coalition with other sensors and sharing information within the coalition. It is proved that the inference performance is a nondecreasing function of the coalition size. However, in an energy constrained network, the energy consumption of inter-sensor communication also increases with increasing coalition size, which discourages the formation of the grand coalition (the set of all sensors). In this paper, the formation of non-overlapping coalitions with statistically dependent sensors is investigated under a specific communication constraint. We apply a game theoretical approach to fully explore and utilize the information contained in the spatial dependence among sensors to maximize individual sensor performance. Before formulating the distributed inference problem as a coalition formation game, we first quantify the gain and loss in forming a coalition by introducing the concepts of diversity gain and redundancy loss for both estimation and detection problems. These definitions, enabled by the statistical theory of copulas, allow us to characterize the influence of statistical dependence among sensor observations on inference performance. An iterative algorithm based on merge-and-split operations is proposed for the solution and the stability of the proposed algorithm is analyzed. Numerical results are provided to demonstrate the superiority of our proposed game theoretical approach.

  6. Statistical Estimation of Orbital Debris Populations with a Spectrum of Object Size

    NASA Technical Reports Server (NTRS)

    Xu, Y. -l; Horstman, M.; Krisko, P. H.; Liou, J. -C; Matney, M.; Stansbery, E. G.; Stokely, C. L.; Whitlock, D.

    2008-01-01

    Orbital debris is a real concern for the safe operations of satellites. In general, the hazard of debris impact is a function of the size and spatial distributions of the debris populations. To describe and characterize the debris environment as reliably as possible, the current NASA Orbital Debris Engineering Model (ORDEM2000) is being upgraded to a new version based on new and better quality data. The data-driven ORDEM model covers a wide range of object sizes from 10 microns to greater than 1 meter. This paper reviews the statistical process for the estimation of the debris populations in the new ORDEM upgrade, and discusses the representation of large-size (greater than or equal to 1 m and greater than or equal to 10 cm) populations by SSN catalog objects and the validation of the statistical approach. Also, it presents results for the populations with sizes of greater than or equal to 3.3 cm, greater than or equal to 1 cm, greater than or equal to 100 micrometers, and greater than or equal to 10 micrometers. The orbital debris populations used in the new version of ORDEM are inferred from data based upon appropriate reference (or benchmark) populations instead of the binning of the multi-dimensional orbital-element space. This paper describes all of the major steps used in the population-inference procedure for each size-range. Detailed discussions on data analysis, parameter definition, the correlation between parameters and data, and uncertainty assessment are included.

  7. Rational approximations to rational models: alternative algorithms for category learning.

    PubMed

    Sanborn, Adam N; Griffiths, Thomas L; Navarro, Daniel J

    2010-10-01

    Rational models of cognition typically consider the abstract computational problems posed by the environment, assuming that people are capable of optimally solving those problems. This differs from more traditional formal models of cognition, which focus on the psychological processes responsible for behavior. A basic challenge for rational models is thus explaining how optimal solutions can be approximated by psychological processes. We outline a general strategy for answering this question, namely to explore the psychological plausibility of approximation algorithms developed in computer science and statistics. In particular, we argue that Monte Carlo methods provide a source of rational process models that connect optimal solutions to psychological processes. We support this argument through a detailed example, applying this approach to Anderson's (1990, 1991) rational model of categorization (RMC), which involves a particularly challenging computational problem. Drawing on a connection between the RMC and ideas from nonparametric Bayesian statistics, we propose 2 alternative algorithms for approximate inference in this model. The algorithms we consider include Gibbs sampling, a procedure appropriate when all stimuli are presented simultaneously, and particle filters, which sequentially approximate the posterior distribution with a small number of samples that are updated as new data become available. Applying these algorithms to several existing datasets shows that a particle filter with a single particle provides a good description of human inferences.

  8. Statistical detection of EEG synchrony using empirical bayesian inference.

    PubMed

    Singh, Archana K; Asoh, Hideki; Takeda, Yuji; Phillips, Steven

    2015-01-01

    There is growing interest in understanding how the brain utilizes synchronized oscillatory activity to integrate information across functionally connected regions. Computing phase-locking values (PLV) between EEG signals is a popular method for quantifying such synchronizations and elucidating their role in cognitive tasks. However, high-dimensionality in PLV data incurs a serious multiple testing problem. Standard multiple testing methods in neuroimaging research (e.g., false discovery rate, FDR) suffer severe loss of power, because they fail to exploit complex dependence structure between hypotheses that vary in spectral, temporal and spatial dimension. Previously, we showed that a hierarchical FDR and optimal discovery procedures could be effectively applied for PLV analysis to provide better power than FDR. In this article, we revisit the multiple comparison problem from a new Empirical Bayes perspective and propose the application of the local FDR method (locFDR; Efron, 2001) for PLV synchrony analysis to compute FDR as a posterior probability that an observed statistic belongs to a null hypothesis. We demonstrate the application of Efron's Empirical Bayes approach for PLV synchrony analysis for the first time. We use simulations to validate the specificity and sensitivity of locFDR and a real EEG dataset from a visual search study for experimental validation. We also compare locFDR with hierarchical FDR and optimal discovery procedures in both simulation and experimental analyses. Our simulation results showed that the locFDR can effectively control false positives without compromising on the power of PLV synchrony inference. Our results from the application locFDR on experiment data detected more significant discoveries than our previously proposed methods whereas the standard FDR method failed to detect any significant discoveries.

  9. Reservoir inflow forecasting with a modified coactive neuro-fuzzy inference system: a case study for a semi-arid region

    NASA Astrophysics Data System (ADS)

    Allawi, Mohammed Falah; Jaafar, Othman; Mohamad Hamzah, Firdaus; Mohd, Nuruol Syuhadaa; Deo, Ravinesh C.; El-Shafie, Ahmed

    2017-10-01

    Existing forecast models applied for reservoir inflow forecasting encounter several drawbacks, due to the difficulty of the underlying mathematical procedures being to cope with and to mimic the naturalization and stochasticity of the inflow data patterns. In this study, appropriate adjustments to the conventional coactive neuro-fuzzy inference system (CANFIS) method are proposed to improve the mathematical procedure, thus enabling a better detection of the high nonlinearity patterns found in the reservoir inflow training data. This modification includes the updating of the back propagation algorithm, leading to a consequent update of the membership rules and the induction of the centre-weighted set rather than the global weighted set used in feature extraction. The modification also aids in constructing an integrated model that is able to not only detect the nonlinearity in the training data but also the wide range of features within the training data records used to simulate the forecasting model. To demonstrate the model's efficacy, the proposed CANFIS method has been applied to forecast monthly inflow data at Aswan High Dam (AHD), located in southern Egypt. Comparative analyses of the forecasting skill of the modified CANFIS and the conventional ANFIS model are carried out with statistical score indicators to assess the reliability of the developed method. The statistical metrics support the better performance of the developed CANFIS model, which significantly outperforms the ANFIS model to attain a low relative error value (23%), mean absolute error (1.4 BCM month-1), root mean square error (1.14 BCM month-1), and a relative large coefficient of determination (0.94). The present study ascertains the better utility of the modified CANFIS model in respect to the traditional ANFIS model applied in reservoir inflow forecasting for a semi-arid region.

  10. Multicategory reclassification statistics for assessing improvements in diagnostic accuracy

    PubMed Central

    Li, Jialiang; Jiang, Binyan; Fine, Jason P.

    2013-01-01

    In this paper, we extend the definitions of the net reclassification improvement (NRI) and the integrated discrimination improvement (IDI) in the context of multicategory classification. Both measures were proposed in Pencina and others (2008. Evaluating the added predictive ability of a new marker: from area under the receiver operating characteristic (ROC) curve to reclassification and beyond. Statistics in Medicine 27, 157–172) as numeric characterizations of accuracy improvement for binary diagnostic tests and were shown to have certain advantage over analyses based on ROC curves or other regression approaches. Estimation and inference procedures for the multiclass NRI and IDI are provided in this paper along with necessary asymptotic distributional results. Simulations are conducted to study the finite-sample properties of the proposed estimators. Two medical examples are considered to illustrate our methodology. PMID:23197381

  11. Spurious correlations and inference in landscape genetics

    Treesearch

    Samuel A. Cushman; Erin L. Landguth

    2010-01-01

    Reliable interpretation of landscape genetic analyses depends on statistical methods that have high power to identify the correct process driving gene flow while rejecting incorrect alternative hypotheses. Little is known about statistical power and inference in individual-based landscape genetics. Our objective was to evaluate the power of causalmodelling with partial...

  12. The Philosophical Foundations of Prescriptive Statements and Statistical Inference

    ERIC Educational Resources Information Center

    Sun, Shuyan; Pan, Wei

    2011-01-01

    From the perspectives of the philosophy of science and statistical inference, we discuss the challenges of making prescriptive statements in quantitative research articles. We first consider the prescriptive nature of educational research and argue that prescriptive statements are a necessity in educational research. The logic of deduction,…

  13. Inference and the Introductory Statistics Course

    ERIC Educational Resources Information Center

    Pfannkuch, Maxine; Regan, Matt; Wild, Chris; Budgett, Stephanie; Forbes, Sharleen; Harraway, John; Parsonage, Ross

    2011-01-01

    This article sets out some of the rationale and arguments for making major changes to the teaching and learning of statistical inference in introductory courses at our universities by changing from a norm-based, mathematical approach to more conceptually accessible computer-based approaches. The core problem of the inferential argument with its…

  14. "Magnitude-based inference": a statistical review.

    PubMed

    Welsh, Alan H; Knight, Emma J

    2015-04-01

    We consider "magnitude-based inference" and its interpretation by examining in detail its use in the problem of comparing two means. We extract from the spreadsheets, which are provided to users of the analysis (http://www.sportsci.org/), a precise description of how "magnitude-based inference" is implemented. We compare the implemented version of the method with general descriptions of it and interpret the method in familiar statistical terms. We show that "magnitude-based inference" is not a progressive improvement on modern statistics. The additional probabilities introduced are not directly related to the confidence interval but, rather, are interpretable either as P values for two different nonstandard tests (for different null hypotheses) or as approximate Bayesian calculations, which also lead to a type of test. We also discuss sample size calculations associated with "magnitude-based inference" and show that the substantial reduction in sample sizes claimed for the method (30% of the sample size obtained from standard frequentist calculations) is not justifiable so the sample size calculations should not be used. Rather than using "magnitude-based inference," a better solution is to be realistic about the limitations of the data and use either confidence intervals or a fully Bayesian analysis.

  15. Multi-Agent Inference in Social Networks: A Finite Population Learning Approach.

    PubMed

    Fan, Jianqing; Tong, Xin; Zeng, Yao

    When people in a society want to make inference about some parameter, each person may want to use data collected by other people. Information (data) exchange in social networks is usually costly, so to make reliable statistical decisions, people need to trade off the benefits and costs of information acquisition. Conflicts of interests and coordination problems will arise in the process. Classical statistics does not consider people's incentives and interactions in the data collection process. To address this imperfection, this work explores multi-agent Bayesian inference problems with a game theoretic social network model. Motivated by our interest in aggregate inference at the societal level, we propose a new concept, finite population learning , to address whether with high probability, a large fraction of people in a given finite population network can make "good" inference. Serving as a foundation, this concept enables us to study the long run trend of aggregate inference quality as population grows.

  16. Inferring gene regression networks with model trees

    PubMed Central

    2010-01-01

    Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database) is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear regressions to separate areas of the search space favoring to infer localized similarities over a more global similarity. Furthermore, experimental results show the good performance of REGNET. PMID:20950452

  17. A Bayesian test for Hardy–Weinberg equilibrium of biallelic X-chromosomal markers

    PubMed Central

    Puig, X; Ginebra, J; Graffelman, J

    2017-01-01

    The X chromosome is a relatively large chromosome, harboring a lot of genetic information. Much of the statistical analysis of X-chromosomal information is complicated by the fact that males only have one copy. Recently, frequentist statistical tests for Hardy–Weinberg equilibrium have been proposed specifically for dealing with markers on the X chromosome. Bayesian test procedures for Hardy–Weinberg equilibrium for the autosomes have been described, but Bayesian work on the X chromosome in this context is lacking. This paper gives the first Bayesian approach for testing Hardy–Weinberg equilibrium with biallelic markers at the X chromosome. Marginal and joint posterior distributions for the inbreeding coefficient in females and the male to female allele frequency ratio are computed, and used for statistical inference. The paper gives a detailed account of the proposed Bayesian test, and illustrates it with data from the 1000 Genomes project. In that implementation, a novel approach to tackle multiple testing from a Bayesian perspective through posterior predictive checks is used. PMID:28900292

  18. Rank-based permutation approaches for non-parametric factorial designs.

    PubMed

    Umlauft, Maria; Konietschke, Frank; Pauly, Markus

    2017-11-01

    Inference methods for null hypotheses formulated in terms of distribution functions in general non-parametric factorial designs are studied. The methods can be applied to continuous, ordinal or even ordered categorical data in a unified way, and are based only on ranks. In this set-up Wald-type statistics and ANOVA-type statistics are the current state of the art. The first method is asymptotically exact but a rather liberal statistical testing procedure for small to moderate sample size, while the latter is only an approximation which does not possess the correct asymptotic α level under the null. To bridge these gaps, a novel permutation approach is proposed which can be seen as a flexible generalization of the Kruskal-Wallis test to all kinds of factorial designs with independent observations. It is proven that the permutation principle is asymptotically correct while keeping its finite exactness property when data are exchangeable. The results of extensive simulation studies foster these theoretical findings. A real data set exemplifies its applicability. © 2017 The British Psychological Society.

  19. Statistics, Adjusted Statistics, and Maladjusted Statistics.

    PubMed

    Kaufman, Jay S

    2017-05-01

    Statistical adjustment is a ubiquitous practice in all quantitative fields that is meant to correct for improprieties or limitations in observed data, to remove the influence of nuisance variables or to turn observed correlations into causal inferences. These adjustments proceed by reporting not what was observed in the real world, but instead modeling what would have been observed in an imaginary world in which specific nuisances and improprieties are absent. These techniques are powerful and useful inferential tools, but their application can be hazardous or deleterious if consumers of the adjusted results mistake the imaginary world of models for the real world of data. Adjustments require decisions about which factors are of primary interest and which are imagined away, and yet many adjusted results are presented without any explanation or justification for these decisions. Adjustments can be harmful if poorly motivated, and are frequently misinterpreted in the media's reporting of scientific studies. Adjustment procedures have become so routinized that many scientists and readers lose the habit of relating the reported findings back to the real world in which we live.

  20. Intuitive statistics by 8-month-old infants

    PubMed Central

    Xu, Fei; Garcia, Vashti

    2008-01-01

    Human learners make inductive inferences based on small amounts of data: we generalize from samples to populations and vice versa. The academic discipline of statistics formalizes these intuitive statistical inferences. What is the origin of this ability? We report six experiments investigating whether 8-month-old infants are “intuitive statisticians.” Our results showed that, given a sample, the infants were able to make inferences about the population from which the sample had been drawn. Conversely, given information about the entire population of relatively small size, the infants were able to make predictions about the sample. Our findings provide evidence that infants possess a powerful mechanism for inductive learning, either using heuristics or basic principles of probability. This ability to make inferences based on samples or information about the population develops early and in the absence of schooling or explicit teaching. Human infants may be rational learners from very early in development. PMID:18378901

  1. Evaluation of Second-Level Inference in fMRI Analysis

    PubMed Central

    Roels, Sanne P.; Loeys, Tom; Moerkerke, Beatrijs

    2016-01-01

    We investigate the impact of decisions in the second-level (i.e., over subjects) inferential process in functional magnetic resonance imaging on (1) the balance between false positives and false negatives and on (2) the data-analytical stability, both proxies for the reproducibility of results. Second-level analysis based on a mass univariate approach typically consists of 3 phases. First, one proceeds via a general linear model for a test image that consists of pooled information from different subjects. We evaluate models that take into account first-level (within-subjects) variability and models that do not take into account this variability. Second, one proceeds via inference based on parametrical assumptions or via permutation-based inference. Third, we evaluate 3 commonly used procedures to address the multiple testing problem: familywise error rate correction, False Discovery Rate (FDR) correction, and a two-step procedure with minimal cluster size. Based on a simulation study and real data we find that the two-step procedure with minimal cluster size results in most stable results, followed by the familywise error rate correction. The FDR results in most variable results, for both permutation-based inference and parametrical inference. Modeling the subject-specific variability yields a better balance between false positives and false negatives when using parametric inference. PMID:26819578

  2. Assessing colour-dependent occupation statistics inferred from galaxy group catalogues

    NASA Astrophysics Data System (ADS)

    Campbell, Duncan; van den Bosch, Frank C.; Hearin, Andrew; Padmanabhan, Nikhil; Berlind, Andreas; Mo, H. J.; Tinker, Jeremy; Yang, Xiaohu

    2015-09-01

    We investigate the ability of current implementations of galaxy group finders to recover colour-dependent halo occupation statistics. To test the fidelity of group catalogue inferred statistics, we run three different group finders used in the literature over a mock that includes galaxy colours in a realistic manner. Overall, the resulting mock group catalogues are remarkably similar, and most colour-dependent statistics are recovered with reasonable accuracy. However, it is also clear that certain systematic errors arise as a consequence of correlated errors in group membership determination, central/satellite designation, and halo mass assignment. We introduce a new statistic, the halo transition probability (HTP), which captures the combined impact of all these errors. As a rule of thumb, errors tend to equalize the properties of distinct galaxy populations (i.e. red versus blue galaxies or centrals versus satellites), and to result in inferred occupation statistics that are more accurate for red galaxies than for blue galaxies. A statistic that is particularly poorly recovered from the group catalogues is the red fraction of central galaxies as a function of halo mass. Group finders do a good job in recovering galactic conformity, but also have a tendency to introduce weak conformity when none is present. We conclude that proper inference of colour-dependent statistics from group catalogues is best achieved using forward modelling (i.e. running group finders over mock data) or by implementing a correction scheme based on the HTP, as long as the latter is not too strongly model dependent.

  3. Statistical inference for the additive hazards model under outcome-dependent sampling.

    PubMed

    Yu, Jichang; Liu, Yanyan; Sandler, Dale P; Zhou, Haibo

    2015-09-01

    Cost-effective study design and proper inference procedures for data from such designs are always of particular interests to study investigators. In this article, we propose a biased sampling scheme, an outcome-dependent sampling (ODS) design for survival data with right censoring under the additive hazards model. We develop a weighted pseudo-score estimator for the regression parameters for the proposed design and derive the asymptotic properties of the proposed estimator. We also provide some suggestions for using the proposed method by evaluating the relative efficiency of the proposed method against simple random sampling design and derive the optimal allocation of the subsamples for the proposed design. Simulation studies show that the proposed ODS design is more powerful than other existing designs and the proposed estimator is more efficient than other estimators. We apply our method to analyze a cancer study conducted at NIEHS, the Cancer Incidence and Mortality of Uranium Miners Study, to study the risk of radon exposure to cancer.

  4. Statistical inference for the additive hazards model under outcome-dependent sampling

    PubMed Central

    Yu, Jichang; Liu, Yanyan; Sandler, Dale P.; Zhou, Haibo

    2015-01-01

    Cost-effective study design and proper inference procedures for data from such designs are always of particular interests to study investigators. In this article, we propose a biased sampling scheme, an outcome-dependent sampling (ODS) design for survival data with right censoring under the additive hazards model. We develop a weighted pseudo-score estimator for the regression parameters for the proposed design and derive the asymptotic properties of the proposed estimator. We also provide some suggestions for using the proposed method by evaluating the relative efficiency of the proposed method against simple random sampling design and derive the optimal allocation of the subsamples for the proposed design. Simulation studies show that the proposed ODS design is more powerful than other existing designs and the proposed estimator is more efficient than other estimators. We apply our method to analyze a cancer study conducted at NIEHS, the Cancer Incidence and Mortality of Uranium Miners Study, to study the risk of radon exposure to cancer. PMID:26379363

  5. Statistical learning and selective inference.

    PubMed

    Taylor, Jonathan; Tibshirani, Robert J

    2015-06-23

    We describe the problem of "selective inference." This addresses the following challenge: Having mined a set of data to find potential associations, how do we properly assess the strength of these associations? The fact that we have "cherry-picked"--searched for the strongest associations--means that we must set a higher bar for declaring significant the associations that we see. This challenge becomes more important in the era of big data and complex statistical modeling. The cherry tree (dataset) can be very large and the tools for cherry picking (statistical learning methods) are now very sophisticated. We describe some recent new developments in selective inference and illustrate their use in forward stepwise regression, the lasso, and principal components analysis.

  6. Variations on Bayesian Prediction and Inference

    DTIC Science & Technology

    2016-05-09

    inference 2.2.1 Background There are a number of statistical inference problems that are not generally formulated via a full probability model...problem of inference about an unknown parameter, the Bayesian approach requires a full probability 1. REPORT DATE (DD-MM-YYYY) 4. TITLE AND...the problem of inference about an unknown parameter, the Bayesian approach requires a full probability model/likelihood which can be an obstacle

  7. Using the Bootstrap Method for a Statistical Significance Test of Differences between Summary Histograms

    NASA Technical Reports Server (NTRS)

    Xu, Kuan-Man

    2006-01-01

    A new method is proposed to compare statistical differences between summary histograms, which are the histograms summed over a large ensemble of individual histograms. It consists of choosing a distance statistic for measuring the difference between summary histograms and using a bootstrap procedure to calculate the statistical significance level. Bootstrapping is an approach to statistical inference that makes few assumptions about the underlying probability distribution that describes the data. Three distance statistics are compared in this study. They are the Euclidean distance, the Jeffries-Matusita distance and the Kuiper distance. The data used in testing the bootstrap method are satellite measurements of cloud systems called cloud objects. Each cloud object is defined as a contiguous region/patch composed of individual footprints or fields of view. A histogram of measured values over footprints is generated for each parameter of each cloud object and then summary histograms are accumulated over all individual histograms in a given cloud-object size category. The results of statistical hypothesis tests using all three distances as test statistics are generally similar, indicating the validity of the proposed method. The Euclidean distance is determined to be most suitable after comparing the statistical tests of several parameters with distinct probability distributions among three cloud-object size categories. Impacts on the statistical significance levels resulting from differences in the total lengths of satellite footprint data between two size categories are also discussed.

  8. Inferring causal relationships between phenotypes using summary statistics from genome-wide association studies.

    PubMed

    Meng, Xiang-He; Shen, Hui; Chen, Xiang-Ding; Xiao, Hong-Mei; Deng, Hong-Wen

    2018-03-01

    Genome-wide association studies (GWAS) have successfully identified numerous genetic variants associated with diverse complex phenotypes and diseases, and provided tremendous opportunities for further analyses using summary association statistics. Recently, Pickrell et al. developed a robust method for causal inference using independent putative causal SNPs. However, this method may fail to infer the causal relationship between two phenotypes when only a limited number of independent putative causal SNPs identified. Here, we extended Pickrell's method to make it more applicable for the general situations. We extended the causal inference method by replacing the putative causal SNPs with the lead SNPs (the set of the most significant SNPs in each independent locus) and tested the performance of our extended method using both simulation and empirical data. Simulations suggested that when the same number of genetic variants is used, our extended method had similar distribution of test statistic under the null model as well as comparable power under the causal model compared with the original method by Pickrell et al. But in practice, our extended method would generally be more powerful because the number of independent lead SNPs was often larger than the number of independent putative causal SNPs. And including more SNPs, on the other hand, would not cause more false positives. By applying our extended method to summary statistics from GWAS for blood metabolites and femoral neck bone mineral density (FN-BMD), we successfully identified ten blood metabolites that may causally influence FN-BMD. We extended a causal inference method for inferring putative causal relationship between two phenotypes using summary statistics from GWAS, and identified a number of potential causal metabolites for FN-BMD, which may provide novel insights into the pathophysiological mechanisms underlying osteoporosis.

  9. Statistical inference for remote sensing-based estimates of net deforestation

    Treesearch

    Ronald E. McRoberts; Brian F. Walters

    2012-01-01

    Statistical inference requires expression of an estimate in probabilistic terms, usually in the form of a confidence interval. An approach to constructing confidence intervals for remote sensing-based estimates of net deforestation is illustrated. The approach is based on post-classification methods using two independent forest/non-forest classifications because...

  10. Marginal evidence for cosmic acceleration from Type Ia supernovae

    NASA Astrophysics Data System (ADS)

    Nielsen, J. T.; Guffanti, A.; Sarkar, S.

    2016-10-01

    The ‘standard’ model of cosmology is founded on the basis that the expansion rate of the universe is accelerating at present — as was inferred originally from the Hubble diagram of Type Ia supernovae. There exists now a much bigger database of supernovae so we can perform rigorous statistical tests to check whether these ‘standardisable candles’ indeed indicate cosmic acceleration. Taking account of the empirical procedure by which corrections are made to their absolute magnitudes to allow for the varying shape of the light curve and extinction by dust, we find, rather surprisingly, that the data are still quite consistent with a constant rate of expansion.

  11. Marginal evidence for cosmic acceleration from Type Ia supernovae

    PubMed Central

    Nielsen, J. T.; Guffanti, A.; Sarkar, S.

    2016-01-01

    The ‘standard’ model of cosmology is founded on the basis that the expansion rate of the universe is accelerating at present — as was inferred originally from the Hubble diagram of Type Ia supernovae. There exists now a much bigger database of supernovae so we can perform rigorous statistical tests to check whether these ‘standardisable candles’ indeed indicate cosmic acceleration. Taking account of the empirical procedure by which corrections are made to their absolute magnitudes to allow for the varying shape of the light curve and extinction by dust, we find, rather surprisingly, that the data are still quite consistent with a constant rate of expansion. PMID:27767125

  12. A method for modeling co-occurrence propensity of clinical codes with application to ICD-10-PCS auto-coding.

    PubMed

    Subotin, Michael; Davis, Anthony R

    2016-09-01

    Natural language processing methods for medical auto-coding, or automatic generation of medical billing codes from electronic health records, generally assign each code independently of the others. They may thus assign codes for closely related procedures or diagnoses to the same document, even when they do not tend to occur together in practice, simply because the right choice can be difficult to infer from the clinical narrative. We propose a method that injects awareness of the propensities for code co-occurrence into this process. First, a model is trained to estimate the conditional probability that one code is assigned by a human coder, given than another code is known to have been assigned to the same document. Then, at runtime, an iterative algorithm is used to apply this model to the output of an existing statistical auto-coder to modify the confidence scores of the codes. We tested this method in combination with a primary auto-coder for International Statistical Classification of Diseases-10 procedure codes, achieving a 12% relative improvement in F-score over the primary auto-coder baseline. The proposed method can be used, with appropriate features, in combination with any auto-coder that generates codes with different levels of confidence. The promising results obtained for International Statistical Classification of Diseases-10 procedure codes suggest that the proposed method may have wider applications in auto-coding. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  13. Detection of crossover time scales in multifractal detrended fluctuation analysis

    NASA Astrophysics Data System (ADS)

    Ge, Erjia; Leung, Yee

    2013-04-01

    Fractal is employed in this paper as a scale-based method for the identification of the scaling behavior of time series. Many spatial and temporal processes exhibiting complex multi(mono)-scaling behaviors are fractals. One of the important concepts in fractals is crossover time scale(s) that separates distinct regimes having different fractal scaling behaviors. A common method is multifractal detrended fluctuation analysis (MF-DFA). The detection of crossover time scale(s) is, however, relatively subjective since it has been made without rigorous statistical procedures and has generally been determined by eye balling or subjective observation. Crossover time scales such determined may be spurious and problematic. It may not reflect the genuine underlying scaling behavior of a time series. The purpose of this paper is to propose a statistical procedure to model complex fractal scaling behaviors and reliably identify the crossover time scales under MF-DFA. The scaling-identification regression model, grounded on a solid statistical foundation, is first proposed to describe multi-scaling behaviors of fractals. Through the regression analysis and statistical inference, we can (1) identify the crossover time scales that cannot be detected by eye-balling observation, (2) determine the number and locations of the genuine crossover time scales, (3) give confidence intervals for the crossover time scales, and (4) establish the statistically significant regression model depicting the underlying scaling behavior of a time series. To substantive our argument, the regression model is applied to analyze the multi-scaling behaviors of avian-influenza outbreaks, water consumption, daily mean temperature, and rainfall of Hong Kong. Through the proposed model, we can have a deeper understanding of fractals in general and a statistical approach to identify multi-scaling behavior under MF-DFA in particular.

  14. Thermodynamics of statistical inference by cells.

    PubMed

    Lang, Alex H; Fisher, Charles K; Mora, Thierry; Mehta, Pankaj

    2014-10-03

    The deep connection between thermodynamics, computation, and information is now well established both theoretically and experimentally. Here, we extend these ideas to show that thermodynamics also places fundamental constraints on statistical estimation and learning. To do so, we investigate the constraints placed by (nonequilibrium) thermodynamics on the ability of biochemical signaling networks to estimate the concentration of an external signal. We show that accuracy is limited by energy consumption, suggesting that there are fundamental thermodynamic constraints on statistical inference.

  15. BIOREL: the benchmark resource to estimate the relevance of the gene networks.

    PubMed

    Antonov, Alexey V; Mewes, Hans W

    2006-02-06

    The progress of high-throughput methodologies in functional genomics has lead to the development of statistical procedures to infer gene networks from various types of high-throughput data. However, due to the lack of common standards, the biological significance of the results of the different studies is hard to compare. To overcome this problem we propose a benchmark procedure and have developed a web resource (BIOREL), which is useful for estimating the biological relevance of any genetic network by integrating different sources of biological information. The associations of each gene from the network are classified as biologically relevant or not. The proportion of genes in the network classified as "relevant" is used as the overall network relevance score. Employing synthetic data we demonstrated that such a score ranks the networks fairly in respect to the relevance level. Using BIOREL as the benchmark resource we compared the quality of experimental and theoretically predicted protein interaction data.

  16. Markov chain Monte Carlo techniques applied to parton distribution functions determination: Proof of concept

    NASA Astrophysics Data System (ADS)

    Gbedo, Yémalin Gabin; Mangin-Brinet, Mariane

    2017-07-01

    We present a new procedure to determine parton distribution functions (PDFs), based on Markov chain Monte Carlo (MCMC) methods. The aim of this paper is to show that we can replace the standard χ2 minimization by procedures grounded on statistical methods, and on Bayesian inference in particular, thus offering additional insight into the rich field of PDFs determination. After a basic introduction to these techniques, we introduce the algorithm we have chosen to implement—namely Hybrid (or Hamiltonian) Monte Carlo. This algorithm, initially developed for Lattice QCD, turns out to be very interesting when applied to PDFs determination by global analyses; we show that it allows us to circumvent the difficulties due to the high dimensionality of the problem, in particular concerning the acceptance. A first feasibility study is performed and presented, which indicates that Markov chain Monte Carlo can successfully be applied to the extraction of PDFs and of their uncertainties.

  17. Identifying fMRI Model Violations with Lagrange Multiplier Tests

    PubMed Central

    Cassidy, Ben; Long, Christopher J; Rae, Caroline; Solo, Victor

    2013-01-01

    The standard modeling framework in Functional Magnetic Resonance Imaging (fMRI) is predicated on assumptions of linearity, time invariance and stationarity. These assumptions are rarely checked because doing so requires specialised software, although failure to do so can lead to bias and mistaken inference. Identifying model violations is an essential but largely neglected step in standard fMRI data analysis. Using Lagrange Multiplier testing methods we have developed simple and efficient procedures for detecting model violations such as non-linearity, non-stationarity and validity of the common Double Gamma specification for hemodynamic response. These procedures are computationally cheap and can easily be added to a conventional analysis. The test statistic is calculated at each voxel and displayed as a spatial anomaly map which shows regions where a model is violated. The methodology is illustrated with a large number of real data examples. PMID:22542665

  18. Proper and Paradigmatic Metonymy as a Lens for Characterizing Student Conceptions of Distributions and Sampling

    ERIC Educational Resources Information Center

    Noll, Jennifer; Hancock, Stacey

    2015-01-01

    This research investigates what students' use of statistical language can tell us about their conceptions of distribution and sampling in relation to informal inference. Prior research documents students' challenges in understanding ideas of distribution and sampling as tools for making informal statistical inferences. We know that these…

  19. Statistical inferences for data from studies conducted with an aggregated multivariate outcome-dependent sample design.

    PubMed

    Lu, Tsui-Shan; Longnecker, Matthew P; Zhou, Haibo

    2017-03-15

    Outcome-dependent sampling (ODS) scheme is a cost-effective sampling scheme where one observes the exposure with a probability that depends on the outcome. The well-known such design is the case-control design for binary response, the case-cohort design for the failure time data, and the general ODS design for a continuous response. While substantial work has been carried out for the univariate response case, statistical inference and design for the ODS with multivariate cases remain under-developed. Motivated by the need in biological studies for taking the advantage of the available responses for subjects in a cluster, we propose a multivariate outcome-dependent sampling (multivariate-ODS) design that is based on a general selection of the continuous responses within a cluster. The proposed inference procedure for the multivariate-ODS design is semiparametric where all the underlying distributions of covariates are modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimator is consistent and developed the asymptotically normality properties. Simulation studies show that the proposed estimator is more efficient than the estimator obtained using only the simple-random-sample portion of the multivariate-ODS or the estimator from a simple random sample with the same sample size. The multivariate-ODS design together with the proposed estimator provides an approach to further improve study efficiency for a given fixed study budget. We illustrate the proposed design and estimator with an analysis of association of polychlorinated biphenyl exposure to hearing loss in children born to the Collaborative Perinatal Study. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  20. Multi-Agent Inference in Social Networks: A Finite Population Learning Approach

    PubMed Central

    Tong, Xin; Zeng, Yao

    2016-01-01

    When people in a society want to make inference about some parameter, each person may want to use data collected by other people. Information (data) exchange in social networks is usually costly, so to make reliable statistical decisions, people need to trade off the benefits and costs of information acquisition. Conflicts of interests and coordination problems will arise in the process. Classical statistics does not consider people’s incentives and interactions in the data collection process. To address this imperfection, this work explores multi-agent Bayesian inference problems with a game theoretic social network model. Motivated by our interest in aggregate inference at the societal level, we propose a new concept, finite population learning, to address whether with high probability, a large fraction of people in a given finite population network can make “good” inference. Serving as a foundation, this concept enables us to study the long run trend of aggregate inference quality as population grows. PMID:27076691

  1. Robust inference from multiple test statistics via permutations: a better alternative to the single test statistic approach for randomized trials.

    PubMed

    Ganju, Jitendra; Yu, Xinxin; Ma, Guoguang Julie

    2013-01-01

    Formal inference in randomized clinical trials is based on controlling the type I error rate associated with a single pre-specified statistic. The deficiency of using just one method of analysis is that it depends on assumptions that may not be met. For robust inference, we propose pre-specifying multiple test statistics and relying on the minimum p-value for testing the null hypothesis of no treatment effect. The null hypothesis associated with the various test statistics is that the treatment groups are indistinguishable. The critical value for hypothesis testing comes from permutation distributions. Rejection of the null hypothesis when the smallest p-value is less than the critical value controls the type I error rate at its designated value. Even if one of the candidate test statistics has low power, the adverse effect on the power of the minimum p-value statistic is not much. Its use is illustrated with examples. We conclude that it is better to rely on the minimum p-value rather than a single statistic particularly when that single statistic is the logrank test, because of the cost and complexity of many survival trials. Copyright © 2013 John Wiley & Sons, Ltd.

  2. A Protein Standard That Emulates Homology for the Characterization of Protein Inference Algorithms.

    PubMed

    The, Matthew; Edfors, Fredrik; Perez-Riverol, Yasset; Payne, Samuel H; Hoopmann, Michael R; Palmblad, Magnus; Forsström, Björn; Käll, Lukas

    2018-05-04

    A natural way to benchmark the performance of an analytical experimental setup is to use samples of known composition and see to what degree one can correctly infer the content of such a sample from the data. For shotgun proteomics, one of the inherent problems of interpreting data is that the measured analytes are peptides and not the actual proteins themselves. As some proteins share proteolytic peptides, there might be more than one possible causative set of proteins resulting in a given set of peptides and there is a need for mechanisms that infer proteins from lists of detected peptides. A weakness of commercially available samples of known content is that they consist of proteins that are deliberately selected for producing tryptic peptides that are unique to a single protein. Unfortunately, such samples do not expose any complications in protein inference. Hence, for a realistic benchmark of protein inference procedures, there is a need for samples of known content where the present proteins share peptides with known absent proteins. Here, we present such a standard, that is based on E. coli expressed human protein fragments. To illustrate the application of this standard, we benchmark a set of different protein inference procedures on the data. We observe that inference procedures excluding shared peptides provide more accurate estimates of errors compared to methods that include information from shared peptides, while still giving a reasonable performance in terms of the number of identified proteins. We also demonstrate that using a sample of known protein content without proteins with shared tryptic peptides can give a false sense of accuracy for many protein inference methods.

  3. Bayesian inference of physiologically meaningful parameters from body sway measurements.

    PubMed

    Tietäväinen, A; Gutmann, M U; Keski-Vakkuri, E; Corander, J; Hæggström, E

    2017-06-19

    The control of the human body sway by the central nervous system, muscles, and conscious brain is of interest since body sway carries information about the physiological status of a person. Several models have been proposed to describe body sway in an upright standing position, however, due to the statistical intractability of the more realistic models, no formal parameter inference has previously been conducted and the expressive power of such models for real human subjects remains unknown. Using the latest advances in Bayesian statistical inference for intractable models, we fitted a nonlinear control model to posturographic measurements, and we showed that it can accurately predict the sway characteristics of both simulated and real subjects. Our method provides a full statistical characterization of the uncertainty related to all model parameters as quantified by posterior probability density functions, which is useful for comparisons across subjects and test settings. The ability to infer intractable control models from sensor data opens new possibilities for monitoring and predicting body status in health applications.

  4. Research participant compensation: A matter of statistical inference as well as ethics.

    PubMed

    Swanson, David M; Betensky, Rebecca A

    2015-11-01

    The ethics of compensation of research subjects for participation in clinical trials has been debated for years. One ethical issue of concern is variation among subjects in the level of compensation for identical treatments. Surprisingly, the impact of variation on the statistical inferences made from trial results has not been examined. We seek to identify how variation in compensation may influence any existing dependent censoring in clinical trials, thereby also influencing inference about the survival curve, hazard ratio, or other measures of treatment efficacy. In simulation studies, we consider a model for how compensation structure may influence the censoring model. Under existing dependent censoring, we estimate survival curves under different compensation structures and observe how these structures induce variability in the estimates. We show through this model that if the compensation structure affects the censoring model and dependent censoring is present, then variation in that structure induces variation in the estimates and affects the accuracy of estimation and inference on treatment efficacy. From the perspectives of both ethics and statistical inference, standardization and transparency in the compensation of participants in clinical trials are warranted. Copyright © 2015 Elsevier Inc. All rights reserved.

  5. Pre-Service Mathematics Teachers' Use of Probability Models in Making Informal Inferences about a Chance Game

    ERIC Educational Resources Information Center

    Kazak, Sibel; Pratt, Dave

    2017-01-01

    This study considers probability models as tools for both making informal statistical inferences and building stronger conceptual connections between data and chance topics in teaching statistics. In this paper, we aim to explore pre-service mathematics teachers' use of probability models for a chance game, where the sum of two dice matters in…

  6. Phylogeography Takes a Relaxed Random Walk in Continuous Space and Time

    PubMed Central

    Lemey, Philippe; Rambaut, Andrew; Welch, John J.; Suchard, Marc A.

    2010-01-01

    Research aimed at understanding the geographic context of evolutionary histories is burgeoning across biological disciplines. Recent endeavors attempt to interpret contemporaneous genetic variation in the light of increasingly detailed geographical and environmental observations. Such interest has promoted the development of phylogeographic inference techniques that explicitly aim to integrate such heterogeneous data. One promising development involves reconstructing phylogeographic history on a continuous landscape. Here, we present a Bayesian statistical approach to infer continuous phylogeographic diffusion using random walk models while simultaneously reconstructing the evolutionary history in time from molecular sequence data. Moreover, by accommodating branch-specific variation in dispersal rates, we relax the most restrictive assumption of the standard Brownian diffusion process and demonstrate increased statistical efficiency in spatial reconstructions of overdispersed random walks by analyzing both simulated and real viral genetic data. We further illustrate how drawing inference about summary statistics from a fully specified stochastic process over both sequence evolution and spatial movement reveals important characteristics of a rabies epidemic. Together with recent advances in discrete phylogeographic inference, the continuous model developments furnish a flexible statistical framework for biogeographical reconstructions that is easily expanded upon to accommodate various landscape genetic features. PMID:20203288

  7. Variation in reaction norms: Statistical considerations and biological interpretation.

    PubMed

    Morrissey, Michael B; Liefting, Maartje

    2016-09-01

    Analysis of reaction norms, the functions by which the phenotype produced by a given genotype depends on the environment, is critical to studying many aspects of phenotypic evolution. Different techniques are available for quantifying different aspects of reaction norm variation. We examine what biological inferences can be drawn from some of the more readily applicable analyses for studying reaction norms. We adopt a strongly biologically motivated view, but draw on statistical theory to highlight strengths and drawbacks of different techniques. In particular, consideration of some formal statistical theory leads to revision of some recently, and forcefully, advocated opinions on reaction norm analysis. We clarify what simple analysis of the slope between mean phenotype in two environments can tell us about reaction norms, explore the conditions under which polynomial regression can provide robust inferences about reaction norm shape, and explore how different existing approaches may be used to draw inferences about variation in reaction norm shape. We show how mixed model-based approaches can provide more robust inferences than more commonly used multistep statistical approaches, and derive new metrics of the relative importance of variation in reaction norm intercepts, slopes, and curvatures. © 2016 The Author(s). Evolution © 2016 The Society for the Study of Evolution.

  8. On the Ability To Infer Deficiency in Mathematics From Performance in Physics Using Hierarchies

    ERIC Educational Resources Information Center

    Riban, David M.

    1971-01-01

    Presents the procedures, results, and conclusions of a study designed to see if mathematical deficiencies can be inferred from PSSC students' performance by using a hierarchical model of requisite skills. Assuming inferences were possible, remediation was given. No effect due to remediation was observed but analysis indicated incidental learning…

  9. The Role of Probability-Based Inference in an Intelligent Tutoring System.

    ERIC Educational Resources Information Center

    Mislevy, Robert J.; Gitomer, Drew H.

    Probability-based inference in complex networks of interdependent variables is an active topic in statistical research, spurred by such diverse applications as forecasting, pedigree analysis, troubleshooting, and medical diagnosis. This paper concerns the role of Bayesian inference networks for updating student models in intelligent tutoring…

  10. Boosting Bayesian parameter inference of stochastic differential equation models with methods from statistical physics

    NASA Astrophysics Data System (ADS)

    Albert, Carlo; Ulzega, Simone; Stoop, Ruedi

    2016-04-01

    Measured time-series of both precipitation and runoff are known to exhibit highly non-trivial statistical properties. For making reliable probabilistic predictions in hydrology, it is therefore desirable to have stochastic models with output distributions that share these properties. When parameters of such models have to be inferred from data, we also need to quantify the associated parametric uncertainty. For non-trivial stochastic models, however, this latter step is typically very demanding, both conceptually and numerically, and always never done in hydrology. Here, we demonstrate that methods developed in statistical physics make a large class of stochastic differential equation (SDE) models amenable to a full-fledged Bayesian parameter inference. For concreteness we demonstrate these methods by means of a simple yet non-trivial toy SDE model. We consider a natural catchment that can be described by a linear reservoir, at the scale of observation. All the neglected processes are assumed to happen at much shorter time-scales and are therefore modeled with a Gaussian white noise term, the standard deviation of which is assumed to scale linearly with the system state (water volume in the catchment). Even for constant input, the outputs of this simple non-linear SDE model show a wealth of desirable statistical properties, such as fat-tailed distributions and long-range correlations. Standard algorithms for Bayesian inference fail, for models of this kind, because their likelihood functions are extremely high-dimensional intractable integrals over all possible model realizations. The use of Kalman filters is illegitimate due to the non-linearity of the model. Particle filters could be used but become increasingly inefficient with growing number of data points. Hamiltonian Monte Carlo algorithms allow us to translate this inference problem to the problem of simulating the dynamics of a statistical mechanics system and give us access to most sophisticated methods that have been developed in the statistical physics community over the last few decades. We demonstrate that such methods, along with automated differentiation algorithms, allow us to perform a full-fledged Bayesian inference, for a large class of SDE models, in a highly efficient and largely automatized manner. Furthermore, our algorithm is highly parallelizable. For our toy model, discretized with a few hundred points, a full Bayesian inference can be performed in a matter of seconds on a standard PC.

  11. An inferentialist perspective on the coordination of actions and reasons involved in making a statistical inference

    NASA Astrophysics Data System (ADS)

    Bakker, Arthur; Ben-Zvi, Dani; Makar, Katie

    2017-12-01

    To understand how statistical and other types of reasoning are coordinated with actions to reduce uncertainty, we conducted a case study in vocational education that involved statistical hypothesis testing. We analyzed an intern's research project in a hospital laboratory in which reducing uncertainties was crucial to make a valid statistical inference. In his project, the intern, Sam, investigated whether patients' blood could be sent through pneumatic post without influencing the measurement of particular blood components. We asked, in the process of making a statistical inference, how are reasons and actions coordinated to reduce uncertainty? For the analysis, we used the semantic theory of inferentialism, specifically, the concept of webs of reasons and actions—complexes of interconnected reasons for facts and actions; these reasons include premises and conclusions, inferential relations, implications, motives for action, and utility of tools for specific purposes in a particular context. Analysis of interviews with Sam, his supervisor and teacher as well as video data of Sam in the classroom showed that many of Sam's actions aimed to reduce variability, rule out errors, and thus reduce uncertainties so as to arrive at a valid inference. Interestingly, the decisive factor was not the outcome of a t test but of the reference change value, a clinical chemical measure of analytic and biological variability. With insights from this case study, we expect that students can be better supported in connecting statistics with context and in dealing with uncertainty.

  12. 77 FR 62350 - Practices and Procedures

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-10-12

    ... inference against a respondent agency with respect to non- appearance of any employee not under its control... inference in favor of the requesting party with regard to the information sought. The existing regulation...

  13. Analysis of Statistical Methods and Errors in the Articles Published in the Korean Journal of Pain

    PubMed Central

    Yim, Kyoung Hoon; Han, Kyoung Ah; Park, Soo Young

    2010-01-01

    Background Statistical analysis is essential in regard to obtaining objective reliability for medical research. However, medical researchers do not have enough statistical knowledge to properly analyze their study data. To help understand and potentially alleviate this problem, we have analyzed the statistical methods and errors of articles published in the Korean Journal of Pain (KJP), with the intention to improve the statistical quality of the journal. Methods All the articles, except case reports and editorials, published from 2004 to 2008 in the KJP were reviewed. The types of applied statistical methods and errors in the articles were evaluated. Results One hundred and thirty-nine original articles were reviewed. Inferential statistics and descriptive statistics were used in 119 papers and 20 papers, respectively. Only 20.9% of the papers were free from statistical errors. The most commonly adopted statistical method was the t-test (21.0%) followed by the chi-square test (15.9%). Errors of omission were encountered 101 times in 70 papers. Among the errors of omission, "no statistics used even though statistical methods were required" was the most common (40.6%). The errors of commission were encountered 165 times in 86 papers, among which "parametric inference for nonparametric data" was the most common (33.9%). Conclusions We found various types of statistical errors in the articles published in the KJP. This suggests that meticulous attention should be given not only in the applying statistical procedures but also in the reviewing process to improve the value of the article. PMID:20552071

  14. PyClone: statistical inference of clonal population structure in cancer.

    PubMed

    Roth, Andrew; Khattra, Jaswinder; Yap, Damian; Wan, Adrian; Laks, Emma; Biele, Justina; Ha, Gavin; Aparicio, Samuel; Bouchard-Côté, Alexandre; Shah, Sohrab P

    2014-04-01

    We introduce PyClone, a statistical model for inference of clonal population structures in cancers. PyClone is a Bayesian clustering method for grouping sets of deeply sequenced somatic mutations into putative clonal clusters while estimating their cellular prevalences and accounting for allelic imbalances introduced by segmental copy-number changes and normal-cell contamination. Single-cell sequencing validation demonstrates PyClone's accuracy.

  15. Fair Inference on Outcomes

    PubMed Central

    Nabi, Razieh; Shpitser, Ilya

    2017-01-01

    In this paper, we consider the problem of fair statistical inference involving outcome variables. Examples include classification and regression problems, and estimating treatment effects in randomized trials or observational data. The issue of fairness arises in such problems where some covariates or treatments are “sensitive,” in the sense of having potential of creating discrimination. In this paper, we argue that the presence of discrimination can be formalized in a sensible way as the presence of an effect of a sensitive covariate on the outcome along certain causal pathways, a view which generalizes (Pearl 2009). A fair outcome model can then be learned by solving a constrained optimization problem. We discuss a number of complications that arise in classical statistical inference due to this view and provide workarounds based on recent work in causal and semi-parametric inference.

  16. P values are only an index to evidence: 20th- vs. 21st-century statistical science.

    PubMed

    Burnham, K P; Anderson, D R

    2014-03-01

    Early statistical methods focused on pre-data probability statements (i.e., data as random variables) such as P values; these are not really inferences nor are P values evidential. Statistical science clung to these principles throughout much of the 20th century as a wide variety of methods were developed for special cases. Looking back, it is clear that the underlying paradigm (i.e., testing and P values) was weak. As Kuhn (1970) suggests, new paradigms have taken the place of earlier ones: this is a goal of good science. New methods have been developed and older methods extended and these allow proper measures of strength of evidence and multimodel inference. It is time to move forward with sound theory and practice for the difficult practical problems that lie ahead. Given data the useful foundation shifts to post-data probability statements such as model probabilities (Akaike weights) or related quantities such as odds ratios and likelihood intervals. These new methods allow formal inference from multiple models in the a prior set. These quantities are properly evidential. The past century was aimed at finding the "best" model and making inferences from it. The goal in the 21st century is to base inference on all the models weighted by their model probabilities (model averaging). Estimates of precision can include model selection uncertainty leading to variances conditional on the model set. The 21st century will be about the quantification of information, proper measures of evidence, and multi-model inference. Nelder (1999:261) concludes, "The most important task before us in developing statistical science is to demolish the P-value culture, which has taken root to a frightening extent in many areas of both pure and applied science and technology".

  17. Statistical inference of the generation probability of T-cell receptors from sequence repertoires.

    PubMed

    Murugan, Anand; Mora, Thierry; Walczak, Aleksandra M; Callan, Curtis G

    2012-10-02

    Stochastic rearrangement of germline V-, D-, and J-genes to create variable coding sequence for certain cell surface receptors is at the origin of immune system diversity. This process, known as "VDJ recombination", is implemented via a series of stochastic molecular events involving gene choices and random nucleotide insertions between, and deletions from, genes. We use large sequence repertoires of the variable CDR3 region of human CD4+ T-cell receptor beta chains to infer the statistical properties of these basic biochemical events. Because any given CDR3 sequence can be produced in multiple ways, the probability distribution of hidden recombination events cannot be inferred directly from the observed sequences; we therefore develop a maximum likelihood inference method to achieve this end. To separate the properties of the molecular rearrangement mechanism from the effects of selection, we focus on nonproductive CDR3 sequences in T-cell DNA. We infer the joint distribution of the various generative events that occur when a new T-cell receptor gene is created. We find a rich picture of correlation (and absence thereof), providing insight into the molecular mechanisms involved. The generative event statistics are consistent between individuals, suggesting a universal biochemical process. Our probabilistic model predicts the generation probability of any specific CDR3 sequence by the primitive recombination process, allowing us to quantify the potential diversity of the T-cell repertoire and to understand why some sequences are shared between individuals. We argue that the use of formal statistical inference methods, of the kind presented in this paper, will be essential for quantitative understanding of the generation and evolution of diversity in the adaptive immune system.

  18. Statistical inference methods for two crossing survival curves: a comparison of methods.

    PubMed

    Li, Huimin; Han, Dong; Hou, Yawen; Chen, Huilin; Chen, Zheng

    2015-01-01

    A common problem that is encountered in medical applications is the overall homogeneity of survival distributions when two survival curves cross each other. A survey demonstrated that under this condition, which was an obvious violation of the assumption of proportional hazard rates, the log-rank test was still used in 70% of studies. Several statistical methods have been proposed to solve this problem. However, in many applications, it is difficult to specify the types of survival differences and choose an appropriate method prior to analysis. Thus, we conducted an extensive series of Monte Carlo simulations to investigate the power and type I error rate of these procedures under various patterns of crossing survival curves with different censoring rates and distribution parameters. Our objective was to evaluate the strengths and weaknesses of tests in different situations and for various censoring rates and to recommend an appropriate test that will not fail for a wide range of applications. Simulation studies demonstrated that adaptive Neyman's smooth tests and the two-stage procedure offer higher power and greater stability than other methods when the survival distributions cross at early, middle or late times. Even for proportional hazards, both methods maintain acceptable power compared with the log-rank test. In terms of the type I error rate, Renyi and Cramér-von Mises tests are relatively conservative, whereas the statistics of the Lin-Xu test exhibit apparent inflation as the censoring rate increases. Other tests produce results close to the nominal 0.05 level. In conclusion, adaptive Neyman's smooth tests and the two-stage procedure are found to be the most stable and feasible approaches for a variety of situations and censoring rates. Therefore, they are applicable to a wider spectrum of alternatives compared with other tests.

  19. Statistical Inference Methods for Two Crossing Survival Curves: A Comparison of Methods

    PubMed Central

    Li, Huimin; Han, Dong; Hou, Yawen; Chen, Huilin; Chen, Zheng

    2015-01-01

    A common problem that is encountered in medical applications is the overall homogeneity of survival distributions when two survival curves cross each other. A survey demonstrated that under this condition, which was an obvious violation of the assumption of proportional hazard rates, the log-rank test was still used in 70% of studies. Several statistical methods have been proposed to solve this problem. However, in many applications, it is difficult to specify the types of survival differences and choose an appropriate method prior to analysis. Thus, we conducted an extensive series of Monte Carlo simulations to investigate the power and type I error rate of these procedures under various patterns of crossing survival curves with different censoring rates and distribution parameters. Our objective was to evaluate the strengths and weaknesses of tests in different situations and for various censoring rates and to recommend an appropriate test that will not fail for a wide range of applications. Simulation studies demonstrated that adaptive Neyman’s smooth tests and the two-stage procedure offer higher power and greater stability than other methods when the survival distributions cross at early, middle or late times. Even for proportional hazards, both methods maintain acceptable power compared with the log-rank test. In terms of the type I error rate, Renyi and Cramér—von Mises tests are relatively conservative, whereas the statistics of the Lin-Xu test exhibit apparent inflation as the censoring rate increases. Other tests produce results close to the nominal 0.05 level. In conclusion, adaptive Neyman’s smooth tests and the two-stage procedure are found to be the most stable and feasible approaches for a variety of situations and censoring rates. Therefore, they are applicable to a wider spectrum of alternatives compared with other tests. PMID:25615624

  20. Statistical Inference and Reverse Engineering of Gene Regulatory Networks from Observational Expression Data

    PubMed Central

    Emmert-Streib, Frank; Glazko, Galina V.; Altay, Gökmen; de Matos Simoes, Ricardo

    2012-01-01

    In this paper, we present a systematic and conceptual overview of methods for inferring gene regulatory networks from observational gene expression data. Further, we discuss two classic approaches to infer causal structures and compare them with contemporary methods by providing a conceptual categorization thereof. We complement the above by surveying global and local evaluation measures for assessing the performance of inference algorithms. PMID:22408642

  1. Information Entropy Production of Maximum Entropy Markov Chains from Spike Trains

    NASA Astrophysics Data System (ADS)

    Cofré, Rodrigo; Maldonado, Cesar

    2018-01-01

    We consider the maximum entropy Markov chain inference approach to characterize the collective statistics of neuronal spike trains, focusing on the statistical properties of the inferred model. We review large deviations techniques useful in this context to describe properties of accuracy and convergence in terms of sampling size. We use these results to study the statistical fluctuation of correlations, distinguishability and irreversibility of maximum entropy Markov chains. We illustrate these applications using simple examples where the large deviation rate function is explicitly obtained for maximum entropy models of relevance in this field.

  2. Weighted analysis of composite endpoints with simultaneous inference for flexible weight constraints.

    PubMed

    Duc, Anh Nguyen; Wolbers, Marcel

    2017-02-10

    Composite endpoints are widely used as primary endpoints of randomized controlled trials across clinical disciplines. A common critique of the conventional analysis of composite endpoints is that all disease events are weighted equally, whereas their clinical relevance may differ substantially. We address this by introducing a framework for the weighted analysis of composite endpoints and interpretable test statistics, which are applicable to both binary and time-to-event data. To cope with the difficulty of selecting an exact set of weights, we propose a method for constructing simultaneous confidence intervals and tests that asymptotically preserve the family-wise type I error in the strong sense across families of weights satisfying flexible inequality or order constraints based on the theory of χ¯2-distributions. We show that the method achieves the nominal simultaneous coverage rate with substantial efficiency gains over Scheffé's procedure in a simulation study and apply it to trials in cardiovascular disease and enteric fever. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  3. Structured statistical models of inductive reasoning.

    PubMed

    Kemp, Charles; Tenenbaum, Joshua B

    2009-01-01

    Everyday inductive inferences are often guided by rich background knowledge. Formal models of induction should aim to incorporate this knowledge and should explain how different kinds of knowledge lead to the distinctive patterns of reasoning found in different inductive contexts. This article presents a Bayesian framework that attempts to meet both goals and describes [corrected] 4 applications of the framework: a taxonomic model, a spatial model, a threshold model, and a causal model. Each model makes probabilistic inferences about the extensions of novel properties, but the priors for the 4 models are defined over different kinds of structures that capture different relationships between the categories in a domain. The framework therefore shows how statistical inference can operate over structured background knowledge, and the authors argue that this interaction between structure and statistics is critical for explaining the power and flexibility of human reasoning.

  4. Inference on network statistics by restricting to the network space: applications to sexual history data.

    PubMed

    Goyal, Ravi; De Gruttola, Victor

    2018-01-30

    Analysis of sexual history data intended to describe sexual networks presents many challenges arising from the fact that most surveys collect information on only a very small fraction of the population of interest. In addition, partners are rarely identified and responses are subject to reporting biases. Typically, each network statistic of interest, such as mean number of sexual partners for men or women, is estimated independently of other network statistics. There is, however, a complex relationship among networks statistics; and knowledge of these relationships can aid in addressing concerns mentioned earlier. We develop a novel method that constrains a posterior predictive distribution of a collection of network statistics in order to leverage the relationships among network statistics in making inference about network properties of interest. The method ensures that inference on network properties is compatible with an actual network. Through extensive simulation studies, we also demonstrate that use of this method can improve estimates in settings where there is uncertainty that arises both from sampling and from systematic reporting bias compared with currently available approaches to estimation. To illustrate the method, we apply it to estimate network statistics using data from the Chicago Health and Social Life Survey. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  5. Statistical comparison of a hybrid approach with approximate and exact inference models for Fusion 2+

    NASA Astrophysics Data System (ADS)

    Lee, K. David; Wiesenfeld, Eric; Gelfand, Andrew

    2007-04-01

    One of the greatest challenges in modern combat is maintaining a high level of timely Situational Awareness (SA). In many situations, computational complexity and accuracy considerations make the development and deployment of real-time, high-level inference tools very difficult. An innovative hybrid framework that combines Bayesian inference, in the form of Bayesian Networks, and Possibility Theory, in the form of Fuzzy Logic systems, has recently been introduced to provide a rigorous framework for high-level inference. In previous research, the theoretical basis and benefits of the hybrid approach have been developed. However, lacking is a concrete experimental comparison of the hybrid framework with traditional fusion methods, to demonstrate and quantify this benefit. The goal of this research, therefore, is to provide a statistical analysis on the comparison of the accuracy and performance of hybrid network theory, with pure Bayesian and Fuzzy systems and an inexact Bayesian system approximated using Particle Filtering. To accomplish this task, domain specific models will be developed under these different theoretical approaches and then evaluated, via Monte Carlo Simulation, in comparison to situational ground truth to measure accuracy and fidelity. Following this, a rigorous statistical analysis of the performance results will be performed, to quantify the benefit of hybrid inference to other fusion tools.

  6. On the analysis of very small samples of Gaussian repeated measurements: an alternative approach.

    PubMed

    Westgate, Philip M; Burchett, Woodrow W

    2017-03-15

    The analysis of very small samples of Gaussian repeated measurements can be challenging. First, due to a very small number of independent subjects contributing outcomes over time, statistical power can be quite small. Second, nuisance covariance parameters must be appropriately accounted for in the analysis in order to maintain the nominal test size. However, available statistical strategies that ensure valid statistical inference may lack power, whereas more powerful methods may have the potential for inflated test sizes. Therefore, we explore an alternative approach to the analysis of very small samples of Gaussian repeated measurements, with the goal of maintaining valid inference while also improving statistical power relative to other valid methods. This approach uses generalized estimating equations with a bias-corrected empirical covariance matrix that accounts for all small-sample aspects of nuisance correlation parameter estimation in order to maintain valid inference. Furthermore, the approach utilizes correlation selection strategies with the goal of choosing the working structure that will result in the greatest power. In our study, we show that when accurate modeling of the nuisance correlation structure impacts the efficiency of regression parameter estimation, this method can improve power relative to existing methods that yield valid inference. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  7. Statistical inference for noisy nonlinear ecological dynamic systems.

    PubMed

    Wood, Simon N

    2010-08-26

    Chaotic ecological dynamic systems defy conventional statistical analysis. Systems with near-chaotic dynamics are little better. Such systems are almost invariably driven by endogenous dynamic processes plus demographic and environmental process noise, and are only observable with error. Their sensitivity to history means that minute changes in the driving noise realization, or the system parameters, will cause drastic changes in the system trajectory. This sensitivity is inherited and amplified by the joint probability density of the observable data and the process noise, rendering it useless as the basis for obtaining measures of statistical fit. Because the joint density is the basis for the fit measures used by all conventional statistical methods, this is a major theoretical shortcoming. The inability to make well-founded statistical inferences about biological dynamic models in the chaotic and near-chaotic regimes, other than on an ad hoc basis, leaves dynamic theory without the methods of quantitative validation that are essential tools in the rest of biological science. Here I show that this impasse can be resolved in a simple and general manner, using a method that requires only the ability to simulate the observed data on a system from the dynamic model about which inferences are required. The raw data series are reduced to phase-insensitive summary statistics, quantifying local dynamic structure and the distribution of observations. Simulation is used to obtain the mean and the covariance matrix of the statistics, given model parameters, allowing the construction of a 'synthetic likelihood' that assesses model fit. This likelihood can be explored using a straightforward Markov chain Monte Carlo sampler, but one further post-processing step returns pure likelihood-based inference. I apply the method to establish the dynamic nature of the fluctuations in Nicholson's classic blowfly experiments.

  8. Statistical inference on censored data for targeted clinical trials under enrichment design.

    PubMed

    Chen, Chen-Fang; Lin, Jr-Rung; Liu, Jen-Pei

    2013-01-01

    For the traditional clinical trials, inclusion and exclusion criteria are usually based on some clinical endpoints; the genetic or genomic variability of the trial participants are not totally utilized in the criteria. After completion of the human genome project, the disease targets at the molecular level can be identified and can be utilized for the treatment of diseases. However, the accuracy of diagnostic devices for identification of such molecular targets is usually not perfect. Some of the patients enrolled in targeted clinical trials with a positive result for the molecular target might not have the specific molecular targets. As a result, the treatment effect may be underestimated in the patient population truly with the molecular target. To resolve this issue, under the exponential distribution, we develop inferential procedures for the treatment effects of the targeted drug based on the censored endpoints in the patients truly with the molecular targets. Under an enrichment design, we propose using the expectation-maximization algorithm in conjunction with the bootstrap technique to incorporate the inaccuracy of the diagnostic device for detection of the molecular targets on the inference of the treatment effects. A simulation study was conducted to empirically investigate the performance of the proposed methods. Simulation results demonstrate that under the exponential distribution, the proposed estimator is nearly unbiased with adequate precision, and the confidence interval can provide adequate coverage probability. In addition, the proposed testing procedure can adequately control the size with sufficient power. On the other hand, when the proportional hazard assumption is violated, additional simulation studies show that the type I error rate is not controlled at the nominal level and is an increasing function of the positive predictive value. A numerical example illustrates the proposed procedures. Copyright © 2013 John Wiley & Sons, Ltd.

  9. Probabilistic Graphical Model Representation in Phylogenetics

    PubMed Central

    Höhna, Sebastian; Heath, Tracy A.; Boussau, Bastien; Landis, Michael J.; Ronquist, Fredrik; Huelsenbeck, John P.

    2014-01-01

    Recent years have seen a rapid expansion of the model space explored in statistical phylogenetics, emphasizing the need for new approaches to statistical model representation and software development. Clear communication and representation of the chosen model is crucial for: (i) reproducibility of an analysis, (ii) model development, and (iii) software design. Moreover, a unified, clear and understandable framework for model representation lowers the barrier for beginners and nonspecialists to grasp complex phylogenetic models, including their assumptions and parameter/variable dependencies. Graphical modeling is a unifying framework that has gained in popularity in the statistical literature in recent years. The core idea is to break complex models into conditionally independent distributions. The strength lies in the comprehensibility, flexibility, and adaptability of this formalism, and the large body of computational work based on it. Graphical models are well-suited to teach statistical models, to facilitate communication among phylogeneticists and in the development of generic software for simulation and statistical inference. Here, we provide an introduction to graphical models for phylogeneticists and extend the standard graphical model representation to the realm of phylogenetics. We introduce a new graphical model component, tree plates, to capture the changing structure of the subgraph corresponding to a phylogenetic tree. We describe a range of phylogenetic models using the graphical model framework and introduce modules to simplify the representation of standard components in large and complex models. Phylogenetic model graphs can be readily used in simulation, maximum likelihood inference, and Bayesian inference using, for example, Metropolis–Hastings or Gibbs sampling of the posterior distribution. [Computation; graphical models; inference; modularization; statistical phylogenetics; tree plate.] PMID:24951559

  10. Protein and gene model inference based on statistical modeling in k-partite graphs.

    PubMed

    Gerster, Sarah; Qeli, Ermir; Ahrens, Christian H; Bühlmann, Peter

    2010-07-06

    One of the major goals of proteomics is the comprehensive and accurate description of a proteome. Shotgun proteomics, the method of choice for the analysis of complex protein mixtures, requires that experimentally observed peptides are mapped back to the proteins they were derived from. This process is also known as protein inference. We present Markovian Inference of Proteins and Gene Models (MIPGEM), a statistical model based on clearly stated assumptions to address the problem of protein and gene model inference for shotgun proteomics data. In particular, we are dealing with dependencies among peptides and proteins using a Markovian assumption on k-partite graphs. We are also addressing the problems of shared peptides and ambiguous proteins by scoring the encoding gene models. Empirical results on two control datasets with synthetic mixtures of proteins and on complex protein samples of Saccharomyces cerevisiae, Drosophila melanogaster, and Arabidopsis thaliana suggest that the results with MIPGEM are competitive with existing tools for protein inference.

  11. minet: A R/Bioconductor package for inferring large transcriptional networks using mutual information.

    PubMed

    Meyer, Patrick E; Lafitte, Frédéric; Bontempi, Gianluca

    2008-10-29

    This paper presents the R/Bioconductor package minet (version 1.1.6) which provides a set of functions to infer mutual information networks from a dataset. Once fed with a microarray dataset, the package returns a network where nodes denote genes, edges model statistical dependencies between genes and the weight of an edge quantifies the statistical evidence of a specific (e.g transcriptional) gene-to-gene interaction. Four different entropy estimators are made available in the package minet (empirical, Miller-Madow, Schurmann-Grassberger and shrink) as well as four different inference methods, namely relevance networks, ARACNE, CLR and MRNET. Also, the package integrates accuracy assessment tools, like F-scores, PR-curves and ROC-curves in order to compare the inferred network with a reference one. The package minet provides a series of tools for inferring transcriptional networks from microarray data. It is freely available from the Comprehensive R Archive Network (CRAN) as well as from the Bioconductor website.

  12. Computational statistics using the Bayesian Inference Engine

    NASA Astrophysics Data System (ADS)

    Weinberg, Martin D.

    2013-09-01

    This paper introduces the Bayesian Inference Engine (BIE), a general parallel, optimized software package for parameter inference and model selection. This package is motivated by the analysis needs of modern astronomical surveys and the need to organize and reuse expensive derived data. The BIE is the first platform for computational statistics designed explicitly to enable Bayesian update and model comparison for astronomical problems. Bayesian update is based on the representation of high-dimensional posterior distributions using metric-ball-tree based kernel density estimation. Among its algorithmic offerings, the BIE emphasizes hybrid tempered Markov chain Monte Carlo schemes that robustly sample multimodal posterior distributions in high-dimensional parameter spaces. Moreover, the BIE implements a full persistence or serialization system that stores the full byte-level image of the running inference and previously characterized posterior distributions for later use. Two new algorithms to compute the marginal likelihood from the posterior distribution, developed for and implemented in the BIE, enable model comparison for complex models and data sets. Finally, the BIE was designed to be a collaborative platform for applying Bayesian methodology to astronomy. It includes an extensible object-oriented and easily extended framework that implements every aspect of the Bayesian inference. By providing a variety of statistical algorithms for all phases of the inference problem, a scientist may explore a variety of approaches with a single model and data implementation. Additional technical details and download details are available from http://www.astro.umass.edu/bie. The BIE is distributed under the GNU General Public License.

  13. Where Low and High Inference Data Converge: Validation of CLASS Assessment of Mathematics Instruction Using Mobile Eye Tracking with Expert and Novice Teachers

    ERIC Educational Resources Information Center

    Cortina, Kai S.; Miller, Kevin F.; McKenzie, Ryan; Epstein, Alanna

    2015-01-01

    Classroom observation research and research on teacher expertise are similar in their reliance on observational data with high-inference procedure to assess the quality of instruction. Expertise research usually uses low-inference measures like eye tracking to identify qualitative difference between expert and novice behaviors and cognition. In…

  14. Estimation of parameter uncertainty for an activated sludge model using Bayesian inference: a comparison with the frequentist method.

    PubMed

    Zonta, Zivko J; Flotats, Xavier; Magrí, Albert

    2014-08-01

    The procedure commonly used for the assessment of the parameters included in activated sludge models (ASMs) relies on the estimation of their optimal value within a confidence region (i.e. frequentist inference). Once optimal values are estimated, parameter uncertainty is computed through the covariance matrix. However, alternative approaches based on the consideration of the model parameters as probability distributions (i.e. Bayesian inference), may be of interest. The aim of this work is to apply (and compare) both Bayesian and frequentist inference methods when assessing uncertainty for an ASM-type model, which considers intracellular storage and biomass growth, simultaneously. Practical identifiability was addressed exclusively considering respirometric profiles based on the oxygen uptake rate and with the aid of probabilistic global sensitivity analysis. Parameter uncertainty was thus estimated according to both the Bayesian and frequentist inferential procedures. Results were compared in order to evidence the strengths and weaknesses of both approaches. Since it was demonstrated that Bayesian inference could be reduced to a frequentist approach under particular hypotheses, the former can be considered as a more generalist methodology. Hence, the use of Bayesian inference is encouraged for tackling inferential issues in ASM environments.

  15. Fast inference of interactions in assemblies of stochastic integrate-and-fire neurons from spike recordings.

    PubMed

    Monasson, Remi; Cocco, Simona

    2011-10-01

    We present two Bayesian procedures to infer the interactions and external currents in an assembly of stochastic integrate-and-fire neurons from the recording of their spiking activity. The first procedure is based on the exact calculation of the most likely time courses of the neuron membrane potentials conditioned by the recorded spikes, and is exact for a vanishing noise variance and for an instantaneous synaptic integration. The second procedure takes into account the presence of fluctuations around the most likely time courses of the potentials, and can deal with moderate noise levels. The running time of both procedures is proportional to the number S of spikes multiplied by the squared number N of neurons. The algorithms are validated on synthetic data generated by networks with known couplings and currents. We also reanalyze previously published recordings of the activity of the salamander retina (including from 32 to 40 neurons, and from 65,000 to 170,000 spikes). We study the dependence of the inferred interactions on the membrane leaking time; the differences and similarities with the classical cross-correlation analysis are discussed.

  16. Teach a Confidence Interval for the Median in the First Statistics Course

    ERIC Educational Resources Information Center

    Howington, Eric B.

    2017-01-01

    Few introductory statistics courses consider statistical inference for the median. This article argues in favour of adding a confidence interval for the median to the first statistics course. Several methods suitable for introductory statistics students are identified and briefly reviewed.

  17. Infinite von Mises-Fisher Mixture Modeling of Whole Brain fMRI Data.

    PubMed

    Røge, Rasmus E; Madsen, Kristoffer H; Schmidt, Mikkel N; Mørup, Morten

    2017-10-01

    Cluster analysis of functional magnetic resonance imaging (fMRI) data is often performed using gaussian mixture models, but when the time series are standardized such that the data reside on a hypersphere, this modeling assumption is questionable. The consequences of ignoring the underlying spherical manifold are rarely analyzed, in part due to the computational challenges imposed by directional statistics. In this letter, we discuss a Bayesian von Mises-Fisher (vMF) mixture model for data on the unit hypersphere and present an efficient inference procedure based on collapsed Markov chain Monte Carlo sampling. Comparing the vMF and gaussian mixture models on synthetic data, we demonstrate that the vMF model has a slight advantage inferring the true underlying clustering when compared to gaussian-based models on data generated from both a mixture of vMFs and a mixture of gaussians subsequently normalized. Thus, when performing model selection, the two models are not in agreement. Analyzing multisubject whole brain resting-state fMRI data from healthy adult subjects, we find that the vMF mixture model is considerably more reliable than the gaussian mixture model when comparing solutions across models trained on different groups of subjects, and again we find that the two models disagree on the optimal number of components. The analysis indicates that the fMRI data support more than a thousand clusters, and we confirm this is not a result of overfitting by demonstrating better prediction on data from held-out subjects. Our results highlight the utility of using directional statistics to model standardized fMRI data and demonstrate that whole brain segmentation of fMRI data requires a very large number of functional units in order to adequately account for the discernible statistical patterns in the data.

  18. Statistical inference methods for sparse biological time series data.

    PubMed

    Ndukum, Juliet; Fonseca, Luís L; Santos, Helena; Voit, Eberhard O; Datta, Susmita

    2011-04-25

    Comparing metabolic profiles under different biological perturbations has become a powerful approach to investigating the functioning of cells. The profiles can be taken as single snapshots of a system, but more information is gained if they are measured longitudinally over time. The results are short time series consisting of relatively sparse data that cannot be analyzed effectively with standard time series techniques, such as autocorrelation and frequency domain methods. In this work, we study longitudinal time series profiles of glucose consumption in the yeast Saccharomyces cerevisiae under different temperatures and preconditioning regimens, which we obtained with methods of in vivo nuclear magnetic resonance (NMR) spectroscopy. For the statistical analysis we first fit several nonlinear mixed effect regression models to the longitudinal profiles and then used an ANOVA likelihood ratio method in order to test for significant differences between the profiles. The proposed methods are capable of distinguishing metabolic time trends resulting from different treatments and associate significance levels to these differences. Among several nonlinear mixed-effects regression models tested, a three-parameter logistic function represents the data with highest accuracy. ANOVA and likelihood ratio tests suggest that there are significant differences between the glucose consumption rate profiles for cells that had been--or had not been--preconditioned by heat during growth. Furthermore, pair-wise t-tests reveal significant differences in the longitudinal profiles for glucose consumption rates between optimal conditions and heat stress, optimal and recovery conditions, and heat stress and recovery conditions (p-values <0.0001). We have developed a nonlinear mixed effects model that is appropriate for the analysis of sparse metabolic and physiological time profiles. The model permits sound statistical inference procedures, based on ANOVA likelihood ratio tests, for testing the significance of differences between short time course data under different biological perturbations.

  19. Pointwise probability reinforcements for robust statistical inference.

    PubMed

    Frénay, Benoît; Verleysen, Michel

    2014-02-01

    Statistical inference using machine learning techniques may be difficult with small datasets because of abnormally frequent data (AFDs). AFDs are observations that are much more frequent in the training sample that they should be, with respect to their theoretical probability, and include e.g. outliers. Estimates of parameters tend to be biased towards models which support such data. This paper proposes to introduce pointwise probability reinforcements (PPRs): the probability of each observation is reinforced by a PPR and a regularisation allows controlling the amount of reinforcement which compensates for AFDs. The proposed solution is very generic, since it can be used to robustify any statistical inference method which can be formulated as a likelihood maximisation. Experiments show that PPRs can be easily used to tackle regression, classification and projection: models are freed from the influence of outliers. Moreover, outliers can be filtered manually since an abnormality degree is obtained for each observation. Copyright © 2013 Elsevier Ltd. All rights reserved.

  20. Probability, statistics, and computational science.

    PubMed

    Beerenwinkel, Niko; Siebourg, Juliane

    2012-01-01

    In this chapter, we review basic concepts from probability theory and computational statistics that are fundamental to evolutionary genomics. We provide a very basic introduction to statistical modeling and discuss general principles, including maximum likelihood and Bayesian inference. Markov chains, hidden Markov models, and Bayesian network models are introduced in more detail as they occur frequently and in many variations in genomics applications. In particular, we discuss efficient inference algorithms and methods for learning these models from partially observed data. Several simple examples are given throughout the text, some of which point to models that are discussed in more detail in subsequent chapters.

  1. A Not-So-Fundamental Limitation on Studying Complex Systems with Statistics: Comment on Rabin (2011)

    NASA Astrophysics Data System (ADS)

    Thomas, Drew M.

    2012-12-01

    Although living organisms are affected by many interrelated and unidentified variables, this complexity does not automatically impose a fundamental limitation on statistical inference. Nor need one invoke such complexity as an explanation of the "Truth Wears Off" or "decline" effect; similar "decline" effects occur with far simpler systems studied in physics. Selective reporting and publication bias, and scientists' biases in favor of reporting eye-catching results (in general) or conforming to others' results (in physics) better explain this feature of the "Truth Wears Off" effect than Rabin's suggested limitation on statistical inference.

  2. Extending the Functionality of Behavioural Change-Point Analysis with k-Means Clustering: A Case Study with the Little Penguin (Eudyptula minor)

    PubMed Central

    Zhang, Jingjing; Dennis, Todd E.

    2015-01-01

    We present a simple framework for classifying mutually exclusive behavioural states within the geospatial lifelines of animals. This method involves use of three sequentially applied statistical procedures: (1) behavioural change point analysis to partition movement trajectories into discrete bouts of same-state behaviours, based on abrupt changes in the spatio-temporal autocorrelation structure of movement parameters; (2) hierarchical multivariate cluster analysis to determine the number of different behavioural states; and (3) k-means clustering to classify inferred bouts of same-state location observations into behavioural modes. We demonstrate application of the method by analysing synthetic trajectories of known ‘artificial behaviours’ comprised of different correlated random walks, as well as real foraging trajectories of little penguins (Eudyptula minor) obtained by global-positioning-system telemetry. Our results show that the modelling procedure correctly classified 92.5% of all individual location observations in the synthetic trajectories, demonstrating reasonable ability to successfully discriminate behavioural modes. Most individual little penguins were found to exhibit three unique behavioural states (resting, commuting/active searching, area-restricted foraging), with variation in the timing and locations of observations apparently related to ambient light, bathymetry, and proximity to coastlines and river mouths. Addition of k-means clustering extends the utility of behavioural change point analysis, by providing a simple means through which the behaviours inferred for the location observations comprising individual movement trajectories can be objectively classified. PMID:25922935

  3. Extending the Functionality of Behavioural Change-Point Analysis with k-Means Clustering: A Case Study with the Little Penguin (Eudyptula minor).

    PubMed

    Zhang, Jingjing; O'Reilly, Kathleen M; Perry, George L W; Taylor, Graeme A; Dennis, Todd E

    2015-01-01

    We present a simple framework for classifying mutually exclusive behavioural states within the geospatial lifelines of animals. This method involves use of three sequentially applied statistical procedures: (1) behavioural change point analysis to partition movement trajectories into discrete bouts of same-state behaviours, based on abrupt changes in the spatio-temporal autocorrelation structure of movement parameters; (2) hierarchical multivariate cluster analysis to determine the number of different behavioural states; and (3) k-means clustering to classify inferred bouts of same-state location observations into behavioural modes. We demonstrate application of the method by analysing synthetic trajectories of known 'artificial behaviours' comprised of different correlated random walks, as well as real foraging trajectories of little penguins (Eudyptula minor) obtained by global-positioning-system telemetry. Our results show that the modelling procedure correctly classified 92.5% of all individual location observations in the synthetic trajectories, demonstrating reasonable ability to successfully discriminate behavioural modes. Most individual little penguins were found to exhibit three unique behavioural states (resting, commuting/active searching, area-restricted foraging), with variation in the timing and locations of observations apparently related to ambient light, bathymetry, and proximity to coastlines and river mouths. Addition of k-means clustering extends the utility of behavioural change point analysis, by providing a simple means through which the behaviours inferred for the location observations comprising individual movement trajectories can be objectively classified.

  4. Statistical inference in comparing DInSAR and GPS data in fault areas

    NASA Astrophysics Data System (ADS)

    Barzaghi, R.; Borghi, A.; Kunzle, A.

    2012-04-01

    DInSAR and GPS data are nowadays currently used in geophysical investigation, e.g. for estimating slip rate over the fault plane in seismogenic areas. This analysis is usually done by mapping the surface deformation rates as estimated by GPS and DInSAR over the fault plane using suitable geophysical models (e.g. the Okada model). Usually, DInSAR vertical velocities and GPS horizontal velocities are used for getting an integrated slip estimate. However, it is sometimes critical to merge the two kinds of information since they may reflect a common undergoing geophysical signal plus different disturbing signals that are not related to the fault dynamic. In GPS and DInSAR data analysis, these artifacts are mainly connected to signal propagation in the atmosphere and to hydrological phenomena (e.g. variation in the water table). Thus, some coherence test between the two information must be carried out in order to properly merge the GPS and DInSAR velocities in the inversion procedure. To this aim, statistical tests have been studied to check for the compatibility of the two deformation rate estimates coming from GPS and DInSAR data analysis. This has been done according both to standard and Bayesian testing methodology. The effectiveness of the proposed inference methods has been checked with numerical simulations in the case of a normal fault. The fault structure is defined following the Pollino fault model and both GPS and DInSAR data are simulated according to real data acquired in this area.

  5. Bayesian Inference: with ecological applications

    USGS Publications Warehouse

    Link, William A.; Barker, Richard J.

    2010-01-01

    This text provides a mathematically rigorous yet accessible and engaging introduction to Bayesian inference with relevant examples that will be of interest to biologists working in the fields of ecology, wildlife management and environmental studies as well as students in advanced undergraduate statistics.. This text opens the door to Bayesian inference, taking advantage of modern computational efficiencies and easily accessible software to evaluate complex hierarchical models.

  6. Theory-based Bayesian Models of Inductive Inference

    DTIC Science & Technology

    2010-07-19

    Subjective randomness and natural scene statistics. Psychonomic Bulletin & Review . http://cocosci.berkeley.edu/tom/papers/randscenes.pdf Page 1...in press). Exemplar models as a mechanism for performing Bayesian inference. Psychonomic Bulletin & Review . http://cocosci.berkeley.edu/tom

  7. Differences in Performance Among Test Statistics for Assessing Phylogenomic Model Adequacy.

    PubMed

    Duchêne, David A; Duchêne, Sebastian; Ho, Simon Y W

    2018-05-18

    Statistical phylogenetic analyses of genomic data depend on models of nucleotide or amino acid substitution. The adequacy of these substitution models can be assessed using a number of test statistics, allowing the model to be rejected when it is found to provide a poor description of the evolutionary process. A potentially valuable use of model-adequacy test statistics is to identify when data sets are likely to produce unreliable phylogenetic estimates, but their differences in performance are rarely explored. We performed a comprehensive simulation study to identify test statistics that are sensitive to some of the most commonly cited sources of phylogenetic estimation error. Our results show that, for many test statistics, traditional thresholds for assessing model adequacy can fail to reject the model when the phylogenetic inferences are inaccurate and imprecise. This is particularly problematic when analysing loci that have few variable informative sites. We propose new thresholds for assessing substitution model adequacy and demonstrate their effectiveness in analyses of three phylogenomic data sets. These thresholds lead to frequent rejection of the model for loci that yield topological inferences that are imprecise and are likely to be inaccurate. We also propose the use of a summary statistic that provides a practical assessment of overall model adequacy. Our approach offers a promising means of enhancing model choice in genome-scale data sets, potentially leading to improvements in the reliability of phylogenomic inference.

  8. Inferring action structure and causal relationships in continuous sequences of human action.

    PubMed

    Buchsbaum, Daphna; Griffiths, Thomas L; Plunkett, Dillon; Gopnik, Alison; Baldwin, Dare

    2015-02-01

    In the real world, causal variables do not come pre-identified or occur in isolation, but instead are embedded within a continuous temporal stream of events. A challenge faced by both human learners and machine learning algorithms is identifying subsequences that correspond to the appropriate variables for causal inference. A specific instance of this problem is action segmentation: dividing a sequence of observed behavior into meaningful actions, and determining which of those actions lead to effects in the world. Here we present a Bayesian analysis of how statistical and causal cues to segmentation should optimally be combined, as well as four experiments investigating human action segmentation and causal inference. We find that both people and our model are sensitive to statistical regularities and causal structure in continuous action, and are able to combine these sources of information in order to correctly infer both causal relationships and segmentation boundaries. Copyright © 2014. Published by Elsevier Inc.

  9. A shift from significance test to hypothesis test through power analysis in medical research.

    PubMed

    Singh, G

    2006-01-01

    Medical research literature until recently, exhibited substantial dominance of the Fisher's significance test approach of statistical inference concentrating more on probability of type I error over Neyman-Pearson's hypothesis test considering both probability of type I and II error. Fisher's approach dichotomises results into significant or not significant results with a P value. The Neyman-Pearson's approach talks of acceptance or rejection of null hypothesis. Based on the same theory these two approaches deal with same objective and conclude in their own way. The advancement in computing techniques and availability of statistical software have resulted in increasing application of power calculations in medical research and thereby reporting the result of significance tests in the light of power of the test also. Significance test approach, when it incorporates power analysis contains the essence of hypothesis test approach. It may be safely argued that rising application of power analysis in medical research may have initiated a shift from Fisher's significance test to Neyman-Pearson's hypothesis test procedure.

  10. Accounting for response misclassification and covariate measurement error improves power and reduces bias in epidemiologic studies.

    PubMed

    Cheng, Dunlei; Branscum, Adam J; Stamey, James D

    2010-07-01

    To quantify the impact of ignoring misclassification of a response variable and measurement error in a covariate on statistical power, and to develop software for sample size and power analysis that accounts for these flaws in epidemiologic data. A Monte Carlo simulation-based procedure is developed to illustrate the differences in design requirements and inferences between analytic methods that properly account for misclassification and measurement error to those that do not in regression models for cross-sectional and cohort data. We found that failure to account for these flaws in epidemiologic data can lead to a substantial reduction in statistical power, over 25% in some cases. The proposed method substantially reduced bias by up to a ten-fold margin compared to naive estimates obtained by ignoring misclassification and mismeasurement. We recommend as routine practice that researchers account for errors in measurement of both response and covariate data when determining sample size, performing power calculations, or analyzing data from epidemiological studies. 2010 Elsevier Inc. All rights reserved.

  11. Optimal design of gene knockout experiments for gene regulatory network inference

    PubMed Central

    Ud-Dean, S. M. Minhaz; Gunawan, Rudiyanto

    2016-01-01

    Motivation: We addressed the problem of inferring gene regulatory network (GRN) from gene expression data of knockout (KO) experiments. This inference is known to be underdetermined and the GRN is not identifiable from data. Past studies have shown that suboptimal design of experiments (DOE) contributes significantly to the identifiability issue of biological networks, including GRNs. However, optimizing DOE has received much less attention than developing methods for GRN inference. Results: We developed REDuction of UnCertain Edges (REDUCE) algorithm for finding the optimal gene KO experiment for inferring directed graphs (digraphs) of GRNs. REDUCE employed ensemble inference to define uncertain gene interactions that could not be verified by prior data. The optimal experiment corresponds to the maximum number of uncertain interactions that could be verified by the resulting data. For this purpose, we introduced the concept of edge separatoid which gave a list of nodes (genes) that upon their removal would allow the verification of a particular gene interaction. Finally, we proposed a procedure that iterates over performing KO experiments, ensemble update and optimal DOE. The case studies including the inference of Escherichia coli GRN and DREAM 4 100-gene GRNs, demonstrated the efficacy of the iterative GRN inference. In comparison to systematic KOs, REDUCE could provide much higher information return per gene KO experiment and consequently more accurate GRN estimates. Conclusions: REDUCE represents an enabling tool for tackling the underdetermined GRN inference. Along with advances in gene deletion and automation technology, the iterative procedure brings an efficient and fully automated GRN inference closer to reality. Availability and implementation: MATLAB and Python scripts of REDUCE are available on www.cabsel.ethz.ch/tools/REDUCE. Contact: rudi.gunawan@chem.ethz.ch Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26568633

  12. The role of causal criteria in causal inferences: Bradford Hill's "aspects of association".

    PubMed

    Ward, Andrew C

    2009-06-17

    As noted by Wesley Salmon and many others, causal concepts are ubiquitous in every branch of theoretical science, in the practical disciplines and in everyday life. In the theoretical and practical sciences especially, people often base claims about causal relations on applications of statistical methods to data. However, the source and type of data place important constraints on the choice of statistical methods as well as on the warrant attributed to the causal claims based on the use of such methods. For example, much of the data used by people interested in making causal claims come from non-experimental, observational studies in which random allocations to treatment and control groups are not present. Thus, one of the most important problems in the social and health sciences concerns making justified causal inferences using non-experimental, observational data. In this paper, I examine one method of justifying such inferences that is especially widespread in epidemiology and the health sciences generally - the use of causal criteria. I argue that while the use of causal criteria is not appropriate for either deductive or inductive inferences, they do have an important role to play in inferences to the best explanation. As such, causal criteria, exemplified by what Bradford Hill referred to as "aspects of [statistical] associations", have an indispensible part to play in the goal of making justified causal claims.

  13. The role of causal criteria in causal inferences: Bradford Hill's "aspects of association"

    PubMed Central

    Ward, Andrew C

    2009-01-01

    As noted by Wesley Salmon and many others, causal concepts are ubiquitous in every branch of theoretical science, in the practical disciplines and in everyday life. In the theoretical and practical sciences especially, people often base claims about causal relations on applications of statistical methods to data. However, the source and type of data place important constraints on the choice of statistical methods as well as on the warrant attributed to the causal claims based on the use of such methods. For example, much of the data used by people interested in making causal claims come from non-experimental, observational studies in which random allocations to treatment and control groups are not present. Thus, one of the most important problems in the social and health sciences concerns making justified causal inferences using non-experimental, observational data. In this paper, I examine one method of justifying such inferences that is especially widespread in epidemiology and the health sciences generally – the use of causal criteria. I argue that while the use of causal criteria is not appropriate for either deductive or inductive inferences, they do have an important role to play in inferences to the best explanation. As such, causal criteria, exemplified by what Bradford Hill referred to as "aspects of [statistical] associations", have an indispensible part to play in the goal of making justified causal claims. PMID:19534788

  14. Benchmarking Inverse Statistical Approaches for Protein Structure and Design with Exactly Solvable Models.

    PubMed

    Jacquin, Hugo; Gilson, Amy; Shakhnovich, Eugene; Cocco, Simona; Monasson, Rémi

    2016-05-01

    Inverse statistical approaches to determine protein structure and function from Multiple Sequence Alignments (MSA) are emerging as powerful tools in computational biology. However the underlying assumptions of the relationship between the inferred effective Potts Hamiltonian and real protein structure and energetics remain untested so far. Here we use lattice protein model (LP) to benchmark those inverse statistical approaches. We build MSA of highly stable sequences in target LP structures, and infer the effective pairwise Potts Hamiltonians from those MSA. We find that inferred Potts Hamiltonians reproduce many important aspects of 'true' LP structures and energetics. Careful analysis reveals that effective pairwise couplings in inferred Potts Hamiltonians depend not only on the energetics of the native structure but also on competing folds; in particular, the coupling values reflect both positive design (stabilization of native conformation) and negative design (destabilization of competing folds). In addition to providing detailed structural information, the inferred Potts models used as protein Hamiltonian for design of new sequences are able to generate with high probability completely new sequences with the desired folds, which is not possible using independent-site models. Those are remarkable results as the effective LP Hamiltonians used to generate MSA are not simple pairwise models due to the competition between the folds. Our findings elucidate the reasons for the success of inverse approaches to the modelling of proteins from sequence data, and their limitations.

  15. Logical reasoning versus information processing in the dual-strategy model of reasoning.

    PubMed

    Markovits, Henry; Brisson, Janie; de Chantal, Pier-Luc

    2017-01-01

    One of the major debates concerning the nature of inferential reasoning is between counterexample-based strategies such as mental model theory and statistical strategies underlying probabilistic models. The dual-strategy model, proposed by Verschueren, Schaeken, & d'Ydewalle (2005a, 2005b), which suggests that people might have access to both kinds of strategy has been supported by several recent studies. These have shown that statistical reasoners make inferences based on using information about premises in order to generate a likelihood estimate of conclusion probability. However, while results concerning counterexample reasoners are consistent with a counterexample detection model, these results could equally be interpreted as indicating a greater sensitivity to logical form. In order to distinguish these 2 interpretations, in Studies 1 and 2, we presented reasoners with Modus ponens (MP) inferences with statistical information about premise strength and in Studies 3 and 4, naturalistic MP inferences with premises having many disabling conditions. Statistical reasoners accepted the MP inference more often than counterexample reasoners in Studies 1 and 2, while the opposite pattern was observed in Studies 3 and 4. Results show that these strategies must be defined in terms of information processing, with no clear relations to "logical" reasoning. These results have additional implications for the underlying debate about the nature of human reasoning. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  16. [Application of statistics on chronic-diseases-relating observational research papers].

    PubMed

    Hong, Zhi-heng; Wang, Ping; Cao, Wei-hua

    2012-09-01

    To study the application of statistics on Chronic-diseases-relating observational research papers which were recently published in the Chinese Medical Association Magazines, with influential index above 0.5. Using a self-developed criterion, two investigators individually participated in assessing the application of statistics on Chinese Medical Association Magazines, with influential index above 0.5. Different opinions reached an agreement through discussion. A total number of 352 papers from 6 magazines, including the Chinese Journal of Epidemiology, Chinese Journal of Oncology, Chinese Journal of Preventive Medicine, Chinese Journal of Cardiology, Chinese Journal of Internal Medicine and Chinese Journal of Endocrinology and Metabolism, were reviewed. The rate of clear statement on the following contents as: research objectives, t target audience, sample issues, objective inclusion criteria and variable definitions were 99.43%, 98.57%, 95.43%, 92.86% and 96.87%. The correct rates of description on quantitative and qualitative data were 90.94% and 91.46%, respectively. The rates on correctly expressing the results, on statistical inference methods related to quantitative, qualitative data and modeling were 100%, 95.32% and 87.19%, respectively. 89.49% of the conclusions could directly response to the research objectives. However, 69.60% of the papers did not mention the exact names of the study design, statistically, that the papers were using. 11.14% of the papers were in lack of further statement on the exclusion criteria. Percentage of the papers that could clearly explain the sample size estimation only taking up as 5.16%. Only 24.21% of the papers clearly described the variable value assignment. Regarding the introduction on statistical conduction and on database methods, the rate was only 24.15%. 18.75% of the papers did not express the statistical inference methods sufficiently. A quarter of the papers did not use 'standardization' appropriately. As for the aspect of statistical inference, the rate of description on statistical testing prerequisite was only 24.12% while 9.94% papers did not even employ the statistical inferential method that should be used. The main deficiencies on the application of Statistics used in papers related to Chronic-diseases-related observational research were as follows: lack of sample-size determination, variable value assignment description not sufficient, methods on statistics were not introduced clearly or properly, lack of consideration for pre-requisition regarding the use of statistical inferences.

  17. Truth, models, model sets, AIC, and multimodel inference: a Bayesian perspective

    USGS Publications Warehouse

    Barker, Richard J.; Link, William A.

    2015-01-01

    Statistical inference begins with viewing data as realizations of stochastic processes. Mathematical models provide partial descriptions of these processes; inference is the process of using the data to obtain a more complete description of the stochastic processes. Wildlife and ecological scientists have become increasingly concerned with the conditional nature of model-based inference: what if the model is wrong? Over the last 2 decades, Akaike's Information Criterion (AIC) has been widely and increasingly used in wildlife statistics for 2 related purposes, first for model choice and second to quantify model uncertainty. We argue that for the second of these purposes, the Bayesian paradigm provides the natural framework for describing uncertainty associated with model choice and provides the most easily communicated basis for model weighting. Moreover, Bayesian arguments provide the sole justification for interpreting model weights (including AIC weights) as coherent (mathematically self consistent) model probabilities. This interpretation requires treating the model as an exact description of the data-generating mechanism. We discuss the implications of this assumption, and conclude that more emphasis is needed on model checking to provide confidence in the quality of inference.

  18. Selecting the right statistical model for analysis of insect count data by using information theoretic measures.

    PubMed

    Sileshi, G

    2006-10-01

    Researchers and regulatory agencies often make statistical inferences from insect count data using modelling approaches that assume homogeneous variance. Such models do not allow for formal appraisal of variability which in its different forms is the subject of interest in ecology. Therefore, the objectives of this paper were to (i) compare models suitable for handling variance heterogeneity and (ii) select optimal models to ensure valid statistical inferences from insect count data. The log-normal, standard Poisson, Poisson corrected for overdispersion, zero-inflated Poisson, the negative binomial distribution and zero-inflated negative binomial models were compared using six count datasets on foliage-dwelling insects and five families of soil-dwelling insects. Akaike's and Schwarz Bayesian information criteria were used for comparing the various models. Over 50% of the counts were zeros even in locally abundant species such as Ootheca bennigseni Weise, Mesoplatys ochroptera Stål and Diaecoderus spp. The Poisson model after correction for overdispersion and the standard negative binomial distribution model provided better description of the probability distribution of seven out of the 11 insects than the log-normal, standard Poisson, zero-inflated Poisson or zero-inflated negative binomial models. It is concluded that excess zeros and variance heterogeneity are common data phenomena in insect counts. If not properly modelled, these properties can invalidate the normal distribution assumptions resulting in biased estimation of ecological effects and jeopardizing the integrity of the scientific inferences. Therefore, it is recommended that statistical models appropriate for handling these data properties be selected using objective criteria to ensure efficient statistical inference.

  19. Comparison of a non-stationary voxelation-corrected cluster-size test with TFCE for group-Level MRI inference.

    PubMed

    Li, Huanjie; Nickerson, Lisa D; Nichols, Thomas E; Gao, Jia-Hong

    2017-03-01

    Two powerful methods for statistical inference on MRI brain images have been proposed recently, a non-stationary voxelation-corrected cluster-size test (CST) based on random field theory and threshold-free cluster enhancement (TFCE) based on calculating the level of local support for a cluster, then using permutation testing for inference. Unlike other statistical approaches, these two methods do not rest on the assumptions of a uniform and high degree of spatial smoothness of the statistic image. Thus, they are strongly recommended for group-level fMRI analysis compared to other statistical methods. In this work, the non-stationary voxelation-corrected CST and TFCE methods for group-level analysis were evaluated for both stationary and non-stationary images under varying smoothness levels, degrees of freedom and signal to noise ratios. Our results suggest that, both methods provide adequate control for the number of voxel-wise statistical tests being performed during inference on fMRI data and they are both superior to current CSTs implemented in popular MRI data analysis software packages. However, TFCE is more sensitive and stable for group-level analysis of VBM data. Thus, the voxelation-corrected CST approach may confer some advantages by being computationally less demanding for fMRI data analysis than TFCE with permutation testing and by also being applicable for single-subject fMRI analyses, while the TFCE approach is advantageous for VBM data. Hum Brain Mapp 38:1269-1280, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  20. An introduction to medical statistics for health care professionals: Hypothesis tests and estimation.

    PubMed

    Thomas, Elaine

    2005-01-01

    This article is the second in a series of three that will give health care professionals (HCPs) a sound introduction to medical statistics (Thomas, 2004). The objective of research is to find out about the population at large. However, it is generally not possible to study the whole of the population and research questions are addressed in an appropriate study sample. The next crucial step is then to use the information from the sample of individuals to make statements about the wider population of like individuals. This procedure of drawing conclusions about the population, based on study data, is known as inferential statistics. The findings from the study give us the best estimate of what is true for the relevant population, given the sample is representative of the population. It is important to consider how accurate this best estimate is, based on a single sample, when compared to the unknown population figure. Any difference between the observed sample result and the population characteristic is termed the sampling error. This article will cover the two main forms of statistical inference (hypothesis tests and estimation) along with issues that need to be addressed when considering the implications of the study results. Copyright (c) 2005 Whurr Publishers Ltd.

  1. Statistics and Informatics in Space Astrophysics

    NASA Astrophysics Data System (ADS)

    Feigelson, E.

    2017-12-01

    The interest in statistical and computational methodology has seen rapid growth in space-based astrophysics, parallel to the growth seen in Earth remote sensing. There is widespread agreement that scientific interpretation of the cosmic microwave background, discovery of exoplanets, and classifying multiwavelength surveys is too complex to be accomplished with traditional techniques. NASA operates several well-functioning Science Archive Research Centers providing 0.5 PBy datasets to the research community. These databases are integrated with full-text journal articles in the NASA Astrophysics Data System (200K pageviews/day). Data products use interoperable formats and protocols established by the International Virtual Observatory Alliance. NASA supercomputers also support complex astrophysical models of systems such as accretion disks and planet formation. Academic researcher interest in methodology has significantly grown in areas such as Bayesian inference and machine learning, and statistical research is underway to treat problems such as irregularly spaced time series and astrophysical model uncertainties. Several scholarly societies have created interest groups in astrostatistics and astroinformatics. Improvements are needed on several fronts. Community education in advanced methodology is not sufficiently rapid to meet the research needs. Statistical procedures within NASA science analysis software are sometimes not optimal, and pipeline development may not use modern software engineering techniques. NASA offers few grant opportunities supporting research in astroinformatics and astrostatistics.

  2. Unbiased split variable selection for random survival forests using maximally selected rank statistics.

    PubMed

    Wright, Marvin N; Dankowski, Theresa; Ziegler, Andreas

    2017-04-15

    The most popular approach for analyzing survival data is the Cox regression model. The Cox model may, however, be misspecified, and its proportionality assumption may not always be fulfilled. An alternative approach for survival prediction is random forests for survival outcomes. The standard split criterion for random survival forests is the log-rank test statistic, which favors splitting variables with many possible split points. Conditional inference forests avoid this split variable selection bias. However, linear rank statistics are utilized by default in conditional inference forests to select the optimal splitting variable, which cannot detect non-linear effects in the independent variables. An alternative is to use maximally selected rank statistics for the split point selection. As in conditional inference forests, splitting variables are compared on the p-value scale. However, instead of the conditional Monte-Carlo approach used in conditional inference forests, p-value approximations are employed. We describe several p-value approximations and the implementation of the proposed random forest approach. A simulation study demonstrates that unbiased split variable selection is possible. However, there is a trade-off between unbiased split variable selection and runtime. In benchmark studies of prediction performance on simulated and real datasets, the new method performs better than random survival forests if informative dichotomous variables are combined with uninformative variables with more categories and better than conditional inference forests if non-linear covariate effects are included. In a runtime comparison, the method proves to be computationally faster than both alternatives, if a simple p-value approximation is used. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  3. CorSig: a general framework for estimating statistical significance of correlation and its application to gene co-expression analysis.

    PubMed

    Wang, Hong-Qiang; Tsai, Chung-Jui

    2013-01-01

    With the rapid increase of omics data, correlation analysis has become an indispensable tool for inferring meaningful associations from a large number of observations. Pearson correlation coefficient (PCC) and its variants are widely used for such purposes. However, it remains challenging to test whether an observed association is reliable both statistically and biologically. We present here a new method, CorSig, for statistical inference of correlation significance. CorSig is based on a biology-informed null hypothesis, i.e., testing whether the true PCC (ρ) between two variables is statistically larger than a user-specified PCC cutoff (τ), as opposed to the simple null hypothesis of ρ = 0 in existing methods, i.e., testing whether an association can be declared without a threshold. CorSig incorporates Fisher's Z transformation of the observed PCC (r), which facilitates use of standard techniques for p-value computation and multiple testing corrections. We compared CorSig against two methods: one uses a minimum PCC cutoff while the other (Zhu's procedure) controls correlation strength and statistical significance in two discrete steps. CorSig consistently outperformed these methods in various simulation data scenarios by balancing between false positives and false negatives. When tested on real-world Populus microarray data, CorSig effectively identified co-expressed genes in the flavonoid pathway, and discriminated between closely related gene family members for their differential association with flavonoid and lignin pathways. The p-values obtained by CorSig can be used as a stand-alone parameter for stratification of co-expressed genes according to their correlation strength in lieu of an arbitrary cutoff. CorSig requires one single tunable parameter, and can be readily extended to other correlation measures. Thus, CorSig should be useful for a wide range of applications, particularly for network analysis of high-dimensional genomic data. A web server for CorSig is provided at http://202.127.200.1:8080/probeWeb. R code for CorSig is freely available for non-commercial use at http://aspendb.uga.edu/downloads.

  4. Incorporating Biological Knowledge into Evaluation of Casual Regulatory Hypothesis

    NASA Technical Reports Server (NTRS)

    Chrisman, Lonnie; Langley, Pat; Bay, Stephen; Pohorille, Andrew; DeVincenzi, D. (Technical Monitor)

    2002-01-01

    Biological data can be scarce and costly to obtain. The small number of samples available typically limits statistical power and makes reliable inference of causal relations extremely difficult. However, we argue that statistical power can be increased substantially by incorporating prior knowledge and data from diverse sources. We present a Bayesian framework that combines information from different sources and we show empirically that this lets one make correct causal inferences with small sample sizes that otherwise would be impossible.

  5. A Review of Some Aspects of Robust Inference for Time Series.

    DTIC Science & Technology

    1984-09-01

    REVIEW OF SOME ASPECTSOF ROBUST INFERNCE FOR TIME SERIES by Ad . Dougla Main TE "iAL REPOW No. 63 Septermber 1984 Department of Statistics University of ...clear. One cannot hope to have a good method for dealing with outliers in time series by using only an instantaneous nonlinear transformation of the data...AI.49 716 A REVIEWd OF SOME ASPECTS OF ROBUST INFERENCE FOR TIME 1/1 SERIES(U) WASHINGTON UNIV SEATTLE DEPT OF STATISTICS R D MARTIN SEP 84 TR-53

  6. The researcher and the consultant: from testing to probability statements.

    PubMed

    Hamra, Ghassan B; Stang, Andreas; Poole, Charles

    2015-09-01

    In the first instalment of this series, Stang and Poole provided an overview of Fisher significance testing (ST), Neyman-Pearson null hypothesis testing (NHT), and their unfortunate and unintended offspring, null hypothesis significance testing. In addition to elucidating the distinction between the first two and the evolution of the third, the authors alluded to alternative models of statistical inference; namely, Bayesian statistics. Bayesian inference has experienced a revival in recent decades, with many researchers advocating for its use as both a complement and an alternative to NHT and ST. This article will continue in the direction of the first instalment, providing practicing researchers with an introduction to Bayesian inference. Our work will draw on the examples and discussion of the previous dialogue.

  7. Spectral likelihood expansions for Bayesian inference

    NASA Astrophysics Data System (ADS)

    Nagel, Joseph B.; Sudret, Bruno

    2016-03-01

    A spectral approach to Bayesian inference is presented. It pursues the emulation of the posterior probability density. The starting point is a series expansion of the likelihood function in terms of orthogonal polynomials. From this spectral likelihood expansion all statistical quantities of interest can be calculated semi-analytically. The posterior is formally represented as the product of a reference density and a linear combination of polynomial basis functions. Both the model evidence and the posterior moments are related to the expansion coefficients. This formulation avoids Markov chain Monte Carlo simulation and allows one to make use of linear least squares instead. The pros and cons of spectral Bayesian inference are discussed and demonstrated on the basis of simple applications from classical statistics and inverse modeling.

  8. Statistics, Computation, and Modeling in Cosmology

    NASA Astrophysics Data System (ADS)

    Jewell, Jeff; Guiness, Joe; SAMSI 2016 Working Group in Cosmology

    2017-01-01

    Current and future ground and space based missions are designed to not only detect, but map out with increasing precision, details of the universe in its infancy to the present-day. As a result we are faced with the challenge of analyzing and interpreting observations from a wide variety of instruments to form a coherent view of the universe. Finding solutions to a broad range of challenging inference problems in cosmology is one of the goals of the “Statistics, Computation, and Modeling in Cosmology” workings groups, formed as part of the year long program on ‘Statistical, Mathematical, and Computational Methods for Astronomy’, hosted by the Statistical and Applied Mathematical Sciences Institute (SAMSI), a National Science Foundation funded institute. Two application areas have emerged for focused development in the cosmology working group involving advanced algorithmic implementations of exact Bayesian inference for the Cosmic Microwave Background, and statistical modeling of galaxy formation. The former includes study and development of advanced Markov Chain Monte Carlo algorithms designed to confront challenging inference problems including inference for spatial Gaussian random fields in the presence of sources of galactic emission (an example of a source separation problem). Extending these methods to future redshift survey data probing the nonlinear regime of large scale structure formation is also included in the working group activities. In addition, the working group is also focused on the study of ‘Galacticus’, a galaxy formation model applied to dark matter-only cosmological N-body simulations operating on time-dependent halo merger trees. The working group is interested in calibrating the Galacticus model to match statistics of galaxy survey observations; specifically stellar mass functions, luminosity functions, and color-color diagrams. The group will use subsampling approaches and fractional factorial designs to statistically and computationally efficiently explore the Galacticus parameter space. The group will also use the Galacticus simulations to study the relationship between the topological and physical structure of the halo merger trees and the properties of the resulting galaxies.

  9. Multivariate normality

    NASA Technical Reports Server (NTRS)

    Crutcher, H. L.; Falls, L. W.

    1976-01-01

    Sets of experimentally determined or routinely observed data provide information about the past, present and, hopefully, future sets of similarly produced data. An infinite set of statistical models exists which may be used to describe the data sets. The normal distribution is one model. If it serves at all, it serves well. If a data set, or a transformation of the set, representative of a larger population can be described by the normal distribution, then valid statistical inferences can be drawn. There are several tests which may be applied to a data set to determine whether the univariate normal model adequately describes the set. The chi-square test based on Pearson's work in the late nineteenth and early twentieth centuries is often used. Like all tests, it has some weaknesses which are discussed in elementary texts. Extension of the chi-square test to the multivariate normal model is provided. Tables and graphs permit easier application of the test in the higher dimensions. Several examples, using recorded data, illustrate the procedures. Tests of maximum absolute differences, mean sum of squares of residuals, runs and changes of sign are included in these tests. Dimensions one through five with selected sample sizes 11 to 101 are used to illustrate the statistical tests developed.

  10. Parameter estimation method that directly compares gravitational wave observations to numerical relativity

    NASA Astrophysics Data System (ADS)

    Lange, J.; O'Shaughnessy, R.; Boyle, M.; Calderón Bustillo, J.; Campanelli, M.; Chu, T.; Clark, J. A.; Demos, N.; Fong, H.; Healy, J.; Hemberger, D. A.; Hinder, I.; Jani, K.; Khamesra, B.; Kidder, L. E.; Kumar, P.; Laguna, P.; Lousto, C. O.; Lovelace, G.; Ossokine, S.; Pfeiffer, H.; Scheel, M. A.; Shoemaker, D. M.; Szilagyi, B.; Teukolsky, S.; Zlochower, Y.

    2017-11-01

    We present and assess a Bayesian method to interpret gravitational wave signals from binary black holes. Our method directly compares gravitational wave data to numerical relativity (NR) simulations. In this study, we present a detailed investigation of the systematic and statistical parameter estimation errors of this method. This procedure bypasses approximations used in semianalytical models for compact binary coalescence. In this work, we use the full posterior parameter distribution for only generic nonprecessing binaries, drawing inferences away from the set of NR simulations used, via interpolation of a single scalar quantity (the marginalized log likelihood, ln L ) evaluated by comparing data to nonprecessing binary black hole simulations. We also compare the data to generic simulations, and discuss the effectiveness of this procedure for generic sources. We specifically assess the impact of higher order modes, repeating our interpretation with both l ≤2 as well as l ≤3 harmonic modes. Using the l ≤3 higher modes, we gain more information from the signal and can better constrain the parameters of the gravitational wave signal. We assess and quantify several sources of systematic error that our procedure could introduce, including simulation resolution and duration; most are negligible. We show through examples that our method can recover the parameters for equal mass, zero spin, GW150914-like, and unequal mass, precessing spin sources. Our study of this new parameter estimation method demonstrates that we can quantify and understand the systematic and statistical error. This method allows us to use higher order modes from numerical relativity simulations to better constrain the black hole binary parameters.

  11. The Development of Inferences About Affect.

    ERIC Educational Resources Information Center

    Lindauer, Barbara K.

    This paper describes two studies which investigated the development in elementary school children of the ability to derive inferences about the subjective states (physiological and psychological) of others. A cued recall procedure was utilized to assess the relative effectiveness of implicitly or explicitly stated emotional states as cues for…

  12. Hybrid regulatory models: a statistically tractable approach to model regulatory network dynamics.

    PubMed

    Ocone, Andrea; Millar, Andrew J; Sanguinetti, Guido

    2013-04-01

    Computational modelling of the dynamics of gene regulatory networks is a central task of systems biology. For networks of small/medium scale, the dominant paradigm is represented by systems of coupled non-linear ordinary differential equations (ODEs). ODEs afford great mechanistic detail and flexibility, but calibrating these models to data is often an extremely difficult statistical problem. Here, we develop a general statistical inference framework for stochastic transcription-translation networks. We use a coarse-grained approach, which represents the system as a network of stochastic (binary) promoter and (continuous) protein variables. We derive an exact inference algorithm and an efficient variational approximation that allows scalable inference and learning of the model parameters. We demonstrate the power of the approach on two biological case studies, showing that the method allows a high degree of flexibility and is capable of testable novel biological predictions. http://homepages.inf.ed.ac.uk/gsanguin/software.html. Supplementary data are available at Bioinformatics online.

  13. Reduction of Complications of Local Anaesthesia in Dental Healthcare Setups by Application of the Six Sigma Methodology: A Statistical Quality Improvement Technique.

    PubMed

    Akifuddin, Syed; Khatoon, Farheen

    2015-12-01

    Health care faces challenges due to complications, inefficiencies and other concerns that threaten the safety of patients. The purpose of his study was to identify causes of complications encountered after administration of local anaesthesia for dental and oral surgical procedures and to reduce the incidence of complications by introduction of six sigma methodology. DMAIC (Define, Measure, Analyse, Improve and Control) process of Six Sigma was taken into consideration to reduce the incidence of complications encountered after administration of local anaesthesia injections for dental and oral surgical procedures using failure mode and effect analysis. Pareto analysis was taken into consideration to analyse the most recurring complications. Paired z-sample test using Minitab Statistical Inference and Fisher's exact test was used to statistically analyse the obtained data. The p-value <0.05 was considered as significant value. Total 54 systemic and 62 local complications occurred during three months of analyse and measure phase. Syncope, failure of anaesthesia, trismus, auto mordeduras and pain at injection site was found to be most recurring complications. Cumulative defective percentage was 7.99 in case of pre-improved data and decreased to 4.58 in the control phase. Estimate for difference was 0.0341228 and 95% lower bound for difference was 0.0193966. p-value was found to be highly significant with p= 0.000. The application of six sigma improvement methodology in healthcare tends to deliver consistently better results to the patients as well as hospitals and results in better patient compliance as well as satisfaction.

  14. A simple signaling rule for variable life-adjusted display derived from an equivalent risk-adjusted CUSUM chart.

    PubMed

    Wittenberg, Philipp; Gan, Fah Fatt; Knoth, Sven

    2018-04-17

    The variable life-adjusted display (VLAD) is the first risk-adjusted graphical procedure proposed in the literature for monitoring the performance of a surgeon. It displays the cumulative sum of expected minus observed deaths. It has since become highly popular because the statistic plotted is easy to understand. But it is also easy to misinterpret a surgeon's performance by utilizing the VLAD, potentially leading to grave consequences. The problem of misinterpretation is essentially caused by the variance of the VLAD's statistic that increases with sample size. In order for the VLAD to be truly useful, a simple signaling rule is desperately needed. Various forms of signaling rules have been developed, but they are usually quite complicated. Without signaling rules, making inferences using the VLAD alone is difficult if not misleading. In this paper, we establish an equivalence between a VLAD with V-mask and a risk-adjusted cumulative sum (RA-CUSUM) chart based on the difference between the estimated probability of death and surgical outcome. Average run length analysis based on simulation shows that this particular RA-CUSUM chart has similar performance as compared to the established RA-CUSUM chart based on the log-likelihood ratio statistic obtained by testing the odds ratio of death. We provide a simple design procedure for determining the V-mask parameters based on a resampling approach. Resampling from a real data set ensures that these parameters can be estimated appropriately. Finally, we illustrate the monitoring of a real surgeon's performance using VLAD with V-mask. Copyright © 2018 John Wiley & Sons, Ltd.

  15. Deterministic physical systems under uncertain initial conditions: the case of maximum entropy applied to projectile motion

    NASA Astrophysics Data System (ADS)

    Montecinos, Alejandra; Davis, Sergio; Peralta, Joaquín

    2018-07-01

    The kinematics and dynamics of deterministic physical systems have been a foundation of our understanding of the world since Galileo and Newton. For real systems, however, uncertainty is largely present via external forces such as friction or lack of precise knowledge about the initial conditions of the system. In this work we focus on the latter case and describe the use of inference methodologies in solving the statistical properties of classical systems subject to uncertain initial conditions. In particular we describe the application of the formalism of maximum entropy (MaxEnt) inference to the problem of projectile motion, given information about the average horizontal range over many realizations. By using MaxEnt we can invert the problem and use the provided information on the average range to reduce the original uncertainty in the initial conditions. Also, additional insight into the initial condition's probabilities, and the projectile path distribution itself, can be achieved based on the value of the average horizontal range. The wide applicability of this procedure, as well as its ease of use, reveals a useful tool with which to revisit a large number of physics problems, from classrooms to frontier research.

  16. Reading biological processes from nucleotide sequences

    NASA Astrophysics Data System (ADS)

    Murugan, Anand

    Cellular processes have traditionally been investigated by techniques of imaging and biochemical analysis of the molecules involved. The recent rapid progress in our ability to manipulate and read nucleic acid sequences gives us direct access to the genetic information that directs and constrains biological processes. While sequence data is being used widely to investigate genotype-phenotype relationships and population structure, here we use sequencing to understand biophysical mechanisms. We present work on two different systems. First, in chapter 2, we characterize the stochastic genetic editing mechanism that produces diverse T-cell receptors in the human immune system. We do this by inferring statistical distributions of the underlying biochemical events that generate T-cell receptor coding sequences from the statistics of the observed sequences. This inferred model quantitatively describes the potential repertoire of T-cell receptors that can be produced by an individual, providing insight into its potential diversity and the probability of generation of any specific T-cell receptor. Then in chapter 3, we present work on understanding the functioning of regulatory DNA sequences in both prokaryotes and eukaryotes. Here we use experiments that measure the transcriptional activity of large libraries of mutagenized promoters and enhancers and infer models of the sequence-function relationship from this data. For the bacterial promoter, we infer a physically motivated 'thermodynamic' model of the interaction of DNA-binding proteins and RNA polymerase determining the transcription rate of the downstream gene. For the eukaryotic enhancers, we infer heuristic models of the sequence-function relationship and use these models to find synthetic enhancer sequences that optimize inducibility of expression. Both projects demonstrate the utility of sequence information in conjunction with sophisticated statistical inference techniques for dissecting underlying biophysical mechanisms.

  17. A Fiducial Approach to Extremes and Multiple Comparisons

    ERIC Educational Resources Information Center

    Wandler, Damian V.

    2010-01-01

    Generalized fiducial inference is a powerful tool for many difficult problems. Based on an extension of R. A. Fisher's work, we used generalized fiducial inference for two extreme value problems and a multiple comparison procedure. The first extreme value problem is dealing with the generalized Pareto distribution. The generalized Pareto…

  18. CADDIS Volume 4. Data Analysis: PECBO Appendix - R Scripts for Non-Parametric Regressions

    EPA Pesticide Factsheets

    Script for computing nonparametric regression analysis. Overview of using scripts to infer environmental conditions from biological observations, statistically estimating species-environment relationships, statistical scripts.

  19. Test Theory Reconceived.

    ERIC Educational Resources Information Center

    Mislevy, Robert J.

    Educational test theory consists of statistical and methodological tools to support inferences about examinees' knowledge, skills, and accomplishments. The evolution of test theory has been shaped by the nature of users' inferences which, until recently, have been framed almost exclusively in terms of trait and behavioral psychology. Progress in…

  20. Data-driven sensitivity inference for Thomson scattering electron density measurement systems.

    PubMed

    Fujii, Keisuke; Yamada, Ichihiro; Hasuo, Masahiro

    2017-01-01

    We developed a method to infer the calibration parameters of multichannel measurement systems, such as channel variations of sensitivity and noise amplitude, from experimental data. We regard such uncertainties of the calibration parameters as dependent noise. The statistical properties of the dependent noise and that of the latent functions were modeled and implemented in the Gaussian process kernel. Based on their statistical difference, both parameters were inferred from the data. We applied this method to the electron density measurement system by Thomson scattering for the Large Helical Device plasma, which is equipped with 141 spatial channels. Based on the 210 sets of experimental data, we evaluated the correction factor of the sensitivity and noise amplitude for each channel. The correction factor varies by ≈10%, and the random noise amplitude is ≈2%, i.e., the measurement accuracy increases by a factor of 5 after this sensitivity correction. The certainty improvement in the spatial derivative inference was demonstrated.

  1. Local dependence in random graph models: characterization, properties and statistical inference

    PubMed Central

    Schweinberger, Michael; Handcock, Mark S.

    2015-01-01

    Summary Dependent phenomena, such as relational, spatial and temporal phenomena, tend to be characterized by local dependence in the sense that units which are close in a well-defined sense are dependent. In contrast with spatial and temporal phenomena, though, relational phenomena tend to lack a natural neighbourhood structure in the sense that it is unknown which units are close and thus dependent. Owing to the challenge of characterizing local dependence and constructing random graph models with local dependence, many conventional exponential family random graph models induce strong dependence and are not amenable to statistical inference. We take first steps to characterize local dependence in random graph models, inspired by the notion of finite neighbourhoods in spatial statistics and M-dependence in time series, and we show that local dependence endows random graph models with desirable properties which make them amenable to statistical inference. We show that random graph models with local dependence satisfy a natural domain consistency condition which every model should satisfy, but conventional exponential family random graph models do not satisfy. In addition, we establish a central limit theorem for random graph models with local dependence, which suggests that random graph models with local dependence are amenable to statistical inference. We discuss how random graph models with local dependence can be constructed by exploiting either observed or unobserved neighbourhood structure. In the absence of observed neighbourhood structure, we take a Bayesian view and express the uncertainty about the neighbourhood structure by specifying a prior on a set of suitable neighbourhood structures. We present simulation results and applications to two real world networks with ‘ground truth’. PMID:26560142

  2. Statistical inference, the bootstrap, and neural-network modeling with application to foreign exchange rates.

    PubMed

    White, H; Racine, J

    2001-01-01

    We propose tests for individual and joint irrelevance of network inputs. Such tests can be used to determine whether an input or group of inputs "belong" in a particular model, thus permitting valid statistical inference based on estimated feedforward neural-network models. The approaches employ well-known statistical resampling techniques. We conduct a small Monte Carlo experiment showing that our tests have reasonable level and power behavior, and we apply our methods to examine whether there are predictable regularities in foreign exchange rates. We find that exchange rates do appear to contain information that is exploitable for enhanced point prediction, but the nature of the predictive relations evolves through time.

  3. From pull-down data to protein interaction networks and complexes with biological relevance.

    PubMed

    Zhang, Bing; Park, Byung-Hoon; Karpinets, Tatiana; Samatova, Nagiza F

    2008-04-01

    Recent improvements in high-throughput Mass Spectrometry (MS) technology have expedited genome-wide discovery of protein-protein interactions by providing a capability of detecting protein complexes in a physiological setting. Computational inference of protein interaction networks and protein complexes from MS data are challenging. Advances are required in developing robust and seamlessly integrated procedures for assessment of protein-protein interaction affinities, mathematical representation of protein interaction networks, discovery of protein complexes and evaluation of their biological relevance. A multi-step but easy-to-follow framework for identifying protein complexes from MS pull-down data is introduced. It assesses interaction affinity between two proteins based on similarity of their co-purification patterns derived from MS data. It constructs a protein interaction network by adopting a knowledge-guided threshold selection method. Based on the network, it identifies protein complexes and infers their core components using a graph-theoretical approach. It deploys a statistical evaluation procedure to assess biological relevance of each found complex. On Saccharomyces cerevisiae pull-down data, the framework outperformed other more complicated schemes by at least 10% in F(1)-measure and identified 610 protein complexes with high-functional homogeneity based on the enrichment in Gene Ontology (GO) annotation. Manual examination of the complexes brought forward the hypotheses on cause of false identifications. Namely, co-purification of different protein complexes as mediated by a common non-protein molecule, such as DNA, might be a source of false positives. Protein identification bias in pull-down technology, such as the hydrophilic bias could result in false negatives.

  4. FUNSTAT and statistical image representations

    NASA Technical Reports Server (NTRS)

    Parzen, E.

    1983-01-01

    General ideas of functional statistical inference analysis of one sample and two samples, univariate and bivariate are outlined. ONESAM program is applied to analyze the univariate probability distributions of multi-spectral image data.

  5. Simulation of an ensemble of future climate time series with an hourly weather generator

    NASA Astrophysics Data System (ADS)

    Caporali, E.; Fatichi, S.; Ivanov, V. Y.; Kim, J.

    2010-12-01

    There is evidence that climate change is occurring in many regions of the world. The necessity of climate change predictions at the local scale and fine temporal resolution is thus warranted for hydrological, ecological, geomorphological, and agricultural applications that can provide thematic insights into the corresponding impacts. Numerous downscaling techniques have been proposed to bridge the gap between the spatial scales adopted in General Circulation Models (GCM) and regional analyses. Nevertheless, the time and spatial resolutions obtained as well as the type of meteorological variables may not be sufficient for detailed studies of climate change effects at the local scales. In this context, this study presents a stochastic downscaling technique that makes use of an hourly weather generator to simulate time series of predicted future climate. Using a Bayesian approach, the downscaling procedure derives distributions of factors of change for several climate statistics from a multi-model ensemble of GCMs. Factors of change are sampled from their distributions using a Monte Carlo technique to entirely account for the probabilistic information obtained with the Bayesian multi-model ensemble. Factors of change are subsequently applied to the statistics derived from observations to re-evaluate the parameters of the weather generator. The weather generator can reproduce a wide set of climate variables and statistics over a range of temporal scales, from extremes, to the low-frequency inter-annual variability. The final result of such a procedure is the generation of an ensemble of hourly time series of meteorological variables that can be considered as representative of future climate, as inferred from GCMs. The generated ensemble of scenarios also accounts for the uncertainty derived from multiple GCMs used in downscaling. Applications of the procedure in reproducing present and future climates are presented for different locations world-wide: Tucson (AZ), Detroit (MI), and Firenze (Italy). The stochastic downscaling is carried out with eight GCMs from the CMIP3 multi-model dataset (IPCC 4AR, A1B scenario).

  6. A semiparametric Bayesian proportional hazards model for interval censored data with frailty effects.

    PubMed

    Henschel, Volkmar; Engel, Jutta; Hölzel, Dieter; Mansmann, Ulrich

    2009-02-10

    Multivariate analysis of interval censored event data based on classical likelihood methods is notoriously cumbersome. Likelihood inference for models which additionally include random effects are not available at all. Developed algorithms bear problems for practical users like: matrix inversion, slow convergence, no assessment of statistical uncertainty. MCMC procedures combined with imputation are used to implement hierarchical models for interval censored data within a Bayesian framework. Two examples from clinical practice demonstrate the handling of clustered interval censored event times as well as multilayer random effects for inter-institutional quality assessment. The software developed is called survBayes and is freely available at CRAN. The proposed software supports the solution of complex analyses in many fields of clinical epidemiology as well as health services research.

  7. Low-dimensional approximation searching strategy for transfer entropy from non-uniform embedding

    PubMed Central

    2018-01-01

    Transfer entropy from non-uniform embedding is a popular tool for the inference of causal relationships among dynamical subsystems. In this study we present an approach that makes use of low-dimensional conditional mutual information quantities to decompose the original high-dimensional conditional mutual information in the searching procedure of non-uniform embedding for significant variables at different lags. We perform a series of simulation experiments to assess the sensitivity and specificity of our proposed method to demonstrate its advantage compared to previous algorithms. The results provide concrete evidence that low-dimensional approximations can help to improve the statistical accuracy of transfer entropy in multivariate causality analysis and yield a better performance over other methods. The proposed method is especially efficient as the data length grows. PMID:29547669

  8. Theory-based Bayesian models of inductive learning and reasoning.

    PubMed

    Tenenbaum, Joshua B; Griffiths, Thomas L; Kemp, Charles

    2006-07-01

    Inductive inference allows humans to make powerful generalizations from sparse data when learning about word meanings, unobserved properties, causal relationships, and many other aspects of the world. Traditional accounts of induction emphasize either the power of statistical learning, or the importance of strong constraints from structured domain knowledge, intuitive theories or schemas. We argue that both components are necessary to explain the nature, use and acquisition of human knowledge, and we introduce a theory-based Bayesian framework for modeling inductive learning and reasoning as statistical inferences over structured knowledge representations.

  9. Data free inference with processed data products

    DOE PAGES

    Chowdhary, K.; Najm, H. N.

    2014-07-12

    Here, we consider the context of probabilistic inference of model parameters given error bars or confidence intervals on model output values, when the data is unavailable. We introduce a class of algorithms in a Bayesian framework, relying on maximum entropy arguments and approximate Bayesian computation methods, to generate consistent data with the given summary statistics. Once we obtain consistent data sets, we pool the respective posteriors, to arrive at a single, averaged density on the parameters. This approach allows us to perform accurate forward uncertainty propagation consistent with the reported statistics.

  10. Statistics at the Chinese Universities.

    DTIC Science & Technology

    1981-09-01

    education in China in the postwar years is pro- vided to give some perspective. My observa- tions on statistics at the Chinese universities are necessarily...has been accepted as a member society of ISI. 3. Education in China Understanding of statistics in universities in China will be enhanced through some...programaming), Statistical Mathematics (infer- ence, data analysis, industrial statistics , information theory), tiathematical Physics (dif- ferential

  11. The FRIGG project: From intermediate galactic scales to self-gravitating cores

    NASA Astrophysics Data System (ADS)

    Hennebelle, Patrick

    2018-03-01

    Context. Understanding the detailed structure of the interstellar gas is essential for our knowledge of the star formation process. Aim. The small-scale structure of the interstellar medium (ISM) is a direct consequence of the galactic scales and making the link between the two is essential. Methods: We perform adaptive mesh simulations that aim to bridge the gap between the intermediate galactic scales and the self-gravitating prestellar cores. For this purpose we use stratified supernova regulated ISM magneto-hydrodynamical simulations at the kpc scale to set up the initial conditions. We then zoom, performing a series of concentric uniform refinement and then refining on the Jeans length for the last levels. This allows us to reach a spatial resolution of a few 10-3 pc. The cores are identified using a clump finder and various criteria based on virial analysis. Their most relevant properties are computed and, due to the large number of objects formed in the simulations, reliable statistics are obtained. Results: The cores' properties show encouraging agreements with observations. The mass spectrum presents a clear powerlaw at high masses with an exponent close to ≃-1.3 and a peak at about 1-2 M⊙. The velocity dispersion and the angular momentum distributions are respectively a few times the local sound speed and a few 10-2 pc km s-1. We also find that the distribution of thermally supercritical cores present a range of magnetic mass-to-flux over critical mass-to-flux ratios, typically between ≃0.3 and 3 indicating that they are significantly magnetized. Investigating the time and spatial dependence of these statistical properties, we conclude that they are not significantly affected by the zooming procedure and that they do not present very large fluctuations. The most severe issue appears to be the dependence on the numerical resolution of the core mass function (CMF). While the core definition process may possibly introduce some biases, the peak tends to shift to smaller values when the resolution improves. Conclusions: Our simulations, which use self-consistently generated initial conditions at the kpc scale, produce a large number of prestellar cores from which reliable statistics can be inferred. Preliminary comparisons with observations show encouraging agreements. In particular the inferred CMFs resemble the ones inferred from recent observations. We stress, however, a possible issue with the peak position shifting with numerical resolution.

  12. The role of familiarity in binary choice inferences.

    PubMed

    Honda, Hidehito; Abe, Keiga; Matsuka, Toshihiko; Yamagishi, Kimihiko

    2011-07-01

    In research on the recognition heuristic (Goldstein & Gigerenzer, Psychological Review, 109, 75-90, 2002), knowledge of recognized objects has been categorized as "recognized" or "unrecognized" without regard to the degree of familiarity of the recognized object. In the present article, we propose a new inference model--familiarity-based inference. We hypothesize that when subjective knowledge levels (familiarity) of recognized objects differ, the degree of familiarity of recognized objects will influence inferences. Specifically, people are predicted to infer that the more familiar object in a pair of two objects has a higher criterion value on the to-be-judged dimension. In two experiments, using a binary choice task, we examined inferences about populations in a pair of two cities. Results support predictions of familiarity-based inference. Participants inferred that the more familiar city in a pair was more populous. Statistical modeling showed that individual differences in familiarity-based inference lie in the sensitivity to differences in familiarity. In addition, we found that familiarity-based inference can be generally regarded as an ecologically rational inference. Furthermore, when cue knowledge about the inference criterion was available, participants made inferences based on the cue knowledge about population instead of familiarity. Implications of the role of familiarity in psychological processes are discussed.

  13. Children use partial resource sharing as a cue to friendship.

    PubMed

    Liberman, Zoe; Shaw, Alex

    2017-07-01

    Resource sharing is an important aspect of human society, and how resources are distributed can provide people with crucial information about social structure. Indeed, a recent partiality account of resource distribution suggested that people may use unequal partial resource distributions to make inferences about a distributor's social affiliations. To empirically test this suggestion derived from the theoretical argument of the partiality account, we presented 4- to 9-year-old children with distributors who gave out resources unequally using either a partial procedure (intentionally choosing which recipient would get more) or an impartial procedure (rolling a die to determine which recipient would get more) and asked children to make judgments about whom the distributor was better friends with. At each age tested, children expected a distributor who gave partially to be better friends with the favored recipient (Studies 1-3). Interestingly, younger children (4- to 6-year-olds) inferred friendship between the distributor and the favored recipient even in cases where the distributor used an impartial procedure, whereas older children (7- to 9-year-olds) did not infer friendship based on impartial distributions (Study 1). These studies demonstrate that children use third-party resource distributions to make important predictions about the social world and add to our knowledge about the developmental trajectory of understanding the importance of partiality in addition to inequity when making social inferences. Copyright © 2017 Elsevier Inc. All rights reserved.

  14. Statistical modeling of software reliability

    NASA Technical Reports Server (NTRS)

    Miller, Douglas R.

    1992-01-01

    This working paper discusses the statistical simulation part of a controlled software development experiment being conducted under the direction of the System Validation Methods Branch, Information Systems Division, NASA Langley Research Center. The experiment uses guidance and control software (GCS) aboard a fictitious planetary landing spacecraft: real-time control software operating on a transient mission. Software execution is simulated to study the statistical aspects of reliability and other failure characteristics of the software during development, testing, and random usage. Quantification of software reliability is a major goal. Various reliability concepts are discussed. Experiments are described for performing simulations and collecting appropriate simulated software performance and failure data. This data is then used to make statistical inferences about the quality of the software development and verification processes as well as inferences about the reliability of software versions and reliability growth under random testing and debugging.

  15. Quantum-Like Representation of Non-Bayesian Inference

    NASA Astrophysics Data System (ADS)

    Asano, M.; Basieva, I.; Khrennikov, A.; Ohya, M.; Tanaka, Y.

    2013-01-01

    This research is related to the problem of "irrational decision making or inference" that have been discussed in cognitive psychology. There are some experimental studies, and these statistical data cannot be described by classical probability theory. The process of decision making generating these data cannot be reduced to the classical Bayesian inference. For this problem, a number of quantum-like coginitive models of decision making was proposed. Our previous work represented in a natural way the classical Bayesian inference in the frame work of quantum mechanics. By using this representation, in this paper, we try to discuss the non-Bayesian (irrational) inference that is biased by effects like the quantum interference. Further, we describe "psychological factor" disturbing "rationality" as an "environment" correlating with the "main system" of usual Bayesian inference.

  16. 48 CFR 6101.21 - Hearing procedures [Rule 21].

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... determination of the amount of recovery, if any, for other proceedings. (5) Before the hearing begins, the Board... the record the inferences it draws from the witness's refusal to testify under oath or affirmation... and, in the event of continued refusal, the Board may state for the record the inferences it draws...

  17. What Do You Want? How Perceivers Use Cues to Make Goal Inferences about Others

    ERIC Educational Resources Information Center

    Magliano, Joseph P.; Skowronski, John J.; Britt, M. Anne; Guss, C. Dominik; Forsythe, Chris

    2008-01-01

    Variables influencing inferences about a stranger's goal during an unsolicited social interaction were explored. Experiment 1 developed a procedure for identifying cues. Experiments 2 and 3 assessed the relative importance of various cues (space, time, characteristics of oneself, characteristics of the stranger, and the stranger's behavior) for…

  18. Why environmental scientists are becoming Bayesians

    Treesearch

    James S. Clark

    2005-01-01

    Advances in computational statistics provide a general framework for the high dimensional models typically needed for ecological inference and prediction. Hierarchical Bayes (HB) represents a modelling structure with capacity to exploit diverse sources of information, to accommodate influences that are unknown (or unknowable), and to draw inference on large numbers of...

  19. Pseudocontingencies and Choice Behavior in Probabilistic Environments with Context-Dependent Outcomes

    ERIC Educational Resources Information Center

    Meiser, Thorsten; Rummel, Jan; Fleig, Hanna

    2018-01-01

    Pseudocontingencies are inferences about correlations in the environment that are formed on the basis of statistical regularities like skewed base rates or varying base rates across environmental contexts. Previous research has demonstrated that pseudocontingencies provide a pervasive mechanism of inductive inference in numerous social judgment…

  20. Cross-Situational Learning of Minimal Word Pairs

    ERIC Educational Resources Information Center

    Escudero, Paola; Mulak, Karen E.; Vlach, Haley A.

    2016-01-01

    "Cross-situational statistical learning" of words involves tracking co-occurrences of auditory words and objects across time to infer word-referent mappings. Previous research has demonstrated that learners can infer referents across sets of very phonologically distinct words (e.g., WUG, DAX), but it remains unknown whether learners can…

  1. Statistical analysis of fNIRS data: a comprehensive review.

    PubMed

    Tak, Sungho; Ye, Jong Chul

    2014-01-15

    Functional near-infrared spectroscopy (fNIRS) is a non-invasive method to measure brain activities using the changes of optical absorption in the brain through the intact skull. fNIRS has many advantages over other neuroimaging modalities such as positron emission tomography (PET), functional magnetic resonance imaging (fMRI), or magnetoencephalography (MEG), since it can directly measure blood oxygenation level changes related to neural activation with high temporal resolution. However, fNIRS signals are highly corrupted by measurement noises and physiology-based systemic interference. Careful statistical analyses are therefore required to extract neuronal activity-related signals from fNIRS data. In this paper, we provide an extensive review of historical developments of statistical analyses of fNIRS signal, which include motion artifact correction, short source-detector separation correction, principal component analysis (PCA)/independent component analysis (ICA), false discovery rate (FDR), serially-correlated errors, as well as inference techniques such as the standard t-test, F-test, analysis of variance (ANOVA), and statistical parameter mapping (SPM) framework. In addition, to provide a unified view of various existing inference techniques, we explain a linear mixed effect model with restricted maximum likelihood (ReML) variance estimation, and show that most of the existing inference methods for fNIRS analysis can be derived as special cases. Some of the open issues in statistical analysis are also described. Copyright © 2013 Elsevier Inc. All rights reserved.

  2. Accounting for measurement error: a critical but often overlooked process.

    PubMed

    Harris, Edward F; Smith, Richard N

    2009-12-01

    Due to instrument imprecision and human inconsistencies, measurements are not free of error. Technical error of measurement (TEM) is the variability encountered between dimensions when the same specimens are measured at multiple sessions. A goal of a data collection regimen is to minimise TEM. The few studies that actually quantify TEM, regardless of discipline, report that it is substantial and can affect results and inferences. This paper reviews some statistical approaches for identifying and controlling TEM. Statistically, TEM is part of the residual ('unexplained') variance in a statistical test, so accounting for TEM, which requires repeated measurements, enhances the chances of finding a statistically significant difference if one exists. The aim of this paper was to review and discuss common statistical designs relating to types of error and statistical approaches to error accountability. This paper addresses issues of landmark location, validity, technical and systematic error, analysis of variance, scaled measures and correlation coefficients in order to guide the reader towards correct identification of true experimental differences. Researchers commonly infer characteristics about populations from comparatively restricted study samples. Most inferences are statistical and, aside from concerns about adequate accounting for known sources of variation with the research design, an important source of variability is measurement error. Variability in locating landmarks that define variables is obvious in odontometrics, cephalometrics and anthropometry, but the same concerns about measurement accuracy and precision extend to all disciplines. With increasing accessibility to computer-assisted methods of data collection, the ease of incorporating repeated measures into statistical designs has improved. Accounting for this technical source of variation increases the chance of finding biologically true differences when they exist.

  3. Developing a statistically powerful measure for quartet tree inference using phylogenetic identities and Markov invariants.

    PubMed

    Sumner, Jeremy G; Taylor, Amelia; Holland, Barbara R; Jarvis, Peter D

    2017-12-01

    Recently there has been renewed interest in phylogenetic inference methods based on phylogenetic invariants, alongside the related Markov invariants. Broadly speaking, both these approaches give rise to polynomial functions of sequence site patterns that, in expectation value, either vanish for particular evolutionary trees (in the case of phylogenetic invariants) or have well understood transformation properties (in the case of Markov invariants). While both approaches have been valued for their intrinsic mathematical interest, it is not clear how they relate to each other, and to what extent they can be used as practical tools for inference of phylogenetic trees. In this paper, by focusing on the special case of binary sequence data and quartets of taxa, we are able to view these two different polynomial-based approaches within a common framework. To motivate the discussion, we present three desirable statistical properties that we argue any invariant-based phylogenetic method should satisfy: (1) sensible behaviour under reordering of input sequences; (2) stability as the taxa evolve independently according to a Markov process; and (3) explicit dependence on the assumption of a continuous-time process. Motivated by these statistical properties, we develop and explore several new phylogenetic inference methods. In particular, we develop a statistically bias-corrected version of the Markov invariants approach which satisfies all three properties. We also extend previous work by showing that the phylogenetic invariants can be implemented in such a way as to satisfy property (3). A simulation study shows that, in comparison to other methods, our new proposed approach based on bias-corrected Markov invariants is extremely powerful for phylogenetic inference. The binary case is of particular theoretical interest as-in this case only-the Markov invariants can be expressed as linear combinations of the phylogenetic invariants. A wider implication of this is that, for models with more than two states-for example DNA sequence alignments with four-state models-we find that methods which rely on phylogenetic invariants are incapable of satisfying all three of the stated statistical properties. This is because in these cases the relevant Markov invariants belong to a class of polynomials independent from the phylogenetic invariants.

  4. Imputation approaches for animal movement modeling

    USGS Publications Warehouse

    Scharf, Henry; Hooten, Mevin B.; Johnson, Devin S.

    2017-01-01

    The analysis of telemetry data is common in animal ecological studies. While the collection of telemetry data for individual animals has improved dramatically, the methods to properly account for inherent uncertainties (e.g., measurement error, dependence, barriers to movement) have lagged behind. Still, many new statistical approaches have been developed to infer unknown quantities affecting animal movement or predict movement based on telemetry data. Hierarchical statistical models are useful to account for some of the aforementioned uncertainties, as well as provide population-level inference, but they often come with an increased computational burden. For certain types of statistical models, it is straightforward to provide inference if the latent true animal trajectory is known, but challenging otherwise. In these cases, approaches related to multiple imputation have been employed to account for the uncertainty associated with our knowledge of the latent trajectory. Despite the increasing use of imputation approaches for modeling animal movement, the general sensitivity and accuracy of these methods have not been explored in detail. We provide an introduction to animal movement modeling and describe how imputation approaches may be helpful for certain types of models. We also assess the performance of imputation approaches in two simulation studies. Our simulation studies suggests that inference for model parameters directly related to the location of an individual may be more accurate than inference for parameters associated with higher-order processes such as velocity or acceleration. Finally, we apply these methods to analyze a telemetry data set involving northern fur seals (Callorhinus ursinus) in the Bering Sea. Supplementary materials accompanying this paper appear online.

  5. Estimating False Discovery Proportion Under Arbitrary Covariance Dependence*

    PubMed Central

    Fan, Jianqing; Han, Xu; Gu, Weijie

    2012-01-01

    Multiple hypothesis testing is a fundamental problem in high dimensional inference, with wide applications in many scientific fields. In genome-wide association studies, tens of thousands of tests are performed simultaneously to find if any SNPs are associated with some traits and those tests are correlated. When test statistics are correlated, false discovery control becomes very challenging under arbitrary dependence. In the current paper, we propose a novel method based on principal factor approximation, which successfully subtracts the common dependence and weakens significantly the correlation structure, to deal with an arbitrary dependence structure. We derive an approximate expression for false discovery proportion (FDP) in large scale multiple testing when a common threshold is used and provide a consistent estimate of realized FDP. This result has important applications in controlling FDR and FDP. Our estimate of realized FDP compares favorably with Efron (2007)’s approach, as demonstrated in the simulated examples. Our approach is further illustrated by some real data applications. We also propose a dependence-adjusted procedure, which is more powerful than the fixed threshold procedure. PMID:24729644

  6. Efficient Posterior Probability Mapping Using Savage-Dickey Ratios

    PubMed Central

    Penny, William D.; Ridgway, Gerard R.

    2013-01-01

    Statistical Parametric Mapping (SPM) is the dominant paradigm for mass-univariate analysis of neuroimaging data. More recently, a Bayesian approach termed Posterior Probability Mapping (PPM) has been proposed as an alternative. PPM offers two advantages: (i) inferences can be made about effect size thus lending a precise physiological meaning to activated regions, (ii) regions can be declared inactive. This latter facility is most parsimoniously provided by PPMs based on Bayesian model comparisons. To date these comparisons have been implemented by an Independent Model Optimization (IMO) procedure which separately fits null and alternative models. This paper proposes a more computationally efficient procedure based on Savage-Dickey approximations to the Bayes factor, and Taylor-series approximations to the voxel-wise posterior covariance matrices. Simulations show the accuracy of this Savage-Dickey-Taylor (SDT) method to be comparable to that of IMO. Results on fMRI data show excellent agreement between SDT and IMO for second-level models, and reasonable agreement for first-level models. This Savage-Dickey test is a Bayesian analogue of the classical SPM-F and allows users to implement model comparison in a truly interactive manner. PMID:23533640

  7. Using SPM 12’s Second-Level Bayesian Inference Procedure for fMRI Analysis: Practical Guidelines for End Users

    PubMed Central

    Han, Hyemin; Park, Joonsuk

    2018-01-01

    Recent debates about the conventional traditional threshold used in the fields of neuroscience and psychology, namely P < 0.05, have spurred researchers to consider alternative ways to analyze fMRI data. A group of methodologists and statisticians have considered Bayesian inference as a candidate methodology. However, few previous studies have attempted to provide end users of fMRI analysis tools, such as SPM 12, with practical guidelines about how to conduct Bayesian inference. In the present study, we aim to demonstrate how to utilize Bayesian inference, Bayesian second-level inference in particular, implemented in SPM 12 by analyzing fMRI data available to public via NeuroVault. In addition, to help end users understand how Bayesian inference actually works in SPM 12, we examine outcomes from Bayesian second-level inference implemented in SPM 12 by comparing them with those from classical second-level inference. Finally, we provide practical guidelines about how to set the parameters for Bayesian inference and how to interpret the results, such as Bayes factors, from the inference. We also discuss the practical and philosophical benefits of Bayesian inference and directions for future research. PMID:29456498

  8. Bayesian inference for joint modelling of longitudinal continuous, binary and ordinal events.

    PubMed

    Li, Qiuju; Pan, Jianxin; Belcher, John

    2016-12-01

    In medical studies, repeated measurements of continuous, binary and ordinal outcomes are routinely collected from the same patient. Instead of modelling each outcome separately, in this study we propose to jointly model the trivariate longitudinal responses, so as to take account of the inherent association between the different outcomes and thus improve statistical inferences. This work is motivated by a large cohort study in the North West of England, involving trivariate responses from each patient: Body Mass Index, Depression (Yes/No) ascertained with cut-off score not less than 8 at the Hospital Anxiety and Depression Scale, and Pain Interference generated from the Medical Outcomes Study 36-item short-form health survey with values returned on an ordinal scale 1-5. There are some well-established methods for combined continuous and binary, or even continuous and ordinal responses, but little work was done on the joint analysis of continuous, binary and ordinal responses. We propose conditional joint random-effects models, which take into account the inherent association between the continuous, binary and ordinal outcomes. Bayesian analysis methods are used to make statistical inferences. Simulation studies show that, by jointly modelling the trivariate outcomes, standard deviations of the estimates of parameters in the models are smaller and much more stable, leading to more efficient parameter estimates and reliable statistical inferences. In the real data analysis, the proposed joint analysis yields a much smaller deviance information criterion value than the separate analysis, and shows other good statistical properties too. © The Author(s) 2014.

  9. Computational State Space Models for Activity and Intention Recognition. A Feasibility Study

    PubMed Central

    Krüger, Frank; Nyolt, Martin; Yordanova, Kristina; Hein, Albert; Kirste, Thomas

    2014-01-01

    Background Computational state space models (CSSMs) enable the knowledge-based construction of Bayesian filters for recognizing intentions and reconstructing activities of human protagonists in application domains such as smart environments, assisted living, or security. Computational, i. e., algorithmic, representations allow the construction of increasingly complex human behaviour models. However, the symbolic models used in CSSMs potentially suffer from combinatorial explosion, rendering inference intractable outside of the limited experimental settings investigated in present research. The objective of this study was to obtain data on the feasibility of CSSM-based inference in domains of realistic complexity. Methods A typical instrumental activity of daily living was used as a trial scenario. As primary sensor modality, wearable inertial measurement units were employed. The results achievable by CSSM methods were evaluated by comparison with those obtained from established training-based methods (hidden Markov models, HMMs) using Wilcoxon signed rank tests. The influence of modeling factors on CSSM performance was analyzed via repeated measures analysis of variance. Results The symbolic domain model was found to have more than states, exceeding the complexity of models considered in previous research by at least three orders of magnitude. Nevertheless, if factors and procedures governing the inference process were suitably chosen, CSSMs outperformed HMMs. Specifically, inference methods used in previous studies (particle filters) were found to perform substantially inferior in comparison to a marginal filtering procedure. Conclusions Our results suggest that the combinatorial explosion caused by rich CSSM models does not inevitably lead to intractable inference or inferior performance. This means that the potential benefits of CSSM models (knowledge-based model construction, model reusability, reduced need for training data) are available without performance penalty. However, our results also show that research on CSSMs needs to consider sufficiently complex domains in order to understand the effects of design decisions such as choice of heuristics or inference procedure on performance. PMID:25372138

  10. Neuronal couplings between retinal ganglion cells inferred by efficient inverse statistical physics methods

    PubMed Central

    Cocco, Simona; Leibler, Stanislas; Monasson, Rémi

    2009-01-01

    Complexity of neural systems often makes impracticable explicit measurements of all interactions between their constituents. Inverse statistical physics approaches, which infer effective couplings between neurons from their spiking activity, have been so far hindered by their computational complexity. Here, we present 2 complementary, computationally efficient inverse algorithms based on the Ising and “leaky integrate-and-fire” models. We apply those algorithms to reanalyze multielectrode recordings in the salamander retina in darkness and under random visual stimulus. We find strong positive couplings between nearby ganglion cells common to both stimuli, whereas long-range couplings appear under random stimulus only. The uncertainty on the inferred couplings due to limitations in the recordings (duration, small area covered on the retina) is discussed. Our methods will allow real-time evaluation of couplings for large assemblies of neurons. PMID:19666487

  11. Confidence crisis of results in biomechanics research.

    PubMed

    Knudson, Duane

    2017-11-01

    Many biomechanics studies have small sample sizes and incorrect statistical analyses, so reporting of inaccurate inferences and inflated magnitude of effects are common in the field. This review examines these issues in biomechanics research and summarises potential solutions from research in other fields to increase the confidence in the experimental effects reported in biomechanics. Authors, reviewers and editors of biomechanics research reports are encouraged to improve sample sizes and the resulting statistical power, improve reporting transparency, improve the rigour of statistical analyses used, and increase the acceptance of replication studies to improve the validity of inferences from data in biomechanics research. The application of sports biomechanics research results would also improve if a larger percentage of unbiased effects and their uncertainty were reported in the literature.

  12. The Empirical Nature and Statistical Treatment of Missing Data

    ERIC Educational Resources Information Center

    Tannenbaum, Christyn E.

    2009-01-01

    Introduction. Missing data is a common problem in research and can produce severely misleading analyses, including biased estimates of statistical parameters, and erroneous conclusions. In its 1999 report, the APA Task Force on Statistical Inference encouraged authors to report complications such as missing data and discouraged the use of…

  13. Cognitive Transfer Outcomes for a Simulation-Based Introductory Statistics Curriculum

    ERIC Educational Resources Information Center

    Backman, Matthew D.; Delmas, Robert C.; Garfield, Joan

    2017-01-01

    Cognitive transfer is the ability to apply learned skills and knowledge to new applications and contexts. This investigation evaluates cognitive transfer outcomes for a tertiary-level introductory statistics course using the CATALST curriculum, which exclusively used simulation-based methods to develop foundations of statistical inference. A…

  14. The Role of the Sampling Distribution in Understanding Statistical Inference

    ERIC Educational Resources Information Center

    Lipson, Kay

    2003-01-01

    Many statistics educators believe that few students develop the level of conceptual understanding essential for them to apply correctly the statistical techniques at their disposal and to interpret their outcomes appropriately. It is also commonly believed that the sampling distribution plays an important role in developing this understanding.…

  15. Updated Intensity - Duration - Frequency Curves Under Different Future Climate Scenarios

    NASA Astrophysics Data System (ADS)

    Ragno, E.; AghaKouchak, A.

    2016-12-01

    Current infrastructure design procedures rely on the use of Intensity - Duration - Frequency (IDF) curves retrieved under the assumption of temporal stationarity, meaning that occurrences of extreme events are expected to be time invariant. However, numerous studies have observed more severe extreme events over time. Hence, the stationarity assumption for extreme analysis may not be appropriate in a warming climate. This issue raises concerns regarding the safety and resilience of the existing and future infrastructures. Here we employ historical and projected (RCP 8.5) CMIP5 runs to investigate IDF curves of 14 urban areas across the United States. We first statistically assess changes in precipitation extremes using an energy-based test for equal distributions. Then, through a Bayesian inference approach for stationary and non-stationary extreme value analysis, we provide updated IDF curves based on climatic model projections. This presentation summarizes the projected changes in statistics of extremes. We show that, based on CMIP5 simulations, extreme precipitation events in some urban areas can be 20% more severe in the future, even when projected annual mean precipitation is expected to remain similar to the ground-based climatology.

  16. A statistical method for lung tumor segmentation uncertainty in PET images based on user inference.

    PubMed

    Zheng, Chaojie; Wang, Xiuying; Feng, Dagan

    2015-01-01

    PET has been widely accepted as an effective imaging modality for lung tumor diagnosis and treatment. However, standard criteria for delineating tumor boundary from PET are yet to develop largely due to relatively low quality of PET images, uncertain tumor boundary definition, and variety of tumor characteristics. In this paper, we propose a statistical solution to segmentation uncertainty on the basis of user inference. We firstly define the uncertainty segmentation band on the basis of segmentation probability map constructed from Random Walks (RW) algorithm; and then based on the extracted features of the user inference, we use Principle Component Analysis (PCA) to formulate the statistical model for labeling the uncertainty band. We validated our method on 10 lung PET-CT phantom studies from the public RIDER collections [1] and 16 clinical PET studies where tumors were manually delineated by two experienced radiologists. The methods were validated using Dice similarity coefficient (DSC) to measure the spatial volume overlap. Our method achieved an average DSC of 0.878 ± 0.078 on phantom studies and 0.835 ± 0.039 on clinical studies.

  17. Karl Pearson and eugenics: personal opinions and scientific rigor.

    PubMed

    Delzell, Darcie A P; Poliak, Cathy D

    2013-09-01

    The influence of personal opinions and biases on scientific conclusions is a threat to the advancement of knowledge. Expertise and experience does not render one immune to this temptation. In this work, one of the founding fathers of statistics, Karl Pearson, is used as an illustration of how even the most talented among us can produce misleading results when inferences are made without caution or reference to potential bias and other analysis limitations. A study performed by Pearson on British Jewish schoolchildren is examined in light of ethical and professional statistical practice. The methodology used and inferences made by Pearson and his coauthor are sometimes questionable and offer insight into how Pearson's support of eugenics and his own British nationalism could have potentially influenced his often careless and far-fetched inferences. A short background into Pearson's work and beliefs is provided, along with an in-depth examination of the authors' overall experimental design and statistical practices. In addition, portions of the study regarding intelligence and tuberculosis are discussed in more detail, along with historical reactions to their work.

  18. Assessing the significance of pedobarographic signals using random field theory.

    PubMed

    Pataky, Todd C

    2008-08-07

    Traditional pedobarographic statistical analyses are conducted over discrete regions. Recent studies have demonstrated that regionalization can corrupt pedobarographic field data through conflation when arbitrary dividing lines inappropriately delineate smooth field processes. An alternative is to register images such that homologous structures optimally overlap and then conduct statistical tests at each pixel to generate statistical parametric maps (SPMs). The significance of SPM processes may be assessed within the framework of random field theory (RFT). RFT is ideally suited to pedobarographic image analysis because its fundamental data unit is a lattice sampling of a smooth and continuous spatial field. To correct for the vast number of multiple comparisons inherent in such data, recent pedobarographic studies have employed a Bonferroni correction to retain a constant family-wise error rate. This approach unfortunately neglects the spatial correlation of neighbouring pixels, so provides an overly conservative (albeit valid) statistical threshold. RFT generally relaxes the threshold depending on field smoothness and on the geometry of the search area, but it also provides a framework for assigning p values to suprathreshold clusters based on their spatial extent. The current paper provides an overview of basic RFT concepts and uses simulated and experimental data to validate both RFT-relevant field smoothness estimations and RFT predictions regarding the topological characteristics of random pedobarographic fields. Finally, previously published experimental data are re-analysed using RFT inference procedures to demonstrate how RFT yields easily understandable statistical results that may be incorporated into routine clinical and laboratory analyses.

  19. In silico model-based inference: a contemporary approach for hypothesis testing in network biology

    PubMed Central

    Klinke, David J.

    2014-01-01

    Inductive inference plays a central role in the study of biological systems where one aims to increase their understanding of the system by reasoning backwards from uncertain observations to identify causal relationships among components of the system. These causal relationships are postulated from prior knowledge as a hypothesis or simply a model. Experiments are designed to test the model. Inferential statistics are used to establish a level of confidence in how well our postulated model explains the acquired data. This iterative process, commonly referred to as the scientific method, either improves our confidence in a model or suggests that we revisit our prior knowledge to develop a new model. Advances in technology impact how we use prior knowledge and data to formulate models of biological networks and how we observe cellular behavior. However, the approach for model-based inference has remained largely unchanged since Fisher, Neyman and Pearson developed the ideas in the early 1900’s that gave rise to what is now known as classical statistical hypothesis (model) testing. Here, I will summarize conventional methods for model-based inference and suggest a contemporary approach to aid in our quest to discover how cells dynamically interpret and transmit information for therapeutic aims that integrates ideas drawn from high performance computing, Bayesian statistics, and chemical kinetics. PMID:25139179

  20. In silico model-based inference: a contemporary approach for hypothesis testing in network biology.

    PubMed

    Klinke, David J

    2014-01-01

    Inductive inference plays a central role in the study of biological systems where one aims to increase their understanding of the system by reasoning backwards from uncertain observations to identify causal relationships among components of the system. These causal relationships are postulated from prior knowledge as a hypothesis or simply a model. Experiments are designed to test the model. Inferential statistics are used to establish a level of confidence in how well our postulated model explains the acquired data. This iterative process, commonly referred to as the scientific method, either improves our confidence in a model or suggests that we revisit our prior knowledge to develop a new model. Advances in technology impact how we use prior knowledge and data to formulate models of biological networks and how we observe cellular behavior. However, the approach for model-based inference has remained largely unchanged since Fisher, Neyman and Pearson developed the ideas in the early 1900s that gave rise to what is now known as classical statistical hypothesis (model) testing. Here, I will summarize conventional methods for model-based inference and suggest a contemporary approach to aid in our quest to discover how cells dynamically interpret and transmit information for therapeutic aims that integrates ideas drawn from high performance computing, Bayesian statistics, and chemical kinetics. © 2014 American Institute of Chemical Engineers.

  1. Spatio-temporal conditional inference and hypothesis tests for neural ensemble spiking precision

    PubMed Central

    Harrison, Matthew T.; Amarasingham, Asohan; Truccolo, Wilson

    2014-01-01

    The collective dynamics of neural ensembles create complex spike patterns with many spatial and temporal scales. Understanding the statistical structure of these patterns can help resolve fundamental questions about neural computation and neural dynamics. Spatio-temporal conditional inference (STCI) is introduced here as a semiparametric statistical framework for investigating the nature of precise spiking patterns from collections of neurons that is robust to arbitrarily complex and nonstationary coarse spiking dynamics. The main idea is to focus statistical modeling and inference, not on the full distribution of the data, but rather on families of conditional distributions of precise spiking given different types of coarse spiking. The framework is then used to develop families of hypothesis tests for probing the spatio-temporal precision of spiking patterns. Relationships among different conditional distributions are used to improve multiple hypothesis testing adjustments and to design novel Monte Carlo spike resampling algorithms. Of special note are algorithms that can locally jitter spike times while still preserving the instantaneous peri-stimulus time histogram (PSTH) or the instantaneous total spike count from a group of recorded neurons. The framework can also be used to test whether first-order maximum entropy models with possibly random and time-varying parameters can account for observed patterns of spiking. STCI provides a detailed example of the generic principle of conditional inference, which may be applicable in other areas of neurostatistical analysis. PMID:25380339

  2. A hierarchical Bayesian approach to adaptive vision testing: A case study with the contrast sensitivity function.

    PubMed

    Gu, Hairong; Kim, Woojae; Hou, Fang; Lesmes, Luis Andres; Pitt, Mark A; Lu, Zhong-Lin; Myung, Jay I

    2016-01-01

    Measurement efficiency is of concern when a large number of observations are required to obtain reliable estimates for parametric models of vision. The standard entropy-based Bayesian adaptive testing procedures addressed the issue by selecting the most informative stimulus in sequential experimental trials. Noninformative, diffuse priors were commonly used in those tests. Hierarchical adaptive design optimization (HADO; Kim, Pitt, Lu, Steyvers, & Myung, 2014) further improves the efficiency of the standard Bayesian adaptive testing procedures by constructing an informative prior using data from observers who have already participated in the experiment. The present study represents an empirical validation of HADO in estimating the human contrast sensitivity function. The results show that HADO significantly improves the accuracy and precision of parameter estimates, and therefore requires many fewer observations to obtain reliable inference about contrast sensitivity, compared to the method of quick contrast sensitivity function (Lesmes, Lu, Baek, & Albright, 2010), which uses the standard Bayesian procedure. The improvement with HADO was maintained even when the prior was constructed from heterogeneous populations or a relatively small number of observers. These results of this case study support the conclusion that HADO can be used in Bayesian adaptive testing by replacing noninformative, diffuse priors with statistically justified informative priors without introducing unwanted bias.

  3. A hierarchical Bayesian approach to adaptive vision testing: A case study with the contrast sensitivity function

    PubMed Central

    Gu, Hairong; Kim, Woojae; Hou, Fang; Lesmes, Luis Andres; Pitt, Mark A.; Lu, Zhong-Lin; Myung, Jay I.

    2016-01-01

    Measurement efficiency is of concern when a large number of observations are required to obtain reliable estimates for parametric models of vision. The standard entropy-based Bayesian adaptive testing procedures addressed the issue by selecting the most informative stimulus in sequential experimental trials. Noninformative, diffuse priors were commonly used in those tests. Hierarchical adaptive design optimization (HADO; Kim, Pitt, Lu, Steyvers, & Myung, 2014) further improves the efficiency of the standard Bayesian adaptive testing procedures by constructing an informative prior using data from observers who have already participated in the experiment. The present study represents an empirical validation of HADO in estimating the human contrast sensitivity function. The results show that HADO significantly improves the accuracy and precision of parameter estimates, and therefore requires many fewer observations to obtain reliable inference about contrast sensitivity, compared to the method of quick contrast sensitivity function (Lesmes, Lu, Baek, & Albright, 2010), which uses the standard Bayesian procedure. The improvement with HADO was maintained even when the prior was constructed from heterogeneous populations or a relatively small number of observers. These results of this case study support the conclusion that HADO can be used in Bayesian adaptive testing by replacing noninformative, diffuse priors with statistically justified informative priors without introducing unwanted bias. PMID:27105061

  4. High-Dimensional Intrinsic Interpolation Using Gaussian Process Regression and Diffusion Maps

    DOE PAGES

    Thimmisetty, Charanraj A.; Ghanem, Roger G.; White, Joshua A.; ...

    2017-10-10

    This article considers the challenging task of estimating geologic properties of interest using a suite of proxy measurements. The current work recast this task as a manifold learning problem. In this process, this article introduces a novel regression procedure for intrinsic variables constrained onto a manifold embedded in an ambient space. The procedure is meant to sharpen high-dimensional interpolation by inferring non-linear correlations from the data being interpolated. The proposed approach augments manifold learning procedures with a Gaussian process regression. It first identifies, using diffusion maps, a low-dimensional manifold embedded in an ambient high-dimensional space associated with the data. Itmore » relies on the diffusion distance associated with this construction to define a distance function with which the data model is equipped. This distance metric function is then used to compute the correlation structure of a Gaussian process that describes the statistical dependence of quantities of interest in the high-dimensional ambient space. The proposed method is applicable to arbitrarily high-dimensional data sets. Here, it is applied to subsurface characterization using a suite of well log measurements. The predictions obtained in original, principal component, and diffusion space are compared using both qualitative and quantitative metrics. Considerable improvement in the prediction of the geological structural properties is observed with the proposed method.« less

  5. High-Dimensional Intrinsic Interpolation Using Gaussian Process Regression and Diffusion Maps

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Thimmisetty, Charanraj A.; Ghanem, Roger G.; White, Joshua A.

    This article considers the challenging task of estimating geologic properties of interest using a suite of proxy measurements. The current work recast this task as a manifold learning problem. In this process, this article introduces a novel regression procedure for intrinsic variables constrained onto a manifold embedded in an ambient space. The procedure is meant to sharpen high-dimensional interpolation by inferring non-linear correlations from the data being interpolated. The proposed approach augments manifold learning procedures with a Gaussian process regression. It first identifies, using diffusion maps, a low-dimensional manifold embedded in an ambient high-dimensional space associated with the data. Itmore » relies on the diffusion distance associated with this construction to define a distance function with which the data model is equipped. This distance metric function is then used to compute the correlation structure of a Gaussian process that describes the statistical dependence of quantities of interest in the high-dimensional ambient space. The proposed method is applicable to arbitrarily high-dimensional data sets. Here, it is applied to subsurface characterization using a suite of well log measurements. The predictions obtained in original, principal component, and diffusion space are compared using both qualitative and quantitative metrics. Considerable improvement in the prediction of the geological structural properties is observed with the proposed method.« less

  6. Data mining and statistical inference in selective laser melting

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kamath, Chandrika

    Selective laser melting (SLM) is an additive manufacturing process that builds a complex three-dimensional part, layer-by-layer, using a laser beam to fuse fine metal powder together. The design freedom afforded by SLM comes associated with complexity. As the physical phenomena occur over a broad range of length and time scales, the computational cost of modeling the process is high. At the same time, the large number of parameters that control the quality of a part make experiments expensive. In this paper, we describe ways in which we can use data mining and statistical inference techniques to intelligently combine simulations andmore » experiments to build parts with desired properties. We start with a brief summary of prior work in finding process parameters for high-density parts. We then expand on this work to show how we can improve the approach by using feature selection techniques to identify important variables, data-driven surrogate models to reduce computational costs, improved sampling techniques to cover the design space adequately, and uncertainty analysis for statistical inference. Here, our results indicate that techniques from data mining and statistics can complement those from physical modeling to provide greater insight into complex processes such as selective laser melting.« less

  7. Data mining and statistical inference in selective laser melting

    DOE PAGES

    Kamath, Chandrika

    2016-01-11

    Selective laser melting (SLM) is an additive manufacturing process that builds a complex three-dimensional part, layer-by-layer, using a laser beam to fuse fine metal powder together. The design freedom afforded by SLM comes associated with complexity. As the physical phenomena occur over a broad range of length and time scales, the computational cost of modeling the process is high. At the same time, the large number of parameters that control the quality of a part make experiments expensive. In this paper, we describe ways in which we can use data mining and statistical inference techniques to intelligently combine simulations andmore » experiments to build parts with desired properties. We start with a brief summary of prior work in finding process parameters for high-density parts. We then expand on this work to show how we can improve the approach by using feature selection techniques to identify important variables, data-driven surrogate models to reduce computational costs, improved sampling techniques to cover the design space adequately, and uncertainty analysis for statistical inference. Here, our results indicate that techniques from data mining and statistics can complement those from physical modeling to provide greater insight into complex processes such as selective laser melting.« less

  8. Fully Bayesian inference for structural MRI: application to segmentation and statistical analysis of T2-hypointensities.

    PubMed

    Schmidt, Paul; Schmid, Volker J; Gaser, Christian; Buck, Dorothea; Bührlen, Susanne; Förschler, Annette; Mühlau, Mark

    2013-01-01

    Aiming at iron-related T2-hypointensity, which is related to normal aging and neurodegenerative processes, we here present two practicable approaches, based on Bayesian inference, for preprocessing and statistical analysis of a complex set of structural MRI data. In particular, Markov Chain Monte Carlo methods were used to simulate posterior distributions. First, we rendered a segmentation algorithm that uses outlier detection based on model checking techniques within a Bayesian mixture model. Second, we rendered an analytical tool comprising a Bayesian regression model with smoothness priors (in the form of Gaussian Markov random fields) mitigating the necessity to smooth data prior to statistical analysis. For validation, we used simulated data and MRI data of 27 healthy controls (age: [Formula: see text]; range, [Formula: see text]). We first observed robust segmentation of both simulated T2-hypointensities and gray-matter regions known to be T2-hypointense. Second, simulated data and images of segmented T2-hypointensity were analyzed. We found not only robust identification of simulated effects but also a biologically plausible age-related increase of T2-hypointensity primarily within the dentate nucleus but also within the globus pallidus, substantia nigra, and red nucleus. Our results indicate that fully Bayesian inference can successfully be applied for preprocessing and statistical analysis of structural MRI data.

  9. Drug target inference through pathway analysis of genomics data

    PubMed Central

    Ma, Haisu; Zhao, Hongyu

    2013-01-01

    Statistical modeling coupled with bioinformatics is commonly used for drug discovery. Although there exist many approaches for single target based drug design and target inference, recent years have seen a paradigm shift to system-level pharmacological research. Pathway analysis of genomics data represents one promising direction for computational inference of drug targets. This article aims at providing a comprehensive review on the evolving issues is this field, covering methodological developments, their pros and cons, as well as future research directions. PMID:23369829

  10. Reduction of Complications of Local Anaesthesia in Dental Healthcare Setups by Application of the Six Sigma Methodology: A Statistical Quality Improvement Technique

    PubMed Central

    Khatoon, Farheen

    2015-01-01

    Background Health care faces challenges due to complications, inefficiencies and other concerns that threaten the safety of patients. Aim The purpose of his study was to identify causes of complications encountered after administration of local anaesthesia for dental and oral surgical procedures and to reduce the incidence of complications by introduction of six sigma methodology. Materials and Methods DMAIC (Define, Measure, Analyse, Improve and Control) process of Six Sigma was taken into consideration to reduce the incidence of complications encountered after administration of local anaesthesia injections for dental and oral surgical procedures using failure mode and effect analysis. Pareto analysis was taken into consideration to analyse the most recurring complications. Paired z-sample test using Minitab Statistical Inference and Fisher’s exact test was used to statistically analyse the obtained data. The p-value <0.05 was considered as significant value. Results Total 54 systemic and 62 local complications occurred during three months of analyse and measure phase. Syncope, failure of anaesthesia, trismus, auto mordeduras and pain at injection site was found to be most recurring complications. Cumulative defective percentage was 7.99 in case of pre-improved data and decreased to 4.58 in the control phase. Estimate for difference was 0.0341228 and 95% lower bound for difference was 0.0193966. p-value was found to be highly significant with p= 0.000. Conclusion The application of six sigma improvement methodology in healthcare tends to deliver consistently better results to the patients as well as hospitals and results in better patient compliance as well as satisfaction. PMID:26816989

  11. PROBABILITY SAMPLING AND POPULATION INFERENCE IN MONITORING PROGRAMS

    EPA Science Inventory

    A fundamental difference between probability sampling and conventional statistics is that "sampling" deals with real, tangible populations, whereas "conventional statistics" usually deals with hypothetical populations that have no real-world realization. he focus here is on real ...

  12. Statistical Inference in the Learning of Novel Phonetic Categories

    ERIC Educational Resources Information Center

    Zhao, Yuan

    2010-01-01

    Learning a phonetic category (or any linguistic category) requires integrating different sources of information. A crucial unsolved problem for phonetic learning is how this integration occurs: how can we update our previous knowledge about a phonetic category as we hear new exemplars of the category? One model of learning is Bayesian Inference,…

  13. Conceptual Challenges in Coordinating Theoretical and Data-Centered Estimates of Probability

    ERIC Educational Resources Information Center

    Konold, Cliff; Madden, Sandra; Pollatsek, Alexander; Pfannkuch, Maxine; Wild, Chris; Ziedins, Ilze; Finzer, William; Horton, Nicholas J.; Kazak, Sibel

    2011-01-01

    A core component of informal statistical inference is the recognition that judgments based on sample data are inherently uncertain. This implies that instruction aimed at developing informal inference needs to foster basic probabilistic reasoning. In this article, we analyze and critique the now-common practice of introducing students to both…

  14. Campbell's and Rubin's Perspectives on Causal Inference

    ERIC Educational Resources Information Center

    West, Stephen G.; Thoemmes, Felix

    2010-01-01

    Donald Campbell's approach to causal inference (D. T. Campbell, 1957; W. R. Shadish, T. D. Cook, & D. T. Campbell, 2002) is widely used in psychology and education, whereas Donald Rubin's causal model (P. W. Holland, 1986; D. B. Rubin, 1974, 2005) is widely used in economics, statistics, medicine, and public health. Campbell's approach focuses on…

  15. Direct Evidence for a Dual Process Model of Deductive Inference

    ERIC Educational Resources Information Center

    Markovits, Henry; Brunet, Marie-Laurence; Thompson, Valerie; Brisson, Janie

    2013-01-01

    In 2 experiments, we tested a strong version of a dual process theory of conditional inference (cf. Verschueren et al., 2005a, 2005b) that assumes that most reasoners have 2 strategies available, the choice of which is determined by situational variables, cognitive capacity, and metacognitive control. The statistical strategy evaluates inferences…

  16. The Role of Probability in Developing Learners' Models of Simulation Approaches to Inference

    ERIC Educational Resources Information Center

    Lee, Hollylynne S.; Doerr, Helen M.; Tran, Dung; Lovett, Jennifer N.

    2016-01-01

    Repeated sampling approaches to inference that rely on simulations have recently gained prominence in statistics education, and probabilistic concepts are at the core of this approach. In this approach, learners need to develop a mapping among the problem situation, a physical enactment, computer representations, and the underlying randomization…

  17. From Blickets to Synapses: Inferring Temporal Causal Networks by Observation

    ERIC Educational Resources Information Center

    Fernando, Chrisantha

    2013-01-01

    How do human infants learn the causal dependencies between events? Evidence suggests that this remarkable feat can be achieved by observation of only a handful of examples. Many computational models have been produced to explain how infants perform causal inference without explicit teaching about statistics or the scientific method. Here, we…

  18. It's a Girl! Random Numbers, Simulations, and the Law of Large Numbers

    ERIC Educational Resources Information Center

    Goodwin, Chris; Ortiz, Enrique

    2015-01-01

    Modeling using mathematics and making inferences about mathematical situations are becoming more prevalent in most fields of study. Descriptive statistics cannot be used to generalize about a population or make predictions of what can occur. Instead, inference must be used. Simulation and sampling are essential in building a foundation for…

  19. Thou Shalt Not Bear False Witness against Null Hypothesis Significance Testing

    ERIC Educational Resources Information Center

    García-Pérez, Miguel A.

    2017-01-01

    Null hypothesis significance testing (NHST) has been the subject of debate for decades and alternative approaches to data analysis have been proposed. This article addresses this debate from the perspective of scientific inquiry and inference. Inference is an inverse problem and application of statistical methods cannot reveal whether effects…

  20. Hypothesis-Testing Demands Trustworthy Data—A Simulation Approach to Inferential Statistics Advocating the Research Program Strategy

    PubMed Central

    Krefeld-Schwalb, Antonia; Witte, Erich H.; Zenker, Frank

    2018-01-01

    In psychology as elsewhere, the main statistical inference strategy to establish empirical effects is null-hypothesis significance testing (NHST). The recent failure to replicate allegedly well-established NHST-results, however, implies that such results lack sufficient statistical power, and thus feature unacceptably high error-rates. Using data-simulation to estimate the error-rates of NHST-results, we advocate the research program strategy (RPS) as a superior methodology. RPS integrates Frequentist with Bayesian inference elements, and leads from a preliminary discovery against a (random) H0-hypothesis to a statistical H1-verification. Not only do RPS-results feature significantly lower error-rates than NHST-results, RPS also addresses key-deficits of a “pure” Frequentist and a standard Bayesian approach. In particular, RPS aggregates underpowered results safely. RPS therefore provides a tool to regain the trust the discipline had lost during the ongoing replicability-crisis. PMID:29740363

  1. NIRS-SPM: statistical parametric mapping for near infrared spectroscopy

    NASA Astrophysics Data System (ADS)

    Tak, Sungho; Jang, Kwang Eun; Jung, Jinwook; Jang, Jaeduck; Jeong, Yong; Ye, Jong Chul

    2008-02-01

    Even though there exists a powerful statistical parametric mapping (SPM) tool for fMRI, similar public domain tools are not available for near infrared spectroscopy (NIRS). In this paper, we describe a new public domain statistical toolbox called NIRS-SPM for quantitative analysis of NIRS signals. Specifically, NIRS-SPM statistically analyzes the NIRS data using GLM and makes inference as the excursion probability which comes from the random field that are interpolated from the sparse measurement. In order to obtain correct inference, NIRS-SPM offers the pre-coloring and pre-whitening method for temporal correlation estimation. For simultaneous recording NIRS signal with fMRI, the spatial mapping between fMRI image and real coordinate in 3-D digitizer is estimated using Horn's algorithm. These powerful tools allows us the super-resolution localization of the brain activation which is not possible using the conventional NIRS analysis tools.

  2. Hypothesis-Testing Demands Trustworthy Data-A Simulation Approach to Inferential Statistics Advocating the Research Program Strategy.

    PubMed

    Krefeld-Schwalb, Antonia; Witte, Erich H; Zenker, Frank

    2018-01-01

    In psychology as elsewhere, the main statistical inference strategy to establish empirical effects is null-hypothesis significance testing (NHST). The recent failure to replicate allegedly well-established NHST-results, however, implies that such results lack sufficient statistical power, and thus feature unacceptably high error-rates. Using data-simulation to estimate the error-rates of NHST-results, we advocate the research program strategy (RPS) as a superior methodology. RPS integrates Frequentist with Bayesian inference elements, and leads from a preliminary discovery against a (random) H 0 -hypothesis to a statistical H 1 -verification. Not only do RPS-results feature significantly lower error-rates than NHST-results, RPS also addresses key-deficits of a "pure" Frequentist and a standard Bayesian approach. In particular, RPS aggregates underpowered results safely. RPS therefore provides a tool to regain the trust the discipline had lost during the ongoing replicability-crisis.

  3. Application of Bayesian inference to the study of hierarchical organization in self-organized complex adaptive systems

    NASA Astrophysics Data System (ADS)

    Knuth, K. H.

    2001-05-01

    We consider the application of Bayesian inference to the study of self-organized structures in complex adaptive systems. In particular, we examine the distribution of elements, agents, or processes in systems dominated by hierarchical structure. We demonstrate that results obtained by Caianiello [1] on Hierarchical Modular Systems (HMS) can be found by applying Jaynes' Principle of Group Invariance [2] to a few key assumptions about our knowledge of hierarchical organization. Subsequent application of the Principle of Maximum Entropy allows inferences to be made about specific systems. The utility of the Bayesian method is considered by examining both successes and failures of the hierarchical model. We discuss how Caianiello's original statements suffer from the Mind Projection Fallacy [3] and we restate his assumptions thus widening the applicability of the HMS model. The relationship between inference and statistical physics, described by Jaynes [4], is reiterated with the expectation that this realization will aid the field of complex systems research by moving away from often inappropriate direct application of statistical mechanics to a more encompassing inferential methodology.

  4. Conditional statistical inference with multistage testing designs.

    PubMed

    Zwitser, Robert J; Maris, Gunter

    2015-03-01

    In this paper it is demonstrated how statistical inference from multistage test designs can be made based on the conditional likelihood. Special attention is given to parameter estimation, as well as the evaluation of model fit. Two reasons are provided why the fit of simple measurement models is expected to be better in adaptive designs, compared to linear designs: more parameters are available for the same number of observations; and undesirable response behavior, like slipping and guessing, might be avoided owing to a better match between item difficulty and examinee proficiency. The results are illustrated with simulated data, as well as with real data.

  5. Use of Tests of Statistical Significance and Other Analytic Choices in a School Psychology Journal: Review of Practices and Suggested Alternatives.

    ERIC Educational Resources Information Center

    Snyder, Patricia A.; Thompson, Bruce

    The use of tests of statistical significance was explored, first by reviewing some criticisms of contemporary practice in the use of statistical tests as reflected in a series of articles in the "American Psychologist" and in the appointment of a "Task Force on Statistical Inference" by the American Psychological Association…

  6. Standard deviation and standard error of the mean.

    PubMed

    Lee, Dong Kyu; In, Junyong; Lee, Sangseok

    2015-06-01

    In most clinical and experimental studies, the standard deviation (SD) and the estimated standard error of the mean (SEM) are used to present the characteristics of sample data and to explain statistical analysis results. However, some authors occasionally muddle the distinctive usage between the SD and SEM in medical literature. Because the process of calculating the SD and SEM includes different statistical inferences, each of them has its own meaning. SD is the dispersion of data in a normal distribution. In other words, SD indicates how accurately the mean represents sample data. However the meaning of SEM includes statistical inference based on the sampling distribution. SEM is the SD of the theoretical distribution of the sample means (the sampling distribution). While either SD or SEM can be applied to describe data and statistical results, one should be aware of reasonable methods with which to use SD and SEM. We aim to elucidate the distinctions between SD and SEM and to provide proper usage guidelines for both, which summarize data and describe statistical results.

  7. Standard deviation and standard error of the mean

    PubMed Central

    In, Junyong; Lee, Sangseok

    2015-01-01

    In most clinical and experimental studies, the standard deviation (SD) and the estimated standard error of the mean (SEM) are used to present the characteristics of sample data and to explain statistical analysis results. However, some authors occasionally muddle the distinctive usage between the SD and SEM in medical literature. Because the process of calculating the SD and SEM includes different statistical inferences, each of them has its own meaning. SD is the dispersion of data in a normal distribution. In other words, SD indicates how accurately the mean represents sample data. However the meaning of SEM includes statistical inference based on the sampling distribution. SEM is the SD of the theoretical distribution of the sample means (the sampling distribution). While either SD or SEM can be applied to describe data and statistical results, one should be aware of reasonable methods with which to use SD and SEM. We aim to elucidate the distinctions between SD and SEM and to provide proper usage guidelines for both, which summarize data and describe statistical results. PMID:26045923

  8. Influence of atmospheric transport on ozone and trace- level toxic air contaminants over the northeastern United States

    NASA Astrophysics Data System (ADS)

    Brankov, Elvira

    This thesis presents a methodology for examining the relationship between synoptic-scale atmospheric transport patterns and observed pollutant concentration levels. It involves calculating a large number of back-trajectories from the observational site and subjecting them to cluster analysis. The pollutant concentration data observed at that site are then segregated according to the back-trajectory clusters. If the pollutant observations extend over several seasons, it is important to filter out seasonal and long-term components from the time series data before pollutant cluster-segregation, because only the short-term component of the time series data is related to the synoptic-scale transport. Multiple comparison procedures are used to test for significant differences in the chemical composition of pollutant data associated with each cluster. This procedure is useful in indicating potential pollutant source regions and isolating meteorological regimes associated with pollutant transport from those regions. If many observational sites are available, the spatial and temporal scales of the pollution transport from a given direction can be extracted through the time-lagged inter- site correlation analysis of pollutant concentrations. The proposed methodology is applicable to any pollutant at any site if sufficiently abundant data set is available. This is illustrated through examination of five-year long time series data of ozone concentrations at several sites in the Northeast. The results provide evidence of ozone transport to these sites, revealing the characteristic spatial and temporal scales involved in the transport and identifying source regions for this pollutant. Problems related to statistical analyses of censored data are addressed in the second half of this thesis. Although censoring (reporting concentrations in a non-quantitative way) is typical for trace-level measurements, methods for statistical analysis, inference and interpretation of such data are complex and still under development. In this study, multiple comparison of censored data sets was required in order to examine the influence of synoptic- scale circulations on concentration levels of several trace-level toxic pollutants observed in the Northeast (e.g., As, Se, Mn, V, etc.). Since the traditional multiple comparison procedures are not readily applicable to such data sets, a Monte Carlo simulation study was performed to assess several nonparametric methods for multiple comparison of censored data sets. Application of an appropriate comparison procedure to clusters of toxic trace elements observed in the Northeast led to the identification of potential source regions and atmospheric patterns associated with the long-range transport of these pollutants. A method for comparison of proportions and elemental ratio calculations were used to confirm/clarify these inferences with a greater degree of confidence.

  9. Visual shape perception as Bayesian inference of 3D object-centered shape representations.

    PubMed

    Erdogan, Goker; Jacobs, Robert A

    2017-11-01

    Despite decades of research, little is known about how people visually perceive object shape. We hypothesize that a promising approach to shape perception is provided by a "visual perception as Bayesian inference" framework which augments an emphasis on visual representation with an emphasis on the idea that shape perception is a form of statistical inference. Our hypothesis claims that shape perception of unfamiliar objects can be characterized as statistical inference of 3D shape in an object-centered coordinate system. We describe a computational model based on our theoretical framework, and provide evidence for the model along two lines. First, we show that, counterintuitively, the model accounts for viewpoint-dependency of object recognition, traditionally regarded as evidence against people's use of 3D object-centered shape representations. Second, we report the results of an experiment using a shape similarity task, and present an extensive evaluation of existing models' abilities to account for the experimental data. We find that our shape inference model captures subjects' behaviors better than competing models. Taken as a whole, our experimental and computational results illustrate the promise of our approach and suggest that people's shape representations of unfamiliar objects are probabilistic, 3D, and object-centered. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  10. On the multiple imputation variance estimator for control-based and delta-adjusted pattern mixture models.

    PubMed

    Tang, Yongqiang

    2017-12-01

    Control-based pattern mixture models (PMM) and delta-adjusted PMMs are commonly used as sensitivity analyses in clinical trials with non-ignorable dropout. These PMMs assume that the statistical behavior of outcomes varies by pattern in the experimental arm in the imputation procedure, but the imputed data are typically analyzed by a standard method such as the primary analysis model. In the multiple imputation (MI) inference, Rubin's variance estimator is generally biased when the imputation and analysis models are uncongenial. One objective of the article is to quantify the bias of Rubin's variance estimator in the control-based and delta-adjusted PMMs for longitudinal continuous outcomes. These PMMs assume the same observed data distribution as the mixed effects model for repeated measures (MMRM). We derive analytic expressions for the MI treatment effect estimator and the associated Rubin's variance in these PMMs and MMRM as functions of the maximum likelihood estimator from the MMRM analysis and the observed proportion of subjects in each dropout pattern when the number of imputations is infinite. The asymptotic bias is generally small or negligible in the delta-adjusted PMM, but can be sizable in the control-based PMM. This indicates that the inference based on Rubin's rule is approximately valid in the delta-adjusted PMM. A simple variance estimator is proposed to ensure asymptotically valid MI inferences in these PMMs, and compared with the bootstrap variance. The proposed method is illustrated by the analysis of an antidepressant trial, and its performance is further evaluated via a simulation study. © 2017, The International Biometric Society.

  11. Applying dynamic Bayesian networks to perturbed gene expression data.

    PubMed

    Dojer, Norbert; Gambin, Anna; Mizera, Andrzej; Wilczyński, Bartek; Tiuryn, Jerzy

    2006-05-08

    A central goal of molecular biology is to understand the regulatory mechanisms of gene transcription and protein synthesis. Because of their solid basis in statistics, allowing to deal with the stochastic aspects of gene expressions and noisy measurements in a natural way, Bayesian networks appear attractive in the field of inferring gene interactions structure from microarray experiments data. However, the basic formalism has some disadvantages, e.g. it is sometimes hard to distinguish between the origin and the target of an interaction. Two kinds of microarray experiments yield data particularly rich in information regarding the direction of interactions: time series and perturbation experiments. In order to correctly handle them, the basic formalism must be modified. For example, dynamic Bayesian networks (DBN) apply to time series microarray data. To our knowledge the DBN technique has not been applied in the context of perturbation experiments. We extend the framework of dynamic Bayesian networks in order to incorporate perturbations. Moreover, an exact algorithm for inferring an optimal network is proposed and a discretization method specialized for time series data from perturbation experiments is introduced. We apply our procedure to realistic simulations data. The results are compared with those obtained by standard DBN learning techniques. Moreover, the advantages of using exact learning algorithm instead of heuristic methods are analyzed. We show that the quality of inferred networks dramatically improves when using data from perturbation experiments. We also conclude that the exact algorithm should be used when it is possible, i.e. when considered set of genes is small enough.

  12. Statistical Methods for Generalized Linear Models with Covariates Subject to Detection Limits.

    PubMed

    Bernhardt, Paul W; Wang, Huixia J; Zhang, Daowen

    2015-05-01

    Censored observations are a common occurrence in biomedical data sets. Although a large amount of research has been devoted to estimation and inference for data with censored responses, very little research has focused on proper statistical procedures when predictors are censored. In this paper, we consider statistical methods for dealing with multiple predictors subject to detection limits within the context of generalized linear models. We investigate and adapt several conventional methods and develop a new multiple imputation approach for analyzing data sets with predictors censored due to detection limits. We establish the consistency and asymptotic normality of the proposed multiple imputation estimator and suggest a computationally simple and consistent variance estimator. We also demonstrate that the conditional mean imputation method often leads to inconsistent estimates in generalized linear models, while several other methods are either computationally intensive or lead to parameter estimates that are biased or more variable compared to the proposed multiple imputation estimator. In an extensive simulation study, we assess the bias and variability of different approaches within the context of a logistic regression model and compare variance estimation methods for the proposed multiple imputation estimator. Lastly, we apply several methods to analyze the data set from a recently-conducted GenIMS study.

  13. The Effects of Statistical Multiplicity of Infection on Virus Quantification and Infectivity Assays.

    PubMed

    Mistry, Bhaven A; D'Orsogna, Maria R; Chou, Tom

    2018-06-19

    Many biological assays are employed in virology to quantify parameters of interest. Two such classes of assays, virus quantification assays (VQAs) and infectivity assays (IAs), aim to estimate the number of viruses present in a solution and the ability of a viral strain to successfully infect a host cell, respectively. VQAs operate at extremely dilute concentrations, and results can be subject to stochastic variability in virus-cell interactions. At the other extreme, high viral-particle concentrations are used in IAs, resulting in large numbers of viruses infecting each cell, enough for measurable change in total transcription activity. Furthermore, host cells can be infected at any concentration regime by multiple particles, resulting in a statistical multiplicity of infection and yielding potentially significant variability in the assay signal and parameter estimates. We develop probabilistic models for statistical multiplicity of infection at low and high viral-particle-concentration limits and apply them to the plaque (VQA), endpoint dilution (VQA), and luciferase reporter (IA) assays. A web-based tool implementing our models and analysis is also developed and presented. We test our proposed new methods for inferring experimental parameters from data using numerical simulations and show improvement on existing procedures in all limits. Copyright © 2018 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  14. Robust inference for group sequential trials.

    PubMed

    Ganju, Jitendra; Lin, Yunzhi; Zhou, Kefei

    2017-03-01

    For ethical reasons, group sequential trials were introduced to allow trials to stop early in the event of extreme results. Endpoints in such trials are usually mortality or irreversible morbidity. For a given endpoint, the norm is to use a single test statistic and to use that same statistic for each analysis. This approach is risky because the test statistic has to be specified before the study is unblinded, and there is loss in power if the assumptions that ensure optimality for each analysis are not met. To minimize the risk of moderate to substantial loss in power due to a suboptimal choice of a statistic, a robust method was developed for nonsequential trials. The concept is analogous to diversification of financial investments to minimize risk. The method is based on combining P values from multiple test statistics for formal inference while controlling the type I error rate at its designated value.This article evaluates the performance of 2 P value combining methods for group sequential trials. The emphasis is on time to event trials although results from less complex trials are also included. The gain or loss in power with the combination method relative to a single statistic is asymmetric in its favor. Depending on the power of each individual test, the combination method can give more power than any single test or give power that is closer to the test with the most power. The versatility of the method is that it can combine P values from different test statistics for analysis at different times. The robustness of results suggests that inference from group sequential trials can be strengthened with the use of combined tests. Copyright © 2017 John Wiley & Sons, Ltd.

  15. Précis of statistical significance: rationale, validity, and utility.

    PubMed

    Chow, S L

    1998-04-01

    The null-hypothesis significance-test procedure (NHSTP) is defended in the context of the theory-corroboration experiment, as well as the following contrasts: (a) substantive hypotheses versus statistical hypotheses, (b) theory corroboration versus statistical hypothesis testing, (c) theoretical inference versus statistical decision, (d) experiments versus nonexperimental studies, and (e) theory corroboration versus treatment assessment. The null hypothesis can be true because it is the hypothesis that errors are randomly distributed in data. Moreover, the null hypothesis is never used as a categorical proposition. Statistical significance means only that chance influences can be excluded as an explanation of data; it does not identify the nonchance factor responsible. The experimental conclusion is drawn with the inductive principle underlying the experimental design. A chain of deductive arguments gives rise to the theoretical conclusion via the experimental conclusion. The anomalous relationship between statistical significance and the effect size often used to criticize NHSTP is more apparent than real. The absolute size of the effect is not an index of evidential support for the substantive hypothesis. Nor is the effect size, by itself, informative as to the practical importance of the research result. Being a conditional probability, statistical power cannot be the a priori probability of statistical significance. The validity of statistical power is debatable because statistical significance is determined with a single sampling distribution of the test statistic based on H0, whereas it takes two distributions to represent statistical power or effect size. Sample size should not be determined in the mechanical manner envisaged in power analysis. It is inappropriate to criticize NHSTP for nonstatistical reasons. At the same time, neither effect size, nor confidence interval estimate, nor posterior probability can be used to exclude chance as an explanation of data. Neither can any of them fulfill the nonstatistical functions expected of them by critics.

  16. Phylogenomics of plant genomes: a methodology for genome-wide searches for orthologs in plants

    PubMed Central

    Conte, Matthieu G; Gaillard, Sylvain; Droc, Gaetan; Perin, Christophe

    2008-01-01

    Background Gene ortholog identification is now a major objective for mining the increasing amount of sequence data generated by complete or partial genome sequencing projects. Comparative and functional genomics urgently need a method for ortholog detection to reduce gene function inference and to aid in the identification of conserved or divergent genetic pathways between several species. As gene functions change during evolution, reconstructing the evolutionary history of genes should be a more accurate way to differentiate orthologs from paralogs. Phylogenomics takes into account phylogenetic information from high-throughput genome annotation and is the most straightforward way to infer orthologs. However, procedures for automatic detection of orthologs are still scarce and suffer from several limitations. Results We developed a procedure for ortholog prediction between Oryza sativa and Arabidopsis thaliana. Firstly, we established an efficient method to cluster A. thaliana and O. sativa full proteomes into gene families. Then, we developed an optimized phylogenomics pipeline for ortholog inference. We validated the full procedure using test sets of orthologs and paralogs to demonstrate that our method outperforms pairwise methods for ortholog predictions. Conclusion Our procedure achieved a high level of accuracy in predicting ortholog and paralog relationships. Phylogenomic predictions for all validated gene families in both species were easily achieved and we can conclude that our methodology outperforms similarly based methods. PMID:18426584

  17. Long-term strategy for the statistical design of a forest health monitoring system

    Treesearch

    Hans T. Schreuder; Raymond L. Czaplewski

    1993-01-01

    A conceptual framework is given for a broad-scale survey of forest health that accomplishes three objectives: generate descriptive statistics; detect changes in such statistics; and simplify analytical inferences that identify, and possibly establish cause-effect relationships. Our paper discusses the development of sampling schemes to satisfy these three objectives,...

  18. Assessing Understanding of Sampling Distributions and Differences in Learning amongst Different Learning Styles

    ERIC Educational Resources Information Center

    Beeman, Jennifer Leigh Sloan

    2013-01-01

    Research has found that students successfully complete an introductory course in statistics without fully comprehending the underlying theory or being able to exhibit statistical reasoning. This is particularly true for the understanding about the sampling distribution of the mean, a crucial concept for statistical inference. This study…

  19. Using Action Research to Develop a Course in Statistical Inference for Workplace-Based Adults

    ERIC Educational Resources Information Center

    Forbes, Sharleen

    2014-01-01

    Many adults who need an understanding of statistical concepts have limited mathematical skills. They need a teaching approach that includes as little mathematical context as possible. Iterative participatory qualitative research (action research) was used to develop a statistical literacy course for adult learners informed by teaching in…

  20. Applying Statistical Process Control to Clinical Data: An Illustration.

    ERIC Educational Resources Information Center

    Pfadt, Al; And Others

    1992-01-01

    Principles of statistical process control are applied to a clinical setting through the use of control charts to detect changes, as part of treatment planning and clinical decision-making processes. The logic of control chart analysis is derived from principles of statistical inference. Sample charts offer examples of evaluating baselines and…

  1. Functional interaction-based nonlinear models with application to multiplatform genomics data.

    PubMed

    Davenport, Clemontina A; Maity, Arnab; Baladandayuthapani, Veerabhadran

    2018-05-07

    Functional regression allows for a scalar response to be dependent on a functional predictor; however, not much work has been done when a scalar exposure that interacts with the functional covariate is introduced. In this paper, we present 2 functional regression models that account for this interaction and propose 2 novel estimation procedures for the parameters in these models. These estimation methods allow for a noisy and/or sparsely observed functional covariate and are easily extended to generalized exponential family responses. We compute standard errors of our estimators, which allows for further statistical inference and hypothesis testing. We compare the performance of the proposed estimators to each other and to one found in the literature via simulation and demonstrate our methods using a real data example. Copyright © 2018 John Wiley & Sons, Ltd.

  2. The two-sample problem with induced dependent censorship.

    PubMed

    Huang, Y

    1999-12-01

    Induced dependent censorship is a general phenomenon in health service evaluation studies in which a measure such as quality-adjusted survival time or lifetime medical cost is of interest. We investigate the two-sample problem and propose two classes of nonparametric tests. Based on consistent estimation of the survival function for each sample, the two classes of test statistics examine the cumulative weighted difference in hazard functions and in survival functions. We derive a unified asymptotic null distribution theory and inference procedure. The tests are applied to trial V of the International Breast Cancer Study Group and show that long duration chemotherapy significantly improves time without symptoms of disease and toxicity of treatment as compared with the short duration treatment. Simulation studies demonstrate that the proposed tests, with a wide range of weight choices, perform well under moderate sample sizes.

  3. The penumbra of learning: a statistical theory of synaptic tagging and capture.

    PubMed

    Gershman, Samuel J

    2014-01-01

    Learning in humans and animals is accompanied by a penumbra: Learning one task benefits from learning an unrelated task shortly before or after. At the cellular level, the penumbra of learning appears when weak potentiation of one synapse is amplified by strong potentiation of another synapse on the same neuron during a critical time window. Weak potentiation sets a molecular tag that enables the synapse to capture plasticity-related proteins synthesized in response to strong potentiation at another synapse. This paper describes a computational model which formalizes synaptic tagging and capture in terms of statistical learning mechanisms. According to this model, synaptic strength encodes a probabilistic inference about the dynamically changing association between pre- and post-synaptic firing rates. The rate of change is itself inferred, coupling together different synapses on the same neuron. When the inputs to one synapse change rapidly, the inferred rate of change increases, amplifying learning at other synapses.

  4. Space-Time Data fusion for Remote Sensing Applications

    NASA Technical Reports Server (NTRS)

    Braverman, Amy; Nguyen, H.; Cressie, N.

    2011-01-01

    NASA has been collecting massive amounts of remote sensing data about Earth's systems for more than a decade. Missions are selected to be complementary in quantities measured, retrieval techniques, and sampling characteristics, so these datasets are highly synergistic. To fully exploit this, a rigorous methodology for combining data with heterogeneous sampling characteristics is required. For scientific purposes, the methodology must also provide quantitative measures of uncertainty that propagate input-data uncertainty appropriately. We view this as a statistical inference problem. The true but notdirectly- observed quantities form a vector-valued field continuous in space and time. Our goal is to infer those true values or some function of them, and provide to uncertainty quantification for those inferences. We use a spatiotemporal statistical model that relates the unobserved quantities of interest at point-level to the spatially aggregated, observed data. We describe and illustrate our method using CO2 data from two NASA data sets.

  5. Statistical numeracy as a moderator of (pseudo)contingency effects on decision behavior.

    PubMed

    Fleig, Hanna; Meiser, Thorsten; Ettlin, Florence; Rummel, Jan

    2017-03-01

    Pseudocontingencies denote contingency estimates inferred from base rates rather than from cell frequencies. We examined the role of statistical numeracy for effects of such fallible but adaptive inferences on choice behavior. In Experiment 1, we provided information on single observations as well as on base rates and tracked participants' eye movements. In Experiment 2, we manipulated the availability of information on cell frequencies and base rates between conditions. Our results demonstrate that a focus on base rates rather than cell frequencies benefits pseudocontingency effects. Learners who are more proficient in (conditional) probability calculation prefer to rely on cell frequencies in order to judge contingencies, though, as was evident from their gaze behavior. If cell frequencies are available in summarized format, they may infer the true contingency between options and outcomes. Otherwise, however, even highly numerate learners are susceptible to pseudocontingency effects. Copyright © 2017 Elsevier B.V. All rights reserved.

  6. Penalized regression procedures for variable selection in the potential outcomes framework

    PubMed Central

    Ghosh, Debashis; Zhu, Yeying; Coffman, Donna L.

    2015-01-01

    A recent topic of much interest in causal inference is model selection. In this article, we describe a framework in which to consider penalized regression approaches to variable selection for causal effects. The framework leads to a simple ‘impute, then select’ class of procedures that is agnostic to the type of imputation algorithm as well as penalized regression used. It also clarifies how model selection involves a multivariate regression model for causal inference problems, and that these methods can be applied for identifying subgroups in which treatment effects are homogeneous. Analogies and links with the literature on machine learning methods, missing data and imputation are drawn. A difference LASSO algorithm is defined, along with its multiple imputation analogues. The procedures are illustrated using a well-known right heart catheterization dataset. PMID:25628185

  7. Fast and Accurate Multivariate Gaussian Modeling of Protein Families: Predicting Residue Contacts and Protein-Interaction Partners

    PubMed Central

    Feinauer, Christoph; Procaccini, Andrea; Zecchina, Riccardo; Weigt, Martin; Pagnani, Andrea

    2014-01-01

    In the course of evolution, proteins show a remarkable conservation of their three-dimensional structure and their biological function, leading to strong evolutionary constraints on the sequence variability between homologous proteins. Our method aims at extracting such constraints from rapidly accumulating sequence data, and thereby at inferring protein structure and function from sequence information alone. Recently, global statistical inference methods (e.g. direct-coupling analysis, sparse inverse covariance estimation) have achieved a breakthrough towards this aim, and their predictions have been successfully implemented into tertiary and quaternary protein structure prediction methods. However, due to the discrete nature of the underlying variable (amino-acids), exact inference requires exponential time in the protein length, and efficient approximations are needed for practical applicability. Here we propose a very efficient multivariate Gaussian modeling approach as a variant of direct-coupling analysis: the discrete amino-acid variables are replaced by continuous Gaussian random variables. The resulting statistical inference problem is efficiently and exactly solvable. We show that the quality of inference is comparable or superior to the one achieved by mean-field approximations to inference with discrete variables, as done by direct-coupling analysis. This is true for (i) the prediction of residue-residue contacts in proteins, and (ii) the identification of protein-protein interaction partner in bacterial signal transduction. An implementation of our multivariate Gaussian approach is available at the website http://areeweb.polito.it/ricerca/cmp/code. PMID:24663061

  8. The space of ultrametric phylogenetic trees.

    PubMed

    Gavryushkin, Alex; Drummond, Alexei J

    2016-08-21

    The reliability of a phylogenetic inference method from genomic sequence data is ensured by its statistical consistency. Bayesian inference methods produce a sample of phylogenetic trees from the posterior distribution given sequence data. Hence the question of statistical consistency of such methods is equivalent to the consistency of the summary of the sample. More generally, statistical consistency is ensured by the tree space used to analyse the sample. In this paper, we consider two standard parameterisations of phylogenetic time-trees used in evolutionary models: inter-coalescent interval lengths and absolute times of divergence events. For each of these parameterisations we introduce a natural metric space on ultrametric phylogenetic trees. We compare the introduced spaces with existing models of tree space and formulate several formal requirements that a metric space on phylogenetic trees must possess in order to be a satisfactory space for statistical analysis, and justify them. We show that only a few known constructions of the space of phylogenetic trees satisfy these requirements. However, our results suggest that these basic requirements are not enough to distinguish between the two metric spaces we introduce and that the choice between metric spaces requires additional properties to be considered. Particularly, that the summary tree minimising the square distance to the trees from the sample might be different for different parameterisations. This suggests that further fundamental insight is needed into the problem of statistical consistency of phylogenetic inference methods. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  9. Assessing risk factors for dental caries: a statistical modeling approach.

    PubMed

    Trottini, Mario; Bossù, Maurizio; Corridore, Denise; Ierardo, Gaetano; Luzzi, Valeria; Saccucci, Matteo; Polimeni, Antonella

    2015-01-01

    The problem of identifying potential determinants and predictors of dental caries is of key importance in caries research and it has received considerable attention in the scientific literature. From the methodological side, a broad range of statistical models is currently available to analyze dental caries indices (DMFT, dmfs, etc.). These models have been applied in several studies to investigate the impact of different risk factors on the cumulative severity of dental caries experience. However, in most of the cases (i) these studies focus on a very specific subset of risk factors; and (ii) in the statistical modeling only few candidate models are considered and model selection is at best only marginally addressed. As a result, our understanding of the robustness of the statistical inferences with respect to the choice of the model is very limited; the richness of the set of statistical models available for analysis in only marginally exploited; and inferences could be biased due the omission of potentially important confounding variables in the model's specification. In this paper we argue that these limitations can be overcome considering a general class of candidate models and carefully exploring the model space using standard model selection criteria and measures of global fit and predictive performance of the candidate models. Strengths and limitations of the proposed approach are illustrated with a real data set. In our illustration the model space contains more than 2.6 million models, which require inferences to be adjusted for 'optimism'.

  10. Model weights and the foundations of multimodel inference

    USGS Publications Warehouse

    Link, W.A.; Barker, R.J.

    2006-01-01

    Statistical thinking in wildlife biology and ecology has been profoundly influenced by the introduction of AIC (Akaike?s information criterion) as a tool for model selection and as a basis for model averaging. In this paper, we advocate the Bayesian paradigm as a broader framework for multimodel inference, one in which model averaging and model selection are naturally linked, and in which the performance of AIC-based tools is naturally evaluated. Prior model weights implicitly associated with the use of AIC are seen to highly favor complex models: in some cases, all but the most highly parameterized models in the model set are virtually ignored a priori. We suggest the usefulness of the weighted BIC (Bayesian information criterion) as a computationally simple alternative to AIC, based on explicit selection of prior model probabilities rather than acceptance of default priors associated with AIC. We note, however, that both procedures are only approximate to the use of exact Bayes factors. We discuss and illustrate technical difficulties associated with Bayes factors, and suggest approaches to avoiding these difficulties in the context of model selection for a logistic regression. Our example highlights the predisposition of AIC weighting to favor complex models and suggests a need for caution in using the BIC for computing approximate posterior model weights.

  11. A pitfall of piecewise-polytropic equation of state inference

    NASA Astrophysics Data System (ADS)

    Raaijmakers, Geert; Riley, Thomas E.; Watts, Anna L.

    2018-05-01

    The only messenger radiation in the Universe which one can use to statistically probe the Equation of State (EOS) of cold dense matter is that originating from the near-field vicinities of compact stars. Constraining gravitational masses and equatorial radii of rotating compact stars is a major goal for current and future telescope missions, with a primary purpose of constraining the EOS. From a Bayesian perspective it is necessary to carefully discuss prior definition; in this context a complicating issue is that in practice there exist pathologies in the general relativistic mapping between spaces of local (interior source matter) and global (exterior spacetime) parameters. In a companion paper, these issues were raised on a theoretical basis. In this study we reproduce a probability transformation procedure from the literature in order to map a joint posterior distribution of Schwarzschild gravitational masses and radii into a joint posterior distribution of EOS parameters. We demonstrate computationally that EOS parameter inferences are sensitive to the choice to define a prior on a joint space of these masses and radii, instead of on a joint space interior source matter parameters. We focus on the piecewise-polytropic EOS model, which is currently standard in the field of astrophysical dense matter study. We discuss the implications of this issue for the field.

  12. A Method for Using Player Tracking Data in Basketball to Learn Player Skills and Predict Team Performance.

    PubMed

    Skinner, Brian; Guy, Stephen J

    2015-01-01

    Player tracking data represents a revolutionary new data source for basketball analysis, in which essentially every aspect of a player's performance is tracked and can be analyzed numerically. We suggest a way by which this data set, when coupled with a network-style model of the offense that relates players' skills to the team's success at running different plays, can be used to automatically learn players' skills and predict the performance of untested 5-man lineups in a way that accounts for the interaction between players' respective skill sets. After developing a general analysis procedure, we present as an example a specific implementation of our method using a simplified network model. While player tracking data is not yet available in the public domain, we evaluate our model using simulated data and show that player skills can be accurately inferred by a simple statistical inference scheme. Finally, we use the model to analyze games from the 2011 playoff series between the Memphis Grizzlies and the Oklahoma City Thunder and we show that, even with a very limited data set, the model can consistently describe a player's interactions with a given lineup based only on his performance with a different lineup.

  13. A Method for Using Player Tracking Data in Basketball to Learn Player Skills and Predict Team Performance

    PubMed Central

    Skinner, Brian; Guy, Stephen J.

    2015-01-01

    Player tracking data represents a revolutionary new data source for basketball analysis, in which essentially every aspect of a player’s performance is tracked and can be analyzed numerically. We suggest a way by which this data set, when coupled with a network-style model of the offense that relates players’ skills to the team’s success at running different plays, can be used to automatically learn players’ skills and predict the performance of untested 5-man lineups in a way that accounts for the interaction between players’ respective skill sets. After developing a general analysis procedure, we present as an example a specific implementation of our method using a simplified network model. While player tracking data is not yet available in the public domain, we evaluate our model using simulated data and show that player skills can be accurately inferred by a simple statistical inference scheme. Finally, we use the model to analyze games from the 2011 playoff series between the Memphis Grizzlies and the Oklahoma City Thunder and we show that, even with a very limited data set, the model can consistently describe a player’s interactions with a given lineup based only on his performance with a different lineup. PMID:26351846

  14. Regional surnames and genetic structure in Great Britain.

    PubMed

    Kandt, Jens; Cheshire, James A; Longley, Paul A

    2016-10-01

    Following the increasing availability of DNA-sequenced data, the genetic structure of populations can now be inferred and studied in unprecedented detail. Across social science, this innovation is shaping new bio-social research agendas, attracting substantial investment in the collection of genetic, biological and social data for large population samples. Yet genetic samples are special because the precise populations that they represent are uncertain and ill-defined. Unlike most social surveys, a genetic sample's representativeness of the population cannot be established by conventional procedures of statistical inference, and the implications for population-wide generalisations about bio-social phenomena are little understood. In this paper, we seek to address these problems by linking surname data to a censored and geographically uneven sample of DNA scans, collected for the People of the British Isles study. Based on a combination of global and local spatial correspondence measures, we identify eight regions in Great Britain that are most likely to represent the geography of genetic structure of Great Britain's long-settled population. We discuss the implications of this regionalisation for bio-social investigations. We conclude that, as the often highly selective collection of DNA and biomarkers becomes a more common practice, geography is crucial to understanding variation in genetic information within diverse populations.

  15. Fast and reliable prediction of domain-peptide binding affinity using coarse-grained structure models.

    PubMed

    Tian, Feifei; Tan, Rui; Guo, Tailin; Zhou, Peng; Yang, Li

    2013-07-01

    Domain-peptide recognition and interaction are fundamentally important for eukaryotic signaling and regulatory networks. It is thus essential to quantitatively infer the binding stability and specificity of such interaction based upon large-scale but low-accurate complex structure models which could be readily obtained from sophisticated molecular modeling procedure. In the present study, a new method is described for the fast and reliable prediction of domain-peptide binding affinity with coarse-grained structure models. This method is designed to tolerate strong random noises involved in domain-peptide complex structures and uses statistical modeling approach to eliminate systematic bias associated with a group of investigated samples. As a paradigm, this method was employed to model and predict the binding behavior of various peptides to four evolutionarily unrelated peptide-recognition domains (PRDs), i.e. human amph SH3, human nherf PDZ, yeast syh GYF and yeast bmh 14-3-3, and moreover, we explored the molecular mechanism and biological implication underlying the binding of cognate and noncognate peptide ligands to their domain receptors. It is expected that the newly proposed method could be further used to perform genome-wide inference of domain-peptide binding at three-dimensional structure level. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  16. ESTuber db: an online database for Tuber borchii EST sequences.

    PubMed

    Lazzari, Barbara; Caprera, Andrea; Cosentino, Cristian; Stella, Alessandra; Milanesi, Luciano; Viotti, Angelo

    2007-03-08

    The ESTuber database (http://www.itb.cnr.it/estuber) includes 3,271 Tuber borchii expressed sequence tags (EST). The dataset consists of 2,389 sequences from an in-house prepared cDNA library from truffle vegetative hyphae, and 882 sequences downloaded from GenBank and representing four libraries from white truffle mycelia and ascocarps at different developmental stages. An automated pipeline was prepared to process EST sequences using public software integrated by in-house developed Perl scripts. Data were collected in a MySQL database, which can be queried via a php-based web interface. Sequences included in the ESTuber db were clustered and annotated against three databases: the GenBank nr database, the UniProtKB database and a third in-house prepared database of fungi genomic sequences. An algorithm was implemented to infer statistical classification among Gene Ontology categories from the ontology occurrences deduced from the annotation procedure against the UniProtKB database. Ontologies were also deduced from the annotation of more than 130,000 EST sequences from five filamentous fungi, for intra-species comparison purposes. Further analyses were performed on the ESTuber db dataset, including tandem repeats search and comparison of the putative protein dataset inferred from the EST sequences to the PROSITE database for protein patterns identification. All the analyses were performed both on the complete sequence dataset and on the contig consensus sequences generated by the EST assembly procedure. The resulting web site is a resource of data and links related to truffle expressed genes. The Sequence Report and Contig Report pages are the web interface core structures which, together with the Text search utility and the Blast utility, allow easy access to the data stored in the database.

  17. A Bayesian procedure for evaluating the frequency of calibration factor updates in highway safety manual (HSM) applications.

    PubMed

    Saha, Dibakar; Alluri, Priyanka; Gan, Albert

    2017-01-01

    The Highway Safety Manual (HSM) presents statistical models to quantitatively estimate an agency's safety performance. The models were developed using data from only a few U.S. states. To account for the effects of the local attributes and temporal factors on crash occurrence, agencies are required to calibrate the HSM-default models for crash predictions. The manual suggests updating calibration factors every two to three years, or preferably on an annual basis. Given that the calibration process involves substantial time, effort, and resources, a comprehensive analysis of the required calibration factor update frequency is valuable to the agencies. Accordingly, the objective of this study is to evaluate the HSM's recommendation and determine the required frequency of calibration factor updates. A robust Bayesian estimation procedure is used to assess the variation between calibration factors computed annually, biennially, and triennially using data collected from over 2400 miles of segments and over 700 intersections on urban and suburban facilities in Florida. Bayesian model yields a posterior distribution of the model parameters that give credible information to infer whether the difference between calibration factors computed at specified intervals is credibly different from the null value which represents unaltered calibration factors between the comparison years or in other words, zero difference. The concept of the null value is extended to include the range of values that are practically equivalent to zero. Bayesian inference shows that calibration factors based on total crash frequency are required to be updated every two years in cases where the variations between calibration factors are not greater than 0.01. When the variations are between 0.01 and 0.05, calibration factors based on total crash frequency could be updated every three years. Copyright © 2016 Elsevier Ltd. All rights reserved.

  18. The limited use of the fluency heuristic: Converging evidence across different procedures.

    PubMed

    Pohl, Rüdiger F; Erdfelder, Edgar; Michalkiewicz, Martha; Castela, Marta; Hilbig, Benjamin E

    2016-10-01

    In paired comparisons based on which of two objects has the larger criterion value, decision makers could use the subjectively experienced difference in retrieval fluency of the objects as a cue. According to the fluency heuristic (FH) theory, decision makers use fluency-as indexed by recognition speed-as the only cue for pairs of recognized objects, and infer that the object retrieved more speedily has the larger criterion value (ignoring all other cues and information). Model-based analyses, however, have previously revealed that only a small portion of such inferences are indeed based on fluency alone. In the majority of cases, other information enters the decision process. However, due to the specific experimental procedures, the estimates of FH use are potentially biased: Some procedures may have led to an overestimated and others to an underestimated, or even to actually reduced, FH use. In the present article, we discuss and test the impacts of such procedural variations by reanalyzing 21 data sets. The results show noteworthy consistency across the procedural variations revealing low FH use. We discuss potential explanations and implications of this finding.

  19. Inferring Characteristics of Sensorimotor Behavior by Quantifying Dynamics of Animal Locomotion

    NASA Astrophysics Data System (ADS)

    Leung, KaWai

    Locomotion is one of the most well-studied topics in animal behavioral studies. Many fundamental and clinical research make use of the locomotion of an animal model to explore various aspects in sensorimotor behavior. In the past, most of these studies focused on population average of a specific trait due to limitation of data collection and processing power. With recent advance in computer vision and statistical modeling techniques, it is now possible to track and analyze large amounts of behavioral data. In this thesis, I present two projects that aim to infer the characteristics of sensorimotor behavior by quantifying the dynamics of locomotion of nematode Caenorhabditis elegans and fruit fly Drosophila melanogaster, shedding light on statistical dependence between sensing and behavior. In the first project, I investigate the possibility of inferring noxious sensory information from the behavior of Caenorhabditis elegans. I develop a statistical model to infer the heat stimulus level perceived by individual animals from their stereotyped escape responses after stimulation by an IR laser. The model allows quantification of analgesic-like effects of chemical agents or genetic mutations in the worm. At the same time, the method is able to differentiate perturbations of locomotion behavior that are beyond affecting the sensory system. With this model I propose experimental designs that allows statistically significant identification of analgesic-like effects. In the second project, I investigate the relationship of energy budget and stability of locomotion in determining the walking speed distribution of Drosophila melanogaster during aging. The locomotion stability at different age groups is estimated from video recordings using Floquet theory. I calculate the power consumption of different locomotion speed using a biomechanics model. In conclusion, the power consumption, not stability, predicts the locomotion speed distribution at different ages.

  20. Towards a Phylogenetic Approach to the Composition of Species Complexes in the North and Central American Triatoma, Vectors of Chagas Disease

    PubMed Central

    de la Rúa, Nicholas M.; Bustamante, Dulce M.; Menes, Marianela; Stevens, Lori; Monroy, Carlota; Kilpatrick, William; Rizzo, Donna; Klotz, Stephen A.; Schmidt, Justin; Axen, Heather J.; Dorn, Patricia L.

    2014-01-01

    Phylogenetic relationships of insect vectors of parasitic diseases are important for understanding the evolution of epidemiologically relevant traits, and may be useful in vector control. The subfamily Triatominae (Hemiptera:Reduviidae) includes ~140 extant species arranged in five tribes comprised of 15 genera. The genus Triatoma is the most species-rich and contains important vectors of Trypanosoma cruzi, the causative agent of Chagas disease. Triatoma species were grouped into complexes originally by morphology and more recently with the addition of information from molecular phylogenetics (the four-complex hypothesis); however, without a strict adherence to monophyly. To date, the validity of proposed species complexes has not been tested by statistical tests of topology. The goal of this study was to clarify the systematics of 19 Triatoma species from North and Central America. We inferred their evolutionary relatedness using two independent data sets: the complete nuclear Internal Transcribed Spacer-2 ribosomal DNA (ITS-2 rDNA) and head morphometrics. In addition, we used the Shimodaira-Hasegawa statistical test of topology to assess the fit of the data to a set of competing systematic hypotheses (topologies). An unconstrained topology inferred from the ITS-2 data was compared to topologies constrained based on the four-complex hypothesis or one inferred from our morphometry results. The unconstrained topology represents a statistically significant better fit of the molecular data than either the four-complex or the morphometric topology. We propose an update to the composition of species complexes in the North and Central American Triatoma, based on a phylogeny inferred from ITS-2 as a first step towards updating the phylogeny of the complexes based on monophyly and statistical tests of topologies. PMID:24681261

  1. BEAGLE: an application programming interface and high-performance computing library for statistical phylogenetics.

    PubMed

    Ayres, Daniel L; Darling, Aaron; Zwickl, Derrick J; Beerli, Peter; Holder, Mark T; Lewis, Paul O; Huelsenbeck, John P; Ronquist, Fredrik; Swofford, David L; Cummings, Michael P; Rambaut, Andrew; Suchard, Marc A

    2012-01-01

    Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years, there has been a concentration of interest in statistical approaches such as Bayesian inference and maximum likelihood estimation. Yet, for large data sets and realistic or interesting models of evolution, these approaches remain computationally demanding. High-throughput sequencing can yield data for thousands of taxa, but scaling to such problems using serial computing often necessitates the use of nonstatistical or approximate approaches. The recent emergence of graphics processing units (GPUs) provides an opportunity to leverage their excellent floating-point computational performance to accelerate statistical phylogenetic inference. A specialized library for phylogenetic calculation would allow existing software packages to make more effective use of available computer hardware, including GPUs. Adoption of a common library would also make it easier for other emerging computing architectures, such as field programmable gate arrays, to be used in the future. We present BEAGLE, an application programming interface (API) and library for high-performance statistical phylogenetic inference. The API provides a uniform interface for performing phylogenetic likelihood calculations on a variety of compute hardware platforms. The library includes a set of efficient implementations and can currently exploit hardware including GPUs using NVIDIA CUDA, central processing units (CPUs) with Streaming SIMD Extensions and related processor supplementary instruction sets, and multicore CPUs via OpenMP. To demonstrate the advantages of a common API, we have incorporated the library into several popular phylogenetic software packages. The BEAGLE library is free open source software licensed under the Lesser GPL and available from http://beagle-lib.googlecode.com. An example client program is available as public domain software.

  2. BEAGLE: An Application Programming Interface and High-Performance Computing Library for Statistical Phylogenetics

    PubMed Central

    Ayres, Daniel L.; Darling, Aaron; Zwickl, Derrick J.; Beerli, Peter; Holder, Mark T.; Lewis, Paul O.; Huelsenbeck, John P.; Ronquist, Fredrik; Swofford, David L.; Cummings, Michael P.; Rambaut, Andrew; Suchard, Marc A.

    2012-01-01

    Abstract Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years, there has been a concentration of interest in statistical approaches such as Bayesian inference and maximum likelihood estimation. Yet, for large data sets and realistic or interesting models of evolution, these approaches remain computationally demanding. High-throughput sequencing can yield data for thousands of taxa, but scaling to such problems using serial computing often necessitates the use of nonstatistical or approximate approaches. The recent emergence of graphics processing units (GPUs) provides an opportunity to leverage their excellent floating-point computational performance to accelerate statistical phylogenetic inference. A specialized library for phylogenetic calculation would allow existing software packages to make more effective use of available computer hardware, including GPUs. Adoption of a common library would also make it easier for other emerging computing architectures, such as field programmable gate arrays, to be used in the future. We present BEAGLE, an application programming interface (API) and library for high-performance statistical phylogenetic inference. The API provides a uniform interface for performing phylogenetic likelihood calculations on a variety of compute hardware platforms. The library includes a set of efficient implementations and can currently exploit hardware including GPUs using NVIDIA CUDA, central processing units (CPUs) with Streaming SIMD Extensions and related processor supplementary instruction sets, and multicore CPUs via OpenMP. To demonstrate the advantages of a common API, we have incorporated the library into several popular phylogenetic software packages. The BEAGLE library is free open source software licensed under the Lesser GPL and available from http://beagle-lib.googlecode.com. An example client program is available as public domain software. PMID:21963610

  3. Inferring Single Neuron Properties in Conductance Based Balanced Networks

    PubMed Central

    Pool, Román Rossi; Mato, Germán

    2011-01-01

    Balanced states in large networks are a usual hypothesis for explaining the variability of neural activity in cortical systems. In this regime the statistics of the inputs is characterized by static and dynamic fluctuations. The dynamic fluctuations have a Gaussian distribution. Such statistics allows to use reverse correlation methods, by recording synaptic inputs and the spike trains of ongoing spontaneous activity without any additional input. By using this method, properties of the single neuron dynamics that are masked by the balanced state can be quantified. To show the feasibility of this approach we apply it to large networks of conductance based neurons. The networks are classified as Type I or Type II according to the bifurcations which neurons of the different populations undergo near the firing onset. We also analyze mixed networks, in which each population has a mixture of different neuronal types. We determine under which conditions the intrinsic noise generated by the network can be used to apply reverse correlation methods. We find that under realistic conditions we can ascertain with low error the types of neurons present in the network. We also find that data from neurons with similar firing rates can be combined to perform covariance analysis. We compare the results of these methods (that do not requite any external input) to the standard procedure (that requires the injection of Gaussian noise into a single neuron). We find a good agreement between the two procedures. PMID:22016730

  4. The (Ir)Responsibility of (Under)Estimating Missing Data.

    PubMed

    Fernández-García, María P; Vallejo-Seco, Guillermo; Livácic-Rojas, Pablo; Tuero-Herrero, Ellian

    2018-01-01

    It is practically impossible to avoid losing data in the course of an investigation, and it has been proven that the consequences can reach such magnitude that they could even invalidate the results of the study. This paper describes some of the most likely causes of missing data in research in the field of clinical psychology and the consequences they may have on statistical and substantive inferences. When it is necessary to recover the missing information, analyzing the data can become extremely complex. We summarize the experts' recommendations regarding the most powerful procedures for performing this task, the advantages each one has over the others, the elements that can or should influence our choice, and the procedures that are not a recommended option except in very exceptional cases. We conclude by offering four pieces of advice, on which all the experts agree and to which we must attend at all times in order to proceed with the greatest possible success. Finally, we show the pernicious effects produced by missing data on the statistical result and on the substantive or clinical conclusions. For this purpose we have planned to lose data in different percentage rates under two mechanisms of loss of data, MCAR and MAR in the complete data set of two very different real researchs, and we proceed to analyze the set of the available data, listwise deletion. One study is carried out using a quasi-experimental non-equivalent control group design, and another study using a experimental design completely randomized.

  5. The Brera Multiscale Wavelet ROSAT HRI Source Catalog. I. The Algorithm

    NASA Astrophysics Data System (ADS)

    Lazzati, Davide; Campana, Sergio; Rosati, Piero; Panzera, Maria Rosa; Tagliaferri, Gianpiero

    1999-10-01

    We present a new detection algorithm based on the wavelet transform for the analysis of high-energy astronomical images. The wavelet transform, because of its multiscale structure, is suited to the optimal detection of pointlike as well as extended sources, regardless of any loss of resolution with the off-axis angle. Sources are detected as significant enhancements in the wavelet space, after the subtraction of the nonflat components of the background. Detection thresholds are computed through Monte Carlo simulations in order to establish the expected number of spurious sources per field. The source characterization is performed through a multisource fitting in the wavelet space. The procedure is designed to correctly deal with very crowded fields, allowing for the simultaneous characterization of nearby sources. To obtain a fast and reliable estimate of the source parameters and related errors, we apply a novel decimation technique that, taking into account the correlation properties of the wavelet transform, extracts a subset of almost independent coefficients. We test the performance of this algorithm on synthetic fields, analyzing with particular care the characterization of sources in poor background situations, where the assumption of Gaussian statistics does not hold. In these cases, for which standard wavelet algorithms generally provide underestimated errors, we infer errors through a procedure that relies on robust basic statistics. Our algorithm is well suited to the analysis of images taken with the new generation of X-ray instruments equipped with CCD technology, which will produce images with very low background and/or high source density.

  6. Beyond P Values and Hypothesis Testing: Using the Minimum Bayes Factor to Teach Statistical Inference in Undergraduate Introductory Statistics Courses

    ERIC Educational Resources Information Center

    Page, Robert; Satake, Eiki

    2017-01-01

    While interest in Bayesian statistics has been growing in statistics education, the treatment of the topic is still inadequate in both textbooks and the classroom. Because so many fields of study lead to careers that involve a decision-making process requiring an understanding of Bayesian methods, it is becoming increasingly clear that Bayesian…

  7. Human Inferences about Sequences: A Minimal Transition Probability Model

    PubMed Central

    2016-01-01

    The brain constantly infers the causes of the inputs it receives and uses these inferences to generate statistical expectations about future observations. Experimental evidence for these expectations and their violations include explicit reports, sequential effects on reaction times, and mismatch or surprise signals recorded in electrophysiology and functional MRI. Here, we explore the hypothesis that the brain acts as a near-optimal inference device that constantly attempts to infer the time-varying matrix of transition probabilities between the stimuli it receives, even when those stimuli are in fact fully unpredictable. This parsimonious Bayesian model, with a single free parameter, accounts for a broad range of findings on surprise signals, sequential effects and the perception of randomness. Notably, it explains the pervasive asymmetry between repetitions and alternations encountered in those studies. Our analysis suggests that a neural machinery for inferring transition probabilities lies at the core of human sequence knowledge. PMID:28030543

  8. 5 CFR 1201.43 - Sanctions.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 5 Administrative Personnel 3 2011-01-01 2011-01-01 false Sanctions. 1201.43 Section 1201.43 Administrative Personnel MERIT SYSTEMS PROTECTION BOARD ORGANIZATION AND PROCEDURES PRACTICES AND PROCEDURES... fails to comply with an order, the judge may: (1) Draw an inference in favor of the requesting party...

  9. 5 CFR 1201.43 - Sanctions.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 5 Administrative Personnel 3 2010-01-01 2010-01-01 false Sanctions. 1201.43 Section 1201.43 Administrative Personnel MERIT SYSTEMS PROTECTION BOARD ORGANIZATION AND PROCEDURES PRACTICES AND PROCEDURES... fails to comply with an order, the judge may: (1) Draw an inference in favor of the requesting party...

  10. Exploring High School Students Beginning Reasoning about Significance Tests with Technology

    ERIC Educational Resources Information Center

    García, Víctor N.; Sánchez, Ernesto

    2017-01-01

    In the present study we analyze how students reason about or make inferences given a particular hypothesis testing problem (without having studied formal methods of statistical inference) when using Fathom. They use Fathom to create an empirical sampling distribution through computer simulation. It is found that most student´s reasoning rely on…

  11. Stan: A Probabilistic Programming Language for Bayesian Inference and Optimization

    ERIC Educational Resources Information Center

    Gelman, Andrew; Lee, Daniel; Guo, Jiqiang

    2015-01-01

    Stan is a free and open-source C++ program that performs Bayesian inference or optimization for arbitrary user-specified models and can be called from the command line, R, Python, Matlab, or Julia and has great promise for fitting large and complex statistical models in many areas of application. We discuss Stan from users' and developers'…

  12. IMNN: Information Maximizing Neural Networks

    NASA Astrophysics Data System (ADS)

    Charnock, Tom; Lavaux, Guilhem; Wandelt, Benjamin D.

    2018-04-01

    This software trains artificial neural networks to find non-linear functionals of data that maximize Fisher information: information maximizing neural networks (IMNNs). As compressing large data sets vastly simplifies both frequentist and Bayesian inference, important information may be inadvertently missed. Likelihood-free inference based on automatically derived IMNN summaries produces summaries that are good approximations to sufficient statistics. IMNNs are robustly capable of automatically finding optimal, non-linear summaries of the data even in cases where linear compression fails: inferring the variance of Gaussian signal in the presence of noise, inferring cosmological parameters from mock simulations of the Lyman-α forest in quasar spectra, and inferring frequency-domain parameters from LISA-like detections of gravitational waveforms. In this final case, the IMNN summary outperforms linear data compression by avoiding the introduction of spurious likelihood maxima.

  13. Anchoring quartet-based phylogenetic distances and applications to species tree reconstruction.

    PubMed

    Sayyari, Erfan; Mirarab, Siavash

    2016-11-11

    Inferring species trees from gene trees using the coalescent-based summary methods has been the subject of much attention, yet new scalable and accurate methods are needed. We introduce DISTIQUE, a new statistically consistent summary method for inferring species trees from gene trees under the coalescent model. We generalize our results to arbitrary phylogenetic inference problems; we show that two arbitrarily chosen leaves, called anchors, can be used to estimate relative distances between all other pairs of leaves by inferring relevant quartet trees. This results in a family of distance-based tree inference methods, with running times ranging between quadratic to quartic in the number of leaves. We show in simulated studies that DISTIQUE has comparable accuracy to leading coalescent-based summary methods and reduced running times.

  14. Hydrological modelling of the Chaohe Basin in China: Statistical model formulation and Bayesian inference

    NASA Astrophysics Data System (ADS)

    Yang, Jing; Reichert, Peter; Abbaspour, Karim C.; Yang, Hong

    2007-07-01

    SummaryCalibration of hydrologic models is very difficult because of measurement errors in input and response, errors in model structure, and the large number of non-identifiable parameters of distributed models. The difficulties even increase in arid regions with high seasonal variation of precipitation, where the modelled residuals often exhibit high heteroscedasticity and autocorrelation. On the other hand, support of water management by hydrologic models is important in arid regions, particularly if there is increasing water demand due to urbanization. The use and assessment of model results for this purpose require a careful calibration and uncertainty analysis. Extending earlier work in this field, we developed a procedure to overcome (i) the problem of non-identifiability of distributed parameters by introducing aggregate parameters and using Bayesian inference, (ii) the problem of heteroscedasticity of errors by combining a Box-Cox transformation of results and data with seasonally dependent error variances, (iii) the problems of autocorrelated errors, missing data and outlier omission with a continuous-time autoregressive error model, and (iv) the problem of the seasonal variation of error correlations with seasonally dependent characteristic correlation times. The technique was tested with the calibration of the hydrologic sub-model of the Soil and Water Assessment Tool (SWAT) in the Chaohe Basin in North China. The results demonstrated the good performance of this approach to uncertainty analysis, particularly with respect to the fulfilment of statistical assumptions of the error model. A comparison with an independent error model and with error models that only considered a subset of the suggested techniques clearly showed the superiority of the approach based on all the features (i)-(iv) mentioned above.

  15. Data Acquisition and Preprocessing in Studies on Humans: What Is Not Taught in Statistics Classes?

    PubMed

    Zhu, Yeyi; Hernandez, Ladia M; Mueller, Peter; Dong, Yongquan; Forman, Michele R

    2013-01-01

    The aim of this paper is to address issues in research that may be missing from statistics classes and important for (bio-)statistics students. In the context of a case study, we discuss data acquisition and preprocessing steps that fill the gap between research questions posed by subject matter scientists and statistical methodology for formal inference. Issues include participant recruitment, data collection training and standardization, variable coding, data review and verification, data cleaning and editing, and documentation. Despite the critical importance of these details in research, most of these issues are rarely discussed in an applied statistics program. One reason for the lack of more formal training is the difficulty in addressing the many challenges that can possibly arise in the course of a study in a systematic way. This article can help to bridge this gap between research questions and formal statistical inference by using an illustrative case study for a discussion. We hope that reading and discussing this paper and practicing data preprocessing exercises will sensitize statistics students to these important issues and achieve optimal conduct, quality control, analysis, and interpretation of a study.

  16. An Artificial Intelligence Approach to Analyzing Student Errors in Statistics.

    ERIC Educational Resources Information Center

    Sebrechts, Marc M.; Schooler, Lael J.

    1987-01-01

    Describes the development of an artificial intelligence system called GIDE that analyzes student errors in statistics problems by inferring the students' intentions. Learning strategies involved in problem solving are discussed and the inclusion of goal structures is explained. (LRW)

  17. Network inference using informative priors.

    PubMed

    Mukherjee, Sach; Speed, Terence P

    2008-09-23

    Recent years have seen much interest in the study of systems characterized by multiple interacting components. A class of statistical models called graphical models, in which graphs are used to represent probabilistic relationships between variables, provides a framework for formal inference regarding such systems. In many settings, the object of inference is the network structure itself. This problem of "network inference" is well known to be a challenging one. However, in scientific settings there is very often existing information regarding network connectivity. A natural idea then is to take account of such information during inference. This article addresses the question of incorporating prior information into network inference. We focus on directed models called Bayesian networks, and use Markov chain Monte Carlo to draw samples from posterior distributions over network structures. We introduce prior distributions on graphs capable of capturing information regarding network features including edges, classes of edges, degree distributions, and sparsity. We illustrate our approach in the context of systems biology, applying our methods to network inference in cancer signaling.

  18. Regression Analysis of Combined Gene Expression Regulation in Acute Myeloid Leukemia

    PubMed Central

    Li, Yue; Liang, Minggao; Zhang, Zhaolei

    2014-01-01

    Gene expression is a combinatorial function of genetic/epigenetic factors such as copy number variation (CNV), DNA methylation (DM), transcription factors (TF) occupancy, and microRNA (miRNA) post-transcriptional regulation. At the maturity of microarray/sequencing technologies, large amounts of data measuring the genome-wide signals of those factors became available from Encyclopedia of DNA Elements (ENCODE) and The Cancer Genome Atlas (TCGA). However, there is a lack of an integrative model to take full advantage of these rich yet heterogeneous data. To this end, we developed RACER (Regression Analysis of Combined Expression Regulation), which fits the mRNA expression as response using as explanatory variables, the TF data from ENCODE, and CNV, DM, miRNA expression signals from TCGA. Briefly, RACER first infers the sample-specific regulatory activities by TFs and miRNAs, which are then used as inputs to infer specific TF/miRNA-gene interactions. Such a two-stage regression framework circumvents a common difficulty in integrating ENCODE data measured in generic cell-line with the sample-specific TCGA measurements. As a case study, we integrated Acute Myeloid Leukemia (AML) data from TCGA and the related TF binding data measured in K562 from ENCODE. As a proof-of-concept, we first verified our model formalism by 10-fold cross-validation on predicting gene expression. We next evaluated RACER on recovering known regulatory interactions, and demonstrated its superior statistical power over existing methods in detecting known miRNA/TF targets. Additionally, we developed a feature selection procedure, which identified 18 regulators, whose activities clustered consistently with cytogenetic risk groups. One of the selected regulators is miR-548p, whose inferred targets were significantly enriched for leukemia-related pathway, implicating its novel role in AML pathogenesis. Moreover, survival analysis using the inferred activities identified C-Fos as a potential AML prognostic marker. Together, we provided a novel framework that successfully integrated the TCGA and ENCODE data in revealing AML-specific regulatory program at global level. PMID:25340776

  19. Understanding behavior under nonverbal transitive-inference procedures: Stimulus-control-topography analyses.

    PubMed

    Galizio, Ann; Doughty, Adam H; Williams, Dean C; Saunders, Kathryn J

    2017-07-01

    Following training with verbal stimulus relations involving A is greater than B and B is greater than C, verbally-competent individuals reliably select A>C when asked "which is greater, A or C?" (i.e., verbal transitive inference). This result is easy to interpret. Nonhuman animals and humans with and without intellectual disabilities have been exposed to nonverbal transitive-inference procedures involving trained arbitrary stimulus relations. Following the training of A+B-, B+C-, C+D-, and D+E-, B reliably is selected over D (i.e., nonverbal transitive inference). Such findings are more challenging to interpret. The present research explored accounts of nonverbal transitive inference based in transitive inference per se, reinforcement, such as value-transfer theory, and operant stimulus control. In Experiment 1, college students selected B>G following the training of A+B-, B+C-, C+D-///E+F-, F+G-, and G+H- (where///signifies the omission of D+E-). In Experiment 2, college students selected B>G following the training of A+B-, B+C-, C+D-///E+F-, F+G-, and G+X- (where X refers to 10 stimuli that alternated across trials). In Experiment 3, college students selected G>B following the training of Y+B-, B+C-, C+D-///E+F-, F+G-, and G+X- (where Y and X refer to 10 stimuli, respectively, that alternated across trials). These findings are discussed in the context of operant stimulus control by offering an approach based in stimulus B typically acquiring only a select stimulus control topography. Copyright © 2017 Elsevier B.V. All rights reserved.

  20. Reaction Time in Grade 5: Data Collection within the Practice of Statistics

    ERIC Educational Resources Information Center

    Watson, Jane; English, Lyn

    2017-01-01

    This study reports on a classroom activity for Grade 5 students investigating their reaction times. The investigation was part of a 3-year research project introducing students to informal inference and giving them experience carrying out the practice of statistics. For this activity the focus within the practice of statistics was on introducing…

  1. An Inferentialist Perspective on the Coordination of Actions and Reasons Involved in Making a Statistical Inference

    ERIC Educational Resources Information Center

    Bakker, Arthur; Ben-Zvi, Dani; Makar, Katie

    2017-01-01

    To understand how statistical and other types of reasoning are coordinated with actions to reduce uncertainty, we conducted a case study in vocational education that involved statistical hypothesis testing. We analyzed an intern's research project in a hospital laboratory in which reducing uncertainties was crucial to make a valid statistical…

  2. Causal inference in biology networks with integrated belief propagation.

    PubMed

    Chang, Rui; Karr, Jonathan R; Schadt, Eric E

    2015-01-01

    Inferring causal relationships among molecular and higher order phenotypes is a critical step in elucidating the complexity of living systems. Here we propose a novel method for inferring causality that is no longer constrained by the conditional dependency arguments that limit the ability of statistical causal inference methods to resolve causal relationships within sets of graphical models that are Markov equivalent. Our method utilizes Bayesian belief propagation to infer the responses of perturbation events on molecular traits given a hypothesized graph structure. A distance measure between the inferred response distribution and the observed data is defined to assess the 'fitness' of the hypothesized causal relationships. To test our algorithm, we infer causal relationships within equivalence classes of gene networks in which the form of the functional interactions that are possible are assumed to be nonlinear, given synthetic microarray and RNA sequencing data. We also apply our method to infer causality in real metabolic network with v-structure and feedback loop. We show that our method can recapitulate the causal structure and recover the feedback loop only from steady-state data which conventional method cannot.

  3. Gene-network inference by message passing

    NASA Astrophysics Data System (ADS)

    Braunstein, A.; Pagnani, A.; Weigt, M.; Zecchina, R.

    2008-01-01

    The inference of gene-regulatory processes from gene-expression data belongs to the major challenges of computational systems biology. Here we address the problem from a statistical-physics perspective and develop a message-passing algorithm which is able to infer sparse, directed and combinatorial regulatory mechanisms. Using the replica technique, the algorithmic performance can be characterized analytically for artificially generated data. The algorithm is applied to genome-wide expression data of baker's yeast under various environmental conditions. We find clear cases of combinatorial control, and enrichment in common functional annotations of regulated genes and their regulators.

  4. The Maryland Refutation Proof Procedure.

    ERIC Educational Resources Information Center

    Minker, Jack; And Others

    The Maryland Refutation Proof Procedure System (MRPPS) is an interactive experimental system intended for studying deductive search methods. Although the work is oriented towards question-answering, MRPPS provides a general problem solving capability. There are three major components within MRPPS. These are: (1) an inference system, (2) a search…

  5. Bayesian multimodel inference for dose-response studies

    USGS Publications Warehouse

    Link, W.A.; Albers, P.H.

    2007-01-01

    Statistical inference in dose?response studies is model-based: The analyst posits a mathematical model of the relation between exposure and response, estimates parameters of the model, and reports conclusions conditional on the model. Such analyses rarely include any accounting for the uncertainties associated with model selection. The Bayesian inferential system provides a convenient framework for model selection and multimodel inference. In this paper we briefly describe the Bayesian paradigm and Bayesian multimodel inference. We then present a family of models for multinomial dose?response data and apply Bayesian multimodel inferential methods to the analysis of data on the reproductive success of American kestrels (Falco sparveriuss) exposed to various sublethal dietary concentrations of methylmercury.

  6. Statistical Inference for Big Data Problems in Molecular Biophysics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ramanathan, Arvind; Savol, Andrej; Burger, Virginia

    2012-01-01

    We highlight the role of statistical inference techniques in providing biological insights from analyzing long time-scale molecular simulation data. Technologi- cal and algorithmic improvements in computation have brought molecular simu- lations to the forefront of techniques applied to investigating the basis of living systems. While these longer simulations, increasingly complex reaching petabyte scales presently, promise a detailed view into microscopic behavior, teasing out the important information has now become a true challenge on its own. Mining this data for important patterns is critical to automating therapeutic intervention discovery, improving protein design, and fundamentally understanding the mech- anistic basis of cellularmore » homeostasis.« less

  7. Tropical geometry of statistical models.

    PubMed

    Pachter, Lior; Sturmfels, Bernd

    2004-11-16

    This article presents a unified mathematical framework for inference in graphical models, building on the observation that graphical models are algebraic varieties. From this geometric viewpoint, observations generated from a model are coordinates of a point in the variety, and the sum-product algorithm is an efficient tool for evaluating specific coordinates. Here, we address the question of how the solutions to various inference problems depend on the model parameters. The proposed answer is expressed in terms of tropical algebraic geometry. The Newton polytope of a statistical model plays a key role. Our results are applied to the hidden Markov model and the general Markov model on a binary tree.

  8. ddClone: joint statistical inference of clonal populations from single cell and bulk tumour sequencing data.

    PubMed

    Salehi, Sohrab; Steif, Adi; Roth, Andrew; Aparicio, Samuel; Bouchard-Côté, Alexandre; Shah, Sohrab P

    2017-03-01

    Next-generation sequencing (NGS) of bulk tumour tissue can identify constituent cell populations in cancers and measure their abundance. This requires computational deconvolution of allelic counts from somatic mutations, which may be incapable of fully resolving the underlying population structure. Single cell sequencing (SCS) is a more direct method, although its replacement of NGS is impeded by technical noise and sampling limitations. We propose ddClone, which analytically integrates NGS and SCS data, leveraging their complementary attributes through joint statistical inference. We show on real and simulated datasets that ddClone produces more accurate results than can be achieved by either method alone.

  9. The Heuristic Value of p in Inductive Statistical Inference

    PubMed Central

    Krueger, Joachim I.; Heck, Patrick R.

    2017-01-01

    Many statistical methods yield the probability of the observed data – or data more extreme – under the assumption that a particular hypothesis is true. This probability is commonly known as ‘the’ p-value. (Null Hypothesis) Significance Testing ([NH]ST) is the most prominent of these methods. The p-value has been subjected to much speculation, analysis, and criticism. We explore how well the p-value predicts what researchers presumably seek: the probability of the hypothesis being true given the evidence, and the probability of reproducing significant results. We also explore the effect of sample size on inferential accuracy, bias, and error. In a series of simulation experiments, we find that the p-value performs quite well as a heuristic cue in inductive inference, although there are identifiable limits to its usefulness. We conclude that despite its general usefulness, the p-value cannot bear the full burden of inductive inference; it is but one of several heuristic cues available to the data analyst. Depending on the inferential challenge at hand, investigators may supplement their reports with effect size estimates, Bayes factors, or other suitable statistics, to communicate what they think the data say. PMID:28649206

  10. Subjective randomness as statistical inference.

    PubMed

    Griffiths, Thomas L; Daniels, Dylan; Austerweil, Joseph L; Tenenbaum, Joshua B

    2018-06-01

    Some events seem more random than others. For example, when tossing a coin, a sequence of eight heads in a row does not seem very random. Where do these intuitions about randomness come from? We argue that subjective randomness can be understood as the result of a statistical inference assessing the evidence that an event provides for having been produced by a random generating process. We show how this account provides a link to previous work relating randomness to algorithmic complexity, in which random events are those that cannot be described by short computer programs. Algorithmic complexity is both incomputable and too general to capture the regularities that people can recognize, but viewing randomness as statistical inference provides two paths to addressing these problems: considering regularities generated by simpler computing machines, and restricting the set of probability distributions that characterize regularity. Building on previous work exploring these different routes to a more restricted notion of randomness, we define strong quantitative models of human randomness judgments that apply not just to binary sequences - which have been the focus of much of the previous work on subjective randomness - but also to binary matrices and spatial clustering. Copyright © 2018 Elsevier Inc. All rights reserved.

  11. The Heuristic Value of p in Inductive Statistical Inference.

    PubMed

    Krueger, Joachim I; Heck, Patrick R

    2017-01-01

    Many statistical methods yield the probability of the observed data - or data more extreme - under the assumption that a particular hypothesis is true. This probability is commonly known as 'the' p -value. (Null Hypothesis) Significance Testing ([NH]ST) is the most prominent of these methods. The p -value has been subjected to much speculation, analysis, and criticism. We explore how well the p -value predicts what researchers presumably seek: the probability of the hypothesis being true given the evidence, and the probability of reproducing significant results. We also explore the effect of sample size on inferential accuracy, bias, and error. In a series of simulation experiments, we find that the p -value performs quite well as a heuristic cue in inductive inference, although there are identifiable limits to its usefulness. We conclude that despite its general usefulness, the p -value cannot bear the full burden of inductive inference; it is but one of several heuristic cues available to the data analyst. Depending on the inferential challenge at hand, investigators may supplement their reports with effect size estimates, Bayes factors, or other suitable statistics, to communicate what they think the data say.

  12. Statistical inference of seabed sound-speed structure in the Gulf of Oman Basin.

    PubMed

    Sagers, Jason D; Knobles, David P

    2014-06-01

    Addressed is the statistical inference of the sound-speed depth profile of a thick soft seabed from broadband sound propagation data recorded in the Gulf of Oman Basin in 1977. The acoustic data are in the form of time series signals recorded on a sparse vertical line array and generated by explosive sources deployed along a 280 km track. The acoustic data offer a unique opportunity to study a deep-water bottom-limited thickly sedimented environment because of the large number of time series measurements, very low seabed attenuation, and auxiliary measurements. A maximum entropy method is employed to obtain a conditional posterior probability distribution (PPD) for the sound-speed ratio and the near-surface sound-speed gradient. The multiple data samples allow for a determination of the average error constraint value required to uniquely specify the PPD for each data sample. Two complicating features of the statistical inference study are addressed: (1) the need to develop an error function that can both utilize the measured multipath arrival structure and mitigate the effects of data errors and (2) the effect of small bathymetric slopes on the structure of the bottom interacting arrivals.

  13. Experimental and environmental factors affect spurious detection of ecological thresholds

    USGS Publications Warehouse

    Daily, Jonathan P.; Hitt, Nathaniel P.; Smith, David; Snyder, Craig D.

    2012-01-01

    Threshold detection methods are increasingly popular for assessing nonlinear responses to environmental change, but their statistical performance remains poorly understood. We simulated linear change in stream benthic macroinvertebrate communities and evaluated the performance of commonly used threshold detection methods based on model fitting (piecewise quantile regression [PQR]), data partitioning (nonparametric change point analysis [NCPA]), and a hybrid approach (significant zero crossings [SiZer]). We demonstrated that false detection of ecological thresholds (type I errors) and inferences on threshold locations are influenced by sample size, rate of linear change, and frequency of observations across the environmental gradient (i.e., sample-environment distribution, SED). However, the relative importance of these factors varied among statistical methods and between inference types. False detection rates were influenced primarily by user-selected parameters for PQR (τ) and SiZer (bandwidth) and secondarily by sample size (for PQR) and SED (for SiZer). In contrast, the location of reported thresholds was influenced primarily by SED. Bootstrapped confidence intervals for NCPA threshold locations revealed strong correspondence to SED. We conclude that the choice of statistical methods for threshold detection should be matched to experimental and environmental constraints to minimize false detection rates and avoid spurious inferences regarding threshold location.

  14. Advances in Bayesian Modeling in Educational Research

    ERIC Educational Resources Information Center

    Levy, Roy

    2016-01-01

    In this article, I provide a conceptually oriented overview of Bayesian approaches to statistical inference and contrast them with frequentist approaches that currently dominate conventional practice in educational research. The features and advantages of Bayesian approaches are illustrated with examples spanning several statistical modeling…

  15. Scaling up spike-and-slab models for unsupervised feature learning.

    PubMed

    Goodfellow, Ian J; Courville, Aaron; Bengio, Yoshua

    2013-08-01

    We describe the use of two spike-and-slab models for modeling real-valued data, with an emphasis on their applications to object recognition. The first model, which we call spike-and-slab sparse coding (S3C), is a preexisting model for which we introduce a faster approximate inference algorithm. We introduce a deep variant of S3C, which we call the partially directed deep Boltzmann machine (PD-DBM) and extend our S3C inference algorithm for use on this model. We describe learning procedures for each. We demonstrate that our inference procedure for S3C enables scaling the model to unprecedented large problem sizes, and demonstrate that using S3C as a feature extractor results in very good object recognition performance, particularly when the number of labeled examples is low. We show that the PD-DBM generates better samples than its shallow counterpart, and that unlike DBMs or DBNs, the PD-DBM may be trained successfully without greedy layerwise training.

  16. Statistical Inference for Quality-Adjusted Survival Time

    DTIC Science & Technology

    2003-08-01

    survival functions of QAL. If an influence function for a test statistic exists for complete data case, denoted as ’i, then a test statistic for...the survival function for the censoring variable. Zhao and Tsiatis (2001) proposed a test statistic where O is the influence function of the general...to 1 everywhere until a subject’s death. We have considered other forms of test statistics. One option is to use an influence function 0i that is

  17. Intimate Partner Violence in the United States - 2010

    MedlinePlus

    ... administration............................................................................. 9 Statistical testing and inference ................................................................... 9 Additional methodological information ..........................................................10 2. Prevalence and Frequency of Individual ...

  18. Learning Quantitative Sequence-Function Relationships from Massively Parallel Experiments

    NASA Astrophysics Data System (ADS)

    Atwal, Gurinder S.; Kinney, Justin B.

    2016-03-01

    A fundamental aspect of biological information processing is the ubiquity of sequence-function relationships—functions that map the sequence of DNA, RNA, or protein to a biochemically relevant activity. Most sequence-function relationships in biology are quantitative, but only recently have experimental techniques for effectively measuring these relationships been developed. The advent of such "massively parallel" experiments presents an exciting opportunity for the concepts and methods of statistical physics to inform the study of biological systems. After reviewing these recent experimental advances, we focus on the problem of how to infer parametric models of sequence-function relationships from the data produced by these experiments. Specifically, we retrace and extend recent theoretical work showing that inference based on mutual information, not the standard likelihood-based approach, is often necessary for accurately learning the parameters of these models. Closely connected with this result is the emergence of "diffeomorphic modes"—directions in parameter space that are far less constrained by data than likelihood-based inference would suggest. Analogous to Goldstone modes in physics, diffeomorphic modes arise from an arbitrarily broken symmetry of the inference problem. An analytically tractable model of a massively parallel experiment is then described, providing an explicit demonstration of these fundamental aspects of statistical inference. This paper concludes with an outlook on the theoretical and computational challenges currently facing studies of quantitative sequence-function relationships.

  19. Solving the problem of comparing whole bacterial genomes across different sequencing platforms.

    PubMed

    Kaas, Rolf S; Leekitcharoenphon, Pimlapas; Aarestrup, Frank M; Lund, Ole

    2014-01-01

    Whole genome sequencing (WGS) shows great potential for real-time monitoring and identification of infectious disease outbreaks. However, rapid and reliable comparison of data generated in multiple laboratories and using multiple technologies is essential. So far studies have focused on using one technology because each technology has a systematic bias making integration of data generated from different platforms difficult. We developed two different procedures for identifying variable sites and inferring phylogenies in WGS data across multiple platforms. The methods were evaluated on three bacterial data sets and sequenced on three different platforms (Illumina, 454, Ion Torrent). We show that the methods are able to overcome the systematic biases caused by the sequencers and infer the expected phylogenies. It is concluded that the cause of the success of these new procedures is due to a validation of all informative sites that are included in the analysis. The procedures are available as web tools.

  20. Statistical primer: how to deal with missing data in scientific research?

    PubMed

    Papageorgiou, Grigorios; Grant, Stuart W; Takkenberg, Johanna J M; Mokhles, Mostafa M

    2018-05-10

    Missing data are a common challenge encountered in research which can compromise the results of statistical inference when not handled appropriately. This paper aims to introduce basic concepts of missing data to a non-statistical audience, list and compare some of the most popular approaches for handling missing data in practice and provide guidelines and recommendations for dealing with and reporting missing data in scientific research. Complete case analysis and single imputation are simple approaches for handling missing data and are popular in practice, however, in most cases they are not guaranteed to provide valid inferences. Multiple imputation is a robust and general alternative which is appropriate for data missing at random, surpassing the disadvantages of the simpler approaches, but should always be conducted with care. The aforementioned approaches are illustrated and compared in an example application using Cox regression.

  1. Is the P-Value Really Dead? Assessing Inference Learning Outcomes for Social Science Students in an Introductory Statistics Course

    ERIC Educational Resources Information Center

    Lane-Getaz, Sharon

    2017-01-01

    In reaction to misuses and misinterpretations of p-values and confidence intervals, a social science journal editor banned p-values from its pages. This study aimed to show that education could address misuse and abuse. This study examines inference-related learning outcomes for social science students in an introductory course supplemented with…

  2. Back to BaySICS: a user-friendly program for Bayesian Statistical Inference from Coalescent Simulations.

    PubMed

    Sandoval-Castellanos, Edson; Palkopoulou, Eleftheria; Dalén, Love

    2014-01-01

    Inference of population demographic history has vastly improved in recent years due to a number of technological and theoretical advances including the use of ancient DNA. Approximate Bayesian computation (ABC) stands among the most promising methods due to its simple theoretical fundament and exceptional flexibility. However, limited availability of user-friendly programs that perform ABC analysis renders it difficult to implement, and hence programming skills are frequently required. In addition, there is limited availability of programs able to deal with heterochronous data. Here we present the software BaySICS: Bayesian Statistical Inference of Coalescent Simulations. BaySICS provides an integrated and user-friendly platform that performs ABC analyses by means of coalescent simulations from DNA sequence data. It estimates historical demographic population parameters and performs hypothesis testing by means of Bayes factors obtained from model comparisons. Although providing specific features that improve inference from datasets with heterochronous data, BaySICS also has several capabilities making it a suitable tool for analysing contemporary genetic datasets. Those capabilities include joint analysis of independent tables, a graphical interface and the implementation of Markov-chain Monte Carlo without likelihoods.

  3. Multibaseline gravitational wave radiometry

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Talukder, Dipongkar; Bose, Sukanta; Mitra, Sanjit

    2011-03-15

    We present a statistic for the detection of stochastic gravitational wave backgrounds (SGWBs) using radiometry with a network of multiple baselines. We also quantitatively compare the sensitivities of existing baselines and their network to SGWBs. We assess how the measurement accuracy of signal parameters, e.g., the sky position of a localized source, can improve when using a network of baselines, as compared to any of the single participating baselines. The search statistic itself is derived from the likelihood ratio of the cross correlation of the data across all possible baselines in a detector network and is optimal in Gaussian noise.more » Specifically, it is the likelihood ratio maximized over the strength of the SGWB and is called the maximized-likelihood ratio (MLR). One of the main advantages of using the MLR over past search strategies for inferring the presence or absence of a signal is that the former does not require the deconvolution of the cross correlation statistic. Therefore, it does not suffer from errors inherent to the deconvolution procedure and is especially useful for detecting weak sources. In the limit of a single baseline, it reduces to the detection statistic studied by Ballmer [Classical Quantum Gravity 23, S179 (2006).] and Mitra et al.[Phys. Rev. D 77, 042002 (2008).]. Unlike past studies, here the MLR statistic enables us to compare quantitatively the performances of a variety of baselines searching for a SGWB signal in (simulated) data. Although we use simulated noise and SGWB signals for making these comparisons, our method can be straightforwardly applied on real data.« less

  4. Assessing NARCCAP climate model effects using spatial confidence regions.

    PubMed

    French, Joshua P; McGinnis, Seth; Schwartzman, Armin

    2017-01-01

    We assess similarities and differences between model effects for the North American Regional Climate Change Assessment Program (NARCCAP) climate models using varying classes of linear regression models. Specifically, we consider how the average temperature effect differs for the various global and regional climate model combinations, including assessment of possible interaction between the effects of global and regional climate models. We use both pointwise and simultaneous inference procedures to identify regions where global and regional climate model effects differ. We also show conclusively that results from pointwise inference are misleading, and that accounting for multiple comparisons is important for making proper inference.

  5. Automatic inference of multicellular regulatory networks using informative priors.

    PubMed

    Sun, Xiaoyun; Hong, Pengyu

    2009-01-01

    To fully understand the mechanisms governing animal development, computational models and algorithms are needed to enable quantitative studies of the underlying regulatory networks. We developed a mathematical model based on dynamic Bayesian networks to model multicellular regulatory networks that govern cell differentiation processes. A machine-learning method was developed to automatically infer such a model from heterogeneous data. We show that the model inference procedure can be greatly improved by incorporating interaction data across species. The proposed approach was applied to C. elegans vulval induction to reconstruct a model capable of simulating C. elegans vulval induction under 73 different genetic conditions.

  6. Evaluating Content Alignment in Computerized Adaptive Testing

    ERIC Educational Resources Information Center

    Wise, Steven L.; Kingsbury, G. Gage; Webb, Norman L.

    2015-01-01

    The alignment between a test and the content domain it measures represents key evidence for the validation of test score inferences. Although procedures have been developed for evaluating the content alignment of linear tests, these procedures are not readily applicable to computerized adaptive tests (CATs), which require large item pools and do…

  7. STATISTICAL METHODOLOGY FOR THE SIMULTANEOUS ANALYSIS OF MULTIPLE TYPES OF OUTCOMES IN NONLINEAR THRESHOLD MODELS.

    EPA Science Inventory

    Multiple outcomes are often measured on each experimental unit in toxicology experiments. These multiple observations typically imply the existence of correlation between endpoints, and a statistical analysis that incorporates it may result in improved inference. When both disc...

  8. The Love of Large Numbers: A Popularity Bias in Consumer Choice.

    PubMed

    Powell, Derek; Yu, Jingqi; DeWolf, Melissa; Holyoak, Keith J

    2017-10-01

    Social learning-the ability to learn from observing the decisions of other people and the outcomes of those decisions-is fundamental to human evolutionary and cultural success. The Internet now provides social evidence on an unprecedented scale. However, properly utilizing this evidence requires a capacity for statistical inference. We examined how people's interpretation of online review scores is influenced by the numbers of reviews-a potential indicator both of an item's popularity and of the precision of the average review score. Our task was designed to pit statistical information against social information. We modeled the behavior of an "intuitive statistician" using empirical prior information from millions of reviews posted on Amazon.com and then compared the model's predictions with the behavior of experimental participants. Under certain conditions, people preferred a product with more reviews to one with fewer reviews even though the statistical model indicated that the latter was likely to be of higher quality than the former. Overall, participants' judgments suggested that they failed to make meaningful statistical inferences.

  9. Learning what to expect (in visual perception)

    PubMed Central

    Seriès, Peggy; Seitz, Aaron R.

    2013-01-01

    Expectations are known to greatly affect our experience of the world. A growing theory in computational neuroscience is that perception can be successfully described using Bayesian inference models and that the brain is “Bayes-optimal” under some constraints. In this context, expectations are particularly interesting, because they can be viewed as prior beliefs in the statistical inference process. A number of questions remain unsolved, however, for example: How fast do priors change over time? Are there limits in the complexity of the priors that can be learned? How do an individual’s priors compare to the true scene statistics? Can we unlearn priors that are thought to correspond to natural scene statistics? Where and what are the neural substrate of priors? Focusing on the perception of visual motion, we here review recent studies from our laboratories and others addressing these issues. We discuss how these data on motion perception fit within the broader literature on perceptual Bayesian priors, perceptual expectations, and statistical and perceptual learning and review the possible neural basis of priors. PMID:24187536

  10. Statistical Inference for Porous Materials using Persistent Homology.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Moon, Chul; Heath, Jason E.; Mitchell, Scott A.

    2017-12-01

    We propose a porous materials analysis pipeline using persistent homology. We rst compute persistent homology of binarized 3D images of sampled material subvolumes. For each image we compute sets of homology intervals, which are represented as summary graphics called persistence diagrams. We convert persistence diagrams into image vectors in order to analyze the similarity of the homology of the material images using the mature tools for image analysis. Each image is treated as a vector and we compute its principal components to extract features. We t a statistical model using the loadings of principal components to estimate material porosity, permeability,more » anisotropy, and tortuosity. We also propose an adaptive version of the structural similarity index (SSIM), a similarity metric for images, as a measure to determine the statistical representative elementary volumes (sREV) for persistence homology. Thus we provide a capability for making a statistical inference of the uid ow and transport properties of porous materials based on their geometry and connectivity.« less

  11. Philosophy and the practice of Bayesian statistics

    PubMed Central

    Gelman, Andrew; Shalizi, Cosma Rohilla

    2015-01-01

    A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypothetico-deductivism. We examine the actual role played by prior distributions in Bayesian models, and the crucial aspects of model checking and model revision, which fall outside the scope of Bayesian confirmation theory. We draw on the literature on the consistency of Bayesian updating and also on our experience of applied work in social science. Clarity about these matters should benefit not just philosophy of science, but also statistical practice. At best, the inductivist view has encouraged researchers to fit and compare models without checking them; at worst, theorists have actively discouraged practitioners from performing model checking because it does not fit into their framework. PMID:22364575

  12. Philosophy and the practice of Bayesian statistics.

    PubMed

    Gelman, Andrew; Shalizi, Cosma Rohilla

    2013-02-01

    A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypothetico-deductivism. We examine the actual role played by prior distributions in Bayesian models, and the crucial aspects of model checking and model revision, which fall outside the scope of Bayesian confirmation theory. We draw on the literature on the consistency of Bayesian updating and also on our experience of applied work in social science. Clarity about these matters should benefit not just philosophy of science, but also statistical practice. At best, the inductivist view has encouraged researchers to fit and compare models without checking them; at worst, theorists have actively discouraged practitioners from performing model checking because it does not fit into their framework. © 2012 The British Psychological Society.

  13. 78 FR 24138 - Implementing Public Safety Broadband Provisions of the Middle Class Tax Relief and Job Creation...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-04-24

    ... Bureau, Statistical Abstract of the United States: 2011, Table 427 (2007). \\28\\ The 2007 U.S Census data.... (U.S. CENSUS BUREAU, STATISTICAL ABSTRACT OF THE UNITED STATES 2011, Table 428.) The criterion by... Statistical Abstract of the U.S., that inference is further supported by the fact that in both Tables, many...

  14. Local quantum thermal susceptibility

    PubMed Central

    De Pasquale, Antonella; Rossini, Davide; Fazio, Rosario; Giovannetti, Vittorio

    2016-01-01

    Thermodynamics relies on the possibility to describe systems composed of a large number of constituents in terms of few macroscopic variables. Its foundations are rooted into the paradigm of statistical mechanics, where thermal properties originate from averaging procedures which smoothen out local details. While undoubtedly successful, elegant and formally correct, this approach carries over an operational problem, namely determining the precision at which such variables are inferred, when technical/practical limitations restrict our capabilities to local probing. Here we introduce the local quantum thermal susceptibility, a quantifier for the best achievable accuracy for temperature estimation via local measurements. Our method relies on basic concepts of quantum estimation theory, providing an operative strategy to address the local thermal response of arbitrary quantum systems at equilibrium. At low temperatures, it highlights the local distinguishability of the ground state from the excited sub-manifolds, thus providing a method to locate quantum phase transitions. PMID:27681458

  15. Local quantum thermal susceptibility

    NASA Astrophysics Data System (ADS)

    de Pasquale, Antonella; Rossini, Davide; Fazio, Rosario; Giovannetti, Vittorio

    2016-09-01

    Thermodynamics relies on the possibility to describe systems composed of a large number of constituents in terms of few macroscopic variables. Its foundations are rooted into the paradigm of statistical mechanics, where thermal properties originate from averaging procedures which smoothen out local details. While undoubtedly successful, elegant and formally correct, this approach carries over an operational problem, namely determining the precision at which such variables are inferred, when technical/practical limitations restrict our capabilities to local probing. Here we introduce the local quantum thermal susceptibility, a quantifier for the best achievable accuracy for temperature estimation via local measurements. Our method relies on basic concepts of quantum estimation theory, providing an operative strategy to address the local thermal response of arbitrary quantum systems at equilibrium. At low temperatures, it highlights the local distinguishability of the ground state from the excited sub-manifolds, thus providing a method to locate quantum phase transitions.

  16. Is it possible to identify a trend in problem/failure data

    NASA Technical Reports Server (NTRS)

    Church, Curtis K.

    1990-01-01

    One of the major obstacles in identifying and interpreting a trend is the small number of data points. Future trending reports will begin with 1983 data. As the problem/failure data are aggregated by year, there are just seven observations (1983 to 1989) for the 1990 reports. Any statistical inferences with a small amount of data will have a large degree of uncertainty. Consequently, a regression technique approach to identify a trend is limited. Though trend determination by failure mode may be unrealistic, the data may be explored for consistency or stability and the failure rate investigated. Various alternative data analysis procedures are briefly discussed. Techniques that could be used to explore problem/failure data by failure mode are addressed. The data used are taken from Section One, Space Shuttle Main Engine, of the Calspan Quarterly Report dated April 2, 1990.

  17. Modeling error distributions of growth curve models through Bayesian methods.

    PubMed

    Zhang, Zhiyong

    2016-06-01

    Growth curve models are widely used in social and behavioral sciences. However, typical growth curve models often assume that the errors are normally distributed although non-normal data may be even more common than normal data. In order to avoid possible statistical inference problems in blindly assuming normality, a general Bayesian framework is proposed to flexibly model normal and non-normal data through the explicit specification of the error distributions. A simulation study shows when the distribution of the error is correctly specified, one can avoid the loss in the efficiency of standard error estimates. A real example on the analysis of mathematical ability growth data from the Early Childhood Longitudinal Study, Kindergarten Class of 1998-99 is used to show the application of the proposed methods. Instructions and code on how to conduct growth curve analysis with both normal and non-normal error distributions using the the MCMC procedure of SAS are provided.

  18. Sojourning with the Homogeneous Poisson Process.

    PubMed

    Liu, Piaomu; Peña, Edsel A

    2016-01-01

    In this pedagogical article, distributional properties, some surprising, pertaining to the homogeneous Poisson process (HPP), when observed over a possibly random window, are presented. Properties of the gap-time that covered the termination time and the correlations among gap-times of the observed events are obtained. Inference procedures, such as estimation and model validation, based on event occurrence data over the observation window, are also presented. We envision that through the results in this paper, a better appreciation of the subtleties involved in the modeling and analysis of recurrent events data will ensue, since the HPP is arguably one of the simplest among recurrent event models. In addition, the use of the theorem of total probability, Bayes theorem, the iterated rules of expectation, variance and covariance, and the renewal equation could be illustrative when teaching distribution theory, mathematical statistics, and stochastic processes at both the undergraduate and graduate levels. This article is targeted towards both instructors and students.

  19. Circum-Arctic petroleum systems identified using decision-tree chemometrics

    USGS Publications Warehouse

    Peters, K.E.; Ramos, L.S.; Zumberge, J.E.; Valin, Z.C.; Scotese, C.R.; Gautier, D.L.

    2007-01-01

    Source- and age-related biomarker and isotopic data were measured for more than 1000 crude oil samples from wells and seeps collected above approximately 55??N latitude. A unique, multitiered chemometric (multivariate statistical) decision tree was created that allowed automated classification of 31 genetically distinct circumArctic oil families based on a training set of 622 oil samples. The method, which we call decision-tree chemometrics, uses principal components analysis and multiple tiers of K-nearest neighbor and SIMCA (soft independent modeling of class analogy) models to classify and assign confidence limits for newly acquired oil samples and source rock extracts. Geochemical data for each oil sample were also used to infer the age, lithology, organic matter input, depositional environment, and identity of its source rock. These results demonstrate the value of large petroleum databases where all samples were analyzed using the same procedures and instrumentation. Copyright ?? 2007. The American Association of Petroleum Geologists. All rights reserved.

  20. Inferring topological features of proteins from amino acid residue networks

    NASA Astrophysics Data System (ADS)

    Alves, Nelson Augusto; Martinez, Alexandre Souto

    2007-02-01

    Topological properties of native folds are obtained from statistical analysis of 160 low homology proteins covering the four structural classes. This is done analyzing one, two and three-vertex joint distribution of quantities related to the corresponding network of amino acid residues. Emphasis on the amino acid residue hydrophobicity leads to the definition of their center of mass as vertices in this contact network model with interactions represented by edges. The network analysis helps us to interpret experimental results such as hydrophobic scales and fraction of buried accessible surface area in terms of the network connectivity. Moreover, those networks show assortative mixing by degree. To explore the vertex-type dependent correlations, we build a network of hydrophobic and polar vertices. This procedure presents the wiring diagram of the topological structure of globular proteins leading to the following attachment probabilities between hydrophobic-hydrophobic 0.424(5), hydrophobic-polar 0.419(2) and polar-polar 0.157(3) residues.

  1. Earthquake hazard assessment in the Zagros Orogenic Belt of Iran using a fuzzy rule-based model

    NASA Astrophysics Data System (ADS)

    Farahi Ghasre Aboonasr, Sedigheh; Zamani, Ahmad; Razavipour, Fatemeh; Boostani, Reza

    2017-08-01

    Producing accurate seismic hazard map and predicting hazardous areas is necessary for risk mitigation strategies. In this paper, a fuzzy logic inference system is utilized to estimate the earthquake potential and seismic zoning of Zagros Orogenic Belt. In addition to the interpretability, fuzzy predictors can capture both nonlinearity and chaotic behavior of data, where the number of data is limited. In this paper, earthquake pattern in the Zagros has been assessed for the intervals of 10 and 50 years using fuzzy rule-based model. The Molchan statistical procedure has been used to show that our forecasting model is reliable. The earthquake hazard maps for this area reveal some remarkable features that cannot be observed on the conventional maps. Regarding our achievements, some areas in the southern (Bandar Abbas), southwestern (Bandar Kangan) and western (Kermanshah) parts of Iran display high earthquake severity even though they are geographically far apart.

  2. Augmenting Latent Dirichlet Allocation and Rank Threshold Detection with Ontologies

    DTIC Science & Technology

    2010-03-01

    Probabilistic Latent Semantic Indexing (PLSI) is an automated indexing information retrieval model [20]. It is based on a statistical latent class model which is...uses a statistical foundation that is more accurate in finding hidden semantic relationships [20]. The model uses factor analysis of count data, number...principle of statistical infer- ence which asserts that all of the information in a sample is contained in the likelihood function [20]. The statistical

  3. 78 FR 43002 - Proposed Collection; Comment Request for Revenue Procedure 2004-29

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-07-18

    ... comments concerning statistical sampling in Sec. 274 Context. DATES: Written comments should be received on... INFORMATION: Title: Statistical Sampling in Sec. 274 Contest. OMB Number: 1545-1847. Revenue Procedure Number: Revenue Procedure 2004-29. Abstract: Revenue Procedure 2004-29 prescribes the statistical sampling...

  4. Inference Control Mechanism for Statistical Database: Frequency-Imposed Data Distortions.

    ERIC Educational Resources Information Center

    Liew, Chong K.; And Others

    1985-01-01

    Introduces two data distortion methods (Frequency-Imposed Distortion, Frequency-Imposed Probability Distortion) and uses a Monte Carlo study to compare their performance with that of other distortion methods (Point Distortion, Probability Distortion). Indications that data generated by these two methods produce accurate statistics and protect…

  5. Learning planar Ising models

    DOE PAGES

    Johnson, Jason K.; Oyen, Diane Adele; Chertkov, Michael; ...

    2016-12-01

    Inference and learning of graphical models are both well-studied problems in statistics and machine learning that have found many applications in science and engineering. However, exact inference is intractable in general graphical models, which suggests the problem of seeking the best approximation to a collection of random variables within some tractable family of graphical models. In this paper, we focus on the class of planar Ising models, for which exact inference is tractable using techniques of statistical physics. Based on these techniques and recent methods for planarity testing and planar embedding, we propose a greedy algorithm for learning the bestmore » planar Ising model to approximate an arbitrary collection of binary random variables (possibly from sample data). Given the set of all pairwise correlations among variables, we select a planar graph and optimal planar Ising model defined on this graph to best approximate that set of correlations. Finally, we demonstrate our method in simulations and for two applications: modeling senate voting records and identifying geo-chemical depth trends from Mars rover data.« less

  6. Learning planar Ising models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Johnson, Jason K.; Oyen, Diane Adele; Chertkov, Michael

    Inference and learning of graphical models are both well-studied problems in statistics and machine learning that have found many applications in science and engineering. However, exact inference is intractable in general graphical models, which suggests the problem of seeking the best approximation to a collection of random variables within some tractable family of graphical models. In this paper, we focus on the class of planar Ising models, for which exact inference is tractable using techniques of statistical physics. Based on these techniques and recent methods for planarity testing and planar embedding, we propose a greedy algorithm for learning the bestmore » planar Ising model to approximate an arbitrary collection of binary random variables (possibly from sample data). Given the set of all pairwise correlations among variables, we select a planar graph and optimal planar Ising model defined on this graph to best approximate that set of correlations. Finally, we demonstrate our method in simulations and for two applications: modeling senate voting records and identifying geo-chemical depth trends from Mars rover data.« less

  7. An argument for mechanism-based statistical inference in cancer

    PubMed Central

    Ochs, Michael; Price, Nathan D.; Tomasetti, Cristian; Younes, Laurent

    2015-01-01

    Cancer is perhaps the prototypical systems disease, and as such has been the focus of extensive study in quantitative systems biology. However, translating these programs into personalized clinical care remains elusive and incomplete. In this perspective, we argue that realizing this agenda—in particular, predicting disease phenotypes, progression and treatment response for individuals—requires going well beyond standard computational and bioinformatics tools and algorithms. It entails designing global mathematical models over network-scale configurations of genomic states and molecular concentrations, and learning the model parameters from limited available samples of high-dimensional and integrative omics data. As such, any plausible design should accommodate: biological mechanism, necessary for both feasible learning and interpretable decision making; stochasticity, to deal with uncertainty and observed variation at many scales; and a capacity for statistical inference at the patient level. This program, which requires a close, sustained collaboration between mathematicians and biologists, is illustrated in several contexts, including learning bio-markers, metabolism, cell signaling, network inference and tumorigenesis. PMID:25381197

  8. Statistics in biomedical laboratory and clinical science: applications, issues and pitfalls.

    PubMed

    Ludbrook, John

    2008-01-01

    This review is directed at biomedical scientists who want to gain a better understanding of statistics: what tests to use, when, and why. In my view, even during the planning stage of a study it is very important to seek the advice of a qualified biostatistician. When designing and analyzing a study, it is important to construct and test global hypotheses, rather than to make multiple tests on the data. If the latter cannot be avoided, it is essential to control the risk of making false-positive inferences by applying multiple comparison procedures. For comparing two means or two proportions, it is best to use exact permutation tests rather then the better known, classical, ones. For comparing many means, analysis of variance, often of a complex type, is the most powerful approach. The correlation coefficient should never be used to compare the performances of two methods of measurement, or two measures, because it does not detect bias. Instead the Altman-Bland method of differences or least-products linear regression analysis should be preferred. Finally, the educational value to investigators of interaction with a biostatistician, before, during and after a study, cannot be overemphasized. (c) 2007 S. Karger AG, Basel.

  9. Graph-based inductive reasoning.

    PubMed

    Boumans, Marcel

    2016-10-01

    This article discusses methods of inductive inferences that are methods of visualizations designed in such a way that the "eye" can be employed as a reliable tool for judgment. The term "eye" is used as a stand-in for visual cognition and perceptual processing. In this paper "meaningfulness" has a particular meaning, namely accuracy, which is closeness to truth. Accuracy consists of precision and unbiasedness. Precision is dealt with by statistical methods, but for unbiasedness one needs expert judgment. The common view at the beginning of the twentieth century was to make the most efficient use of this kind of judgment by representing the data in shapes and forms in such a way that the "eye" can function as a reliable judge to reduce bias. The need for judgment of the "eye" is even more necessary when the background conditions of the observations are heterogeneous. Statistical procedures require a certain minimal level of homogeneity, but the "eye" does not. The "eye" is an adequate tool for assessing topological similarities when, due to heterogeneity of the data, metric assessment is not possible. In fact, graphical assessments precedes measurement, or to put it more forcefully, the graphic method is a necessary prerequisite for measurement. Copyright © 2016 Elsevier Ltd. All rights reserved.

  10. Network inference using informative priors

    PubMed Central

    Mukherjee, Sach; Speed, Terence P.

    2008-01-01

    Recent years have seen much interest in the study of systems characterized by multiple interacting components. A class of statistical models called graphical models, in which graphs are used to represent probabilistic relationships between variables, provides a framework for formal inference regarding such systems. In many settings, the object of inference is the network structure itself. This problem of “network inference” is well known to be a challenging one. However, in scientific settings there is very often existing information regarding network connectivity. A natural idea then is to take account of such information during inference. This article addresses the question of incorporating prior information into network inference. We focus on directed models called Bayesian networks, and use Markov chain Monte Carlo to draw samples from posterior distributions over network structures. We introduce prior distributions on graphs capable of capturing information regarding network features including edges, classes of edges, degree distributions, and sparsity. We illustrate our approach in the context of systems biology, applying our methods to network inference in cancer signaling. PMID:18799736

  11. Bayesian Inference for Functional Dynamics Exploring in fMRI Data.

    PubMed

    Guo, Xuan; Liu, Bing; Chen, Le; Chen, Guantao; Pan, Yi; Zhang, Jing

    2016-01-01

    This paper aims to review state-of-the-art Bayesian-inference-based methods applied to functional magnetic resonance imaging (fMRI) data. Particularly, we focus on one specific long-standing challenge in the computational modeling of fMRI datasets: how to effectively explore typical functional interactions from fMRI time series and the corresponding boundaries of temporal segments. Bayesian inference is a method of statistical inference which has been shown to be a powerful tool to encode dependence relationships among the variables with uncertainty. Here we provide an introduction to a group of Bayesian-inference-based methods for fMRI data analysis, which were designed to detect magnitude or functional connectivity change points and to infer their functional interaction patterns based on corresponding temporal boundaries. We also provide a comparison of three popular Bayesian models, that is, Bayesian Magnitude Change Point Model (BMCPM), Bayesian Connectivity Change Point Model (BCCPM), and Dynamic Bayesian Variable Partition Model (DBVPM), and give a summary of their applications. We envision that more delicate Bayesian inference models will be emerging and play increasingly important roles in modeling brain functions in the years to come.

  12. Modular Spectral Inference Framework Applied to Young Stars and Brown Dwarfs

    NASA Technical Reports Server (NTRS)

    Gully-Santiago, Michael A.; Marley, Mark S.

    2017-01-01

    In practice, synthetic spectral models are imperfect, causing inaccurate estimates of stellar parameters. Using forward modeling and statistical inference, we derive accurate stellar parameters for a given observed spectrum by emulating a grid of precomputed spectra to track uncertainties. Spectral inference as applied to brown dwarfs re: Synthetic spectral models (Marley et al 1996 and 2014) via the newest grid spans a massive multi-dimensional grid applied to IGRINS spectra, improving atmospheric models for JWST. When applied to young stars(10Myr) with large starpots, they can be measured spectroscopically, especially in the near-IR with IGRINS.

  13. Inference of neuronal network spike dynamics and topology from calcium imaging data

    PubMed Central

    Lütcke, Henry; Gerhard, Felipe; Zenke, Friedemann; Gerstner, Wulfram; Helmchen, Fritjof

    2013-01-01

    Two-photon calcium imaging enables functional analysis of neuronal circuits by inferring action potential (AP) occurrence (“spike trains”) from cellular fluorescence signals. It remains unclear how experimental parameters such as signal-to-noise ratio (SNR) and acquisition rate affect spike inference and whether additional information about network structure can be extracted. Here we present a simulation framework for quantitatively assessing how well spike dynamics and network topology can be inferred from noisy calcium imaging data. For simulated AP-evoked calcium transients in neocortical pyramidal cells, we analyzed the quality of spike inference as a function of SNR and data acquisition rate using a recently introduced peeling algorithm. Given experimentally attainable values of SNR and acquisition rate, neural spike trains could be reconstructed accurately and with up to millisecond precision. We then applied statistical neuronal network models to explore how remaining uncertainties in spike inference affect estimates of network connectivity and topological features of network organization. We define the experimental conditions suitable for inferring whether the network has a scale-free structure and determine how well hub neurons can be identified. Our findings provide a benchmark for future calcium imaging studies that aim to reliably infer neuronal network properties. PMID:24399936

  14. Cognitive Clozing To Teach Them To Think.

    ERIC Educational Resources Information Center

    Viaggio, Sergio

    A cloze-type procedure can be used effectively to teach interpreters how to anticipate what the speaker will say, inferring communicative intention. The exercise uses a text from which words are deleted, not randomly as in the true cloze procedure, but in significant locations or contexts. The words or groups of words suppressed are progressively…

  15. Type Ia Supernova Light Curve Inference: Hierarchical Models for Nearby SN Ia in the Optical and Near Infrared

    NASA Astrophysics Data System (ADS)

    Mandel, Kaisey; Kirshner, R. P.; Narayan, G.; Wood-Vasey, W. M.; Friedman, A. S.; Hicken, M.

    2010-01-01

    I have constructed a comprehensive statistical model for Type Ia supernova light curves spanning optical through near infrared data simultaneously. The near infrared light curves are found to be excellent standard candles (sigma(MH) = 0.11 +/- 0.03 mag) that are less vulnerable to systematic error from dust extinction, a major confounding factor for cosmological studies. A hierarchical statistical framework incorporates coherently multiple sources of randomness and uncertainty, including photometric error, intrinsic supernova light curve variations and correlations, dust extinction and reddening, peculiar velocity dispersion and distances, for probabilistic inference with Type Ia SN light curves. Inferences are drawn from the full probability density over individual supernovae and the SN Ia and dust populations, conditioned on a dataset of SN Ia light curves and redshifts. To compute probabilistic inferences with hierarchical models, I have developed BayeSN, a Markov Chain Monte Carlo algorithm based on Gibbs sampling. This code explores and samples the global probability density of parameters describing individual supernovae and the population. I have applied this hierarchical model to optical and near infrared data of over 100 nearby Type Ia SN from PAIRITEL, the CfA3 sample, and the literature. Using this statistical model, I find that SN with optical and NIR data have a smaller residual scatter in the Hubble diagram than SN with only optical data. The continued study of Type Ia SN in the near infrared will be important for improving their utility as precise and accurate cosmological distance indicators.

  16. Statistical inference involving binomial and negative binomial parameters.

    PubMed

    García-Pérez, Miguel A; Núñez-Antón, Vicente

    2009-05-01

    Statistical inference about two binomial parameters implies that they are both estimated by binomial sampling. There are occasions in which one aims at testing the equality of two binomial parameters before and after the occurrence of the first success along a sequence of Bernoulli trials. In these cases, the binomial parameter before the first success is estimated by negative binomial sampling whereas that after the first success is estimated by binomial sampling, and both estimates are related. This paper derives statistical tools to test two hypotheses, namely, that both binomial parameters equal some specified value and that both parameters are equal though unknown. Simulation studies are used to show that in small samples both tests are accurate in keeping the nominal Type-I error rates, and also to determine sample size requirements to detect large, medium, and small effects with adequate power. Additional simulations also show that the tests are sufficiently robust to certain violations of their assumptions.

  17. Econophysical visualization of Adam Smith’s invisible hand

    NASA Astrophysics Data System (ADS)

    Cohen, Morrel H.; Eliazar, Iddo I.

    2013-02-01

    Consider a complex system whose macrostate is statistically observable, but yet whose operating mechanism is an unknown black-box. In this paper we address the problem of inferring, from the system’s macrostate statistics, the system’s intrinsic force yielding the observed statistics. The inference is established via two diametrically opposite approaches which result in the very same intrinsic force: a top-down approach based on the notion of entropy, and a bottom-up approach based on the notion of Langevin dynamics. The general results established are applied to the problem of visualizing the intrinsic socioeconomic force-Adam Smith’s invisible hand-shaping the distribution of wealth in human societies. Our analysis yields quantitative econophysical representations of figurative socioeconomic forces, quantitative definitions of “poor” and “rich”, and a quantitative characterization of the “poor-get-poorer” and the “rich-get-richer” phenomena.

  18. Sampling and counting genome rearrangement scenarios

    PubMed Central

    2015-01-01

    Background Even for moderate size inputs, there are a tremendous number of optimal rearrangement scenarios, regardless what the model is and which specific question is to be answered. Therefore giving one optimal solution might be misleading and cannot be used for statistical inferring. Statistically well funded methods are necessary to sample uniformly from the solution space and then a small number of samples are sufficient for statistical inferring. Contribution In this paper, we give a mini-review about the state-of-the-art of sampling and counting rearrangement scenarios, focusing on the reversal, DCJ and SCJ models. Above that, we also give a Gibbs sampler for sampling most parsimonious labeling of evolutionary trees under the SCJ model. The method has been implemented and tested on real life data. The software package together with example data can be downloaded from http://www.renyi.hu/~miklosi/SCJ-Gibbs/ PMID:26452124

  19. Online Updating of Statistical Inference in the Big Data Setting.

    PubMed

    Schifano, Elizabeth D; Wu, Jing; Wang, Chun; Yan, Jun; Chen, Ming-Hui

    2016-01-01

    We present statistical methods for big data arising from online analytical processing, where large amounts of data arrive in streams and require fast analysis without storage/access to the historical data. In particular, we develop iterative estimating algorithms and statistical inferences for linear models and estimating equations that update as new data arrive. These algorithms are computationally efficient, minimally storage-intensive, and allow for possible rank deficiencies in the subset design matrices due to rare-event covariates. Within the linear model setting, the proposed online-updating framework leads to predictive residual tests that can be used to assess the goodness-of-fit of the hypothesized model. We also propose a new online-updating estimator under the estimating equation setting. Theoretical properties of the goodness-of-fit tests and proposed estimators are examined in detail. In simulation studies and real data applications, our estimator compares favorably with competing approaches under the estimating equation setting.

  20. When mechanism matters: Bayesian forecasting using models of ecological diffusion

    USGS Publications Warehouse

    Hefley, Trevor J.; Hooten, Mevin B.; Russell, Robin E.; Walsh, Daniel P.; Powell, James A.

    2017-01-01

    Ecological diffusion is a theory that can be used to understand and forecast spatio-temporal processes such as dispersal, invasion, and the spread of disease. Hierarchical Bayesian modelling provides a framework to make statistical inference and probabilistic forecasts, using mechanistic ecological models. To illustrate, we show how hierarchical Bayesian models of ecological diffusion can be implemented for large data sets that are distributed densely across space and time. The hierarchical Bayesian approach is used to understand and forecast the growth and geographic spread in the prevalence of chronic wasting disease in white-tailed deer (Odocoileus virginianus). We compare statistical inference and forecasts from our hierarchical Bayesian model to phenomenological regression-based methods that are commonly used to analyse spatial occurrence data. The mechanistic statistical model based on ecological diffusion led to important ecological insights, obviated a commonly ignored type of collinearity, and was the most accurate method for forecasting.

  1. Family farming workers mental health in a microrregion in southern Brazil.

    PubMed

    Poletto, Ângela Regina; Gontijo, Leila Amaral

    2012-01-01

    This research aims at investigating family farming workers' of Ituporanga microregion mental health problems and sociodemographical feature and work process association. The sample corresponded to 447 family farming workers in Ituporanga, i. e., part of the overall population lives in the 1.578 rural properties of the city (IBGE, 2007). A questionnaire with socio-demographic and work process variables was used for data collection concerning mental health problems along with the Self Report Questionnaire (SRQ-20). Inference descriptive statistics with central trend measures and variability was used for data analysis. By means of binary logistic regression the probability of an event, i.e. the presence of mental health problems occur as a result of predicting variables. Level of significance 5% was adopted in all statistical procedures. The investigation revealed the prevalence of 33,8% of mental health problems. It was observed that women prevailed with 39,7% (n = 91), in contrast with men with 26,1% (n = 46), being such association statistically significant (X² = 8,225, df = 1, p = 0,004, phi= -0,143). Socio-demographical and work process variables showed predictors of mental health problems, such as: (sex, age, use of agrochemicals, working hours outside and during harvest time, being family intoxication the most important. Mental health problems showed mostly associated to the use of agro-chemicals and farmers being intoxicated.

  2. On the insufficiency of arbitrarily precise covariance matrices: non-Gaussian weak-lensing likelihoods

    NASA Astrophysics Data System (ADS)

    Sellentin, Elena; Heavens, Alan F.

    2018-01-01

    We investigate whether a Gaussian likelihood, as routinely assumed in the analysis of cosmological data, is supported by simulated survey data. We define test statistics, based on a novel method that first destroys Gaussian correlations in a data set, and then measures the non-Gaussian correlations that remain. This procedure flags pairs of data points that depend on each other in a non-Gaussian fashion, and thereby identifies where the assumption of a Gaussian likelihood breaks down. Using this diagnosis, we find that non-Gaussian correlations in the CFHTLenS cosmic shear correlation functions are significant. With a simple exclusion of the most contaminated data points, the posterior for s8 is shifted without broadening, but we find no significant reduction in the tension with s8 derived from Planck cosmic microwave background data. However, we also show that the one-point distributions of the correlation statistics are noticeably skewed, such that sound weak-lensing data sets are intrinsically likely to lead to a systematically low lensing amplitude being inferred. The detected non-Gaussianities get larger with increasing angular scale such that for future wide-angle surveys such as Euclid or LSST, with their very small statistical errors, the large-scale modes are expected to be increasingly affected. The shifts in posteriors may then not be negligible and we recommend that these diagnostic tests be run as part of future analyses.

  3. Decision tree modeling using R.

    PubMed

    Zhang, Zhongheng

    2016-08-01

    In machine learning field, decision tree learner is powerful and easy to interpret. It employs recursive binary partitioning algorithm that splits the sample in partitioning variable with the strongest association with the response variable. The process continues until some stopping criteria are met. In the example I focus on conditional inference tree, which incorporates tree-structured regression models into conditional inference procedures. While growing a single tree is subject to small changes in the training data, random forests procedure is introduced to address this problem. The sources of diversity for random forests come from the random sampling and restricted set of input variables to be selected. Finally, I introduce R functions to perform model based recursive partitioning. This method incorporates recursive partitioning into conventional parametric model building.

  4. Bayesian analysis of non-homogeneous Markov chains: application to mental health data.

    PubMed

    Sung, Minje; Soyer, Refik; Nhan, Nguyen

    2007-07-10

    In this paper we present a formal treatment of non-homogeneous Markov chains by introducing a hierarchical Bayesian framework. Our work is motivated by the analysis of correlated categorical data which arise in assessment of psychiatric treatment programs. In our development, we introduce a Markovian structure to describe the non-homogeneity of transition patterns. In doing so, we introduce a logistic regression set-up for Markov chains and incorporate covariates in our model. We present a Bayesian model using Markov chain Monte Carlo methods and develop inference procedures to address issues encountered in the analyses of data from psychiatric treatment programs. Our model and inference procedures are implemented to some real data from a psychiatric treatment study. Copyright 2006 John Wiley & Sons, Ltd.

  5. Probabilistic Signal Recovery and Random Matrices

    DTIC Science & Technology

    2016-12-08

    applications in statistics , biomedical data analysis, quantization, dimen- sion reduction, and networks science. 1. High-dimensional inference and geometry Our...low-rank approxima- tion, with applications to community detection in networks, Annals of Statistics 44 (2016), 373–400. [7] C. Le, E. Levina, R...approximation, with applications to community detection in networks, Annals of Statistics 44 (2016), 373–400. C. Le, E. Levina, R. Vershynin, Concentration

  6. The Use of a Context-Based Information Retrieval Technique

    DTIC Science & Technology

    2009-07-01

    provided in context. Latent Semantic Analysis (LSA) is a statistical technique for inferring contextual and structural information, and previous studies...WAIS). 10 DSTO-TR-2322 1.4.4 Latent Semantic Analysis LSA, which is also known as latent semantic indexing (LSI), uses a statistical and...1.4.6 Language Models In contrast, natural language models apply algorithms that combine statistical information with semantic information. Semantic

  7. An Introduction to Confidence Intervals for Both Statistical Estimates and Effect Sizes.

    ERIC Educational Resources Information Center

    Capraro, Mary Margaret

    This paper summarizes methods of estimating confidence intervals, including classical intervals and intervals for effect sizes. The recent American Psychological Association (APA) Task Force on Statistical Inference report suggested that confidence intervals should always be reported, and the fifth edition of the APA "Publication Manual"…

  8. Balancing Treatment and Control Groups in Quasi-Experiments: An Introduction to Propensity Scoring

    ERIC Educational Resources Information Center

    Connelly, Brian S.; Sackett, Paul R.; Waters, Shonna D.

    2013-01-01

    Organizational and applied sciences have long struggled with improving causal inference in quasi-experiments. We introduce organizational researchers to propensity scoring, a statistical technique that has become popular in other applied sciences as a means for improving internal validity. Propensity scoring statistically models how individuals in…

  9. Modeling Cross-Situational Word-Referent Learning: Prior Questions

    ERIC Educational Resources Information Center

    Yu, Chen; Smith, Linda B.

    2012-01-01

    Both adults and young children possess powerful statistical computation capabilities--they can infer the referent of a word from highly ambiguous contexts involving many words and many referents by aggregating cross-situational statistical information across contexts. This ability has been explained by models of hypothesis testing and by models of…

  10. Temporal and Statistical Information in Causal Structure Learning

    ERIC Educational Resources Information Center

    McCormack, Teresa; Frosch, Caren; Patrick, Fiona; Lagnado, David

    2015-01-01

    Three experiments examined children's and adults' abilities to use statistical and temporal information to distinguish between common cause and causal chain structures. In Experiment 1, participants were provided with conditional probability information and/or temporal information and asked to infer the causal structure of a 3-variable mechanical…

  11. Secondary Analysis of National Longitudinal Transition Study 2 Data

    ERIC Educational Resources Information Center

    Hicks, Tyler A.; Knollman, Greg A.

    2015-01-01

    This review examines published secondary analyses of National Longitudinal Transition Study 2 (NLTS2) data, with a primary focus upon statistical objectives, paradigms, inferences, and methods. Its primary purpose was to determine which statistical techniques have been common in secondary analyses of NLTS2 data. The review begins with an…

  12. Some General Goals in Teaching Statistics.

    ERIC Educational Resources Information Center

    Blalock, H. M.

    1987-01-01

    States that regardless of the content or level of a statistics course, five goals to reach are: (1) overcoming fears, resistances, and tendencies to memorize; (2) the importance of intellectual honesty and integrity; (3) understanding relationship between deductive and inductive inferences; (4) learning to play role of reasonable critic; and (5)…

  13. Propensity Score Analysis: An Alternative Statistical Approach for HRD Researchers

    ERIC Educational Resources Information Center

    Keiffer, Greggory L.; Lane, Forrest C.

    2016-01-01

    Purpose: This paper aims to introduce matching in propensity score analysis (PSA) as an alternative statistical approach for researchers looking to make causal inferences using intact groups. Design/methodology/approach: An illustrative example demonstrated the varying results of analysis of variance, analysis of covariance and PSA on a heuristic…

  14. Technology Focus: Using Technology to Explore Statistical Inference

    ERIC Educational Resources Information Center

    Garofalo, Joe; Juersivich, Nicole

    2007-01-01

    There is much research that documents what many teachers know, that students struggle with many concepts in probability and statistics. This article presents two sample activities the authors use to help preservice teachers develop ideas about how they can use technology to promote their students' ability to understand mathematics and connect…

  15. The Impact of an Instructional Intervention Designed to Support Development of Stochastic Understanding of Probability Distribution

    ERIC Educational Resources Information Center

    Conant, Darcy Lynn

    2013-01-01

    Stochastic understanding of probability distribution undergirds development of conceptual connections between probability and statistics and supports development of a principled understanding of statistical inference. This study investigated the impact of an instructional course intervention designed to support development of stochastic…

  16. Basic Statistical Concepts and Methods for Earth Scientists

    USGS Publications Warehouse

    Olea, Ricardo A.

    2008-01-01

    INTRODUCTION Statistics is the science of collecting, analyzing, interpreting, modeling, and displaying masses of numerical data primarily for the characterization and understanding of incompletely known systems. Over the years, these objectives have lead to a fair amount of analytical work to achieve, substantiate, and guide descriptions and inferences.

  17. Metacontrast Inferred from Reaction Time and Verbal Report: Replication and Comments on the Feher-Biederman Experiment

    ERIC Educational Resources Information Center

    Amundson, Vickie E.; Bernstein, Ira H.

    1973-01-01

    Authors note that Fehrer and Biederman's two statistical tests were not of equal power and that their conclusion could be a statistical artifact of both the lesser power of the verbal report comparison and the insensitivity of their particular verbal report indicator. (Editor)

  18. Optimism bias leads to inconclusive results - an empirical study

    PubMed Central

    Djulbegovic, Benjamin; Kumar, Ambuj; Magazin, Anja; Schroen, Anneke T.; Soares, Heloisa; Hozo, Iztok; Clarke, Mike; Sargent, Daniel; Schell, Michael J.

    2010-01-01

    Objective Optimism bias refers to unwarranted belief in the efficacy of new therapies. We assessed the impact of optimism bias on a proportion of trials that did not answer their research question successfully, and explored whether poor accrual or optimism bias is responsible for inconclusive results. Study Design Systematic review Setting Retrospective analysis of a consecutive series phase III randomized controlled trials (RCTs) performed under the aegis of National Cancer Institute Cooperative groups. Results 359 trials (374 comparisons) enrolling 150,232 patients were analyzed. 70% (262/374) of the trials generated conclusive results according to the statistical criteria. Investigators made definitive statements related to the treatment preference in 73% (273/374) of studies. Investigators’ judgments and statistical inferences were concordant in 75% (279/374) of trials. Investigators consistently overestimated their expected treatment effects, but to a significantly larger extent for inconclusive trials. The median ratio of expected over observed hazard ratio or odds ratio was 1.34 (range 0.19 – 15.40) in conclusive trials compared to 1.86 (range 1.09 – 12.00) in inconclusive studies (p<0.0001). Only 17% of the trials had treatment effects that matched original researchers’ expectations. Conclusion Formal statistical inference is sufficient to answer the research question in 75% of RCTs. The answers to the other 25% depend mostly on subjective judgments, which at times are in conflict with statistical inference. Optimism bias significantly contributes to inconclusive results. PMID:21163620

  19. Optimism bias leads to inconclusive results-an empirical study.

    PubMed

    Djulbegovic, Benjamin; Kumar, Ambuj; Magazin, Anja; Schroen, Anneke T; Soares, Heloisa; Hozo, Iztok; Clarke, Mike; Sargent, Daniel; Schell, Michael J

    2011-06-01

    Optimism bias refers to unwarranted belief in the efficacy of new therapies. We assessed the impact of optimism bias on a proportion of trials that did not answer their research question successfully and explored whether poor accrual or optimism bias is responsible for inconclusive results. Systematic review. Retrospective analysis of a consecutive-series phase III randomized controlled trials (RCTs) performed under the aegis of National Cancer Institute Cooperative groups. Three hundred fifty-nine trials (374 comparisons) enrolling 150,232 patients were analyzed. Seventy percent (262 of 374) of the trials generated conclusive results according to the statistical criteria. Investigators made definitive statements related to the treatment preference in 73% (273 of 374) of studies. Investigators' judgments and statistical inferences were concordant in 75% (279 of 374) of trials. Investigators consistently overestimated their expected treatment effects but to a significantly larger extent for inconclusive trials. The median ratio of expected and observed hazard ratio or odds ratio was 1.34 (range: 0.19-15.40) in conclusive trials compared with 1.86 (range: 1.09-12.00) in inconclusive studies (P<0.0001). Only 17% of the trials had treatment effects that matched original researchers' expectations. Formal statistical inference is sufficient to answer the research question in 75% of RCTs. The answers to the other 25% depend mostly on subjective judgments, which at times are in conflict with statistical inference. Optimism bias significantly contributes to inconclusive results. Copyright © 2011 Elsevier Inc. All rights reserved.

  20. Forward and backward inference in spatial cognition.

    PubMed

    Penny, Will D; Zeidman, Peter; Burgess, Neil

    2013-01-01

    This paper shows that the various computations underlying spatial cognition can be implemented using statistical inference in a single probabilistic model. Inference is implemented using a common set of 'lower-level' computations involving forward and backward inference over time. For example, to estimate where you are in a known environment, forward inference is used to optimally combine location estimates from path integration with those from sensory input. To decide which way to turn to reach a goal, forward inference is used to compute the likelihood of reaching that goal under each option. To work out which environment you are in, forward inference is used to compute the likelihood of sensory observations under the different hypotheses. For reaching sensory goals that require a chaining together of decisions, forward inference can be used to compute a state trajectory that will lead to that goal, and backward inference to refine the route and estimate control signals that produce the required trajectory. We propose that these computations are reflected in recent findings of pattern replay in the mammalian brain. Specifically, that theta sequences reflect decision making, theta flickering reflects model selection, and remote replay reflects route and motor planning. We also propose a mapping of the above computational processes onto lateral and medial entorhinal cortex and hippocampus.

  1. Forward and Backward Inference in Spatial Cognition

    PubMed Central

    Penny, Will D.; Zeidman, Peter; Burgess, Neil

    2013-01-01

    This paper shows that the various computations underlying spatial cognition can be implemented using statistical inference in a single probabilistic model. Inference is implemented using a common set of ‘lower-level’ computations involving forward and backward inference over time. For example, to estimate where you are in a known environment, forward inference is used to optimally combine location estimates from path integration with those from sensory input. To decide which way to turn to reach a goal, forward inference is used to compute the likelihood of reaching that goal under each option. To work out which environment you are in, forward inference is used to compute the likelihood of sensory observations under the different hypotheses. For reaching sensory goals that require a chaining together of decisions, forward inference can be used to compute a state trajectory that will lead to that goal, and backward inference to refine the route and estimate control signals that produce the required trajectory. We propose that these computations are reflected in recent findings of pattern replay in the mammalian brain. Specifically, that theta sequences reflect decision making, theta flickering reflects model selection, and remote replay reflects route and motor planning. We also propose a mapping of the above computational processes onto lateral and medial entorhinal cortex and hippocampus. PMID:24348230

  2. Statistical inference approach to structural reconstruction of complex networks from binary time series

    NASA Astrophysics Data System (ADS)

    Ma, Chuang; Chen, Han-Shuang; Lai, Ying-Cheng; Zhang, Hai-Feng

    2018-02-01

    Complex networks hosting binary-state dynamics arise in a variety of contexts. In spite of previous works, to fully reconstruct the network structure from observed binary data remains challenging. We articulate a statistical inference based approach to this problem. In particular, exploiting the expectation-maximization (EM) algorithm, we develop a method to ascertain the neighbors of any node in the network based solely on binary data, thereby recovering the full topology of the network. A key ingredient of our method is the maximum-likelihood estimation of the probabilities associated with actual or nonexistent links, and we show that the EM algorithm can distinguish the two kinds of probability values without any ambiguity, insofar as the length of the available binary time series is reasonably long. Our method does not require any a priori knowledge of the detailed dynamical processes, is parameter-free, and is capable of accurate reconstruction even in the presence of noise. We demonstrate the method using combinations of distinct types of binary dynamical processes and network topologies, and provide a physical understanding of the underlying reconstruction mechanism. Our statistical inference based reconstruction method contributes an additional piece to the rapidly expanding "toolbox" of data based reverse engineering of complex networked systems.

  3. A statistical model for interpreting computerized dynamic posturography data

    NASA Technical Reports Server (NTRS)

    Feiveson, Alan H.; Metter, E. Jeffrey; Paloski, William H.

    2002-01-01

    Computerized dynamic posturography (CDP) is widely used for assessment of altered balance control. CDP trials are quantified using the equilibrium score (ES), which ranges from zero to 100, as a decreasing function of peak sway angle. The problem of how best to model and analyze ESs from a controlled study is considered. The ES often exhibits a skewed distribution in repeated trials, which can lead to incorrect inference when applying standard regression or analysis of variance models. Furthermore, CDP trials are terminated when a patient loses balance. In these situations, the ES is not observable, but is assigned the lowest possible score--zero. As a result, the response variable has a mixed discrete-continuous distribution, further compromising inference obtained by standard statistical methods. Here, we develop alternative methodology for analyzing ESs under a stochastic model extending the ES to a continuous latent random variable that always exists, but is unobserved in the event of a fall. Loss of balance occurs conditionally, with probability depending on the realized latent ES. After fitting the model by a form of quasi-maximum-likelihood, one may perform statistical inference to assess the effects of explanatory variables. An example is provided, using data from the NIH/NIA Baltimore Longitudinal Study on Aging.

  4. Statistical inference of protein structural alignments using information and compression.

    PubMed

    Collier, James H; Allison, Lloyd; Lesk, Arthur M; Stuckey, Peter J; Garcia de la Banda, Maria; Konagurthu, Arun S

    2017-04-01

    Structural molecular biology depends crucially on computational techniques that compare protein three-dimensional structures and generate structural alignments (the assignment of one-to-one correspondences between subsets of amino acids based on atomic coordinates). Despite its importance, the structural alignment problem has not been formulated, much less solved, in a consistent and reliable way. To overcome these difficulties, we present here a statistical framework for the precise inference of structural alignments, built on the Bayesian and information-theoretic principle of Minimum Message Length (MML). The quality of any alignment is measured by its explanatory power-the amount of lossless compression achieved to explain the protein coordinates using that alignment. We have implemented this approach in MMLigner , the first program able to infer statistically significant structural alignments. We also demonstrate the reliability of MMLigner 's alignment results when compared with the state of the art. Importantly, MMLigner can also discover different structural alignments of comparable quality, a challenging problem for oligomers and protein complexes. Source code, binaries and an interactive web version are available at http://lcb.infotech.monash.edu.au/mmligner . arun.konagurthu@monash.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  5. Statistical inference with quantum measurements: methodologies for nitrogen vacancy centers in diamond

    NASA Astrophysics Data System (ADS)

    Hincks, Ian; Granade, Christopher; Cory, David G.

    2018-01-01

    The analysis of photon count data from the standard nitrogen vacancy (NV) measurement process is treated as a statistical inference problem. This has applications toward gaining better and more rigorous error bars for tasks such as parameter estimation (e.g. magnetometry), tomography, and randomized benchmarking. We start by providing a summary of the standard phenomenological model of the NV optical process in terms of Lindblad jump operators. This model is used to derive random variables describing emitted photons during measurement, to which finite visibility, dark counts, and imperfect state preparation are added. NV spin-state measurement is then stated as an abstract statistical inference problem consisting of an underlying biased coin obstructed by three Poisson rates. Relevant frequentist and Bayesian estimators are provided, discussed, and quantitatively compared. We show numerically that the risk of the maximum likelihood estimator is well approximated by the Cramér-Rao bound, for which we provide a simple formula. Of the estimators, we in particular promote the Bayes estimator, owing to its slightly better risk performance, and straightforward error propagation into more complex experiments. This is illustrated on experimental data, where quantum Hamiltonian learning is performed and cross-validated in a fully Bayesian setting, and compared to a more traditional weighted least squares fit.

  6. Statistical inference approach to structural reconstruction of complex networks from binary time series.

    PubMed

    Ma, Chuang; Chen, Han-Shuang; Lai, Ying-Cheng; Zhang, Hai-Feng

    2018-02-01

    Complex networks hosting binary-state dynamics arise in a variety of contexts. In spite of previous works, to fully reconstruct the network structure from observed binary data remains challenging. We articulate a statistical inference based approach to this problem. In particular, exploiting the expectation-maximization (EM) algorithm, we develop a method to ascertain the neighbors of any node in the network based solely on binary data, thereby recovering the full topology of the network. A key ingredient of our method is the maximum-likelihood estimation of the probabilities associated with actual or nonexistent links, and we show that the EM algorithm can distinguish the two kinds of probability values without any ambiguity, insofar as the length of the available binary time series is reasonably long. Our method does not require any a priori knowledge of the detailed dynamical processes, is parameter-free, and is capable of accurate reconstruction even in the presence of noise. We demonstrate the method using combinations of distinct types of binary dynamical processes and network topologies, and provide a physical understanding of the underlying reconstruction mechanism. Our statistical inference based reconstruction method contributes an additional piece to the rapidly expanding "toolbox" of data based reverse engineering of complex networked systems.

  7. PREFACE: ELC International Meeting on Inference, Computation, and Spin Glasses (ICSG2013)

    NASA Astrophysics Data System (ADS)

    Kabashima, Yoshiyuki; Hukushima, Koji; Inoue, Jun-ichi; Tanaka, Toshiyuki; Watanabe, Osamu

    2013-12-01

    The close relationship between probability-based inference and statistical mechanics of disordered systems has been noted for some time. This relationship has provided researchers with a theoretical foundation in various fields of information processing for analytical performance evaluation and construction of efficient algorithms based on message-passing or Monte Carlo sampling schemes. The ELC International Meeting on 'Inference, Computation, and Spin Glasses (ICSG2013)', was held in Sapporo 28-30 July 2013. The meeting was organized as a satellite meeting of STATPHYS25 in order to offer a forum where concerned researchers can assemble and exchange information on the latest results and newly established methodologies, and discuss future directions of the interdisciplinary studies between statistical mechanics and information sciences. Financial support from Grant-in-Aid for Scientific Research on Innovative Areas, MEXT, Japan 'Exploring the Limits of Computation (ELC)' is gratefully acknowledged. We are pleased to publish 23 papers contributed by invited speakers of ICSG2013 in this volume of Journal of Physics: Conference Series. We hope that this volume will promote further development of this highly vigorous interdisciplinary field between statistical mechanics and information/computer science. Editors and ICSG2013 Organizing Committee: Koji Hukushima Jun-ichi Inoue (Local Chair of ICSG2013) Yoshiyuki Kabashima (Editor-in-Chief) Toshiyuki Tanaka Osamu Watanabe (General Chair of ICSG2013)

  8. Measuring the Number of M Dwarfs per M Dwarf Using Kepler Eclipsing Binaries

    NASA Astrophysics Data System (ADS)

    Shan, Yutong; Johnson, John A.; Morton, Timothy D.

    2015-11-01

    We measure the binarity of detached M dwarfs in the Kepler field with orbital periods in the range of 1-90 days. Kepler’s photometric precision and nearly continuous monitoring of stellar targets over time baselines ranging from 3 months to 4 years make its detection efficiency for eclipsing binaries nearly complete over this period range and for all radius ratios. Our investigation employs a statistical framework akin to that used for inferring planetary occurrence rates from planetary transits. The obvious simplification is that eclipsing binaries have a vastly improved detection efficiency that is limited chiefly by their geometric probabilities to eclipse. For the M-dwarf sample observed by the Kepler Mission, the fractional incidence of eclipsing binaries implies that there are {0.11}-0.04+0.02 close stellar companions per apparently single M dwarf. Our measured binarity is higher than previous inferences of the occurrence rate of close binaries via radial velocity techniques, at roughly the 2σ level. This study represents the first use of eclipsing binary detections from a high quality transiting planet mission to infer binary statistics. Application of this statistical framework to the eclipsing binaries discovered by future transit surveys will establish better constraints on short-period M+M binary rate, as well as binarity measurements for stars of other spectral types.

  9. 28 CFR Appendix D to Part 61 - Office of Justice Assistance, Research, and Statistics Procedures Relating to the Implementation...

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ..., and Statistics Procedures Relating to the Implementation of the National Environmental Policy Act D... Assistance, Research, and Statistics Procedures Relating to the Implementation of the National Environmental... Statistics (OJARS) assists State and local units of government in strengthening and improving law enforcement...

  10. 28 CFR Appendix D to Part 61 - Office of Justice Assistance, Research, and Statistics Procedures Relating to the Implementation...

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ..., and Statistics Procedures Relating to the Implementation of the National Environmental Policy Act D... Assistance, Research, and Statistics Procedures Relating to the Implementation of the National Environmental... Statistics (OJARS) assists State and local units of government in strengthening and improving law enforcement...

  11. Was that part of the story or did i just think so? Age and cognitive status differences in inference and story recognition.

    PubMed

    Bielak, Allison A M; Hultsch, David F; Kadlec, Helena; Strauss, Esther

    2007-01-01

    This study expanded the inference and story recognition literature by investigating differences within the older age range, differences as a result of cognitive impairment, no dementia (CIND), and applying signal detection procedures to the analysis of accuracy data. Old-old adults and those with more severe CIND showed poorer ability to accurately recognize inferences, and less sensitivity in discriminating between statement types. Results support the proposal that participants used two different recognition strategies. Old-old and CIND adults may be less able to recognize that something plausible with an event may not have actually occurred.

  12. Assessing NARCCAP climate model effects using spatial confidence regions

    PubMed Central

    French, Joshua P.; McGinnis, Seth; Schwartzman, Armin

    2017-01-01

    We assess similarities and differences between model effects for the North American Regional Climate Change Assessment Program (NARCCAP) climate models using varying classes of linear regression models. Specifically, we consider how the average temperature effect differs for the various global and regional climate model combinations, including assessment of possible interaction between the effects of global and regional climate models. We use both pointwise and simultaneous inference procedures to identify regions where global and regional climate model effects differ. We also show conclusively that results from pointwise inference are misleading, and that accounting for multiple comparisons is important for making proper inference. PMID:28936474

  13. Cancer Survival Estimates Due to Non-Uniform Loss to Follow-Up and Non-Proportional Hazards

    PubMed

    K M, Jagathnath Krishna; Mathew, Aleyamma; Sara George, Preethi

    2017-06-25

    Background: Cancer survival depends on loss to follow-up (LFU) and non-proportional hazards (non-PH). If LFU is high, survival will be over-estimated. If hazard is non-PH, rank tests will provide biased inference and Cox-model will provide biased hazard-ratio. We assessed the bias due to LFU and non-PH factor in cancer survival and provided alternate methods for unbiased inference and hazard-ratio. Materials and Methods: Kaplan-Meier survival were plotted using a realistic breast cancer (BC) data-set, with >40%, 5-year LFU and compared it using another BC data-set with <15%, 5-year LFU to assess the bias in survival due to high LFU. Age at diagnosis of the latter data set was used to illustrate the bias due to a non-PH factor. Log-rank test was employed to assess the bias in p-value and Cox-model was used to assess the bias in hazard-ratio for the non-PH factor. Schoenfeld statistic was used to test the non-PH of age. For the non-PH factor, we employed Renyi statistic for inference and time dependent Cox-model for hazard-ratio. Results: Five-year BC survival was 69% (SE: 1.1%) vs. 90% (SE: 0.7%) for data with low vs. high LFU respectively. Age (<45, 46-54 & >54 years) was a non-PH factor (p-value: 0.036). However, survival by age was significant (log-rank p-value: 0.026), but not significant using Renyi statistic (p=0.067). Hazard ratio (HR) for age using Cox-model was 1.012 (95%CI: 1.004 -1.019) and the same using time-dependent Cox-model was in the other direction (HR: 0.997; 95% CI: 0.997- 0.998). Conclusion: Over-estimated survival was observed for cancer with high LFU. Log-rank statistic and Cox-model provided biased results for non-PH factor. For data with non-PH factors, Renyi statistic and time dependent Cox-model can be used as alternate methods to obtain unbiased inference and estimates. Creative Commons Attribution License

  14. Final Report, DOE Early Career Award: Predictive modeling of complex physical systems: new tools for statistical inference, uncertainty quantification, and experimental design

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Marzouk, Youssef

    Predictive simulation of complex physical systems increasingly rests on the interplay of experimental observations with computational models. Key inputs, parameters, or structural aspects of models may be incomplete or unknown, and must be developed from indirect and limited observations. At the same time, quantified uncertainties are needed to qualify computational predictions in the support of design and decision-making. In this context, Bayesian statistics provides a foundation for inference from noisy and limited data, but at prohibitive computional expense. This project intends to make rigorous predictive modeling *feasible* in complex physical systems, via accelerated and scalable tools for uncertainty quantification, Bayesianmore » inference, and experimental design. Specific objectives are as follows: 1. Develop adaptive posterior approximations and dimensionality reduction approaches for Bayesian inference in high-dimensional nonlinear systems. 2. Extend accelerated Bayesian methodologies to large-scale {\\em sequential} data assimilation, fully treating nonlinear models and non-Gaussian state and parameter distributions. 3. Devise efficient surrogate-based methods for Bayesian model selection and the learning of model structure. 4. Develop scalable simulation/optimization approaches to nonlinear Bayesian experimental design, for both parameter inference and model selection. 5. Demonstrate these inferential tools on chemical kinetic models in reacting flow, constructing and refining thermochemical and electrochemical models from limited data. Demonstrate Bayesian filtering on canonical stochastic PDEs and in the dynamic estimation of inhomogeneous subsurface properties and flow fields.« less

  15. Unraveling multiple changes in complex climate time series using Bayesian inference

    NASA Astrophysics Data System (ADS)

    Berner, Nadine; Trauth, Martin H.; Holschneider, Matthias

    2016-04-01

    Change points in time series are perceived as heterogeneities in the statistical or dynamical characteristics of observations. Unraveling such transitions yields essential information for the understanding of the observed system. The precise detection and basic characterization of underlying changes is therefore of particular importance in environmental sciences. We present a kernel-based Bayesian inference approach to investigate direct as well as indirect climate observations for multiple generic transition events. In order to develop a diagnostic approach designed to capture a variety of natural processes, the basic statistical features of central tendency and dispersion are used to locally approximate a complex time series by a generic transition model. A Bayesian inversion approach is developed to robustly infer on the location and the generic patterns of such a transition. To systematically investigate time series for multiple changes occurring at different temporal scales, the Bayesian inversion is extended to a kernel-based inference approach. By introducing basic kernel measures, the kernel inference results are composed into a proxy probability to a posterior distribution of multiple transitions. Thus, based on a generic transition model a probability expression is derived that is capable to indicate multiple changes within a complex time series. We discuss the method's performance by investigating direct and indirect climate observations. The approach is applied to environmental time series (about 100 a), from the weather station in Tuscaloosa, Alabama, and confirms documented instrumentation changes. Moreover, the approach is used to investigate a set of complex terrigenous dust records from the ODP sites 659, 721/722 and 967 interpreted as climate indicators of the African region of the Plio-Pleistocene period (about 5 Ma). The detailed inference unravels multiple transitions underlying the indirect climate observations coinciding with established global climate events.

  16. Bayesian Parameter Inference and Model Selection by Population Annealing in Systems Biology

    PubMed Central

    Murakami, Yohei

    2014-01-01

    Parameter inference and model selection are very important for mathematical modeling in systems biology. Bayesian statistics can be used to conduct both parameter inference and model selection. Especially, the framework named approximate Bayesian computation is often used for parameter inference and model selection in systems biology. However, Monte Carlo methods needs to be used to compute Bayesian posterior distributions. In addition, the posterior distributions of parameters are sometimes almost uniform or very similar to their prior distributions. In such cases, it is difficult to choose one specific value of parameter with high credibility as the representative value of the distribution. To overcome the problems, we introduced one of the population Monte Carlo algorithms, population annealing. Although population annealing is usually used in statistical mechanics, we showed that population annealing can be used to compute Bayesian posterior distributions in the approximate Bayesian computation framework. To deal with un-identifiability of the representative values of parameters, we proposed to run the simulations with the parameter ensemble sampled from the posterior distribution, named “posterior parameter ensemble”. We showed that population annealing is an efficient and convenient algorithm to generate posterior parameter ensemble. We also showed that the simulations with the posterior parameter ensemble can, not only reproduce the data used for parameter inference, but also capture and predict the data which was not used for parameter inference. Lastly, we introduced the marginal likelihood in the approximate Bayesian computation framework for Bayesian model selection. We showed that population annealing enables us to compute the marginal likelihood in the approximate Bayesian computation framework and conduct model selection depending on the Bayes factor. PMID:25089832

  17. Quakefinder: A scalable data mining system for detecting earthquakes from space

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stolorz, P.; Dean, C.

    1996-12-31

    We present an application of novel massively parallel datamining techniques to highly precise inference of important physical processes from remote sensing imagery. Specifically, we have developed and applied a system, Quakefinder, that automatically detects and measures tectonic activity in the earth`s crust by examination of satellite data. We have used Quakefinder to automatically map the direction and magnitude of ground displacements due to the 1992 Landers earthquake in Southern California, over a spatial region of several hundred square kilometers, at a resolution of 10 meters, to a (sub-pixel) precision of 1 meter. This is the first calculation that has evermore » been able to extract area-mapped information about 2D tectonic processes at this level of detail. We outline the architecture of the Quakefinder system, based upon a combination of techniques drawn from the fields of statistical inference, massively parallel computing and global optimization. We confirm the overall correctness of the procedure by comparison of our results with known locations of targeted faults obtained by careful and time-consuming field measurements. The system also performs knowledge discovery by indicating novel unexplained tectonic activity away from the primary faults that has never before been observed. We conclude by discussing the future potential of this data mining system in the broad context of studying subtle spatio-temporal processes within massive image streams.« less

  18. Three field tests of a gas filter correlation radiometer

    NASA Technical Reports Server (NTRS)

    Campbell, S. A.; Casas, J. C.; Condon, E. P.

    1977-01-01

    Test flights to remotely measure nonurban carbon monoxide (CO) concentrations by gas filter correlation radiometry are discussed. The inferred CO concentrations obtained through use of the Gas Filter Correlation Radiometer (GFCR) agreed with independent measurements obtained by gas chromatography air sample bottle analysis to within 20 percent. The equipment flown on board the aircraft, the flight test procedure, the gas chromatograph direct air sampling procedure, and the GFCR data analysis procedure are reported.

  19. Testing Genetic Pleiotropy with GWAS Summary Statistics for Marginal and Conditional Analyses.

    PubMed

    Deng, Yangqing; Pan, Wei

    2017-12-01

    There is growing interest in testing genetic pleiotropy, which is when a single genetic variant influences multiple traits. Several methods have been proposed; however, these methods have some limitations. First, all the proposed methods are based on the use of individual-level genotype and phenotype data; in contrast, for logistical, and other, reasons, summary statistics of univariate SNP-trait associations are typically only available based on meta- or mega-analyzed large genome-wide association study (GWAS) data. Second, existing tests are based on marginal pleiotropy, which cannot distinguish between direct and indirect associations of a single genetic variant with multiple traits due to correlations among the traits. Hence, it is useful to consider conditional analysis, in which a subset of traits is adjusted for another subset of traits. For example, in spite of substantial lowering of low-density lipoprotein cholesterol (LDL) with statin therapy, some patients still maintain high residual cardiovascular risk, and, for these patients, it might be helpful to reduce their triglyceride (TG) level. For this purpose, in order to identify new therapeutic targets, it would be useful to identify genetic variants with pleiotropic effects on LDL and TG after adjusting the latter for LDL; otherwise, a pleiotropic effect of a genetic variant detected by a marginal model could simply be due to its association with LDL only, given the well-known correlation between the two types of lipids. Here, we develop a new pleiotropy testing procedure based only on GWAS summary statistics that can be applied for both marginal analysis and conditional analysis. Although the main technical development is based on published union-intersection testing methods, care is needed in specifying conditional models to avoid invalid statistical estimation and inference. In addition to the previously used likelihood ratio test, we also propose using generalized estimating equations under the working independence model for robust inference. We provide numerical examples based on both simulated and real data, including two large lipid GWAS summary association datasets based on ∼100,000 and ∼189,000 samples, respectively, to demonstrate the difference between marginal and conditional analyses, as well as the effectiveness of our new approach. Copyright © 2017 by the Genetics Society of America.

  20. Evidence and Clinical Trials.

    NASA Astrophysics Data System (ADS)

    Goodman, Steven N.

    1989-11-01

    This dissertation explores the use of a mathematical measure of statistical evidence, the log likelihood ratio, in clinical trials. The methods and thinking behind the use of an evidential measure are contrasted with traditional methods of analyzing data, which depend primarily on a p-value as an estimate of the statistical strength of an observed data pattern. It is contended that neither the behavioral dictates of Neyman-Pearson hypothesis testing methods, nor the coherency dictates of Bayesian methods are realistic models on which to base inference. The use of the likelihood alone is applied to four aspects of trial design or conduct: the calculation of sample size, the monitoring of data, testing for the equivalence of two treatments, and meta-analysis--the combining of results from different trials. Finally, a more general model of statistical inference, using belief functions, is used to see if it is possible to separate the assessment of evidence from our background knowledge. It is shown that traditional and Bayesian methods can be modeled as two ends of a continuum of structured background knowledge, methods which summarize evidence at the point of maximum likelihood assuming no structure, and Bayesian methods assuming complete knowledge. Both schools are seen to be missing a concept of ignorance- -uncommitted belief. This concept provides the key to understanding the problem of sampling to a foregone conclusion and the role of frequency properties in statistical inference. The conclusion is that statistical evidence cannot be defined independently of background knowledge, and that frequency properties of an estimator are an indirect measure of uncommitted belief. Several likelihood summaries need to be used in clinical trials, with the quantitative disparity between summaries being an indirect measure of our ignorance. This conclusion is linked with parallel ideas in the philosophy of science and cognitive psychology.

  1. Ionospheric and Birkeland current distributions inferred from the MAGSAT magnetometer data

    NASA Technical Reports Server (NTRS)

    Zanetti, L. J.; Potemra, T. A.; Baumjohann, W.

    1983-01-01

    Ionospheric and field-aligned sheet current density distributions are presently inferred by means of MAGSAT vector magnetometer data, together with an accurate magnetic field model. By comparing Hall current densities inferred from the MAGSAT data and those inferred from simultaneously recorded ground based data acquired by the Scandinavian magnetometer array, it is determined that the former have previously been underestimated due to high damping of magnetic variations with high spatial wave numbers between the ionosphere and the MAGSAT orbit. Among important results of this study is noted the fact that the Birkeland and electrojet current systems are colocated. The analyses have shown a tendency for triangular rather than constant electrojet current distributions as a function of latitude, consistent with the statistical, uniform regions 1 and 2 Birkeland current patterns.

  2. Inferring Markov chains: Bayesian estimation, model comparison, entropy rate, and out-of-class modeling.

    PubMed

    Strelioff, Christopher C; Crutchfield, James P; Hübler, Alfred W

    2007-07-01

    Markov chains are a natural and well understood tool for describing one-dimensional patterns in time or space. We show how to infer kth order Markov chains, for arbitrary k , from finite data by applying Bayesian methods to both parameter estimation and model-order selection. Extending existing results for multinomial models of discrete data, we connect inference to statistical mechanics through information-theoretic (type theory) techniques. We establish a direct relationship between Bayesian evidence and the partition function which allows for straightforward calculation of the expectation and variance of the conditional relative entropy and the source entropy rate. Finally, we introduce a method that uses finite data-size scaling with model-order comparison to infer the structure of out-of-class processes.

  3. Model selection in historical biogeography reveals that founder-event speciation is a crucial process in Island Clades.

    PubMed

    Matzke, Nicholas J

    2014-11-01

    Founder-event speciation, where a rare jump dispersal event founds a new genetically isolated lineage, has long been considered crucial by many historical biogeographers, but its importance is disputed within the vicariance school. Probabilistic modeling of geographic range evolution creates the potential to test different biogeographical models against data using standard statistical model choice procedures, as long as multiple models are available. I re-implement the Dispersal-Extinction-Cladogenesis (DEC) model of LAGRANGE in the R package BioGeoBEARS, and modify it to create a new model, DEC + J, which adds founder-event speciation, the importance of which is governed by a new free parameter, [Formula: see text]. The identifiability of DEC and DEC + J is tested on data sets simulated under a wide range of macroevolutionary models where geography evolves jointly with lineage birth/death events. The results confirm that DEC and DEC + J are identifiable even though these models ignore the fact that molecular phylogenies are missing many cladogenesis and extinction events. The simulations also indicate that DEC will have substantially increased errors in ancestral range estimation and parameter inference when the true model includes + J. DEC and DEC + J are compared on 13 empirical data sets drawn from studies of island clades. Likelihood-ratio tests indicate that all clades reject DEC, and AICc model weights show large to overwhelming support for DEC + J, for the first time verifying the importance of founder-event speciation in island clades via statistical model choice. Under DEC + J, ancestral nodes are usually estimated to have ranges occupying only one island, rather than the widespread ancestors often favored by DEC. These results indicate that the assumptions of historical biogeography models can have large impacts on inference and require testing and comparison with statistical methods. © The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  4. Fairness heuristics and substitutability effects: inferring the fairness of outcomes, procedures, and interpersonal treatment when employees lack clear information.

    PubMed

    Qin, Xin; Ren, Run; Zhang, Zhi-Xue; Johnson, Russell E

    2015-05-01

    Employees routinely make judgments of 3 kinds of justice (i.e., distributive, procedural, and interactional), yet they may lack clear information to do so. This research examines how justice judgments are formed when clear information about certain types of justice is unavailable or ambiguous. Drawing from fairness heuristic theory, as well as more general theories of cognitive heuristics, we predict that when information for 1 type of justice is unclear (i.e., low in justice clarity), people infer its fairness based on other types of justice with clear information (i.e., high in justice clarity). Results across 3 studies employing different designs (correlational vs. experimental), samples (employees vs. students), and measures (proxy vs. direct) provided support for the proposed substitutability effects, especially when inferences were based on clear interactional justice information. Moreover, we found that substitutability effects were more likely to occur when employees had high (vs. low) need for cognitive closure. We conclude by discussing the theoretical contributions and practical implications of our findings. (c) 2015 APA, all rights reserved).

  5. Bayesian estimation of Karhunen–Loève expansions; A random subspace approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chowdhary, Kenny; Najm, Habib N.

    One of the most widely-used statistical procedures for dimensionality reduction of high dimensional random fields is Principal Component Analysis (PCA), which is based on the Karhunen-Lo eve expansion (KLE) of a stochastic process with finite variance. The KLE is analogous to a Fourier series expansion for a random process, where the goal is to find an orthogonal transformation for the data such that the projection of the data onto this orthogonal subspace is optimal in the L 2 sense, i.e, which minimizes the mean square error. In practice, this orthogonal transformation is determined by performing an SVD (Singular Value Decomposition)more » on the sample covariance matrix or on the data matrix itself. Sampling error is typically ignored when quantifying the principal components, or, equivalently, basis functions of the KLE. Furthermore, it is exacerbated when the sample size is much smaller than the dimension of the random field. In this paper, we introduce a Bayesian KLE procedure, allowing one to obtain a probabilistic model on the principal components, which can account for inaccuracies due to limited sample size. The probabilistic model is built via Bayesian inference, from which the posterior becomes the matrix Bingham density over the space of orthonormal matrices. We use a modified Gibbs sampling procedure to sample on this space and then build a probabilistic Karhunen-Lo eve expansions over random subspaces to obtain a set of low-dimensional surrogates of the stochastic process. We illustrate this probabilistic procedure with a finite dimensional stochastic process inspired by Brownian motion.« less

  6. Bayesian estimation of Karhunen–Loève expansions; A random subspace approach

    DOE PAGES

    Chowdhary, Kenny; Najm, Habib N.

    2016-04-13

    One of the most widely-used statistical procedures for dimensionality reduction of high dimensional random fields is Principal Component Analysis (PCA), which is based on the Karhunen-Lo eve expansion (KLE) of a stochastic process with finite variance. The KLE is analogous to a Fourier series expansion for a random process, where the goal is to find an orthogonal transformation for the data such that the projection of the data onto this orthogonal subspace is optimal in the L 2 sense, i.e, which minimizes the mean square error. In practice, this orthogonal transformation is determined by performing an SVD (Singular Value Decomposition)more » on the sample covariance matrix or on the data matrix itself. Sampling error is typically ignored when quantifying the principal components, or, equivalently, basis functions of the KLE. Furthermore, it is exacerbated when the sample size is much smaller than the dimension of the random field. In this paper, we introduce a Bayesian KLE procedure, allowing one to obtain a probabilistic model on the principal components, which can account for inaccuracies due to limited sample size. The probabilistic model is built via Bayesian inference, from which the posterior becomes the matrix Bingham density over the space of orthonormal matrices. We use a modified Gibbs sampling procedure to sample on this space and then build a probabilistic Karhunen-Lo eve expansions over random subspaces to obtain a set of low-dimensional surrogates of the stochastic process. We illustrate this probabilistic procedure with a finite dimensional stochastic process inspired by Brownian motion.« less

  7. [Confidence interval or p-value--similarities and differences between two important methods of statistical inference of quantitative studies].

    PubMed

    Harari, Gil

    2014-01-01

    Statistic significance, also known as p-value, and CI (Confidence Interval) are common statistics measures and are essential for the statistical analysis of studies in medicine and life sciences. These measures provide complementary information about the statistical probability and conclusions regarding the clinical significance of study findings. This article is intended to describe the methodologies, compare between the methods, assert their suitability for the different needs of study results analysis and to explain situations in which each method should be used.

  8. Reliability of a Measure of Institutional Discrimination against Minorities

    DTIC Science & Technology

    1979-12-01

    samples are presented. The first is based upon classical statistical theory and the second derives from a series of computer-generated Monte Carlo...Institutional racism and sexism . Englewood Cliffs, N. J.: Prentice-Hall, Inc., 1978. Hays, W. L. and Winkler, R. L. Statistics : probability, inference... statistical measure of the e of institutional discrimination are discussed. Two methods of dealing with the problem of reliability of the measure in small

  9. Exploratory Causal Analysis in Bivariate Time Series Data

    NASA Astrophysics Data System (ADS)

    McCracken, James M.

    Many scientific disciplines rely on observational data of systems for which it is difficult (or impossible) to implement controlled experiments and data analysis techniques are required for identifying causal information and relationships directly from observational data. This need has lead to the development of many different time series causality approaches and tools including transfer entropy, convergent cross-mapping (CCM), and Granger causality statistics. In this thesis, the existing time series causality method of CCM is extended by introducing a new method called pairwise asymmetric inference (PAI). It is found that CCM may provide counter-intuitive causal inferences for simple dynamics with strong intuitive notions of causality, and the CCM causal inference can be a function of physical parameters that are seemingly unrelated to the existence of a driving relationship in the system. For example, a CCM causal inference might alternate between ''voltage drives current'' and ''current drives voltage'' as the frequency of the voltage signal is changed in a series circuit with a single resistor and inductor. PAI is introduced to address both of these limitations. Many of the current approaches in the times series causality literature are not computationally straightforward to apply, do not follow directly from assumptions of probabilistic causality, depend on assumed models for the time series generating process, or rely on embedding procedures. A new approach, called causal leaning, is introduced in this work to avoid these issues. The leaning is found to provide causal inferences that agree with intuition for both simple systems and more complicated empirical examples, including space weather data sets. The leaning may provide a clearer interpretation of the results than those from existing time series causality tools. A practicing analyst can explore the literature to find many proposals for identifying drivers and causal connections in times series data sets, but little research exists of how these tools compare to each other in practice. This work introduces and defines exploratory causal analysis (ECA) to address this issue along with the concept of data causality in the taxonomy of causal studies introduced in this work. The motivation is to provide a framework for exploring potential causal structures in time series data sets. ECA is used on several synthetic and empirical data sets, and it is found that all of the tested time series causality tools agree with each other (and intuitive notions of causality) for many simple systems but can provide conflicting causal inferences for more complicated systems. It is proposed that such disagreements between different time series causality tools during ECA might provide deeper insight into the data than could be found otherwise.

  10. 75 FR 38871 - Proposed Collection; Comment Request for Revenue Procedure 2004-29

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-07-06

    ... comments concerning Revenue Procedure 2004-29, Statistical Sampling in Sec. 274 Context. DATES: Written... Internet, at [email protected] . SUPPLEMENTARY INFORMATION: Title: Statistical Sampling in Sec...: Revenue Procedure 2004-29 prescribes the statistical sampling methodology by which taxpayers under...

  11. Bayesian model reduction and empirical Bayes for group (DCM) studies

    PubMed Central

    Friston, Karl J.; Litvak, Vladimir; Oswal, Ashwini; Razi, Adeel; Stephan, Klaas E.; van Wijk, Bernadette C.M.; Ziegler, Gabriel; Zeidman, Peter

    2016-01-01

    This technical note describes some Bayesian procedures for the analysis of group studies that use nonlinear models at the first (within-subject) level – e.g., dynamic causal models – and linear models at subsequent (between-subject) levels. Its focus is on using Bayesian model reduction to finesse the inversion of multiple models of a single dataset or a single (hierarchical or empirical Bayes) model of multiple datasets. These applications of Bayesian model reduction allow one to consider parametric random effects and make inferences about group effects very efficiently (in a few seconds). We provide the relatively straightforward theoretical background to these procedures and illustrate their application using a worked example. This example uses a simulated mismatch negativity study of schizophrenia. We illustrate the robustness of Bayesian model reduction to violations of the (commonly used) Laplace assumption in dynamic causal modelling and show how its recursive application can facilitate both classical and Bayesian inference about group differences. Finally, we consider the application of these empirical Bayesian procedures to classification and prediction. PMID:26569570

  12. Comparative analysis on the selection of number of clusters in community detection

    NASA Astrophysics Data System (ADS)

    Kawamoto, Tatsuro; Kabashima, Yoshiyuki

    2018-02-01

    We conduct a comparative analysis on various estimates of the number of clusters in community detection. An exhaustive comparison requires testing of all possible combinations of frameworks, algorithms, and assessment criteria. In this paper we focus on the framework based on a stochastic block model, and investigate the performance of greedy algorithms, statistical inference, and spectral methods. For the assessment criteria, we consider modularity, map equation, Bethe free energy, prediction errors, and isolated eigenvalues. From the analysis, the tendency of overfit and underfit that the assessment criteria and algorithms have becomes apparent. In addition, we propose that the alluvial diagram is a suitable tool to visualize statistical inference results and can be useful to determine the number of clusters.

  13. Distributed Sensing and Processing for Multi-Camera Networks

    NASA Astrophysics Data System (ADS)

    Sankaranarayanan, Aswin C.; Chellappa, Rama; Baraniuk, Richard G.

    Sensor networks with large numbers of cameras are becoming increasingly prevalent in a wide range of applications, including video conferencing, motion capture, surveillance, and clinical diagnostics. In this chapter, we identify some of the fundamental challenges in designing such systems: robust statistical inference, computationally efficiency, and opportunistic and parsimonious sensing. We show that the geometric constraints induced by the imaging process are extremely useful for identifying and designing optimal estimators for object detection and tracking tasks. We also derive pipelined and parallelized implementations of popular tools used for statistical inference in non-linear systems, of which multi-camera systems are examples. Finally, we highlight the use of the emerging theory of compressive sensing in reducing the amount of data sensed and communicated by a camera network.

  14. General intelligence does not help us understand cognitive evolution.

    PubMed

    Shuker, David M; Barrett, Louise; Dickins, Thomas E; Scott-Phillips, Thom C; Barton, Robert A

    2017-01-01

    Burkart et al. conflate the domain-specificity of cognitive processes with the statistical pattern of variance in behavioural measures that partly reflect those processes. General intelligence is a statistical abstraction, not a cognitive trait, and we argue that the former does not warrant inferences about the nature or evolution of the latter.

  15. Exploring Tree Age & Diameter to Illustrate Sample Design & Inference in Observational Ecology

    ERIC Educational Resources Information Center

    Casady, Grant M.

    2015-01-01

    Undergraduate biology labs often explore the techniques of data collection but neglect the statistical framework necessary to express findings. Students can be confused about how to use their statistical knowledge to address specific biological questions. Growth in the area of observational ecology requires that students gain experience in…

  16. Computer-Based Instruction in Statistical Inference; Final Report. Technical Memorandum (TM Series).

    ERIC Educational Resources Information Center

    Rosenbaum, J.; And Others

    A two-year investigation into the development of computer-assisted instruction (CAI) for the improvement of undergraduate training in statistics was undertaken. The first year was largely devoted to designing PLANIT (Programming LANguage for Interactive Teaching) which reduces, or completely eliminates, the need an author of CAI lessons would…

  17. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction

    ERIC Educational Resources Information Center

    Imbens, Guido W.; Rubin, Donald B.

    2015-01-01

    Most questions in social and biomedical sciences are causal in nature: what would happen to individuals, or to groups, if part of their environment were changed? In this groundbreaking text, two world-renowned experts present statistical methods for studying such questions. This book starts with the notion of potential outcomes, each corresponding…

  18. The Co-Emergence of Aggregate and Modelling Reasoning

    ERIC Educational Resources Information Center

    Aridor, Keren; Ben-Zvi, Dani

    2017-01-01

    This article examines how two processes--reasoning with statistical modelling of a real phenomenon and aggregate reasoning--can co-emerge. We focus in this case study on the emergent reasoning of two fifth graders (aged 10) involved in statistical data analysis, informal inference, and modelling activities using TinkerPlots™. We describe nine…

  19. Can Being Scared Cause Tummy Aches? Naive Theories, Ambiguous Evidence, and Preschoolers' Causal Inferences

    ERIC Educational Resources Information Center

    Schulz, Laura E.; Bonawitz, Elizabeth Baraff; Griffiths, Thomas L.

    2007-01-01

    Causal learning requires integrating constraints provided by domain-specific theories with domain-general statistical learning. In order to investigate the interaction between these factors, the authors presented preschoolers with stories pitting their existing theories against statistical evidence. Each child heard 2 stories in which 2 candidate…

  20. Using Informal Inferential Reasoning to Develop Formal Concepts: Analyzing an Activity

    ERIC Educational Resources Information Center

    Weinberg, Aaron; Wiesner, Emilie; Pfaff, Thomas J.

    2010-01-01

    Inferential reasoning is a central component of statistics. Researchers have suggested that students should develop an informal understanding of the ideas that underlie inference before learning the concepts formally. This paper presents a hands-on activity that is designed to help students in an introductory statistics course draw informal…

Top