Sample records for bayesian posterior probabilities

  1. A Bayesian pick-the-winner design in a randomized phase II clinical trial.

    PubMed

    Chen, Dung-Tsa; Huang, Po-Yu; Lin, Hui-Yi; Chiappori, Alberto A; Gabrilovich, Dmitry I; Haura, Eric B; Antonia, Scott J; Gray, Jhanelle E

    2017-10-24

    Many phase II clinical trials evaluate unique experimental drugs/combinations through multi-arm design to expedite the screening process (early termination of ineffective drugs) and to identify the most effective drug (pick the winner) to warrant a phase III trial. Various statistical approaches have been developed for the pick-the-winner design but have been criticized for lack of objective comparison among the drug agents. We developed a Bayesian pick-the-winner design by integrating a Bayesian posterior probability with Simon two-stage design in a randomized two-arm clinical trial. The Bayesian posterior probability, as the rule to pick the winner, is defined as probability of the response rate in one arm higher than in the other arm. The posterior probability aims to determine the winner when both arms pass the second stage of the Simon two-stage design. When both arms are competitive (i.e., both passing the second stage), the Bayesian posterior probability performs better to correctly identify the winner compared with the Fisher exact test in the simulation study. In comparison to a standard two-arm randomized design, the Bayesian pick-the-winner design has a higher power to determine a clear winner. In application to two studies, the approach is able to perform statistical comparison of two treatment arms and provides a winner probability (Bayesian posterior probability) to statistically justify the winning arm. We developed an integrated design that utilizes Bayesian posterior probability, Simon two-stage design, and randomization into a unique setting. It gives objective comparisons between the arms to determine the winner.

  2. Hepatitis disease detection using Bayesian theory

    NASA Astrophysics Data System (ADS)

    Maseleno, Andino; Hidayati, Rohmah Zahroh

    2017-02-01

    This paper presents hepatitis disease diagnosis using a Bayesian theory for better understanding of the theory. In this research, we used a Bayesian theory for detecting hepatitis disease and displaying the result of diagnosis process. Bayesian algorithm theory is rediscovered and perfected by Laplace, the basic idea is using of the known prior probability and conditional probability density parameter, based on Bayes theorem to calculate the corresponding posterior probability, and then obtained the posterior probability to infer and make decisions. Bayesian methods combine existing knowledge, prior probabilities, with additional knowledge derived from new data, the likelihood function. The initial symptoms of hepatitis which include malaise, fever and headache. The probability of hepatitis given the presence of malaise, fever, and headache. The result revealed that a Bayesian theory has successfully identified the existence of hepatitis disease.

  3. Reweighting Data in the Spirit of Tukey: Using Bayesian Posterior Probabilities as Rasch Residuals for Studying Misfit

    ERIC Educational Resources Information Center

    Dardick, William R.; Mislevy, Robert J.

    2016-01-01

    A new variant of the iterative "data = fit + residual" data-analytical approach described by Mosteller and Tukey is proposed and implemented in the context of item response theory psychometric models. Posterior probabilities from a Bayesian mixture model of a Rasch item response theory model and an unscalable latent class are expressed…

  4. Bayesian selection of misspecified models is overconfident and may cause spurious posterior probabilities for phylogenetic trees.

    PubMed

    Yang, Ziheng; Zhu, Tianqi

    2018-02-20

    The Bayesian method is noted to produce spuriously high posterior probabilities for phylogenetic trees in analysis of large datasets, but the precise reasons for this overconfidence are unknown. In general, the performance of Bayesian selection of misspecified models is poorly understood, even though this is of great scientific interest since models are never true in real data analysis. Here we characterize the asymptotic behavior of Bayesian model selection and show that when the competing models are equally wrong, Bayesian model selection exhibits surprising and polarized behaviors in large datasets, supporting one model with full force while rejecting the others. If one model is slightly less wrong than the other, the less wrong model will eventually win when the amount of data increases, but the method may become overconfident before it becomes reliable. We suggest that this extreme behavior may be a major factor for the spuriously high posterior probabilities for evolutionary trees. The philosophical implications of our results to the application of Bayesian model selection to evaluate opposing scientific hypotheses are yet to be explored, as are the behaviors of non-Bayesian methods in similar situations.

  5. Pig Data and Bayesian Inference on Multinomial Probabilities

    ERIC Educational Resources Information Center

    Kern, John C.

    2006-01-01

    Bayesian inference on multinomial probabilities is conducted based on data collected from the game Pass the Pigs[R]. Prior information on these probabilities is readily available from the instruction manual, and is easily incorporated in a Dirichlet prior. Posterior analysis of the scoring probabilities quantifies the discrepancy between empirical…

  6. Bayesian model checking: A comparison of tests

    NASA Astrophysics Data System (ADS)

    Lucy, L. B.

    2018-06-01

    Two procedures for checking Bayesian models are compared using a simple test problem based on the local Hubble expansion. Over four orders of magnitude, p-values derived from a global goodness-of-fit criterion for posterior probability density functions agree closely with posterior predictive p-values. The former can therefore serve as an effective proxy for the difficult-to-calculate posterior predictive p-values.

  7. Bayesian statistical inference enhances the interpretation of contemporary randomized controlled trials.

    PubMed

    Wijeysundera, Duminda N; Austin, Peter C; Hux, Janet E; Beattie, W Scott; Laupacis, Andreas

    2009-01-01

    Randomized trials generally use "frequentist" statistics based on P-values and 95% confidence intervals. Frequentist methods have limitations that might be overcome, in part, by Bayesian inference. To illustrate these advantages, we re-analyzed randomized trials published in four general medical journals during 2004. We used Medline to identify randomized superiority trials with two parallel arms, individual-level randomization and dichotomous or time-to-event primary outcomes. Studies with P<0.05 in favor of the intervention were deemed "positive"; otherwise, they were "negative." We used several prior distributions and exact conjugate analyses to calculate Bayesian posterior probabilities for clinically relevant effects. Of 88 included studies, 39 were positive using a frequentist analysis. Although the Bayesian posterior probabilities of any benefit (relative risk or hazard ratio<1) were high in positive studies, these probabilities were lower and variable for larger benefits. The positive studies had only moderate probabilities for exceeding the effects that were assumed for calculating the sample size. By comparison, there were moderate probabilities of any benefit in negative studies. Bayesian and frequentist analyses complement each other when interpreting the results of randomized trials. Future reports of randomized trials should include both.

  8. Asking better questions: How presentation formats influence information search.

    PubMed

    Wu, Charley M; Meder, Björn; Filimon, Flavia; Nelson, Jonathan D

    2017-08-01

    While the influence of presentation formats have been widely studied in Bayesian reasoning tasks, we present the first systematic investigation of how presentation formats influence information search decisions. Four experiments were conducted across different probabilistic environments, where subjects (N = 2,858) chose between 2 possible search queries, each with binary probabilistic outcomes, with the goal of maximizing classification accuracy. We studied 14 different numerical and visual formats for presenting information about the search environment, constructed across 6 design features that have been prominently related to improvements in Bayesian reasoning accuracy (natural frequencies, posteriors, complement, spatial extent, countability, and part-to-whole information). The posterior variants of the icon array and bar graph formats led to the highest proportion of correct responses, and were substantially better than the standard probability format. Results suggest that presenting information in terms of posterior probabilities and visualizing natural frequencies using spatial extent (a perceptual feature) were especially helpful in guiding search decisions, although environments with a mixture of probabilistic and certain outcomes were challenging across all formats. Subjects who made more accurate probability judgments did not perform better on the search task, suggesting that simple decision heuristics may be used to make search decisions without explicitly applying Bayesian inference to compute probabilities. We propose a new take-the-difference (TTD) heuristic that identifies the accuracy-maximizing query without explicit computation of posterior probabilities. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  9. Use of Bayesian Inference in Crystallographic Structure Refinement via Full Diffraction Profile Analysis

    PubMed Central

    Fancher, Chris M.; Han, Zhen; Levin, Igor; Page, Katharine; Reich, Brian J.; Smith, Ralph C.; Wilson, Alyson G.; Jones, Jacob L.

    2016-01-01

    A Bayesian inference method for refining crystallographic structures is presented. The distribution of model parameters is stochastically sampled using Markov chain Monte Carlo. Posterior probability distributions are constructed for all model parameters to properly quantify uncertainty by appropriately modeling the heteroskedasticity and correlation of the error structure. The proposed method is demonstrated by analyzing a National Institute of Standards and Technology silicon standard reference material. The results obtained by Bayesian inference are compared with those determined by Rietveld refinement. Posterior probability distributions of model parameters provide both estimates and uncertainties. The new method better estimates the true uncertainties in the model as compared to the Rietveld method. PMID:27550221

  10. Bayesian approach to inverse statistical mechanics.

    PubMed

    Habeck, Michael

    2014-05-01

    Inverse statistical mechanics aims to determine particle interactions from ensemble properties. This article looks at this inverse problem from a Bayesian perspective and discusses several statistical estimators to solve it. In addition, a sequential Monte Carlo algorithm is proposed that draws the interaction parameters from their posterior probability distribution. The posterior probability involves an intractable partition function that is estimated along with the interactions. The method is illustrated for inverse problems of varying complexity, including the estimation of a temperature, the inverse Ising problem, maximum entropy fitting, and the reconstruction of molecular interaction potentials.

  11. Bayesian approach to inverse statistical mechanics

    NASA Astrophysics Data System (ADS)

    Habeck, Michael

    2014-05-01

    Inverse statistical mechanics aims to determine particle interactions from ensemble properties. This article looks at this inverse problem from a Bayesian perspective and discusses several statistical estimators to solve it. In addition, a sequential Monte Carlo algorithm is proposed that draws the interaction parameters from their posterior probability distribution. The posterior probability involves an intractable partition function that is estimated along with the interactions. The method is illustrated for inverse problems of varying complexity, including the estimation of a temperature, the inverse Ising problem, maximum entropy fitting, and the reconstruction of molecular interaction potentials.

  12. Bayesian Posterior Odds Ratios: Statistical Tools for Collaborative Evaluations

    ERIC Educational Resources Information Center

    Hicks, Tyler; Rodríguez-Campos, Liliana; Choi, Jeong Hoon

    2018-01-01

    To begin statistical analysis, Bayesians quantify their confidence in modeling hypotheses with priors. A prior describes the probability of a certain modeling hypothesis apart from the data. Bayesians should be able to defend their choice of prior to a skeptical audience. Collaboration between evaluators and stakeholders could make their choices…

  13. The Estimation of Tree Posterior Probabilities Using Conditional Clade Probability Distributions

    PubMed Central

    Larget, Bret

    2013-01-01

    In this article I introduce the idea of conditional independence of separated subtrees as a principle by which to estimate the posterior probability of trees using conditional clade probability distributions rather than simple sample relative frequencies. I describe an algorithm for these calculations and software which implements these ideas. I show that these alternative calculations are very similar to simple sample relative frequencies for high probability trees but are substantially more accurate for relatively low probability trees. The method allows the posterior probability of unsampled trees to be calculated when these trees contain only clades that are in other sampled trees. Furthermore, the method can be used to estimate the total probability of the set of sampled trees which provides a measure of the thoroughness of a posterior sample. [Bayesian phylogenetics; conditional clade distributions; improved accuracy; posterior probabilities of trees.] PMID:23479066

  14. Little Bayesians or Little Einsteins? Probability and Explanatory Virtue in Children's Inferences

    ERIC Educational Resources Information Center

    Johnston, Angie M.; Johnson, Samuel G. B.; Koven, Marissa L.; Keil, Frank C.

    2017-01-01

    Like scientists, children seek ways to explain causal systems in the world. But are children scientists in the strict Bayesian tradition of maximizing posterior probability? Or do they attend to other explanatory considerations, as laypeople and scientists--such as Einstein--do? Four experiments support the latter possibility. In particular, we…

  15. Ockham's razor and Bayesian analysis. [statistical theory for systems evaluation

    NASA Technical Reports Server (NTRS)

    Jefferys, William H.; Berger, James O.

    1992-01-01

    'Ockham's razor', the ad hoc principle enjoining the greatest possible simplicity in theoretical explanations, is presently shown to be justifiable as a consequence of Bayesian inference; Bayesian analysis can, moreover, clarify the nature of the 'simplest' hypothesis consistent with the given data. By choosing the prior probabilities of hypotheses, it becomes possible to quantify the scientific judgment that simpler hypotheses are more likely to be correct. Bayesian analysis also shows that a hypothesis with fewer adjustable parameters intrinsically possesses an enhanced posterior probability, due to the clarity of its predictions.

  16. The utility of Bayesian predictive probabilities for interim monitoring of clinical trials

    PubMed Central

    Connor, Jason T.; Ayers, Gregory D; Alvarez, JoAnn

    2014-01-01

    Background Bayesian predictive probabilities can be used for interim monitoring of clinical trials to estimate the probability of observing a statistically significant treatment effect if the trial were to continue to its predefined maximum sample size. Purpose We explore settings in which Bayesian predictive probabilities are advantageous for interim monitoring compared to Bayesian posterior probabilities, p-values, conditional power, or group sequential methods. Results For interim analyses that address prediction hypotheses, such as futility monitoring and efficacy monitoring with lagged outcomes, only predictive probabilities properly account for the amount of data remaining to be observed in a clinical trial and have the flexibility to incorporate additional information via auxiliary variables. Limitations Computational burdens limit the feasibility of predictive probabilities in many clinical trial settings. The specification of prior distributions brings additional challenges for regulatory approval. Conclusions The use of Bayesian predictive probabilities enables the choice of logical interim stopping rules that closely align with the clinical decision making process. PMID:24872363

  17. A new prior for bayesian anomaly detection: application to biosurveillance.

    PubMed

    Shen, Y; Cooper, G F

    2010-01-01

    Bayesian anomaly detection computes posterior probabilities of anomalous events by combining prior beliefs and evidence from data. However, the specification of prior probabilities can be challenging. This paper describes a Bayesian prior in the context of disease outbreak detection. The goal is to provide a meaningful, easy-to-use prior that yields a posterior probability of an outbreak that performs at least as well as a standard frequentist approach. If this goal is achieved, the resulting posterior could be usefully incorporated into a decision analysis about how to act in light of a possible disease outbreak. This paper describes a Bayesian method for anomaly detection that combines learning from data with a semi-informative prior probability over patterns of anomalous events. A univariate version of the algorithm is presented here for ease of illustration of the essential ideas. The paper describes the algorithm in the context of disease-outbreak detection, but it is general and can be used in other anomaly detection applications. For this application, the semi-informative prior specifies that an increased count over baseline is expected for the variable being monitored, such as the number of respiratory chief complaints per day at a given emergency department. The semi-informative prior is derived based on the baseline prior, which is estimated from using historical data. The evaluation reported here used semi-synthetic data to evaluate the detection performance of the proposed Bayesian method and a control chart method, which is a standard frequentist algorithm that is closest to the Bayesian method in terms of the type of data it uses. The disease-outbreak detection performance of the Bayesian method was statistically significantly better than that of the control chart method when proper baseline periods were used to estimate the baseline behavior to avoid seasonal effects. When using longer baseline periods, the Bayesian method performed as well as the control chart method. The time complexity of the Bayesian algorithm is linear in the number of the observed events being monitored, due to a novel, closed-form derivation that is introduced in the paper. This paper introduces a novel prior probability for Bayesian outbreak detection that is expressive, easy-to-apply, computationally efficient, and performs as well or better than a standard frequentist method.

  18. Bayesian operational modal analysis with asynchronous data, Part II: Posterior uncertainty

    NASA Astrophysics Data System (ADS)

    Zhu, Yi-Chen; Au, Siu-Kui

    2018-01-01

    A Bayesian modal identification method has been proposed in the companion paper that allows the most probable values of modal parameters to be determined using asynchronous ambient vibration data. This paper investigates the identification uncertainty of modal parameters in terms of their posterior covariance matrix. Computational issues are addressed. Analytical expressions are derived to allow the posterior covariance matrix to be evaluated accurately and efficiently. Synthetic, laboratory and field data examples are presented to verify the consistency, investigate potential modelling error and demonstrate practical applications.

  19. Phylogenetic relationships of Malaysia’s long-tailed macaques, Macaca fascicularis, based on cytochrome b sequences

    PubMed Central

    Abdul-Latiff, Muhammad Abu Bakar; Ruslin, Farhani; Fui, Vun Vui; Abu, Mohd-Hashim; Rovie-Ryan, Jeffrine Japning; Abdul-Patah, Pazil; Lakim, Maklarin; Roos, Christian; Yaakop, Salmah; Md-Zain, Badrul Munir

    2014-01-01

    Abstract Phylogenetic relationships among Malaysia’s long-tailed macaques have yet to be established, despite abundant genetic studies of the species worldwide. The aims of this study are to examine the phylogenetic relationships of Macaca fascicularis in Malaysia and to test its classification as a morphological subspecies. A total of 25 genetic samples of M. fascicularis yielding 383 bp of Cytochrome b (Cyt b) sequences were used in phylogenetic analysis along with one sample each of M. nemestrina and M. arctoides used as outgroups. Sequence character analysis reveals that Cyt b locus is a highly conserved region with only 23% parsimony informative character detected among ingroups. Further analysis indicates a clear separation between populations originating from different regions; the Malay Peninsula versus Borneo Insular, the East Coast versus West Coast of the Malay Peninsula, and the island versus mainland Malay Peninsula populations. Phylogenetic trees (NJ, MP and Bayesian) portray a consistent clustering paradigm as Borneo’s population was distinguished from Peninsula’s population (99% and 100% bootstrap value in NJ and MP respectively and 1.00 posterior probability in Bayesian trees). The East coast population was separated from other Peninsula populations (64% in NJ, 66% in MP and 0.53 posterior probability in Bayesian). West coast populations were divided into 2 clades: the North-South (47%/54% in NJ, 26/26% in MP and 1.00/0.80 posterior probability in Bayesian) and Island-Mainland (93% in NJ, 90% in MP and 1.00 posterior probability in Bayesian). The results confirm the previous morphological assignment of 2 subspecies, M. f. fascicularis and M. f. argentimembris, in the Malay Peninsula. These populations should be treated as separate genetic entities in order to conserve the genetic diversity of Malaysia’s M. fascicularis. These findings are crucial in aiding the conservation management and translocation process of M. fascicularis populations in Malaysia. PMID:24899832

  20. Phylogenetic relationships of Malaysia's long-tailed macaques, Macaca fascicularis, based on cytochrome b sequences.

    PubMed

    Abdul-Latiff, Muhammad Abu Bakar; Ruslin, Farhani; Fui, Vun Vui; Abu, Mohd-Hashim; Rovie-Ryan, Jeffrine Japning; Abdul-Patah, Pazil; Lakim, Maklarin; Roos, Christian; Yaakop, Salmah; Md-Zain, Badrul Munir

    2014-01-01

    Phylogenetic relationships among Malaysia's long-tailed macaques have yet to be established, despite abundant genetic studies of the species worldwide. The aims of this study are to examine the phylogenetic relationships of Macaca fascicularis in Malaysia and to test its classification as a morphological subspecies. A total of 25 genetic samples of M. fascicularis yielding 383 bp of Cytochrome b (Cyt b) sequences were used in phylogenetic analysis along with one sample each of M. nemestrina and M. arctoides used as outgroups. Sequence character analysis reveals that Cyt b locus is a highly conserved region with only 23% parsimony informative character detected among ingroups. Further analysis indicates a clear separation between populations originating from different regions; the Malay Peninsula versus Borneo Insular, the East Coast versus West Coast of the Malay Peninsula, and the island versus mainland Malay Peninsula populations. Phylogenetic trees (NJ, MP and Bayesian) portray a consistent clustering paradigm as Borneo's population was distinguished from Peninsula's population (99% and 100% bootstrap value in NJ and MP respectively and 1.00 posterior probability in Bayesian trees). The East coast population was separated from other Peninsula populations (64% in NJ, 66% in MP and 0.53 posterior probability in Bayesian). West coast populations were divided into 2 clades: the North-South (47%/54% in NJ, 26/26% in MP and 1.00/0.80 posterior probability in Bayesian) and Island-Mainland (93% in NJ, 90% in MP and 1.00 posterior probability in Bayesian). The results confirm the previous morphological assignment of 2 subspecies, M. f. fascicularis and M. f. argentimembris, in the Malay Peninsula. These populations should be treated as separate genetic entities in order to conserve the genetic diversity of Malaysia's M. fascicularis. These findings are crucial in aiding the conservation management and translocation process of M. fascicularis populations in Malaysia.

  1. Bayesian Retrieval of Complete Posterior PDFs of Oceanic Rain Rate From Microwave Observations

    NASA Technical Reports Server (NTRS)

    Chiu, J. Christine; Petty, Grant W.

    2005-01-01

    This paper presents a new Bayesian algorithm for retrieving surface rain rate from Tropical Rainfall Measurements Mission (TRMM) Microwave Imager (TMI) over the ocean, along with validations against estimates from the TRMM Precipitation Radar (PR). The Bayesian approach offers a rigorous basis for optimally combining multichannel observations with prior knowledge. While other rain rate algorithms have been published that are based at least partly on Bayesian reasoning, this is believed to be the first self-contained algorithm that fully exploits Bayes Theorem to yield not just a single rain rate, but rather a continuous posterior probability distribution of rain rate. To advance our understanding of theoretical benefits of the Bayesian approach, we have conducted sensitivity analyses based on two synthetic datasets for which the true conditional and prior distribution are known. Results demonstrate that even when the prior and conditional likelihoods are specified perfectly, biased retrievals may occur at high rain rates. This bias is not the result of a defect of the Bayesian formalism but rather represents the expected outcome when the physical constraint imposed by the radiometric observations is weak, due to saturation effects. It is also suggested that the choice of the estimators and the prior information are both crucial to the retrieval. In addition, the performance of our Bayesian algorithm is found to be comparable to that of other benchmark algorithms in real-world applications, while having the additional advantage of providing a complete continuous posterior probability distribution of surface rain rate.

  2. DOE Office of Scientific and Technical Information (OSTI.GOV)

    La Russa, D

    Purpose: The purpose of this project is to develop a robust method of parameter estimation for a Poisson-based TCP model using Bayesian inference. Methods: Bayesian inference was performed using the PyMC3 probabilistic programming framework written in Python. A Poisson-based TCP regression model that accounts for clonogen proliferation was fit to observed rates of local relapse as a function of equivalent dose in 2 Gy fractions for a population of 623 stage-I non-small-cell lung cancer patients. The Slice Markov Chain Monte Carlo sampling algorithm was used to sample the posterior distributions, and was initiated using the maximum of the posterior distributionsmore » found by optimization. The calculation of TCP with each sample step required integration over the free parameter α, which was performed using an adaptive 24-point Gauss-Legendre quadrature. Convergence was verified via inspection of the trace plot and posterior distribution for each of the fit parameters, as well as with comparisons of the most probable parameter values with their respective maximum likelihood estimates. Results: Posterior distributions for α, the standard deviation of α (σ), the average tumour cell-doubling time (Td), and the repopulation delay time (Tk), were generated assuming α/β = 10 Gy, and a fixed clonogen density of 10{sup 7} cm−{sup 3}. Posterior predictive plots generated from samples from these posterior distributions are in excellent agreement with the observed rates of local relapse used in the Bayesian inference. The most probable values of the model parameters also agree well with maximum likelihood estimates. Conclusion: A robust method of performing Bayesian inference of TCP data using a complex TCP model has been established.« less

  3. Spectral likelihood expansions for Bayesian inference

    NASA Astrophysics Data System (ADS)

    Nagel, Joseph B.; Sudret, Bruno

    2016-03-01

    A spectral approach to Bayesian inference is presented. It pursues the emulation of the posterior probability density. The starting point is a series expansion of the likelihood function in terms of orthogonal polynomials. From this spectral likelihood expansion all statistical quantities of interest can be calculated semi-analytically. The posterior is formally represented as the product of a reference density and a linear combination of polynomial basis functions. Both the model evidence and the posterior moments are related to the expansion coefficients. This formulation avoids Markov chain Monte Carlo simulation and allows one to make use of linear least squares instead. The pros and cons of spectral Bayesian inference are discussed and demonstrated on the basis of simple applications from classical statistics and inverse modeling.

  4. Bayesian analysis of the astrobiological implications of life’s early emergence on Earth

    PubMed Central

    Spiegel, David S.; Turner, Edwin L.

    2012-01-01

    Life arose on Earth sometime in the first few hundred million years after the young planet had cooled to the point that it could support water-based organisms on its surface. The early emergence of life on Earth has been taken as evidence that the probability of abiogenesis is high, if starting from young Earth-like conditions. We revisit this argument quantitatively in a Bayesian statistical framework. By constructing a simple model of the probability of abiogenesis, we calculate a Bayesian estimate of its posterior probability, given the data that life emerged fairly early in Earth’s history and that, billions of years later, curious creatures noted this fact and considered its implications. We find that, given only this very limited empirical information, the choice of Bayesian prior for the abiogenesis probability parameter has a dominant influence on the computed posterior probability. Although terrestrial life's early emergence provides evidence that life might be abundant in the universe if early-Earth-like conditions are common, the evidence is inconclusive and indeed is consistent with an arbitrarily low intrinsic probability of abiogenesis for plausible uninformative priors. Finding a single case of life arising independently of our lineage (on Earth, elsewhere in the solar system, or on an extrasolar planet) would provide much stronger evidence that abiogenesis is not extremely rare in the universe. PMID:22198766

  5. Bayesian analysis of the astrobiological implications of life's early emergence on Earth.

    PubMed

    Spiegel, David S; Turner, Edwin L

    2012-01-10

    Life arose on Earth sometime in the first few hundred million years after the young planet had cooled to the point that it could support water-based organisms on its surface. The early emergence of life on Earth has been taken as evidence that the probability of abiogenesis is high, if starting from young Earth-like conditions. We revisit this argument quantitatively in a bayesian statistical framework. By constructing a simple model of the probability of abiogenesis, we calculate a bayesian estimate of its posterior probability, given the data that life emerged fairly early in Earth's history and that, billions of years later, curious creatures noted this fact and considered its implications. We find that, given only this very limited empirical information, the choice of bayesian prior for the abiogenesis probability parameter has a dominant influence on the computed posterior probability. Although terrestrial life's early emergence provides evidence that life might be abundant in the universe if early-Earth-like conditions are common, the evidence is inconclusive and indeed is consistent with an arbitrarily low intrinsic probability of abiogenesis for plausible uninformative priors. Finding a single case of life arising independently of our lineage (on Earth, elsewhere in the solar system, or on an extrasolar planet) would provide much stronger evidence that abiogenesis is not extremely rare in the universe.

  6. Probabilistic Inference: Task Dependency and Individual Differences of Probability Weighting Revealed by Hierarchical Bayesian Modeling

    PubMed Central

    Boos, Moritz; Seer, Caroline; Lange, Florian; Kopp, Bruno

    2016-01-01

    Cognitive determinants of probabilistic inference were examined using hierarchical Bayesian modeling techniques. A classic urn-ball paradigm served as experimental strategy, involving a factorial two (prior probabilities) by two (likelihoods) design. Five computational models of cognitive processes were compared with the observed behavior. Parameter-free Bayesian posterior probabilities and parameter-free base rate neglect provided inadequate models of probabilistic inference. The introduction of distorted subjective probabilities yielded more robust and generalizable results. A general class of (inverted) S-shaped probability weighting functions had been proposed; however, the possibility of large differences in probability distortions not only across experimental conditions, but also across individuals, seems critical for the model's success. It also seems advantageous to consider individual differences in parameters of probability weighting as being sampled from weakly informative prior distributions of individual parameter values. Thus, the results from hierarchical Bayesian modeling converge with previous results in revealing that probability weighting parameters show considerable task dependency and individual differences. Methodologically, this work exemplifies the usefulness of hierarchical Bayesian modeling techniques for cognitive psychology. Theoretically, human probabilistic inference might be best described as the application of individualized strategic policies for Bayesian belief revision. PMID:27303323

  7. On the use of posterior predictive probabilities and prediction uncertainty to tailor informative sampling for parasitological surveillance in livestock.

    PubMed

    Musella, Vincenzo; Rinaldi, Laura; Lagazio, Corrado; Cringoli, Giuseppe; Biggeri, Annibale; Catelan, Dolores

    2014-09-15

    Model-based geostatistics and Bayesian approaches are appropriate in the context of Veterinary Epidemiology when point data have been collected by valid study designs. The aim is to predict a continuous infection risk surface. Little work has been done on the use of predictive infection probabilities at farm unit level. In this paper we show how to use predictive infection probability and related uncertainty from a Bayesian kriging model to draw a informative samples from the 8794 geo-referenced sheep farms of the Campania region (southern Italy). Parasitological data come from a first cross-sectional survey carried out to study the spatial distribution of selected helminths in sheep farms. A grid sampling was performed to select the farms for coprological examinations. Faecal samples were collected for 121 sheep farms and the presence of 21 different helminths were investigated using the FLOTAC technique. The 21 responses are very different in terms of geographical distribution and prevalence of infection. The observed prevalence range is from 0.83% to 96.69%. The distributions of the posterior predictive probabilities for all the 21 parasites are very heterogeneous. We show how the results of the Bayesian kriging model can be used to plan a second wave survey. Several alternatives can be chosen depending on the purposes of the second survey: weight by posterior predictive probabilities, their uncertainty or combining both information. The proposed Bayesian kriging model is simple, and the proposed samping strategy represents a useful tool to address targeted infection control treatments and surbveillance campaigns. It is easily extendable to other fields of research. Copyright © 2014 Elsevier B.V. All rights reserved.

  8. Screening for SNPs with Allele-Specific Methylation based on Next-Generation Sequencing Data.

    PubMed

    Hu, Bo; Ji, Yuan; Xu, Yaomin; Ting, Angela H

    2013-05-01

    Allele-specific methylation (ASM) has long been studied but mainly documented in the context of genomic imprinting and X chromosome inactivation. Taking advantage of the next-generation sequencing technology, we conduct a high-throughput sequencing experiment with four prostate cell lines to survey the whole genome and identify single nucleotide polymorphisms (SNPs) with ASM. A Bayesian approach is proposed to model the counts of short reads for each SNP conditional on its genotypes of multiple subjects, leading to a posterior probability of ASM. We flag SNPs with high posterior probabilities of ASM by accounting for multiple comparisons based on posterior false discovery rates. Applying the Bayesian approach to the in-house prostate cell line data, we identify 269 SNPs as candidates of ASM. A simulation study is carried out to demonstrate the quantitative performance of the proposed approach.

  9. Efficient Posterior Probability Mapping Using Savage-Dickey Ratios

    PubMed Central

    Penny, William D.; Ridgway, Gerard R.

    2013-01-01

    Statistical Parametric Mapping (SPM) is the dominant paradigm for mass-univariate analysis of neuroimaging data. More recently, a Bayesian approach termed Posterior Probability Mapping (PPM) has been proposed as an alternative. PPM offers two advantages: (i) inferences can be made about effect size thus lending a precise physiological meaning to activated regions, (ii) regions can be declared inactive. This latter facility is most parsimoniously provided by PPMs based on Bayesian model comparisons. To date these comparisons have been implemented by an Independent Model Optimization (IMO) procedure which separately fits null and alternative models. This paper proposes a more computationally efficient procedure based on Savage-Dickey approximations to the Bayes factor, and Taylor-series approximations to the voxel-wise posterior covariance matrices. Simulations show the accuracy of this Savage-Dickey-Taylor (SDT) method to be comparable to that of IMO. Results on fMRI data show excellent agreement between SDT and IMO for second-level models, and reasonable agreement for first-level models. This Savage-Dickey test is a Bayesian analogue of the classical SPM-F and allows users to implement model comparison in a truly interactive manner. PMID:23533640

  10. Bayesian Estimation of the DINA Model with Gibbs Sampling

    ERIC Educational Resources Information Center

    Culpepper, Steven Andrew

    2015-01-01

    A Bayesian model formulation of the deterministic inputs, noisy "and" gate (DINA) model is presented. Gibbs sampling is employed to simulate from the joint posterior distribution of item guessing and slipping parameters, subject attribute parameters, and latent class probabilities. The procedure extends concepts in Béguin and Glas,…

  11. A Bayesian Method for Evaluating and Discovering Disease Loci Associations

    PubMed Central

    Jiang, Xia; Barmada, M. Michael; Cooper, Gregory F.; Becich, Michael J.

    2011-01-01

    Background A genome-wide association study (GWAS) typically involves examining representative SNPs in individuals from some population. A GWAS data set can concern a million SNPs and may soon concern billions. Researchers investigate the association of each SNP individually with a disease, and it is becoming increasingly commonplace to also analyze multi-SNP associations. Techniques for handling so many hypotheses include the Bonferroni correction and recently developed Bayesian methods. These methods can encounter problems. Most importantly, they are not applicable to a complex multi-locus hypothesis which has several competing hypotheses rather than only a null hypothesis. A method that computes the posterior probability of complex hypotheses is a pressing need. Methodology/Findings We introduce the Bayesian network posterior probability (BNPP) method which addresses the difficulties. The method represents the relationship between a disease and SNPs using a directed acyclic graph (DAG) model, and computes the likelihood of such models using a Bayesian network scoring criterion. The posterior probability of a hypothesis is computed based on the likelihoods of all competing hypotheses. The BNPP can not only be used to evaluate a hypothesis that has previously been discovered or suspected, but also to discover new disease loci associations. The results of experiments using simulated and real data sets are presented. Our results concerning simulated data sets indicate that the BNPP exhibits both better evaluation and discovery performance than does a p-value based method. For the real data sets, previous findings in the literature are confirmed and additional findings are found. Conclusions/Significance We conclude that the BNPP resolves a pressing problem by providing a way to compute the posterior probability of complex multi-locus hypotheses. A researcher can use the BNPP to determine the expected utility of investigating a hypothesis further. Furthermore, we conclude that the BNPP is a promising method for discovering disease loci associations. PMID:21853025

  12. Screening for SNPs with Allele-Specific Methylation based on Next-Generation Sequencing Data

    PubMed Central

    Hu, Bo; Xu, Yaomin

    2013-01-01

    Allele-specific methylation (ASM) has long been studied but mainly documented in the context of genomic imprinting and X chromosome inactivation. Taking advantage of the next-generation sequencing technology, we conduct a high-throughput sequencing experiment with four prostate cell lines to survey the whole genome and identify single nucleotide polymorphisms (SNPs) with ASM. A Bayesian approach is proposed to model the counts of short reads for each SNP conditional on its genotypes of multiple subjects, leading to a posterior probability of ASM. We flag SNPs with high posterior probabilities of ASM by accounting for multiple comparisons based on posterior false discovery rates. Applying the Bayesian approach to the in-house prostate cell line data, we identify 269 SNPs as candidates of ASM. A simulation study is carried out to demonstrate the quantitative performance of the proposed approach. PMID:23710259

  13. Nested Sampling for Bayesian Model Comparison in the Context of Salmonella Disease Dynamics

    PubMed Central

    Dybowski, Richard; McKinley, Trevelyan J.; Mastroeni, Pietro; Restif, Olivier

    2013-01-01

    Understanding the mechanisms underlying the observed dynamics of complex biological systems requires the statistical assessment and comparison of multiple alternative models. Although this has traditionally been done using maximum likelihood-based methods such as Akaike's Information Criterion (AIC), Bayesian methods have gained in popularity because they provide more informative output in the form of posterior probability distributions. However, comparison between multiple models in a Bayesian framework is made difficult by the computational cost of numerical integration over large parameter spaces. A new, efficient method for the computation of posterior probabilities has recently been proposed and applied to complex problems from the physical sciences. Here we demonstrate how nested sampling can be used for inference and model comparison in biological sciences. We present a reanalysis of data from experimental infection of mice with Salmonella enterica showing the distribution of bacteria in liver cells. In addition to confirming the main finding of the original analysis, which relied on AIC, our approach provides: (a) integration across the parameter space, (b) estimation of the posterior parameter distributions (with visualisations of parameter correlations), and (c) estimation of the posterior predictive distributions for goodness-of-fit assessments of the models. The goodness-of-fit results suggest that alternative mechanistic models and a relaxation of the quasi-stationary assumption should be considered. PMID:24376528

  14. Bayesian inference of nonlinear unsteady aerodynamics from aeroelastic limit cycle oscillations

    NASA Astrophysics Data System (ADS)

    Sandhu, Rimple; Poirel, Dominique; Pettit, Chris; Khalil, Mohammad; Sarkar, Abhijit

    2016-07-01

    A Bayesian model selection and parameter estimation algorithm is applied to investigate the influence of nonlinear and unsteady aerodynamic loads on the limit cycle oscillation (LCO) of a pitching airfoil in the transitional Reynolds number regime. At small angles of attack, laminar boundary layer trailing edge separation causes negative aerodynamic damping leading to the LCO. The fluid-structure interaction of the rigid, but elastically mounted, airfoil and nonlinear unsteady aerodynamics is represented by two coupled nonlinear stochastic ordinary differential equations containing uncertain parameters and model approximation errors. Several plausible aerodynamic models with increasing complexity are proposed to describe the aeroelastic system leading to LCO. The likelihood in the posterior parameter probability density function (pdf) is available semi-analytically using the extended Kalman filter for the state estimation of the coupled nonlinear structural and unsteady aerodynamic model. The posterior parameter pdf is sampled using a parallel and adaptive Markov Chain Monte Carlo (MCMC) algorithm. The posterior probability of each model is estimated using the Chib-Jeliazkov method that directly uses the posterior MCMC samples for evidence (marginal likelihood) computation. The Bayesian algorithm is validated through a numerical study and then applied to model the nonlinear unsteady aerodynamic loads using wind-tunnel test data at various Reynolds numbers.

  15. Bayesian inference of nonlinear unsteady aerodynamics from aeroelastic limit cycle oscillations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sandhu, Rimple; Poirel, Dominique; Pettit, Chris

    2016-07-01

    A Bayesian model selection and parameter estimation algorithm is applied to investigate the influence of nonlinear and unsteady aerodynamic loads on the limit cycle oscillation (LCO) of a pitching airfoil in the transitional Reynolds number regime. At small angles of attack, laminar boundary layer trailing edge separation causes negative aerodynamic damping leading to the LCO. The fluid–structure interaction of the rigid, but elastically mounted, airfoil and nonlinear unsteady aerodynamics is represented by two coupled nonlinear stochastic ordinary differential equations containing uncertain parameters and model approximation errors. Several plausible aerodynamic models with increasing complexity are proposed to describe the aeroelastic systemmore » leading to LCO. The likelihood in the posterior parameter probability density function (pdf) is available semi-analytically using the extended Kalman filter for the state estimation of the coupled nonlinear structural and unsteady aerodynamic model. The posterior parameter pdf is sampled using a parallel and adaptive Markov Chain Monte Carlo (MCMC) algorithm. The posterior probability of each model is estimated using the Chib–Jeliazkov method that directly uses the posterior MCMC samples for evidence (marginal likelihood) computation. The Bayesian algorithm is validated through a numerical study and then applied to model the nonlinear unsteady aerodynamic loads using wind-tunnel test data at various Reynolds numbers.« less

  16. On the use of Bayesian Monte-Carlo in evaluation of nuclear data

    NASA Astrophysics Data System (ADS)

    De Saint Jean, Cyrille; Archier, Pascal; Privas, Edwin; Noguere, Gilles

    2017-09-01

    As model parameters, necessary ingredients of theoretical models, are not always predicted by theory, a formal mathematical framework associated to the evaluation work is needed to obtain the best set of parameters (resonance parameters, optical models, fission barrier, average width, multigroup cross sections) with Bayesian statistical inference by comparing theory to experiment. The formal rule related to this methodology is to estimate the posterior density probability function of a set of parameters by solving an equation of the following type: pdf(posterior) ˜ pdf(prior) × a likelihood function. A fitting procedure can be seen as an estimation of the posterior density probability of a set of parameters (referred as x→?) knowing a prior information on these parameters and a likelihood which gives the probability density function of observing a data set knowing x→?. To solve this problem, two major paths could be taken: add approximations and hypothesis and obtain an equation to be solved numerically (minimum of a cost function or Generalized least Square method, referred as GLS) or use Monte-Carlo sampling of all prior distributions and estimate the final posterior distribution. Monte Carlo methods are natural solution for Bayesian inference problems. They avoid approximations (existing in traditional adjustment procedure based on chi-square minimization) and propose alternative in the choice of probability density distribution for priors and likelihoods. This paper will propose the use of what we are calling Bayesian Monte Carlo (referred as BMC in the rest of the manuscript) in the whole energy range from thermal, resonance and continuum range for all nuclear reaction models at these energies. Algorithms will be presented based on Monte-Carlo sampling and Markov chain. The objectives of BMC are to propose a reference calculation for validating the GLS calculations and approximations, to test probability density distributions effects and to provide the framework of finding global minimum if several local minimums exist. Application to resolved resonance, unresolved resonance and continuum evaluation as well as multigroup cross section data assimilation will be presented.

  17. Using Discrete Loss Functions and Weighted Kappa for Classification: An Illustration Based on Bayesian Network Analysis

    ERIC Educational Resources Information Center

    Zwick, Rebecca; Lenaburg, Lubella

    2009-01-01

    In certain data analyses (e.g., multiple discriminant analysis and multinomial log-linear modeling), classification decisions are made based on the estimated posterior probabilities that individuals belong to each of several distinct categories. In the Bayesian network literature, this type of classification is often accomplished by assigning…

  18. A Bayesian predictive two-stage design for phase II clinical trials.

    PubMed

    Sambucini, Valeria

    2008-04-15

    In this paper, we propose a Bayesian two-stage design for phase II clinical trials, which represents a predictive version of the single threshold design (STD) recently introduced by Tan and Machin. The STD two-stage sample sizes are determined specifying a minimum threshold for the posterior probability that the true response rate exceeds a pre-specified target value and assuming that the observed response rate is slightly higher than the target. Unlike the STD, we do not refer to a fixed experimental outcome, but take into account the uncertainty about future data. In both stages, the design aims to control the probability of getting a large posterior probability that the true response rate exceeds the target value. Such a probability is expressed in terms of prior predictive distributions of the data. The performance of the design is based on the distinction between analysis and design priors, recently introduced in the literature. The properties of the method are studied when all the design parameters vary.

  19. Bayesian randomized clinical trials: From fixed to adaptive design.

    PubMed

    Yin, Guosheng; Lam, Chi Kin; Shi, Haolun

    2017-08-01

    Randomized controlled studies are the gold standard for phase III clinical trials. Using α-spending functions to control the overall type I error rate, group sequential methods are well established and have been dominating phase III studies. Bayesian randomized design, on the other hand, can be viewed as a complement instead of competitive approach to the frequentist methods. For the fixed Bayesian design, the hypothesis testing can be cast in the posterior probability or Bayes factor framework, which has a direct link to the frequentist type I error rate. Bayesian group sequential design relies upon Bayesian decision-theoretic approaches based on backward induction, which is often computationally intensive. Compared with the frequentist approaches, Bayesian methods have several advantages. The posterior predictive probability serves as a useful and convenient tool for trial monitoring, and can be updated at any time as the data accrue during the trial. The Bayesian decision-theoretic framework possesses a direct link to the decision making in the practical setting, and can be modeled more realistically to reflect the actual cost-benefit analysis during the drug development process. Other merits include the possibility of hierarchical modeling and the use of informative priors, which would lead to a more comprehensive utilization of information from both historical and longitudinal data. From fixed to adaptive design, we focus on Bayesian randomized controlled clinical trials and make extensive comparisons with frequentist counterparts through numerical studies. Copyright © 2017 Elsevier Inc. All rights reserved.

  20. XID+: Next generation XID development

    NASA Astrophysics Data System (ADS)

    Hurley, Peter

    2017-04-01

    XID+ is a prior-based source extraction tool which carries out photometry in the Herschel SPIRE (Spectral and Photometric Imaging Receiver) maps at the positions of known sources. It uses a probabilistic Bayesian framework that provides a natural framework in which to include prior information, and uses the Bayesian inference tool Stan to obtain the full posterior probability distribution on flux estimates.

  1. Daniel Goodman’s empirical approach to Bayesian statistics

    USGS Publications Warehouse

    Gerrodette, Tim; Ward, Eric; Taylor, Rebecca L.; Schwarz, Lisa K.; Eguchi, Tomoharu; Wade, Paul; Himes Boor, Gina

    2016-01-01

    Bayesian statistics, in contrast to classical statistics, uses probability to represent uncertainty about the state of knowledge. Bayesian statistics has often been associated with the idea that knowledge is subjective and that a probability distribution represents a personal degree of belief. Dr. Daniel Goodman considered this viewpoint problematic for issues of public policy. He sought to ground his Bayesian approach in data, and advocated the construction of a prior as an empirical histogram of “similar” cases. In this way, the posterior distribution that results from a Bayesian analysis combined comparable previous data with case-specific current data, using Bayes’ formula. Goodman championed such a data-based approach, but he acknowledged that it was difficult in practice. If based on a true representation of our knowledge and uncertainty, Goodman argued that risk assessment and decision-making could be an exact science, despite the uncertainties. In his view, Bayesian statistics is a critical component of this science because a Bayesian analysis produces the probabilities of future outcomes. Indeed, Goodman maintained that the Bayesian machinery, following the rules of conditional probability, offered the best legitimate inference from available data. We give an example of an informative prior in a recent study of Steller sea lion spatial use patterns in Alaska.

  2. Neural Mechanisms for Integrating Prior Knowledge and Likelihood in Value-Based Probabilistic Inference

    PubMed Central

    Ting, Chih-Chung; Yu, Chia-Chen; Maloney, Laurence T.

    2015-01-01

    In Bayesian decision theory, knowledge about the probabilities of possible outcomes is captured by a prior distribution and a likelihood function. The prior reflects past knowledge and the likelihood summarizes current sensory information. The two combined (integrated) form a posterior distribution that allows estimation of the probability of different possible outcomes. In this study, we investigated the neural mechanisms underlying Bayesian integration using a novel lottery decision task in which both prior knowledge and likelihood information about reward probability were systematically manipulated on a trial-by-trial basis. Consistent with Bayesian integration, as sample size increased, subjects tended to weigh likelihood information more compared with prior information. Using fMRI in humans, we found that the medial prefrontal cortex (mPFC) correlated with the mean of the posterior distribution, a statistic that reflects the integration of prior knowledge and likelihood of reward probability. Subsequent analysis revealed that both prior and likelihood information were represented in mPFC and that the neural representations of prior and likelihood in mPFC reflected changes in the behaviorally estimated weights assigned to these different sources of information in response to changes in the environment. Together, these results establish the role of mPFC in prior-likelihood integration and highlight its involvement in representing and integrating these distinct sources of information. PMID:25632152

  3. Probabilistic Damage Characterization Using the Computationally-Efficient Bayesian Approach

    NASA Technical Reports Server (NTRS)

    Warner, James E.; Hochhalter, Jacob D.

    2016-01-01

    This work presents a computationally-ecient approach for damage determination that quanti es uncertainty in the provided diagnosis. Given strain sensor data that are polluted with measurement errors, Bayesian inference is used to estimate the location, size, and orientation of damage. This approach uses Bayes' Theorem to combine any prior knowledge an analyst may have about the nature of the damage with information provided implicitly by the strain sensor data to form a posterior probability distribution over possible damage states. The unknown damage parameters are then estimated based on samples drawn numerically from this distribution using a Markov Chain Monte Carlo (MCMC) sampling algorithm. Several modi cations are made to the traditional Bayesian inference approach to provide signi cant computational speedup. First, an ecient surrogate model is constructed using sparse grid interpolation to replace a costly nite element model that must otherwise be evaluated for each sample drawn with MCMC. Next, the standard Bayesian posterior distribution is modi ed using a weighted likelihood formulation, which is shown to improve the convergence of the sampling process. Finally, a robust MCMC algorithm, Delayed Rejection Adaptive Metropolis (DRAM), is adopted to sample the probability distribution more eciently. Numerical examples demonstrate that the proposed framework e ectively provides damage estimates with uncertainty quanti cation and can yield orders of magnitude speedup over standard Bayesian approaches.

  4. Bayesian structural inference for hidden processes.

    PubMed

    Strelioff, Christopher C; Crutchfield, James P

    2014-04-01

    We introduce a Bayesian approach to discovering patterns in structurally complex processes. The proposed method of Bayesian structural inference (BSI) relies on a set of candidate unifilar hidden Markov model (uHMM) topologies for inference of process structure from a data series. We employ a recently developed exact enumeration of topological ε-machines. (A sequel then removes the topological restriction.) This subset of the uHMM topologies has the added benefit that inferred models are guaranteed to be ε-machines, irrespective of estimated transition probabilities. Properties of ε-machines and uHMMs allow for the derivation of analytic expressions for estimating transition probabilities, inferring start states, and comparing the posterior probability of candidate model topologies, despite process internal structure being only indirectly present in data. We demonstrate BSI's effectiveness in estimating a process's randomness, as reflected by the Shannon entropy rate, and its structure, as quantified by the statistical complexity. We also compare using the posterior distribution over candidate models and the single, maximum a posteriori model for point estimation and show that the former more accurately reflects uncertainty in estimated values. We apply BSI to in-class examples of finite- and infinite-order Markov processes, as well to an out-of-class, infinite-state hidden process.

  5. Bayesian structural inference for hidden processes

    NASA Astrophysics Data System (ADS)

    Strelioff, Christopher C.; Crutchfield, James P.

    2014-04-01

    We introduce a Bayesian approach to discovering patterns in structurally complex processes. The proposed method of Bayesian structural inference (BSI) relies on a set of candidate unifilar hidden Markov model (uHMM) topologies for inference of process structure from a data series. We employ a recently developed exact enumeration of topological ɛ-machines. (A sequel then removes the topological restriction.) This subset of the uHMM topologies has the added benefit that inferred models are guaranteed to be ɛ-machines, irrespective of estimated transition probabilities. Properties of ɛ-machines and uHMMs allow for the derivation of analytic expressions for estimating transition probabilities, inferring start states, and comparing the posterior probability of candidate model topologies, despite process internal structure being only indirectly present in data. We demonstrate BSI's effectiveness in estimating a process's randomness, as reflected by the Shannon entropy rate, and its structure, as quantified by the statistical complexity. We also compare using the posterior distribution over candidate models and the single, maximum a posteriori model for point estimation and show that the former more accurately reflects uncertainty in estimated values. We apply BSI to in-class examples of finite- and infinite-order Markov processes, as well to an out-of-class, infinite-state hidden process.

  6. Bayesian methods for outliers detection in GNSS time series

    NASA Astrophysics Data System (ADS)

    Qianqian, Zhang; Qingming, Gui

    2013-07-01

    This article is concerned with the problem of detecting outliers in GNSS time series based on Bayesian statistical theory. Firstly, a new model is proposed to simultaneously detect different types of outliers based on the conception of introducing different types of classification variables corresponding to the different types of outliers; the problem of outlier detection is converted into the computation of the corresponding posterior probabilities, and the algorithm for computing the posterior probabilities based on standard Gibbs sampler is designed. Secondly, we analyze the reasons of masking and swamping about detecting patches of additive outliers intensively; an unmasking Bayesian method for detecting additive outlier patches is proposed based on an adaptive Gibbs sampler. Thirdly, the correctness of the theories and methods proposed above is illustrated by simulated data and then by analyzing real GNSS observations, such as cycle slips detection in carrier phase data. Examples illustrate that the Bayesian methods for outliers detection in GNSS time series proposed by this paper are not only capable of detecting isolated outliers but also capable of detecting additive outlier patches. Furthermore, it can be successfully used to process cycle slips in phase data, which solves the problem of small cycle slips.

  7. Bayesian parameter estimation for chiral effective field theory

    NASA Astrophysics Data System (ADS)

    Wesolowski, Sarah; Furnstahl, Richard; Phillips, Daniel; Klco, Natalie

    2016-09-01

    The low-energy constants (LECs) of a chiral effective field theory (EFT) interaction in the two-body sector are fit to observable data using a Bayesian parameter estimation framework. By using Bayesian prior probability distributions (pdfs), we quantify relevant physical expectations such as LEC naturalness and include them in the parameter estimation procedure. The final result is a posterior pdf for the LECs, which can be used to propagate uncertainty resulting from the fit to data to the final observable predictions. The posterior pdf also allows an empirical test of operator redundancy and other features of the potential. We compare results of our framework with other fitting procedures, interpreting the underlying assumptions in Bayesian probabilistic language. We also compare results from fitting all partial waves of the interaction simultaneously to cross section data compared to fitting to extracted phase shifts, appropriately accounting for correlations in the data. Supported in part by the NSF and DOE.

  8. A Bayesian Approach to Evaluating Consistency between Climate Model Output and Observations

    NASA Astrophysics Data System (ADS)

    Braverman, A. J.; Cressie, N.; Teixeira, J.

    2010-12-01

    Like other scientific and engineering problems that involve physical modeling of complex systems, climate models can be evaluated and diagnosed by comparing their output to observations of similar quantities. Though the global remote sensing data record is relatively short by climate research standards, these data offer opportunities to evaluate model predictions in new ways. For example, remote sensing data are spatially and temporally dense enough to provide distributional information that goes beyond simple moments to allow quantification of temporal and spatial dependence structures. In this talk, we propose a new method for exploiting these rich data sets using a Bayesian paradigm. For a collection of climate models, we calculate posterior probabilities its members best represent the physical system each seeks to reproduce. The posterior probability is based on the likelihood that a chosen summary statistic, computed from observations, would be obtained when the model's output is considered as a realization from a stochastic process. By exploring how posterior probabilities change with different statistics, we may paint a more quantitative and complete picture of the strengths and weaknesses of the models relative to the observations. We demonstrate our method using model output from the CMIP archive, and observations from NASA's Atmospheric Infrared Sounder.

  9. A Bayesian Framework of Uncertainties Integration in 3D Geological Model

    NASA Astrophysics Data System (ADS)

    Liang, D.; Liu, X.

    2017-12-01

    3D geological model can describe complicated geological phenomena in an intuitive way while its application may be limited by uncertain factors. Great progress has been made over the years, lots of studies decompose the uncertainties of geological model to analyze separately, while ignored the comprehensive impacts of multi-source uncertainties. Great progress has been made over the years, while lots of studies ignored the comprehensive impacts of multi-source uncertainties when analyzed them item by item from each source. To evaluate the synthetical uncertainty, we choose probability distribution to quantify uncertainty, and propose a bayesian framework of uncertainties integration. With this framework, we integrated data errors, spatial randomness, and cognitive information into posterior distribution to evaluate synthetical uncertainty of geological model. Uncertainties propagate and cumulate in modeling process, the gradual integration of multi-source uncertainty is a kind of simulation of the uncertainty propagation. Bayesian inference accomplishes uncertainty updating in modeling process. Maximum entropy principle makes a good effect on estimating prior probability distribution, which ensures the prior probability distribution subjecting to constraints supplied by the given information with minimum prejudice. In the end, we obtained a posterior distribution to evaluate synthetical uncertainty of geological model. This posterior distribution represents the synthetical impact of all the uncertain factors on the spatial structure of geological model. The framework provides a solution to evaluate synthetical impact on geological model of multi-source uncertainties and a thought to study uncertainty propagation mechanism in geological modeling.

  10. Bayesian enhancement two-stage design for single-arm phase II clinical trials with binary and time-to-event endpoints.

    PubMed

    Shi, Haolun; Yin, Guosheng

    2018-02-21

    Simon's two-stage design is one of the most commonly used methods in phase II clinical trials with binary endpoints. The design tests the null hypothesis that the response rate is less than an uninteresting level, versus the alternative hypothesis that the response rate is greater than a desirable target level. From a Bayesian perspective, we compute the posterior probabilities of the null and alternative hypotheses given that a promising result is declared in Simon's design. Our study reveals that because the frequentist hypothesis testing framework places its focus on the null hypothesis, a potentially efficacious treatment identified by rejecting the null under Simon's design could have only less than 10% posterior probability of attaining the desirable target level. Due to the indifference region between the null and alternative, rejecting the null does not necessarily mean that the drug achieves the desirable response level. To clarify such ambiguity, we propose a Bayesian enhancement two-stage (BET) design, which guarantees a high posterior probability of the response rate reaching the target level, while allowing for early termination and sample size saving in case that the drug's response rate is smaller than the clinically uninteresting level. Moreover, the BET design can be naturally adapted to accommodate survival endpoints. We conduct extensive simulation studies to examine the empirical performance of our design and present two trial examples as applications. © 2018, The International Biometric Society.

  11. Integrating probabilistic models of perception and interactive neural networks: a historical and tutorial review

    PubMed Central

    McClelland, James L.

    2013-01-01

    This article seeks to establish a rapprochement between explicitly Bayesian models of contextual effects in perception and neural network models of such effects, particularly the connectionist interactive activation (IA) model of perception. The article is in part an historical review and in part a tutorial, reviewing the probabilistic Bayesian approach to understanding perception and how it may be shaped by context, and also reviewing ideas about how such probabilistic computations may be carried out in neural networks, focusing on the role of context in interactive neural networks, in which both bottom-up and top-down signals affect the interpretation of sensory inputs. It is pointed out that connectionist units that use the logistic or softmax activation functions can exactly compute Bayesian posterior probabilities when the bias terms and connection weights affecting such units are set to the logarithms of appropriate probabilistic quantities. Bayesian concepts such the prior, likelihood, (joint and marginal) posterior, probability matching and maximizing, and calculating vs. sampling from the posterior are all reviewed and linked to neural network computations. Probabilistic and neural network models are explicitly linked to the concept of a probabilistic generative model that describes the relationship between the underlying target of perception (e.g., the word intended by a speaker or other source of sensory stimuli) and the sensory input that reaches the perceiver for use in inferring the underlying target. It is shown how a new version of the IA model called the multinomial interactive activation (MIA) model can sample correctly from the joint posterior of a proposed generative model for perception of letters in words, indicating that interactive processing is fully consistent with principled probabilistic computation. Ways in which these computations might be realized in real neural systems are also considered. PMID:23970868

  12. Integrating probabilistic models of perception and interactive neural networks: a historical and tutorial review.

    PubMed

    McClelland, James L

    2013-01-01

    This article seeks to establish a rapprochement between explicitly Bayesian models of contextual effects in perception and neural network models of such effects, particularly the connectionist interactive activation (IA) model of perception. The article is in part an historical review and in part a tutorial, reviewing the probabilistic Bayesian approach to understanding perception and how it may be shaped by context, and also reviewing ideas about how such probabilistic computations may be carried out in neural networks, focusing on the role of context in interactive neural networks, in which both bottom-up and top-down signals affect the interpretation of sensory inputs. It is pointed out that connectionist units that use the logistic or softmax activation functions can exactly compute Bayesian posterior probabilities when the bias terms and connection weights affecting such units are set to the logarithms of appropriate probabilistic quantities. Bayesian concepts such the prior, likelihood, (joint and marginal) posterior, probability matching and maximizing, and calculating vs. sampling from the posterior are all reviewed and linked to neural network computations. Probabilistic and neural network models are explicitly linked to the concept of a probabilistic generative model that describes the relationship between the underlying target of perception (e.g., the word intended by a speaker or other source of sensory stimuli) and the sensory input that reaches the perceiver for use in inferring the underlying target. It is shown how a new version of the IA model called the multinomial interactive activation (MIA) model can sample correctly from the joint posterior of a proposed generative model for perception of letters in words, indicating that interactive processing is fully consistent with principled probabilistic computation. Ways in which these computations might be realized in real neural systems are also considered.

  13. To P or Not to P: Backing Bayesian Statistics.

    PubMed

    Buchinsky, Farrel J; Chadha, Neil K

    2017-12-01

    In biomedical research, it is imperative to differentiate chance variation from truth before we generalize what we see in a sample of subjects to the wider population. For decades, we have relied on null hypothesis significance testing, where we calculate P values for our data to decide whether to reject a null hypothesis. This methodology is subject to substantial misinterpretation and errant conclusions. Instead of working backward by calculating the probability of our data if the null hypothesis were true, Bayesian statistics allow us instead to work forward, calculating the probability of our hypothesis given the available data. This methodology gives us a mathematical means of incorporating our "prior probabilities" from previous study data (if any) to produce new "posterior probabilities." Bayesian statistics tell us how confidently we should believe what we believe. It is time to embrace and encourage their use in our otolaryngology research.

  14. Supernova Cosmology Inference with Probabilistic Photometric Redshifts (SCIPPR)

    NASA Astrophysics Data System (ADS)

    Peters, Christina; Malz, Alex; Hlozek, Renée

    2018-01-01

    The Bayesian Estimation Applied to Multiple Species (BEAMS) framework employs probabilistic supernova type classifications to do photometric SN cosmology. This work extends BEAMS to replace high-confidence spectroscopic redshifts with photometric redshift probability density functions, a capability that will be essential in the era the Large Synoptic Survey Telescope and other next-generation photometric surveys where it will not be possible to perform spectroscopic follow up on every SN. We present the Supernova Cosmology Inference with Probabilistic Photometric Redshifts (SCIPPR) Bayesian hierarchical model for constraining the cosmological parameters from photometric lightcurves and host galaxy photometry, which includes selection effects and is extensible to uncertainty in the redshift-dependent supernova type proportions. We create a pair of realistic mock catalogs of joint posteriors over supernova type, redshift, and distance modulus informed by photometric supernova lightcurves and over redshift from simulated host galaxy photometry. We perform inference under our model to obtain a joint posterior probability distribution over the cosmological parameters and compare our results with other methods, namely: a spectroscopic subset, a subset of high probability photometrically classified supernovae, and reducing the photometric redshift probability to a single measurement and error bar.

  15. Multiple model cardinalized probability hypothesis density filter

    NASA Astrophysics Data System (ADS)

    Georgescu, Ramona; Willett, Peter

    2011-09-01

    The Probability Hypothesis Density (PHD) filter propagates the first-moment approximation to the multi-target Bayesian posterior distribution while the Cardinalized PHD (CPHD) filter propagates both the posterior likelihood of (an unlabeled) target state and the posterior probability mass function of the number of targets. Extensions of the PHD filter to the multiple model (MM) framework have been published and were implemented either with a Sequential Monte Carlo or a Gaussian Mixture approach. In this work, we introduce the multiple model version of the more elaborate CPHD filter. We present the derivation of the prediction and update steps of the MMCPHD particularized for the case of two target motion models and proceed to show that in the case of a single model, the new MMCPHD equations reduce to the original CPHD equations.

  16. Assessment of accident severity in the construction industry using the Bayesian theorem.

    PubMed

    Alizadeh, Seyed Shamseddin; Mortazavi, Seyed Bagher; Mehdi Sepehri, Mohammad

    2015-01-01

    Construction is a major source of employment in many countries. In construction, workers perform a great diversity of activities, each one with a specific associated risk. The aim of this paper is to identify workers who are at risk of accidents with severe consequences and classify these workers to determine appropriate control measures. We defined 48 groups of workers and used the Bayesian theorem to estimate posterior probabilities about the severity of accidents at the level of individuals in construction sector. First, the posterior probabilities of injuries based on four variables were provided. Then the probabilities of injury for 48 groups of workers were determined. With regard to marginal frequency of injury, slight injury (0.856), fatal injury (0.086) and severe injury (0.058) had the highest probability of occurrence. It was observed that workers with <1 year's work experience (0.168) had the highest probability of injury occurrence. The first group of workers, who were extensively exposed to risk of severe and fatal accidents, involved workers ≥ 50 years old, married, with 1-5 years' work experience, who had no past accident experience. The findings provide a direction for more effective safety strategies and occupational accident prevention and emergency programmes.

  17. Inference of emission rates from multiple sources using Bayesian probability theory.

    PubMed

    Yee, Eugene; Flesch, Thomas K

    2010-03-01

    The determination of atmospheric emission rates from multiple sources using inversion (regularized least-squares or best-fit technique) is known to be very susceptible to measurement and model errors in the problem, rendering the solution unusable. In this paper, a new perspective is offered for this problem: namely, it is argued that the problem should be addressed as one of inference rather than inversion. Towards this objective, Bayesian probability theory is used to estimate the emission rates from multiple sources. The posterior probability distribution for the emission rates is derived, accounting fully for the measurement errors in the concentration data and the model errors in the dispersion model used to interpret the data. The Bayesian inferential methodology for emission rate recovery is validated against real dispersion data, obtained from a field experiment involving various source-sensor geometries (scenarios) consisting of four synthetic area sources and eight concentration sensors. The recovery of discrete emission rates from three different scenarios obtained using Bayesian inference and singular value decomposition inversion are compared and contrasted.

  18. Comparison of sampling techniques for Bayesian parameter estimation

    NASA Astrophysics Data System (ADS)

    Allison, Rupert; Dunkley, Joanna

    2014-02-01

    The posterior probability distribution for a set of model parameters encodes all that the data have to tell us in the context of a given model; it is the fundamental quantity for Bayesian parameter estimation. In order to infer the posterior probability distribution we have to decide how to explore parameter space. Here we compare three prescriptions for how parameter space is navigated, discussing their relative merits. We consider Metropolis-Hasting sampling, nested sampling and affine-invariant ensemble Markov chain Monte Carlo (MCMC) sampling. We focus on their performance on toy-model Gaussian likelihoods and on a real-world cosmological data set. We outline the sampling algorithms themselves and elaborate on performance diagnostics such as convergence time, scope for parallelization, dimensional scaling, requisite tunings and suitability for non-Gaussian distributions. We find that nested sampling delivers high-fidelity estimates for posterior statistics at low computational cost, and should be adopted in favour of Metropolis-Hastings in many cases. Affine-invariant MCMC is competitive when computing clusters can be utilized for massive parallelization. Affine-invariant MCMC and existing extensions to nested sampling naturally probe multimodal and curving distributions.

  19. Robust Bayesian Experimental Design for Conceptual Model Discrimination

    NASA Astrophysics Data System (ADS)

    Pham, H. V.; Tsai, F. T. C.

    2015-12-01

    A robust Bayesian optimal experimental design under uncertainty is presented to provide firm information for model discrimination, given the least number of pumping wells and observation wells. Firm information is the maximum information of a system can be guaranteed from an experimental design. The design is based on the Box-Hill expected entropy decrease (EED) before and after the experiment design and the Bayesian model averaging (BMA) framework. A max-min programming is introduced to choose the robust design that maximizes the minimal Box-Hill EED subject to that the highest expected posterior model probability satisfies a desired probability threshold. The EED is calculated by the Gauss-Hermite quadrature. The BMA method is used to predict future observations and to quantify future observation uncertainty arising from conceptual and parametric uncertainties in calculating EED. Monte Carlo approach is adopted to quantify the uncertainty in the posterior model probabilities. The optimal experimental design is tested by a synthetic 5-layer anisotropic confined aquifer. Nine conceptual groundwater models are constructed due to uncertain geological architecture and boundary condition. High-performance computing is used to enumerate all possible design solutions in order to identify the most plausible groundwater model. Results highlight the impacts of scedasticity in future observation data as well as uncertainty sources on potential pumping and observation locations.

  20. Bayesian soft X-ray tomography using non-stationary Gaussian Processes

    NASA Astrophysics Data System (ADS)

    Li, Dong; Svensson, J.; Thomsen, H.; Medina, F.; Werner, A.; Wolf, R.

    2013-08-01

    In this study, a Bayesian based non-stationary Gaussian Process (GP) method for the inference of soft X-ray emissivity distribution along with its associated uncertainties has been developed. For the investigation of equilibrium condition and fast magnetohydrodynamic behaviors in nuclear fusion plasmas, it is of importance to infer, especially in the plasma center, spatially resolved soft X-ray profiles from a limited number of noisy line integral measurements. For this ill-posed inversion problem, Bayesian probability theory can provide a posterior probability distribution over all possible solutions under given model assumptions. Specifically, the use of a non-stationary GP to model the emission allows the model to adapt to the varying length scales of the underlying diffusion process. In contrast to other conventional methods, the prior regularization is realized in a probability form which enhances the capability of uncertainty analysis, in consequence, scientists who concern the reliability of their results will benefit from it. Under the assumption of normally distributed noise, the posterior distribution evaluated at a discrete number of points becomes a multivariate normal distribution whose mean and covariance are analytically available, making inversions and calculation of uncertainty fast. Additionally, the hyper-parameters embedded in the model assumption can be optimized through a Bayesian Occam's Razor formalism and thereby automatically adjust the model complexity. This method is shown to produce convincing reconstructions and good agreements with independently calculated results from the Maximum Entropy and Equilibrium-Based Iterative Tomography Algorithm methods.

  1. Bayesian soft X-ray tomography using non-stationary Gaussian Processes.

    PubMed

    Li, Dong; Svensson, J; Thomsen, H; Medina, F; Werner, A; Wolf, R

    2013-08-01

    In this study, a Bayesian based non-stationary Gaussian Process (GP) method for the inference of soft X-ray emissivity distribution along with its associated uncertainties has been developed. For the investigation of equilibrium condition and fast magnetohydrodynamic behaviors in nuclear fusion plasmas, it is of importance to infer, especially in the plasma center, spatially resolved soft X-ray profiles from a limited number of noisy line integral measurements. For this ill-posed inversion problem, Bayesian probability theory can provide a posterior probability distribution over all possible solutions under given model assumptions. Specifically, the use of a non-stationary GP to model the emission allows the model to adapt to the varying length scales of the underlying diffusion process. In contrast to other conventional methods, the prior regularization is realized in a probability form which enhances the capability of uncertainty analysis, in consequence, scientists who concern the reliability of their results will benefit from it. Under the assumption of normally distributed noise, the posterior distribution evaluated at a discrete number of points becomes a multivariate normal distribution whose mean and covariance are analytically available, making inversions and calculation of uncertainty fast. Additionally, the hyper-parameters embedded in the model assumption can be optimized through a Bayesian Occam's Razor formalism and thereby automatically adjust the model complexity. This method is shown to produce convincing reconstructions and good agreements with independently calculated results from the Maximum Entropy and Equilibrium-Based Iterative Tomography Algorithm methods.

  2. Exoplanet Biosignatures: A Framework for Their Assessment.

    PubMed

    Catling, David C; Krissansen-Totton, Joshua; Kiang, Nancy Y; Crisp, David; Robinson, Tyler D; DasSarma, Shiladitya; Rushby, Andrew J; Del Genio, Anthony; Bains, William; Domagal-Goldman, Shawn

    2018-04-20

    Finding life on exoplanets from telescopic observations is an ultimate goal of exoplanet science. Life produces gases and other substances, such as pigments, which can have distinct spectral or photometric signatures. Whether or not life is found with future data must be expressed with probabilities, requiring a framework of biosignature assessment. We present a framework in which we advocate using biogeochemical "Exo-Earth System" models to simulate potential biosignatures in spectra or photometry. Given actual observations, simulations are used to find the Bayesian likelihoods of those data occurring for scenarios with and without life. The latter includes "false positives" wherein abiotic sources mimic biosignatures. Prior knowledge of factors influencing planetary inhabitation, including previous observations, is combined with the likelihoods to give the Bayesian posterior probability of life existing on a given exoplanet. Four components of observation and analysis are necessary. (1) Characterization of stellar (e.g., age and spectrum) and exoplanetary system properties, including "external" exoplanet parameters (e.g., mass and radius), to determine an exoplanet's suitability for life. (2) Characterization of "internal" exoplanet parameters (e.g., climate) to evaluate habitability. (3) Assessment of potential biosignatures within the environmental context (components 1-2), including corroborating evidence. (4) Exclusion of false positives. We propose that resulting posterior Bayesian probabilities of life's existence map to five confidence levels, ranging from "very likely" (90-100%) to "very unlikely" (<10%) inhabited. Key Words: Bayesian statistics-Biosignatures-Drake equation-Exoplanets-Habitability-Planetary science. Astrobiology 18, xxx-xxx.

  3. An agglomerative hierarchical clustering approach to visualisation in Bayesian clustering problems

    PubMed Central

    Dawson, Kevin J.; Belkhir, Khalid

    2009-01-01

    Clustering problems (including the clustering of individuals into outcrossing populations, hybrid generations, full-sib families and selfing lines) have recently received much attention in population genetics. In these clustering problems, the parameter of interest is a partition of the set of sampled individuals, - the sample partition. In a fully Bayesian approach to clustering problems of this type, our knowledge about the sample partition is represented by a probability distribution on the space of possible sample partitions. Since the number of possible partitions grows very rapidly with the sample size, we can not visualise this probability distribution in its entirety, unless the sample is very small. As a solution to this visualisation problem, we recommend using an agglomerative hierarchical clustering algorithm, which we call the exact linkage algorithm. This algorithm is a special case of the maximin clustering algorithm that we introduced previously. The exact linkage algorithm is now implemented in our software package Partition View. The exact linkage algorithm takes the posterior co-assignment probabilities as input, and yields as output a rooted binary tree, - or more generally, a forest of such trees. Each node of this forest defines a set of individuals, and the node height is the posterior co-assignment probability of this set. This provides a useful visual representation of the uncertainty associated with the assignment of individuals to categories. It is also a useful starting point for a more detailed exploration of the posterior distribution in terms of the co-assignment probabilities. PMID:19337306

  4. Bayesian feature selection for high-dimensional linear regression via the Ising approximation with applications to genomics.

    PubMed

    Fisher, Charles K; Mehta, Pankaj

    2015-06-01

    Feature selection, identifying a subset of variables that are relevant for predicting a response, is an important and challenging component of many methods in statistics and machine learning. Feature selection is especially difficult and computationally intensive when the number of variables approaches or exceeds the number of samples, as is often the case for many genomic datasets. Here, we introduce a new approach--the Bayesian Ising Approximation (BIA)-to rapidly calculate posterior probabilities for feature relevance in L2 penalized linear regression. In the regime where the regression problem is strongly regularized by the prior, we show that computing the marginal posterior probabilities for features is equivalent to computing the magnetizations of an Ising model with weak couplings. Using a mean field approximation, we show it is possible to rapidly compute the feature selection path described by the posterior probabilities as a function of the L2 penalty. We present simulations and analytical results illustrating the accuracy of the BIA on some simple regression problems. Finally, we demonstrate the applicability of the BIA to high-dimensional regression by analyzing a gene expression dataset with nearly 30 000 features. These results also highlight the impact of correlations between features on Bayesian feature selection. An implementation of the BIA in C++, along with data for reproducing our gene expression analyses, are freely available at http://physics.bu.edu/∼pankajm/BIACode. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  5. Bayes factor and posterior probability: Complementary statistical evidence to p-value.

    PubMed

    Lin, Ruitao; Yin, Guosheng

    2015-09-01

    As a convention, a p-value is often computed in hypothesis testing and compared with the nominal level of 0.05 to determine whether to reject the null hypothesis. Although the smaller the p-value, the more significant the statistical test, it is difficult to perceive the p-value in a probability scale and quantify it as the strength of the data against the null hypothesis. In contrast, the Bayesian posterior probability of the null hypothesis has an explicit interpretation of how strong the data support the null. We make a comparison of the p-value and the posterior probability by considering a recent clinical trial. The results show that even when we reject the null hypothesis, there is still a substantial probability (around 20%) that the null is true. Not only should we examine whether the data would have rarely occurred under the null hypothesis, but we also need to know whether the data would be rare under the alternative. As a result, the p-value only provides one side of the information, for which the Bayes factor and posterior probability may offer complementary evidence. Copyright © 2015 Elsevier Inc. All rights reserved.

  6. CytoBayesJ: software tools for Bayesian analysis of cytogenetic radiation dosimetry data.

    PubMed

    Ainsbury, Elizabeth A; Vinnikov, Volodymyr; Puig, Pedro; Maznyk, Nataliya; Rothkamm, Kai; Lloyd, David C

    2013-08-30

    A number of authors have suggested that a Bayesian approach may be most appropriate for analysis of cytogenetic radiation dosimetry data. In the Bayesian framework, probability of an event is described in terms of previous expectations and uncertainty. Previously existing, or prior, information is used in combination with experimental results to infer probabilities or the likelihood that a hypothesis is true. It has been shown that the Bayesian approach increases both the accuracy and quality assurance of radiation dose estimates. New software entitled CytoBayesJ has been developed with the aim of bringing Bayesian analysis to cytogenetic biodosimetry laboratory practice. CytoBayesJ takes a number of Bayesian or 'Bayesian like' methods that have been proposed in the literature and presents them to the user in the form of simple user-friendly tools, including testing for the most appropriate model for distribution of chromosome aberrations and calculations of posterior probability distributions. The individual tools are described in detail and relevant examples of the use of the methods and the corresponding CytoBayesJ software tools are given. In this way, the suitability of the Bayesian approach to biological radiation dosimetry is highlighted and its wider application encouraged by providing a user-friendly software interface and manual in English and Russian. Copyright © 2013 Elsevier B.V. All rights reserved.

  7. The estimation of lower refractivity uncertainty from radar sea clutter using the Bayesian—MCMC method

    NASA Astrophysics Data System (ADS)

    Sheng, Zheng

    2013-02-01

    The estimation of lower atmospheric refractivity from radar sea clutter (RFC) is a complicated nonlinear optimization problem. This paper deals with the RFC problem in a Bayesian framework. It uses the unbiased Markov Chain Monte Carlo (MCMC) sampling technique, which can provide accurate posterior probability distributions of the estimated refractivity parameters by using an electromagnetic split-step fast Fourier transform terrain parabolic equation propagation model within a Bayesian inversion framework. In contrast to the global optimization algorithm, the Bayesian—MCMC can obtain not only the approximate solutions, but also the probability distributions of the solutions, that is, uncertainty analyses of solutions. The Bayesian—MCMC algorithm is implemented on the simulation radar sea-clutter data and the real radar sea-clutter data. Reference data are assumed to be simulation data and refractivity profiles are obtained using a helicopter. The inversion algorithm is assessed (i) by comparing the estimated refractivity profiles from the assumed simulation and the helicopter sounding data; (ii) the one-dimensional (1D) and two-dimensional (2D) posterior probability distribution of solutions.

  8. Identification of transmissivity fields using a Bayesian strategy and perturbative approach

    NASA Astrophysics Data System (ADS)

    Zanini, Andrea; Tanda, Maria Giovanna; Woodbury, Allan D.

    2017-10-01

    The paper deals with the crucial problem of the groundwater parameter estimation that is the basis for efficient modeling and reclamation activities. A hierarchical Bayesian approach is developed: it uses the Akaike's Bayesian Information Criteria in order to estimate the hyperparameters (related to the covariance model chosen) and to quantify the unknown noise variance. The transmissivity identification proceeds in two steps: the first, called empirical Bayesian interpolation, uses Y* (Y = lnT) observations to interpolate Y values on a specified grid; the second, called empirical Bayesian update, improve the previous Y estimate through the addition of hydraulic head observations. The relationship between the head and the lnT has been linearized through a perturbative solution of the flow equation. In order to test the proposed approach, synthetic aquifers from literature have been considered. The aquifers in question contain a variety of boundary conditions (both Dirichelet and Neuman type) and scales of heterogeneities (σY2 = 1.0 and σY2 = 5.3). The estimated transmissivity fields were compared to the true one. The joint use of Y* and head measurements improves the estimation of Y considering both degrees of heterogeneity. Even if the variance of the strong transmissivity field can be considered high for the application of the perturbative approach, the results show the same order of approximation of the non-linear methods proposed in literature. The procedure allows to compute the posterior probability distribution of the target quantities and to quantify the uncertainty in the model prediction. Bayesian updating has advantages related both to the Monte-Carlo (MC) and non-MC approaches. In fact, as the MC methods, Bayesian updating allows computing the direct posterior probability distribution of the target quantities and as non-MC methods it has computational times in the order of seconds.

  9. A bayesian analysis for identifying DNA copy number variations using a compound poisson process.

    PubMed

    Chen, Jie; Yiğiter, Ayten; Wang, Yu-Ping; Deng, Hong-Wen

    2010-01-01

    To study chromosomal aberrations that may lead to cancer formation or genetic diseases, the array-based Comparative Genomic Hybridization (aCGH) technique is often used for detecting DNA copy number variants (CNVs). Various methods have been developed for gaining CNVs information based on aCGH data. However, most of these methods make use of the log-intensity ratios in aCGH data without taking advantage of other information such as the DNA probe (e.g., biomarker) positions/distances contained in the data. Motivated by the specific features of aCGH data, we developed a novel method that takes into account the estimation of a change point or locus of the CNV in aCGH data with its associated biomarker position on the chromosome using a compound Poisson process. We used a Bayesian approach to derive the posterior probability for the estimation of the CNV locus. To detect loci of multiple CNVs in the data, a sliding window process combined with our derived Bayesian posterior probability was proposed. To evaluate the performance of the method in the estimation of the CNV locus, we first performed simulation studies. Finally, we applied our approach to real data from aCGH experiments, demonstrating its applicability.

  10. Traffic Video Image Segmentation Model Based on Bayesian and Spatio-Temporal Markov Random Field

    NASA Astrophysics Data System (ADS)

    Zhou, Jun; Bao, Xu; Li, Dawei; Yin, Yongwen

    2017-10-01

    Traffic video image is a kind of dynamic image and its background and foreground is changed at any time, which results in the occlusion. In this case, using the general method is more difficult to get accurate image segmentation. A segmentation algorithm based on Bayesian and Spatio-Temporal Markov Random Field is put forward, which respectively build the energy function model of observation field and label field to motion sequence image with Markov property, then according to Bayesian' rule, use the interaction of label field and observation field, that is the relationship of label field’s prior probability and observation field’s likelihood probability, get the maximum posterior probability of label field’s estimation parameter, use the ICM model to extract the motion object, consequently the process of segmentation is finished. Finally, the segmentation methods of ST - MRF and the Bayesian combined with ST - MRF were analyzed. Experimental results: the segmentation time in Bayesian combined with ST-MRF algorithm is shorter than in ST-MRF, and the computing workload is small, especially in the heavy traffic dynamic scenes the method also can achieve better segmentation effect.

  11. A Gibbs sampler for Bayesian analysis of site-occupancy data

    USGS Publications Warehouse

    Dorazio, Robert M.; Rodriguez, Daniel Taylor

    2012-01-01

    1. A Bayesian analysis of site-occupancy data containing covariates of species occurrence and species detection probabilities is usually completed using Markov chain Monte Carlo methods in conjunction with software programs that can implement those methods for any statistical model, not just site-occupancy models. Although these software programs are quite flexible, considerable experience is often required to specify a model and to initialize the Markov chain so that summaries of the posterior distribution can be estimated efficiently and accurately. 2. As an alternative to these programs, we develop a Gibbs sampler for Bayesian analysis of site-occupancy data that include covariates of species occurrence and species detection probabilities. This Gibbs sampler is based on a class of site-occupancy models in which probabilities of species occurrence and detection are specified as probit-regression functions of site- and survey-specific covariate measurements. 3. To illustrate the Gibbs sampler, we analyse site-occupancy data of the blue hawker, Aeshna cyanea (Odonata, Aeshnidae), a common dragonfly species in Switzerland. Our analysis includes a comparison of results based on Bayesian and classical (non-Bayesian) methods of inference. We also provide code (based on the R software program) for conducting Bayesian and classical analyses of site-occupancy data.

  12. Effective Online Bayesian Phylogenetics via Sequential Monte Carlo with Guided Proposals

    PubMed Central

    Fourment, Mathieu; Claywell, Brian C; Dinh, Vu; McCoy, Connor; Matsen IV, Frederick A; Darling, Aaron E

    2018-01-01

    Abstract Modern infectious disease outbreak surveillance produces continuous streams of sequence data which require phylogenetic analysis as data arrives. Current software packages for Bayesian phylogenetic inference are unable to quickly incorporate new sequences as they become available, making them less useful for dynamically unfolding evolutionary stories. This limitation can be addressed by applying a class of Bayesian statistical inference algorithms called sequential Monte Carlo (SMC) to conduct online inference, wherein new data can be continuously incorporated to update the estimate of the posterior probability distribution. In this article, we describe and evaluate several different online phylogenetic sequential Monte Carlo (OPSMC) algorithms. We show that proposing new phylogenies with a density similar to the Bayesian prior suffers from poor performance, and we develop “guided” proposals that better match the proposal density to the posterior. Furthermore, we show that the simplest guided proposals can exhibit pathological behavior in some situations, leading to poor results, and that the situation can be resolved by heating the proposal density. The results demonstrate that relative to the widely used MCMC-based algorithm implemented in MrBayes, the total time required to compute a series of phylogenetic posteriors as sequences arrive can be significantly reduced by the use of OPSMC, without incurring a significant loss in accuracy. PMID:29186587

  13. Fast Bayesian approach for modal identification using free vibration data, Part I - Most probable value

    NASA Astrophysics Data System (ADS)

    Zhang, Feng-Liang; Ni, Yan-Chun; Au, Siu-Kui; Lam, Heung-Fai

    2016-03-01

    The identification of modal properties from field testing of civil engineering structures is becoming economically viable, thanks to the advent of modern sensor and data acquisition technology. Its demand is driven by innovative structural designs and increased performance requirements of dynamic-prone structures that call for a close cross-checking or monitoring of their dynamic properties and responses. Existing instrumentation capabilities and modal identification techniques allow structures to be tested under free vibration, forced vibration (known input) or ambient vibration (unknown broadband loading). These tests can be considered complementary rather than competing as they are based on different modeling assumptions in the identification model and have different implications on costs and benefits. Uncertainty arises naturally in the dynamic testing of structures due to measurement noise, sensor alignment error, modeling error, etc. This is especially relevant in field vibration tests because the test condition in the field environment can hardly be controlled. In this work, a Bayesian statistical approach is developed for modal identification using the free vibration response of structures. A frequency domain formulation is proposed that makes statistical inference based on the Fast Fourier Transform (FFT) of the data in a selected frequency band. This significantly simplifies the identification model because only the modes dominating the frequency band need to be included. It also legitimately ignores the information in the excluded frequency bands that are either irrelevant or difficult to model, thereby significantly reducing modeling error risk. The posterior probability density function (PDF) of the modal parameters is derived rigorously from modeling assumptions and Bayesian probability logic. Computational difficulties associated with calculating the posterior statistics, including the most probable value (MPV) and the posterior covariance matrix, are addressed. Fast computational algorithms for determining the MPV are proposed so that the method can be practically implemented. In the companion paper (Part II), analytical formulae are derived for the posterior covariance matrix so that it can be evaluated without resorting to finite difference method. The proposed method is verified using synthetic data. It is also applied to modal identification of full-scale field structures.

  14. Bayesian seismic inversion based on rock-physics prior modeling for the joint estimation of acoustic impedance, porosity and lithofacies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Passos de Figueiredo, Leandro, E-mail: leandrop.fgr@gmail.com; Grana, Dario; Santos, Marcio

    We propose a Bayesian approach for seismic inversion to estimate acoustic impedance, porosity and lithofacies within the reservoir conditioned to post-stack seismic and well data. The link between elastic and petrophysical properties is given by a joint prior distribution for the logarithm of impedance and porosity, based on a rock-physics model. The well conditioning is performed through a background model obtained by well log interpolation. Two different approaches are presented: in the first approach, the prior is defined by a single Gaussian distribution, whereas in the second approach it is defined by a Gaussian mixture to represent the well datamore » multimodal distribution and link the Gaussian components to different geological lithofacies. The forward model is based on a linearized convolutional model. For the single Gaussian case, we obtain an analytical expression for the posterior distribution, resulting in a fast algorithm to compute the solution of the inverse problem, i.e. the posterior distribution of acoustic impedance and porosity as well as the facies probability given the observed data. For the Gaussian mixture prior, it is not possible to obtain the distributions analytically, hence we propose a Gibbs algorithm to perform the posterior sampling and obtain several reservoir model realizations, allowing an uncertainty analysis of the estimated properties and lithofacies. Both methodologies are applied to a real seismic dataset with three wells to obtain 3D models of acoustic impedance, porosity and lithofacies. The methodologies are validated through a blind well test and compared to a standard Bayesian inversion approach. Using the probability of the reservoir lithofacies, we also compute a 3D isosurface probability model of the main oil reservoir in the studied field.« less

  15. Bayesian inference based on stationary Fokker-Planck sampling.

    PubMed

    Berrones, Arturo

    2010-06-01

    A novel formalism for bayesian learning in the context of complex inference models is proposed. The method is based on the use of the stationary Fokker-Planck (SFP) approach to sample from the posterior density. Stationary Fokker-Planck sampling generalizes the Gibbs sampler algorithm for arbitrary and unknown conditional densities. By the SFP procedure, approximate analytical expressions for the conditionals and marginals of the posterior can be constructed. At each stage of SFP, the approximate conditionals are used to define a Gibbs sampling process, which is convergent to the full joint posterior. By the analytical marginals efficient learning methods in the context of artificial neural networks are outlined. Offline and incremental bayesian inference and maximum likelihood estimation from the posterior are performed in classification and regression examples. A comparison of SFP with other Monte Carlo strategies in the general problem of sampling from arbitrary densities is also presented. It is shown that SFP is able to jump large low-probability regions without the need of a careful tuning of any step-size parameter. In fact, the SFP method requires only a small set of meaningful parameters that can be selected following clear, problem-independent guidelines. The computation cost of SFP, measured in terms of loss function evaluations, grows linearly with the given model's dimension.

  16. Bayesian Travel Time Inversion adopting Gaussian Process Regression

    NASA Astrophysics Data System (ADS)

    Mauerberger, S.; Holschneider, M.

    2017-12-01

    A major application in seismology is the determination of seismic velocity models. Travel time measurements are putting an integral constraint on the velocity between source and receiver. We provide insight into travel time inversion from a correlation-based Bayesian point of view. Therefore, the concept of Gaussian process regression is adopted to estimate a velocity model. The non-linear travel time integral is approximated by a 1st order Taylor expansion. A heuristic covariance describes correlations amongst observations and a priori model. That approach enables us to assess a proxy of the Bayesian posterior distribution at ordinary computational costs. No multi dimensional numeric integration nor excessive sampling is necessary. Instead of stacking the data, we suggest to progressively build the posterior distribution. Incorporating only a single evidence at a time accounts for the deficit of linearization. As a result, the most probable model is given by the posterior mean whereas uncertainties are described by the posterior covariance.As a proof of concept, a synthetic purely 1d model is addressed. Therefore a single source accompanied by multiple receivers is considered on top of a model comprising a discontinuity. We consider travel times of both phases - direct and reflected wave - corrupted by noise. Left and right of the interface are assumed independent where the squared exponential kernel serves as covariance.

  17. Mean Field Variational Bayesian Data Assimilation

    NASA Astrophysics Data System (ADS)

    Vrettas, M.; Cornford, D.; Opper, M.

    2012-04-01

    Current data assimilation schemes propose a range of approximate solutions to the classical data assimilation problem, particularly state estimation. Broadly there are three main active research areas: ensemble Kalman filter methods which rely on statistical linearization of the model evolution equations, particle filters which provide a discrete point representation of the posterior filtering or smoothing distribution and 4DVAR methods which seek the most likely posterior smoothing solution. In this paper we present a recent extension to our variational Bayesian algorithm which seeks the most probably posterior distribution over the states, within the family of non-stationary Gaussian processes. Our original work on variational Bayesian approaches to data assimilation sought the best approximating time varying Gaussian process to the posterior smoothing distribution for stochastic dynamical systems. This approach was based on minimising the Kullback-Leibler divergence between the true posterior over paths, and our Gaussian process approximation. So long as the observation density was sufficiently high to bring the posterior smoothing density close to Gaussian the algorithm proved very effective, on lower dimensional systems. However for higher dimensional systems, the algorithm was computationally very demanding. We have been developing a mean field version of the algorithm which treats the state variables at a given time as being independent in the posterior approximation, but still accounts for their relationships between each other in the mean solution arising from the original dynamical system. In this work we present the new mean field variational Bayesian approach, illustrating its performance on a range of classical data assimilation problems. We discuss the potential and limitations of the new approach. We emphasise that the variational Bayesian approach we adopt, in contrast to other variational approaches, provides a bound on the marginal likelihood of the observations given parameters in the model which also allows inference of parameters such as observation errors, and parameters in the model and model error representation, particularly if this is written as a deterministic form with small additive noise. We stress that our approach can address very long time window and weak constraint settings. However like traditional variational approaches our Bayesian variational method has the benefit of being posed as an optimisation problem. We finish with a sketch of the future directions for our approach.

  18. Estimation of Model's Marginal likelihood Using Adaptive Sparse Grid Surrogates in Bayesian Model Averaging

    NASA Astrophysics Data System (ADS)

    Zeng, X.

    2015-12-01

    A large number of model executions are required to obtain alternative conceptual models' predictions and their posterior probabilities in Bayesian model averaging (BMA). The posterior model probability is estimated through models' marginal likelihood and prior probability. The heavy computation burden hinders the implementation of BMA prediction, especially for the elaborated marginal likelihood estimator. For overcoming the computation burden of BMA, an adaptive sparse grid (SG) stochastic collocation method is used to build surrogates for alternative conceptual models through the numerical experiment of a synthetical groundwater model. BMA predictions depend on model posterior weights (or marginal likelihoods), and this study also evaluated four marginal likelihood estimators, including arithmetic mean estimator (AME), harmonic mean estimator (HME), stabilized harmonic mean estimator (SHME), and thermodynamic integration estimator (TIE). The results demonstrate that TIE is accurate in estimating conceptual models' marginal likelihoods. The BMA-TIE has better predictive performance than other BMA predictions. TIE has high stability for estimating conceptual model's marginal likelihood. The repeated estimated conceptual model's marginal likelihoods by TIE have significant less variability than that estimated by other estimators. In addition, the SG surrogates are efficient to facilitate BMA predictions, especially for BMA-TIE. The number of model executions needed for building surrogates is 4.13%, 6.89%, 3.44%, and 0.43% of the required model executions of BMA-AME, BMA-HME, BMA-SHME, and BMA-TIE, respectively.

  19. Bayesian Inference and Online Learning in Poisson Neuronal Networks.

    PubMed

    Huang, Yanping; Rao, Rajesh P N

    2016-08-01

    Motivated by the growing evidence for Bayesian computation in the brain, we show how a two-layer recurrent network of Poisson neurons can perform both approximate Bayesian inference and learning for any hidden Markov model. The lower-layer sensory neurons receive noisy measurements of hidden world states. The higher-layer neurons infer a posterior distribution over world states via Bayesian inference from inputs generated by sensory neurons. We demonstrate how such a neuronal network with synaptic plasticity can implement a form of Bayesian inference similar to Monte Carlo methods such as particle filtering. Each spike in a higher-layer neuron represents a sample of a particular hidden world state. The spiking activity across the neural population approximates the posterior distribution over hidden states. In this model, variability in spiking is regarded not as a nuisance but as an integral feature that provides the variability necessary for sampling during inference. We demonstrate how the network can learn the likelihood model, as well as the transition probabilities underlying the dynamics, using a Hebbian learning rule. We present results illustrating the ability of the network to perform inference and learning for arbitrary hidden Markov models.

  20. Bayesian analysis of the flutter margin method in aeroelasticity

    DOE PAGES

    Khalil, Mohammad; Poirel, Dominique; Sarkar, Abhijit

    2016-08-27

    A Bayesian statistical framework is presented for Zimmerman and Weissenburger flutter margin method which considers the uncertainties in aeroelastic modal parameters. The proposed methodology overcomes the limitations of the previously developed least-square based estimation technique which relies on the Gaussian approximation of the flutter margin probability density function (pdf). Using the measured free-decay responses at subcritical (preflutter) airspeeds, the joint non-Gaussain posterior pdf of the modal parameters is sampled using the Metropolis–Hastings (MH) Markov chain Monte Carlo (MCMC) algorithm. The posterior MCMC samples of the modal parameters are then used to obtain the flutter margin pdfs and finally the fluttermore » speed pdf. The usefulness of the Bayesian flutter margin method is demonstrated using synthetic data generated from a two-degree-of-freedom pitch-plunge aeroelastic model. The robustness of the statistical framework is demonstrated using different sets of measurement data. In conclusion, it will be shown that the probabilistic (Bayesian) approach reduces the number of test points required in providing a flutter speed estimate for a given accuracy and precision.« less

  1. Back to Normal! Gaussianizing posterior distributions for cosmological probes

    NASA Astrophysics Data System (ADS)

    Schuhmann, Robert L.; Joachimi, Benjamin; Peiris, Hiranya V.

    2014-05-01

    We present a method to map multivariate non-Gaussian posterior probability densities into Gaussian ones via nonlinear Box-Cox transformations, and generalizations thereof. This is analogous to the search for normal parameters in the CMB, but can in principle be applied to any probability density that is continuous and unimodal. The search for the optimally Gaussianizing transformation amongst the Box-Cox family is performed via a maximum likelihood formalism. We can judge the quality of the found transformation a posteriori: qualitatively via statistical tests of Gaussianity, and more illustratively by how well it reproduces the credible regions. The method permits an analytical reconstruction of the posterior from a sample, e.g. a Markov chain, and simplifies the subsequent joint analysis with other experiments. Furthermore, it permits the characterization of a non-Gaussian posterior in a compact and efficient way. The expression for the non-Gaussian posterior can be employed to find analytic formulae for the Bayesian evidence, and consequently be used for model comparison.

  2. Value of Weather Information in Cranberry Marketing Decisions.

    NASA Astrophysics Data System (ADS)

    Morzuch, Bernard J.; Willis, Cleve E.

    1982-04-01

    Econometric techniques are used to establish a functional relationship between cranberry yields and important precipitation, temperature, and sunshine variables. Crop forecasts are derived from the model and are used to establish posterior probabilities to be used in a Bayesian decision context pertaining to leasing space for the storage of the berries.

  3. BAT - The Bayesian analysis toolkit

    NASA Astrophysics Data System (ADS)

    Caldwell, Allen; Kollár, Daniel; Kröninger, Kevin

    2009-11-01

    We describe the development of a new toolkit for data analysis. The analysis package is based on Bayes' Theorem, and is realized with the use of Markov Chain Monte Carlo. This gives access to the full posterior probability distribution. Parameter estimation, limit setting and uncertainty propagation are implemented in a straightforward manner.

  4. Detection of mastitis in dairy cattle by use of mixture models for repeated somatic cell scores: a Bayesian approach via Gibbs sampling.

    PubMed

    Odegård, J; Jensen, J; Madsen, P; Gianola, D; Klemetsdal, G; Heringstad, B

    2003-11-01

    The distribution of somatic cell scores could be regarded as a mixture of at least two components depending on a cow's udder health status. A heteroscedastic two-component Bayesian normal mixture model with random effects was developed and implemented via Gibbs sampling. The model was evaluated using datasets consisting of simulated somatic cell score records. Somatic cell score was simulated as a mixture representing two alternative udder health statuses ("healthy" or "diseased"). Animals were assigned randomly to the two components according to the probability of group membership (Pm). Random effects (additive genetic and permanent environment), when included, had identical distributions across mixture components. Posterior probabilities of putative mastitis were estimated for all observations, and model adequacy was evaluated using measures of sensitivity, specificity, and posterior probability of misclassification. Fitting different residual variances in the two mixture components caused some bias in estimation of parameters. When the components were difficult to disentangle, so were their residual variances, causing bias in estimation of Pm and of location parameters of the two underlying distributions. When all variance components were identical across mixture components, the mixture model analyses returned parameter estimates essentially without bias and with a high degree of precision. Including random effects in the model increased the probability of correct classification substantially. No sizable differences in probability of correct classification were found between models in which a single cow effect (ignoring relationships) was fitted and models where this effect was split into genetic and permanent environmental components, utilizing relationship information. When genetic and permanent environmental effects were fitted, the between-replicate variance of estimates of posterior means was smaller because the model accounted for random genetic drift.

  5. Model selection and Bayesian inference for high-resolution seabed reflection inversion.

    PubMed

    Dettmer, Jan; Dosso, Stan E; Holland, Charles W

    2009-02-01

    This paper applies Bayesian inference, including model selection and posterior parameter inference, to inversion of seabed reflection data to resolve sediment structure at a spatial scale below the pulse length of the acoustic source. A practical approach to model selection is used, employing the Bayesian information criterion to decide on the number of sediment layers needed to sufficiently fit the data while satisfying parsimony to avoid overparametrization. Posterior parameter inference is carried out using an efficient Metropolis-Hastings algorithm for high-dimensional models, and results are presented as marginal-probability depth distributions for sound velocity, density, and attenuation. The approach is applied to plane-wave reflection-coefficient inversion of single-bounce data collected on the Malta Plateau, Mediterranean Sea, which indicate complex fine structure close to the water-sediment interface. This fine structure is resolved in the geoacoustic inversion results in terms of four layers within the upper meter of sediments. The inversion results are in good agreement with parameter estimates from a gravity core taken at the experiment site.

  6. Bayesian ionospheric multi-instrument 3D tomography

    NASA Astrophysics Data System (ADS)

    Norberg, Johannes; Vierinen, Juha; Roininen, Lassi

    2017-04-01

    The tomographic reconstruction of ionospheric electron densities is an inverse problem that cannot be solved without relatively strong regularising additional information. % Especially the vertical electron density profile is determined predominantly by the regularisation. % %Often utilised regularisations in ionospheric tomography include smoothness constraints and iterative methods with initial ionospheric models. % Despite its crucial role, the regularisation is often hidden in the algorithm as a numerical procedure without physical understanding. % % The Bayesian methodology provides an interpretative approach for the problem, as the regularisation can be given in a physically meaningful and quantifiable prior probability distribution. % The prior distribution can be based on ionospheric physics, other available ionospheric measurements and their statistics. % Updating the prior with measurements results as the posterior distribution that carries all the available information combined. % From the posterior distribution, the most probable state of the ionosphere can then be solved with the corresponding probability intervals. % Altogether, the Bayesian methodology provides understanding on how strong the given regularisation is, what is the information gained with the measurements and how reliable the final result is. % In addition, the combination of different measurements and temporal development can be taken into account in a very intuitive way. However, a direct implementation of the Bayesian approach requires inversion of large covariance matrices resulting in computational infeasibility. % In the presented method, Gaussian Markov random fields are used to form a sparse matrix approximations for the covariances. % The approach makes the problem computationally feasible while retaining the probabilistic and physical interpretation. Here, the Bayesian method with Gaussian Markov random fields is applied for ionospheric 3D tomography over Northern Europe. % Multi-instrument measurements are utilised from TomoScand receiver network for Low Earth orbit beacon satellite signals, GNSS receiver networks, as well as from EISCAT ionosondes and incoherent scatter radars. % %The performance is demonstrated in three-dimensional spatial domain with temporal development also taken into account.

  7. Stochastic static fault slip inversion from geodetic data with non-negativity and bounds constraints

    NASA Astrophysics Data System (ADS)

    Nocquet, J.-M.

    2018-04-01

    Despite surface displacements observed by geodesy are linear combinations of slip at faults in an elastic medium, determining the spatial distribution of fault slip remains a ill-posed inverse problem. A widely used approach to circumvent the illness of the inversion is to add regularization constraints in terms of smoothing and/or damping so that the linear system becomes invertible. However, the choice of regularization parameters is often arbitrary, and sometimes leads to significantly different results. Furthermore, the resolution analysis is usually empirical and cannot be made independently of the regularization. The stochastic approach of inverse problems (Tarantola & Valette 1982; Tarantola 2005) provides a rigorous framework where the a priori information about the searched parameters is combined with the observations in order to derive posterior probabilities of the unkown parameters. Here, I investigate an approach where the prior probability density function (pdf) is a multivariate Gaussian function, with single truncation to impose positivity of slip or double truncation to impose positivity and upper bounds on slip for interseismic modeling. I show that the joint posterior pdf is similar to the linear untruncated Gaussian case and can be expressed as a Truncated Multi-Variate Normal (TMVN) distribution. The TMVN form can then be used to obtain semi-analytical formulas for the single, two-dimensional or n-dimensional marginal pdf. The semi-analytical formula involves the product of a Gaussian by an integral term that can be evaluated using recent developments in TMVN probabilities calculations (e.g. Genz & Bretz 2009). Posterior mean and covariance can also be efficiently derived. I show that the Maximum Posterior (MAP) can be obtained using a Non-Negative Least-Squares algorithm (Lawson & Hanson 1974) for the single truncated case or using the Bounded-Variable Least-Squares algorithm (Stark & Parker 1995) for the double truncated case. I show that the case of independent uniform priors can be approximated using TMVN. The numerical equivalence to Bayesian inversions using Monte Carlo Markov Chain (MCMC) sampling is shown for a synthetic example and a real case for interseismic modeling in Central Peru. The TMVN method overcomes several limitations of the Bayesian approach using MCMC sampling. First, the need of computer power is largely reduced. Second, unlike Bayesian MCMC based approach, marginal pdf, mean, variance or covariance are obtained independently one from each other. Third, the probability and cumulative density functions can be obtained with any density of points. Finally, determining the Maximum Posterior (MAP) is extremely fast.

  8. Stochastic static fault slip inversion from geodetic data with non-negativity and bound constraints

    NASA Astrophysics Data System (ADS)

    Nocquet, J.-M.

    2018-07-01

    Despite surface displacements observed by geodesy are linear combinations of slip at faults in an elastic medium, determining the spatial distribution of fault slip remains a ill-posed inverse problem. A widely used approach to circumvent the illness of the inversion is to add regularization constraints in terms of smoothing and/or damping so that the linear system becomes invertible. However, the choice of regularization parameters is often arbitrary, and sometimes leads to significantly different results. Furthermore, the resolution analysis is usually empirical and cannot be made independently of the regularization. The stochastic approach of inverse problems provides a rigorous framework where the a priori information about the searched parameters is combined with the observations in order to derive posterior probabilities of the unkown parameters. Here, I investigate an approach where the prior probability density function (pdf) is a multivariate Gaussian function, with single truncation to impose positivity of slip or double truncation to impose positivity and upper bounds on slip for interseismic modelling. I show that the joint posterior pdf is similar to the linear untruncated Gaussian case and can be expressed as a truncated multivariate normal (TMVN) distribution. The TMVN form can then be used to obtain semi-analytical formulae for the single, 2-D or n-D marginal pdf. The semi-analytical formula involves the product of a Gaussian by an integral term that can be evaluated using recent developments in TMVN probabilities calculations. Posterior mean and covariance can also be efficiently derived. I show that the maximum posterior (MAP) can be obtained using a non-negative least-squares algorithm for the single truncated case or using the bounded-variable least-squares algorithm for the double truncated case. I show that the case of independent uniform priors can be approximated using TMVN. The numerical equivalence to Bayesian inversions using Monte Carlo Markov chain (MCMC) sampling is shown for a synthetic example and a real case for interseismic modelling in Central Peru. The TMVN method overcomes several limitations of the Bayesian approach using MCMC sampling. First, the need of computer power is largely reduced. Second, unlike Bayesian MCMC-based approach, marginal pdf, mean, variance or covariance are obtained independently one from each other. Third, the probability and cumulative density functions can be obtained with any density of points. Finally, determining the MAP is extremely fast.

  9. Bayesian inference for the genetic control of water deficit tolerance in spring wheat by stochastic search variable selection.

    PubMed

    Safari, Parviz; Danyali, Syyedeh Fatemeh; Rahimi, Mehdi

    2018-06-02

    Drought is the main abiotic stress seriously influencing wheat production. Information about the inheritance of drought tolerance is necessary to determine the most appropriate strategy to develop tolerant cultivars and populations. In this study, generation means analysis to identify the genetic effects controlling grain yield inheritance in water deficit and normal conditions was considered as a model selection problem in a Bayesian framework. Stochastic search variable selection (SSVS) was applied to identify the most important genetic effects and the best fitted models using different generations obtained from two crosses applying two water regimes in two growing seasons. The SSVS is used to evaluate the effect of each variable on the dependent variable via posterior variable inclusion probabilities. The model with the highest posterior probability is selected as the best model. In this study, the grain yield was controlled by the main effects (additive and non-additive effects) and epistatic. The results demonstrate that breeding methods such as recurrent selection and subsequent pedigree method and hybrid production can be useful to improve grain yield.

  10. Entropic Inference

    NASA Astrophysics Data System (ADS)

    Caticha, Ariel

    2011-03-01

    In this tutorial we review the essential arguments behing entropic inference. We focus on the epistemological notion of information and its relation to the Bayesian beliefs of rational agents. The problem of updating from a prior to a posterior probability distribution is tackled through an eliminative induction process that singles out the logarithmic relative entropy as the unique tool for inference. The resulting method of Maximum relative Entropy (ME), includes as special cases both MaxEnt and Bayes' rule, and therefore unifies the two themes of these workshops—the Maximum Entropy and the Bayesian methods—into a single general inference scheme.

  11. Intervals for posttest probabilities: a comparison of 5 methods.

    PubMed

    Mossman, D; Berger, J O

    2001-01-01

    Several medical articles discuss methods of constructing confidence intervals for single proportions and the likelihood ratio, but scant attention has been given to the systematic study of intervals for the posterior odds, or the positive predictive value, of a test. The authors describe 5 methods of constructing confidence intervals for posttest probabilities when estimates of sensitivity, specificity, and the pretest probability of a disorder are derived from empirical data. They then evaluate each method to determine how well the intervals' coverage properties correspond to their nominal value. When the estimates of pretest probabilities, sensitivity, and specificity are derived from more than 80 subjects and are not close to 0 or 1, all methods generate intervals with appropriate coverage properties. When these conditions are not met, however, the best-performing method is an objective Bayesian approach implemented by a simple simulation using a spreadsheet. Physicians and investigators can generate accurate confidence intervals for posttest probabilities in small-sample situations using the objective Bayesian approach.

  12. Estimation from incomplete multinomial data. Ph.D. Thesis - Harvard Univ.

    NASA Technical Reports Server (NTRS)

    Credeur, K. R.

    1978-01-01

    The vector of multinomial cell probabilities was estimated from incomplete data, incomplete in that it contains partially classified observations. Each such partially classified observation was observed to fall in one of two or more selected categories but was not classified further into a single category. The data were assumed to be incomplete at random. The estimation criterion was minimization of risk for quadratic loss. The estimators were the classical maximum likelihood estimate, the Bayesian posterior mode, and the posterior mean. An approximation was developed for the posterior mean. The Dirichlet, the conjugate prior for the multinomial distribution, was assumed for the prior distribution.

  13. MapReduce Based Parallel Bayesian Network for Manufacturing Quality Control

    NASA Astrophysics Data System (ADS)

    Zheng, Mao-Kuan; Ming, Xin-Guo; Zhang, Xian-Yu; Li, Guo-Ming

    2017-09-01

    Increasing complexity of industrial products and manufacturing processes have challenged conventional statistics based quality management approaches in the circumstances of dynamic production. A Bayesian network and big data analytics integrated approach for manufacturing process quality analysis and control is proposed. Based on Hadoop distributed architecture and MapReduce parallel computing model, big volume and variety quality related data generated during the manufacturing process could be dealt with. Artificial intelligent algorithms, including Bayesian network learning, classification and reasoning, are embedded into the Reduce process. Relying on the ability of the Bayesian network in dealing with dynamic and uncertain problem and the parallel computing power of MapReduce, Bayesian network of impact factors on quality are built based on prior probability distribution and modified with posterior probability distribution. A case study on hull segment manufacturing precision management for ship and offshore platform building shows that computing speed accelerates almost directly proportionally to the increase of computing nodes. It is also proved that the proposed model is feasible for locating and reasoning of root causes, forecasting of manufacturing outcome, and intelligent decision for precision problem solving. The integration of bigdata analytics and BN method offers a whole new perspective in manufacturing quality control.

  14. A three-step Maximum-A-Posterior probability method for InSAR data inversion of coseismic rupture with application to four recent large earthquakes in Asia

    NASA Astrophysics Data System (ADS)

    Sun, J.; Shen, Z.; Burgmann, R.; Liang, F.

    2012-12-01

    We develop a three-step Maximum-A-Posterior probability (MAP) method for coseismic rupture inversion, which aims at maximizing the a posterior probability density function (PDF) of elastic solutions of earthquake rupture. The method originates from the Fully Bayesian Inversion (FBI) and the Mixed linear-nonlinear Bayesian inversion (MBI) methods , shares the same a posterior PDF with them and keeps most of their merits, while overcoming its convergence difficulty when large numbers of low quality data are used and improving the convergence rate greatly using optimization procedures. A highly efficient global optimization algorithm, Adaptive Simulated Annealing (ASA), is used to search for the maximum posterior probability in the first step. The non-slip parameters are determined by the global optimization method, and the slip parameters are inverted for using the least squares method without positivity constraint initially, and then damped to physically reasonable range. This step MAP inversion brings the inversion close to 'true' solution quickly and jumps over local maximum regions in high-dimensional parameter space. The second step inversion approaches the 'true' solution further with positivity constraints subsequently applied on slip parameters using the Monte Carlo Inversion (MCI) technique, with all parameters obtained from step one as the initial solution. Then the slip artifacts are eliminated from slip models in the third step MAP inversion with fault geometry parameters fixed. We first used a designed model with 45 degree dipping angle and oblique slip, and corresponding synthetic InSAR data sets to validate the efficiency and accuracy of method. We then applied the method on four recent large earthquakes in Asia, namely the 2010 Yushu, China earthquake, the 2011 Burma earthquake, the 2011 New Zealand earthquake and the 2008 Qinghai, China earthquake, and compared our results with those results from other groups. Our results show the effectiveness of the method in earthquake studies and a number of advantages of it over other methods. The details will be reported on the meeting.

  15. A Variational Bayes Genomic-Enabled Prediction Model with Genotype × Environment Interaction

    PubMed Central

    Montesinos-López, Osval A.; Montesinos-López, Abelardo; Crossa, José; Montesinos-López, José Cricelio; Luna-Vázquez, Francisco Javier; Salinas-Ruiz, Josafhat; Herrera-Morales, José R.; Buenrostro-Mariscal, Raymundo

    2017-01-01

    There are Bayesian and non-Bayesian genomic models that take into account G×E interactions. However, the computational cost of implementing Bayesian models is high, and becomes almost impossible when the number of genotypes, environments, and traits is very large, while, in non-Bayesian models, there are often important and unsolved convergence problems. The variational Bayes method is popular in machine learning, and, by approximating the probability distributions through optimization, it tends to be faster than Markov Chain Monte Carlo methods. For this reason, in this paper, we propose a new genomic variational Bayes version of the Bayesian genomic model with G×E using half-t priors on each standard deviation (SD) term to guarantee highly noninformative and posterior inferences that are not sensitive to the choice of hyper-parameters. We show the complete theoretical derivation of the full conditional and the variational posterior distributions, and their implementations. We used eight experimental genomic maize and wheat data sets to illustrate the new proposed variational Bayes approximation, and compared its predictions and implementation time with a standard Bayesian genomic model with G×E. Results indicated that prediction accuracies are slightly higher in the standard Bayesian model with G×E than in its variational counterpart, but, in terms of computation time, the variational Bayes genomic model with G×E is, in general, 10 times faster than the conventional Bayesian genomic model with G×E. For this reason, the proposed model may be a useful tool for researchers who need to predict and select genotypes in several environments. PMID:28391241

  16. A Variational Bayes Genomic-Enabled Prediction Model with Genotype × Environment Interaction.

    PubMed

    Montesinos-López, Osval A; Montesinos-López, Abelardo; Crossa, José; Montesinos-López, José Cricelio; Luna-Vázquez, Francisco Javier; Salinas-Ruiz, Josafhat; Herrera-Morales, José R; Buenrostro-Mariscal, Raymundo

    2017-06-07

    There are Bayesian and non-Bayesian genomic models that take into account G×E interactions. However, the computational cost of implementing Bayesian models is high, and becomes almost impossible when the number of genotypes, environments, and traits is very large, while, in non-Bayesian models, there are often important and unsolved convergence problems. The variational Bayes method is popular in machine learning, and, by approximating the probability distributions through optimization, it tends to be faster than Markov Chain Monte Carlo methods. For this reason, in this paper, we propose a new genomic variational Bayes version of the Bayesian genomic model with G×E using half-t priors on each standard deviation (SD) term to guarantee highly noninformative and posterior inferences that are not sensitive to the choice of hyper-parameters. We show the complete theoretical derivation of the full conditional and the variational posterior distributions, and their implementations. We used eight experimental genomic maize and wheat data sets to illustrate the new proposed variational Bayes approximation, and compared its predictions and implementation time with a standard Bayesian genomic model with G×E. Results indicated that prediction accuracies are slightly higher in the standard Bayesian model with G×E than in its variational counterpart, but, in terms of computation time, the variational Bayes genomic model with G×E is, in general, 10 times faster than the conventional Bayesian genomic model with G×E. For this reason, the proposed model may be a useful tool for researchers who need to predict and select genotypes in several environments. Copyright © 2017 Montesinos-López et al.

  17. Objectively combining AR5 instrumental period and paleoclimate climate sensitivity evidence

    NASA Astrophysics Data System (ADS)

    Lewis, Nicholas; Grünwald, Peter

    2018-03-01

    Combining instrumental period evidence regarding equilibrium climate sensitivity with largely independent paleoclimate proxy evidence should enable a more constrained sensitivity estimate to be obtained. Previous, subjective Bayesian approaches involved selection of a prior probability distribution reflecting the investigators' beliefs about climate sensitivity. Here a recently developed approach employing two different statistical methods—objective Bayesian and frequentist likelihood-ratio—is used to combine instrumental period and paleoclimate evidence based on data presented and assessments made in the IPCC Fifth Assessment Report. Probabilistic estimates from each source of evidence are represented by posterior probability density functions (PDFs) of physically-appropriate form that can be uniquely factored into a likelihood function and a noninformative prior distribution. The three-parameter form is shown accurately to fit a wide range of estimated climate sensitivity PDFs. The likelihood functions relating to the probabilistic estimates from the two sources are multiplicatively combined and a prior is derived that is noninformative for inference from the combined evidence. A posterior PDF that incorporates the evidence from both sources is produced using a single-step approach, which avoids the order-dependency that would arise if Bayesian updating were used. Results are compared with an alternative approach using the frequentist signed root likelihood ratio method. Results from these two methods are effectively identical, and provide a 5-95% range for climate sensitivity of 1.1-4.05 K (median 1.87 K).

  18. Scheduling structural health monitoring activities for optimizing life-cycle costs and reliability of wind turbines

    NASA Astrophysics Data System (ADS)

    Hanish Nithin, Anu; Omenzetter, Piotr

    2017-04-01

    Optimization of the life-cycle costs and reliability of offshore wind turbines (OWTs) is an area of immense interest due to the widespread increase in wind power generation across the world. Most of the existing studies have used structural reliability and the Bayesian pre-posterior analysis for optimization. This paper proposes an extension to the previous approaches in a framework for probabilistic optimization of the total life-cycle costs and reliability of OWTs by combining the elements of structural reliability/risk analysis (SRA), the Bayesian pre-posterior analysis with optimization through a genetic algorithm (GA). The SRA techniques are adopted to compute the probabilities of damage occurrence and failure associated with the deterioration model. The probabilities are used in the decision tree and are updated using the Bayesian analysis. The output of this framework would determine the optimal structural health monitoring and maintenance schedules to be implemented during the life span of OWTs while maintaining a trade-off between the life-cycle costs and risk of the structural failure. Numerical illustrations with a generic deterioration model for one monitoring exercise in the life cycle of a system are demonstrated. Two case scenarios, namely to build initially an expensive and robust or a cheaper but more quickly deteriorating structures and to adopt expensive monitoring system, are presented to aid in the decision-making process.

  19. Efficiency of nuclear and mitochondrial markers recovering and supporting known amniote groups.

    PubMed

    Lambret-Frotté, Julia; Perini, Fernando Araújo; de Moraes Russo, Claudia Augusta

    2012-01-01

    We have analysed the efficiency of all mitochondrial protein coding genes and six nuclear markers (Adora3, Adrb2, Bdnf, Irbp, Rag2 and Vwf) in reconstructing and statistically supporting known amniote groups (murines, rodents, primates, eutherians, metatherians, therians). The efficiencies of maximum likelihood, Bayesian inference, maximum parsimony, neighbor-joining and UPGMA were also evaluated, by assessing the number of correct and incorrect recovered groupings. In addition, we have compared support values using the conservative bootstrap test and the Bayesian posterior probabilities. First, no correlation was observed between gene size and marker efficiency in recovering or supporting correct nodes. As expected, tree-building methods performed similarly, even UPGMA that, in some cases, outperformed other most extensively used methods. Bayesian posterior probabilities tend to show much higher support values than the conservative bootstrap test, for correct and incorrect nodes. Our results also suggest that nuclear markers do not necessarily show a better performance than mitochondrial genes. The so-called dependency among mitochondrial markers was not observed comparing genome performances. Finally, the amniote groups with lowest recovery rates were therians and rodents, despite the morphological support for their monophyletic status. We suggest that, regardless of the tree-building method, a few carefully selected genes are able to unfold a detailed and robust scenario of phylogenetic hypotheses, particularly if taxon sampling is increased.

  20. Finite element model updating using the shadow hybrid Monte Carlo technique

    NASA Astrophysics Data System (ADS)

    Boulkaibet, I.; Mthembu, L.; Marwala, T.; Friswell, M. I.; Adhikari, S.

    2015-02-01

    Recent research in the field of finite element model updating (FEM) advocates the adoption of Bayesian analysis techniques to dealing with the uncertainties associated with these models. However, Bayesian formulations require the evaluation of the Posterior Distribution Function which may not be available in analytical form. This is the case in FEM updating. In such cases sampling methods can provide good approximations of the Posterior distribution when implemented in the Bayesian context. Markov Chain Monte Carlo (MCMC) algorithms are the most popular sampling tools used to sample probability distributions. However, the efficiency of these algorithms is affected by the complexity of the systems (the size of the parameter space). The Hybrid Monte Carlo (HMC) offers a very important MCMC approach to dealing with higher-dimensional complex problems. The HMC uses the molecular dynamics (MD) steps as the global Monte Carlo (MC) moves to reach areas of high probability where the gradient of the log-density of the Posterior acts as a guide during the search process. However, the acceptance rate of HMC is sensitive to the system size as well as the time step used to evaluate the MD trajectory. To overcome this limitation we propose the use of the Shadow Hybrid Monte Carlo (SHMC) algorithm. The SHMC algorithm is a modified version of the Hybrid Monte Carlo (HMC) and designed to improve sampling for large-system sizes and time steps. This is done by sampling from a modified Hamiltonian function instead of the normal Hamiltonian function. In this paper, the efficiency and accuracy of the SHMC method is tested on the updating of two real structures; an unsymmetrical H-shaped beam structure and a GARTEUR SM-AG19 structure and is compared to the application of the HMC algorithm on the same structures.

  1. Approximate Bayesian estimation of extinction rate in the Finnish Daphnia magna metapopulation.

    PubMed

    Robinson, John D; Hall, David W; Wares, John P

    2013-05-01

    Approximate Bayesian computation (ABC) is useful for parameterizing complex models in population genetics. In this study, ABC was applied to simultaneously estimate parameter values for a model of metapopulation coalescence and test two alternatives to a strict metapopulation model in the well-studied network of Daphnia magna populations in Finland. The models shared four free parameters: the subpopulation genetic diversity (θS), the rate of gene flow among patches (4Nm), the founding population size (N0) and the metapopulation extinction rate (e) but differed in the distribution of extinction rates across habitat patches in the system. The three models had either a constant extinction rate in all populations (strict metapopulation), one population that was protected from local extinction (i.e. a persistent source), or habitat-specific extinction rates drawn from a distribution with specified mean and variance. Our model selection analysis favoured the model including a persistent source population over the two alternative models. Of the closest 750,000 data sets in Euclidean space, 78% were simulated under the persistent source model (estimated posterior probability = 0.769). This fraction increased to more than 85% when only the closest 150,000 data sets were considered (estimated posterior probability = 0.774). Approximate Bayesian computation was then used to estimate parameter values that might produce the observed set of summary statistics. Our analysis provided posterior distributions for e that included the point estimate obtained from previous data from the Finnish D. magna metapopulation. Our results support the use of ABC and population genetic data for testing the strict metapopulation model and parameterizing complex models of demography. © 2013 Blackwell Publishing Ltd.

  2. Incorporating prior knowledge induced from stochastic differential equations in the classification of stochastic observations.

    PubMed

    Zollanvari, Amin; Dougherty, Edward R

    2016-12-01

    In classification, prior knowledge is incorporated in a Bayesian framework by assuming that the feature-label distribution belongs to an uncertainty class of feature-label distributions governed by a prior distribution. A posterior distribution is then derived from the prior and the sample data. An optimal Bayesian classifier (OBC) minimizes the expected misclassification error relative to the posterior distribution. From an application perspective, prior construction is critical. The prior distribution is formed by mapping a set of mathematical relations among the features and labels, the prior knowledge, into a distribution governing the probability mass across the uncertainty class. In this paper, we consider prior knowledge in the form of stochastic differential equations (SDEs). We consider a vector SDE in integral form involving a drift vector and dispersion matrix. Having constructed the prior, we develop the optimal Bayesian classifier between two models and examine, via synthetic experiments, the effects of uncertainty in the drift vector and dispersion matrix. We apply the theory to a set of SDEs for the purpose of differentiating the evolutionary history between two species.

  3. Bayesian Parameter Inference and Model Selection by Population Annealing in Systems Biology

    PubMed Central

    Murakami, Yohei

    2014-01-01

    Parameter inference and model selection are very important for mathematical modeling in systems biology. Bayesian statistics can be used to conduct both parameter inference and model selection. Especially, the framework named approximate Bayesian computation is often used for parameter inference and model selection in systems biology. However, Monte Carlo methods needs to be used to compute Bayesian posterior distributions. In addition, the posterior distributions of parameters are sometimes almost uniform or very similar to their prior distributions. In such cases, it is difficult to choose one specific value of parameter with high credibility as the representative value of the distribution. To overcome the problems, we introduced one of the population Monte Carlo algorithms, population annealing. Although population annealing is usually used in statistical mechanics, we showed that population annealing can be used to compute Bayesian posterior distributions in the approximate Bayesian computation framework. To deal with un-identifiability of the representative values of parameters, we proposed to run the simulations with the parameter ensemble sampled from the posterior distribution, named “posterior parameter ensemble”. We showed that population annealing is an efficient and convenient algorithm to generate posterior parameter ensemble. We also showed that the simulations with the posterior parameter ensemble can, not only reproduce the data used for parameter inference, but also capture and predict the data which was not used for parameter inference. Lastly, we introduced the marginal likelihood in the approximate Bayesian computation framework for Bayesian model selection. We showed that population annealing enables us to compute the marginal likelihood in the approximate Bayesian computation framework and conduct model selection depending on the Bayes factor. PMID:25089832

  4. A Bayesian Hierarchical Model for Glacial Dynamics Based on the Shallow Ice Approximation and its Evaluation Using Analytical Solutions

    NASA Astrophysics Data System (ADS)

    Gopalan, Giri; Hrafnkelsson, Birgir; Aðalgeirsdóttir, Guðfinna; Jarosch, Alexander H.; Pálsson, Finnur

    2018-03-01

    Bayesian hierarchical modeling can assist the study of glacial dynamics and ice flow properties. This approach will allow glaciologists to make fully probabilistic predictions for the thickness of a glacier at unobserved spatio-temporal coordinates, and it will also allow for the derivation of posterior probability distributions for key physical parameters such as ice viscosity and basal sliding. The goal of this paper is to develop a proof of concept for a Bayesian hierarchical model constructed, which uses exact analytical solutions for the shallow ice approximation (SIA) introduced by Bueler et al. (2005). A suite of test simulations utilizing these exact solutions suggests that this approach is able to adequately model numerical errors and produce useful physical parameter posterior distributions and predictions. A byproduct of the development of the Bayesian hierarchical model is the derivation of a novel finite difference method for solving the SIA partial differential equation (PDE). An additional novelty of this work is the correction of numerical errors induced through a numerical solution using a statistical model. This error correcting process models numerical errors that accumulate forward in time and spatial variation of numerical errors between the dome, interior, and margin of a glacier.

  5. A Bayesian Approach to Person Fit Analysis in Item Response Theory Models. Research Report.

    ERIC Educational Resources Information Center

    Glas, Cees A. W.; Meijer, Rob R.

    A Bayesian approach to the evaluation of person fit in item response theory (IRT) models is presented. In a posterior predictive check, the observed value on a discrepancy variable is positioned in its posterior distribution. In a Bayesian framework, a Markov Chain Monte Carlo procedure can be used to generate samples of the posterior distribution…

  6. On parametrized cold dense matter equation-of-state inference

    NASA Astrophysics Data System (ADS)

    Riley, Thomas E.; Raaijmakers, Geert; Watts, Anna L.

    2018-07-01

    Constraining the equation of state of cold dense matter in compact stars is a major science goal for observing programmes being conducted using X-ray, radio, and gravitational wave telescopes. We discuss Bayesian hierarchical inference of parametrized dense matter equations of state. In particular, we generalize and examine two inference paradigms from the literature: (i) direct posterior equation-of-state parameter estimation, conditioned on observations of a set of rotating compact stars; and (ii) indirect parameter estimation, via transformation of an intermediary joint posterior distribution of exterior spacetime parameters (such as gravitational masses and coordinate equatorial radii). We conclude that the former paradigm is not only tractable for large-scale analyses, but is principled and flexible from a Bayesian perspective while the latter paradigm is not. The thematic problem of Bayesian prior definition emerges as the crux of the difference between these paradigms. The second paradigm should in general only be considered as an ill-defined approach to the problem of utilizing archival posterior constraints on exterior spacetime parameters; we advocate for an alternative approach whereby such information is repurposed as an approximative likelihood function. We also discuss why conditioning on a piecewise-polytropic equation-of-state model - currently standard in the field of dense matter study - can easily violate conditions required for transformation of a probability density distribution between spaces of exterior (spacetime) and interior (source matter) parameters.

  7. On parametrised cold dense matter equation of state inference

    NASA Astrophysics Data System (ADS)

    Riley, Thomas E.; Raaijmakers, Geert; Watts, Anna L.

    2018-04-01

    Constraining the equation of state of cold dense matter in compact stars is a major science goal for observing programmes being conducted using X-ray, radio, and gravitational wave telescopes. We discuss Bayesian hierarchical inference of parametrised dense matter equations of state. In particular we generalise and examine two inference paradigms from the literature: (i) direct posterior equation of state parameter estimation, conditioned on observations of a set of rotating compact stars; and (ii) indirect parameter estimation, via transformation of an intermediary joint posterior distribution of exterior spacetime parameters (such as gravitational masses and coordinate equatorial radii). We conclude that the former paradigm is not only tractable for large-scale analyses, but is principled and flexible from a Bayesian perspective whilst the latter paradigm is not. The thematic problem of Bayesian prior definition emerges as the crux of the difference between these paradigms. The second paradigm should in general only be considered as an ill-defined approach to the problem of utilising archival posterior constraints on exterior spacetime parameters; we advocate for an alternative approach whereby such information is repurposed as an approximative likelihood function. We also discuss why conditioning on a piecewise-polytropic equation of state model - currently standard in the field of dense matter study - can easily violate conditions required for transformation of a probability density distribution between spaces of exterior (spacetime) and interior (source matter) parameters.

  8. Modeling Dynamic Contrast-Enhanced MRI Data with a Constrained Local AIF.

    PubMed

    Duan, Chong; Kallehauge, Jesper F; Pérez-Torres, Carlos J; Bretthorst, G Larry; Beeman, Scott C; Tanderup, Kari; Ackerman, Joseph J H; Garbow, Joel R

    2018-02-01

    This study aims to develop a constrained local arterial input function (cL-AIF) to improve quantitative analysis of dynamic contrast-enhanced (DCE)-magnetic resonance imaging (MRI) data by accounting for the contrast-agent bolus amplitude error in the voxel-specific AIF. Bayesian probability theory-based parameter estimation and model selection were used to compare tracer kinetic modeling employing either the measured remote-AIF (R-AIF, i.e., the traditional approach) or an inferred cL-AIF against both in silico DCE-MRI data and clinical, cervical cancer DCE-MRI data. When the data model included the cL-AIF, tracer kinetic parameters were correctly estimated from in silico data under contrast-to-noise conditions typical of clinical DCE-MRI experiments. Considering the clinical cervical cancer data, Bayesian model selection was performed for all tumor voxels of the 16 patients (35,602 voxels in total). Among those voxels, a tracer kinetic model that employed the voxel-specific cL-AIF was preferred (i.e., had a higher posterior probability) in 80 % of the voxels compared to the direct use of a single R-AIF. Maps of spatial variation in voxel-specific AIF bolus amplitude and arrival time for heterogeneous tissues, such as cervical cancer, are accessible with the cL-AIF approach. The cL-AIF method, which estimates unique local-AIF amplitude and arrival time for each voxel within the tissue of interest, provides better modeling of DCE-MRI data than the use of a single, measured R-AIF. The Bayesian-based data analysis described herein affords estimates of uncertainties for each model parameter, via posterior probability density functions, and voxel-wise comparison across methods/models, via model selection in data modeling.

  9. Constructive Epistemic Modeling: A Hierarchical Bayesian Model Averaging Method

    NASA Astrophysics Data System (ADS)

    Tsai, F. T. C.; Elshall, A. S.

    2014-12-01

    Constructive epistemic modeling is the idea that our understanding of a natural system through a scientific model is a mental construct that continually develops through learning about and from the model. Using the hierarchical Bayesian model averaging (HBMA) method [1], this study shows that segregating different uncertain model components through a BMA tree of posterior model probabilities, model prediction, within-model variance, between-model variance and total model variance serves as a learning tool [2]. First, the BMA tree of posterior model probabilities permits the comparative evaluation of the candidate propositions of each uncertain model component. Second, systemic model dissection is imperative for understanding the individual contribution of each uncertain model component to the model prediction and variance. Third, the hierarchical representation of the between-model variance facilitates the prioritization of the contribution of each uncertain model component to the overall model uncertainty. We illustrate these concepts using the groundwater modeling of a siliciclastic aquifer-fault system. The sources of uncertainty considered are from geological architecture, formation dip, boundary conditions and model parameters. The study shows that the HBMA analysis helps in advancing knowledge about the model rather than forcing the model to fit a particularly understanding or merely averaging several candidate models. [1] Tsai, F. T.-C., and A. S. Elshall (2013), Hierarchical Bayesian model averaging for hydrostratigraphic modeling: Uncertainty segregation and comparative evaluation. Water Resources Research, 49, 5520-5536, doi:10.1002/wrcr.20428. [2] Elshall, A.S., and F. T.-C. Tsai (2014). Constructive epistemic modeling of groundwater flow with geological architecture and boundary condition uncertainty under Bayesian paradigm, Journal of Hydrology, 517, 105-119, doi: 10.1016/j.jhydrol.2014.05.027.

  10. Determining Protein Complex Structures Based on a Bayesian Model of in Vivo Förster Resonance Energy Transfer (FRET) Data*

    PubMed Central

    Bonomi, Massimiliano; Pellarin, Riccardo; Kim, Seung Joong; Russel, Daniel; Sundin, Bryan A.; Riffle, Michael; Jaschob, Daniel; Ramsden, Richard; Davis, Trisha N.; Muller, Eric G. D.; Sali, Andrej

    2014-01-01

    The use of in vivo Förster resonance energy transfer (FRET) data to determine the molecular architecture of a protein complex in living cells is challenging due to data sparseness, sample heterogeneity, signal contributions from multiple donors and acceptors, unequal fluorophore brightness, photobleaching, flexibility of the linker connecting the fluorophore to the tagged protein, and spectral cross-talk. We addressed these challenges by using a Bayesian approach that produces the posterior probability of a model, given the input data. The posterior probability is defined as a function of the dependence of our FRET metric FRETR on a structure (forward model), a model of noise in the data, as well as prior information about the structure, relative populations of distinct states in the sample, forward model parameters, and data noise. The forward model was validated against kinetic Monte Carlo simulations and in vivo experimental data collected on nine systems of known structure. In addition, our Bayesian approach was validated by a benchmark of 16 protein complexes of known structure. Given the structures of each subunit of the complexes, models were computed from synthetic FRETR data with a distance root-mean-squared deviation error of 14 to 17 Å. The approach is implemented in the open-source Integrative Modeling Platform, allowing us to determine macromolecular structures through a combination of in vivo FRETR data and data from other sources, such as electron microscopy and chemical cross-linking. PMID:25139910

  11. Bayesian truncation errors in chiral effective field theory: model checking and accounting for correlations

    NASA Astrophysics Data System (ADS)

    Melendez, Jordan; Wesolowski, Sarah; Furnstahl, Dick

    2017-09-01

    Chiral effective field theory (EFT) predictions are necessarily truncated at some order in the EFT expansion, which induces an error that must be quantified for robust statistical comparisons to experiment. A Bayesian model yields posterior probability distribution functions for these errors based on expectations of naturalness encoded in Bayesian priors and the observed order-by-order convergence pattern of the EFT. As a general example of a statistical approach to truncation errors, the model was applied to chiral EFT for neutron-proton scattering using various semi-local potentials of Epelbaum, Krebs, and Meißner (EKM). Here we discuss how our model can learn correlation information from the data and how to perform Bayesian model checking to validate that the EFT is working as advertised. Supported in part by NSF PHY-1614460 and DOE NUCLEI SciDAC DE-SC0008533.

  12. Bayesian adaptive phase II screening design for combination trials.

    PubMed

    Cai, Chunyan; Yuan, Ying; Johnson, Valen E

    2013-01-01

    Trials of combination therapies for the treatment of cancer are playing an increasingly important role in the battle against this disease. To more efficiently handle the large number of combination therapies that must be tested, we propose a novel Bayesian phase II adaptive screening design to simultaneously select among possible treatment combinations involving multiple agents. Our design is based on formulating the selection procedure as a Bayesian hypothesis testing problem in which the superiority of each treatment combination is equated to a single hypothesis. During the trial conduct, we use the current values of the posterior probabilities of all hypotheses to adaptively allocate patients to treatment combinations. Simulation studies show that the proposed design substantially outperforms the conventional multiarm balanced factorial trial design. The proposed design yields a significantly higher probability for selecting the best treatment while allocating substantially more patients to efficacious treatments. The proposed design is most appropriate for the trials combining multiple agents and screening out the efficacious combination to be further investigated. The proposed Bayesian adaptive phase II screening design substantially outperformed the conventional complete factorial design. Our design allocates more patients to better treatments while providing higher power to identify the best treatment at the end of the trial.

  13. A Bayesian model averaging approach for estimating the relative risk of mortality associated with heat waves in 105 U.S. cities.

    PubMed

    Bobb, Jennifer F; Dominici, Francesca; Peng, Roger D

    2011-12-01

    Estimating the risks heat waves pose to human health is a critical part of assessing the future impact of climate change. In this article, we propose a flexible class of time series models to estimate the relative risk of mortality associated with heat waves and conduct Bayesian model averaging (BMA) to account for the multiplicity of potential models. Applying these methods to data from 105 U.S. cities for the period 1987-2005, we identify those cities having a high posterior probability of increased mortality risk during heat waves, examine the heterogeneity of the posterior distributions of mortality risk across cities, assess sensitivity of the results to the selection of prior distributions, and compare our BMA results to a model selection approach. Our results show that no single model best predicts risk across the majority of cities, and that for some cities heat-wave risk estimation is sensitive to model choice. Although model averaging leads to posterior distributions with increased variance as compared to statistical inference conditional on a model obtained through model selection, we find that the posterior mean of heat wave mortality risk is robust to accounting for model uncertainty over a broad class of models. © 2011, The International Biometric Society.

  14. A three-step maximum a posteriori probability method for InSAR data inversion of coseismic rupture with application to the 14 April 2010 Mw 6.9 Yushu, China, earthquake

    NASA Astrophysics Data System (ADS)

    Sun, Jianbao; Shen, Zheng-Kang; Bürgmann, Roland; Wang, Min; Chen, Lichun; Xu, Xiwei

    2013-08-01

    develop a three-step maximum a posteriori probability method for coseismic rupture inversion, which aims at maximizing the a posterior probability density function (PDF) of elastic deformation solutions of earthquake rupture. The method originates from the fully Bayesian inversion and mixed linear-nonlinear Bayesian inversion methods and shares the same posterior PDF with them, while overcoming difficulties with convergence when large numbers of low-quality data are used and greatly improving the convergence rate using optimization procedures. A highly efficient global optimization algorithm, adaptive simulated annealing, is used to search for the maximum of a posterior PDF ("mode" in statistics) in the first step. The second step inversion approaches the "true" solution further using the Monte Carlo inversion technique with positivity constraints, with all parameters obtained from the first step as the initial solution. Then slip artifacts are eliminated from slip models in the third step using the same procedure of the second step, with fixed fault geometry parameters. We first design a fault model with 45° dip angle and oblique slip, and produce corresponding synthetic interferometric synthetic aperture radar (InSAR) data sets to validate the reliability and efficiency of the new method. We then apply this method to InSAR data inversion for the coseismic slip distribution of the 14 April 2010 Mw 6.9 Yushu, China earthquake. Our preferred slip model is composed of three segments with most of the slip occurring within 15 km depth and the maximum slip reaches 1.38 m at the surface. The seismic moment released is estimated to be 2.32e+19 Nm, consistent with the seismic estimate of 2.50e+19 Nm.

  15. HELP: XID+, the probabilistic de-blender for Herschel SPIRE maps

    NASA Astrophysics Data System (ADS)

    Hurley, P. D.; Oliver, S.; Betancourt, M.; Clarke, C.; Cowley, W. I.; Duivenvoorden, S.; Farrah, D.; Griffin, M.; Lacey, C.; Le Floc'h, E.; Papadopoulos, A.; Sargent, M.; Scudder, J. M.; Vaccari, M.; Valtchanov, I.; Wang, L.

    2017-01-01

    We have developed a new prior-based source extraction tool, XID+, to carry out photometry in the Herschel SPIRE (Spectral and Photometric Imaging Receiver) maps at the positions of known sources. XID+ is developed using a probabilistic Bayesian framework that provides a natural framework in which to include prior information, and uses the Bayesian inference tool Stan to obtain the full posterior probability distribution on flux estimates. In this paper, we discuss the details of XID+ and demonstrate the basic capabilities and performance by running it on simulated SPIRE maps resembling the COSMOS field, and comparing to the current prior-based source extraction tool DESPHOT. Not only we show that XID+ performs better on metrics such as flux accuracy and flux uncertainty accuracy, but we also illustrate how obtaining the posterior probability distribution can help overcome some of the issues inherent with maximum-likelihood-based source extraction routines. We run XID+ on the COSMOS SPIRE maps from Herschel Multi-Tiered Extragalactic Survey using a 24-μm catalogue as a positional prior, and a uniform flux prior ranging from 0.01 to 1000 mJy. We show the marginalized SPIRE colour-colour plot and marginalized contribution to the cosmic infrared background at the SPIRE wavelengths. XID+ is a core tool arising from the Herschel Extragalactic Legacy Project (HELP) and we discuss how additional work within HELP providing prior information on fluxes can and will be utilized. The software is available at https://github.com/H-E-L-P/XID_plus. We also provide the data product for COSMOS. We believe this is the first time that the full posterior probability of galaxy photometry has been provided as a data product.

  16. Bayesian evidence computation for model selection in non-linear geoacoustic inference problems.

    PubMed

    Dettmer, Jan; Dosso, Stan E; Osler, John C

    2010-12-01

    This paper applies a general Bayesian inference approach, based on Bayesian evidence computation, to geoacoustic inversion of interface-wave dispersion data. Quantitative model selection is carried out by computing the evidence (normalizing constants) for several model parameterizations using annealed importance sampling. The resulting posterior probability density estimate is compared to estimates obtained from Metropolis-Hastings sampling to ensure consistent results. The approach is applied to invert interface-wave dispersion data collected on the Scotian Shelf, off the east coast of Canada for the sediment shear-wave velocity profile. Results are consistent with previous work on these data but extend the analysis to a rigorous approach including model selection and uncertainty analysis. The results are also consistent with core samples and seismic reflection measurements carried out in the area.

  17. Aerosol-type retrieval and uncertainty quantification from OMI data

    NASA Astrophysics Data System (ADS)

    Kauppi, Anu; Kolmonen, Pekka; Laine, Marko; Tamminen, Johanna

    2017-11-01

    We discuss uncertainty quantification for aerosol-type selection in satellite-based atmospheric aerosol retrieval. The retrieval procedure uses precalculated aerosol microphysical models stored in look-up tables (LUTs) and top-of-atmosphere (TOA) spectral reflectance measurements to solve the aerosol characteristics. The forward model approximations cause systematic differences between the modelled and observed reflectance. Acknowledging this model discrepancy as a source of uncertainty allows us to produce more realistic uncertainty estimates and assists the selection of the most appropriate LUTs for each individual retrieval.This paper focuses on the aerosol microphysical model selection and characterisation of uncertainty in the retrieved aerosol type and aerosol optical depth (AOD). The concept of model evidence is used as a tool for model comparison. The method is based on Bayesian inference approach, in which all uncertainties are described as a posterior probability distribution. When there is no single best-matching aerosol microphysical model, we use a statistical technique based on Bayesian model averaging to combine AOD posterior probability densities of the best-fitting models to obtain an averaged AOD estimate. We also determine the shared evidence of the best-matching models of a certain main aerosol type in order to quantify how plausible it is that it represents the underlying atmospheric aerosol conditions.The developed method is applied to Ozone Monitoring Instrument (OMI) measurements using a multiwavelength approach for retrieving the aerosol type and AOD estimate with uncertainty quantification for cloud-free over-land pixels. Several larger pixel set areas were studied in order to investigate the robustness of the developed method. We evaluated the retrieved AOD by comparison with ground-based measurements at example sites. We found that the uncertainty of AOD expressed by posterior probability distribution reflects the difficulty in model selection. The posterior probability distribution can provide a comprehensive characterisation of the uncertainty in this kind of problem for aerosol-type selection. As a result, the proposed method can account for the model error and also include the model selection uncertainty in the total uncertainty budget.

  18. Topics in inference and decision-making with partial knowledge

    NASA Technical Reports Server (NTRS)

    Safavian, S. Rasoul; Landgrebe, David

    1990-01-01

    Two essential elements needed in the process of inference and decision-making are prior probabilities and likelihood functions. When both of these components are known accurately and precisely, the Bayesian approach provides a consistent and coherent solution to the problems of inference and decision-making. In many situations, however, either one or both of the above components may not be known, or at least may not be known precisely. This problem of partial knowledge about prior probabilities and likelihood functions is addressed. There are at least two ways to cope with this lack of precise knowledge: robust methods, and interval-valued methods. First, ways of modeling imprecision and indeterminacies in prior probabilities and likelihood functions are examined; then how imprecision in the above components carries over to the posterior probabilities is examined. Finally, the problem of decision making with imprecise posterior probabilities and the consequences of such actions are addressed. Application areas where the above problems may occur are in statistical pattern recognition problems, for example, the problem of classification of high-dimensional multispectral remote sensing image data.

  19. Inference of reaction rate parameters based on summary statistics from experiments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Khalil, Mohammad; Chowdhary, Kamaljit Singh; Safta, Cosmin

    Here, we present the results of an application of Bayesian inference and maximum entropy methods for the estimation of the joint probability density for the Arrhenius rate para meters of the rate coefficient of the H 2/O 2-mechanism chain branching reaction H + O 2 → OH + O. Available published data is in the form of summary statistics in terms of nominal values and error bars of the rate coefficient of this reaction at a number of temperature values obtained from shock-tube experiments. Our approach relies on generating data, in this case OH concentration profiles, consistent with the givenmore » summary statistics, using Approximate Bayesian Computation methods and a Markov Chain Monte Carlo procedure. The approach permits the forward propagation of parametric uncertainty through the computational model in a manner that is consistent with the published statistics. A consensus joint posterior on the parameters is obtained by pooling the posterior parameter densities given each consistent data set. To expedite this process, we construct efficient surrogates for the OH concentration using a combination of Pad'e and polynomial approximants. These surrogate models adequately represent forward model observables and their dependence on input parameters and are computationally efficient to allow their use in the Bayesian inference procedure. We also utilize Gauss-Hermite quadrature with Gaussian proposal probability density functions for moment computation resulting in orders of magnitude speedup in data likelihood evaluation. Despite the strong non-linearity in the model, the consistent data sets all res ult in nearly Gaussian conditional parameter probability density functions. The technique also accounts for nuisance parameters in the form of Arrhenius parameters of other rate coefficients with prescribed uncertainty. The resulting pooled parameter probability density function is propagated through stoichiometric hydrogen-air auto-ignition computations to illustrate the need to account for correlation among the Arrhenius rate parameters of one reaction and across rate parameters of different reactions.« less

  20. Inference of reaction rate parameters based on summary statistics from experiments

    DOE PAGES

    Khalil, Mohammad; Chowdhary, Kamaljit Singh; Safta, Cosmin; ...

    2016-10-15

    Here, we present the results of an application of Bayesian inference and maximum entropy methods for the estimation of the joint probability density for the Arrhenius rate para meters of the rate coefficient of the H 2/O 2-mechanism chain branching reaction H + O 2 → OH + O. Available published data is in the form of summary statistics in terms of nominal values and error bars of the rate coefficient of this reaction at a number of temperature values obtained from shock-tube experiments. Our approach relies on generating data, in this case OH concentration profiles, consistent with the givenmore » summary statistics, using Approximate Bayesian Computation methods and a Markov Chain Monte Carlo procedure. The approach permits the forward propagation of parametric uncertainty through the computational model in a manner that is consistent with the published statistics. A consensus joint posterior on the parameters is obtained by pooling the posterior parameter densities given each consistent data set. To expedite this process, we construct efficient surrogates for the OH concentration using a combination of Pad'e and polynomial approximants. These surrogate models adequately represent forward model observables and their dependence on input parameters and are computationally efficient to allow their use in the Bayesian inference procedure. We also utilize Gauss-Hermite quadrature with Gaussian proposal probability density functions for moment computation resulting in orders of magnitude speedup in data likelihood evaluation. Despite the strong non-linearity in the model, the consistent data sets all res ult in nearly Gaussian conditional parameter probability density functions. The technique also accounts for nuisance parameters in the form of Arrhenius parameters of other rate coefficients with prescribed uncertainty. The resulting pooled parameter probability density function is propagated through stoichiometric hydrogen-air auto-ignition computations to illustrate the need to account for correlation among the Arrhenius rate parameters of one reaction and across rate parameters of different reactions.« less

  1. Good fences make for good neighbors but bad science: a review of what improves Bayesian reasoning and why.

    PubMed

    Brase, Gary L; Hill, W Trey

    2015-01-01

    Bayesian reasoning, defined here as the updating of a posterior probability following new information, has historically been problematic for humans. Classic psychology experiments have tested human Bayesian reasoning through the use of word problems and have evaluated each participant's performance against the normatively correct answer provided by Bayes' theorem. The standard finding is of generally poor performance. Over the past two decades, though, progress has been made on how to improve Bayesian reasoning. Most notably, research has demonstrated that the use of frequencies in a natural sampling framework-as opposed to single-event probabilities-can improve participants' Bayesian estimates. Furthermore, pictorial aids and certain individual difference factors also can play significant roles in Bayesian reasoning success. The mechanics of how to build tasks which show these improvements is not under much debate. The explanations for why naturally sampled frequencies and pictures help Bayesian reasoning remain hotly contested, however, with many researchers falling into ingrained "camps" organized around two dominant theoretical perspectives. The present paper evaluates the merits of these theoretical perspectives, including the weight of empirical evidence, theoretical coherence, and predictive power. By these criteria, the ecological rationality approach is clearly better than the heuristics and biases view. Progress in the study of Bayesian reasoning will depend on continued research that honestly, vigorously, and consistently engages across these different theoretical accounts rather than staying "siloed" within one particular perspective. The process of science requires an understanding of competing points of view, with the ultimate goal being integration.

  2. Bayesian model selection: Evidence estimation based on DREAM simulation and bridge sampling

    NASA Astrophysics Data System (ADS)

    Volpi, Elena; Schoups, Gerrit; Firmani, Giovanni; Vrugt, Jasper A.

    2017-04-01

    Bayesian inference has found widespread application in Earth and Environmental Systems Modeling, providing an effective tool for prediction, data assimilation, parameter estimation, uncertainty analysis and hypothesis testing. Under multiple competing hypotheses, the Bayesian approach also provides an attractive alternative to traditional information criteria (e.g. AIC, BIC) for model selection. The key variable for Bayesian model selection is the evidence (or marginal likelihood) that is the normalizing constant in the denominator of Bayes theorem; while it is fundamental for model selection, the evidence is not required for Bayesian inference. It is computed for each hypothesis (model) by averaging the likelihood function over the prior parameter distribution, rather than maximizing it as by information criteria; the larger a model evidence the more support it receives among a collection of hypothesis as the simulated values assign relatively high probability density to the observed data. Hence, the evidence naturally acts as an Occam's razor, preferring simpler and more constrained models against the selection of over-fitted ones by information criteria that incorporate only the likelihood maximum. Since it is not particularly easy to estimate the evidence in practice, Bayesian model selection via the marginal likelihood has not yet found mainstream use. We illustrate here the properties of a new estimator of the Bayesian model evidence, which provides robust and unbiased estimates of the marginal likelihood; the method is coined Gaussian Mixture Importance Sampling (GMIS). GMIS uses multidimensional numerical integration of the posterior parameter distribution via bridge sampling (a generalization of importance sampling) of a mixture distribution fitted to samples of the posterior distribution derived from the DREAM algorithm (Vrugt et al., 2008; 2009). Some illustrative examples are presented to show the robustness and superiority of the GMIS estimator with respect to other commonly used approaches in the literature.

  3. Bayesian approach to analyzing holograms of colloidal particles.

    PubMed

    Dimiduk, Thomas G; Manoharan, Vinothan N

    2016-10-17

    We demonstrate a Bayesian approach to tracking and characterizing colloidal particles from in-line digital holograms. We model the formation of the hologram using Lorenz-Mie theory. We then use a tempered Markov-chain Monte Carlo method to sample the posterior probability distributions of the model parameters: particle position, size, and refractive index. Compared to least-squares fitting, our approach allows us to more easily incorporate prior information about the parameters and to obtain more accurate uncertainties, which are critical for both particle tracking and characterization experiments. Our approach also eliminates the need to supply accurate initial guesses for the parameters, so it requires little tuning.

  4. Competing risk models in reliability systems, an exponential distribution model with Bayesian analysis approach

    NASA Astrophysics Data System (ADS)

    Iskandar, I.

    2018-03-01

    The exponential distribution is the most widely used reliability analysis. This distribution is very suitable for representing the lengths of life of many cases and is available in a simple statistical form. The characteristic of this distribution is a constant hazard rate. The exponential distribution is the lower rank of the Weibull distributions. In this paper our effort is to introduce the basic notions that constitute an exponential competing risks model in reliability analysis using Bayesian analysis approach and presenting their analytic methods. The cases are limited to the models with independent causes of failure. A non-informative prior distribution is used in our analysis. This model describes the likelihood function and follows with the description of the posterior function and the estimations of the point, interval, hazard function, and reliability. The net probability of failure if only one specific risk is present, crude probability of failure due to a specific risk in the presence of other causes, and partial crude probabilities are also included.

  5. Bayesian probabilistic approach for inverse source determination from limited and noisy chemical or biological sensor concentration measurements

    NASA Astrophysics Data System (ADS)

    Yee, Eugene

    2007-04-01

    Although a great deal of research effort has been focused on the forward prediction of the dispersion of contaminants (e.g., chemical and biological warfare agents) released into the turbulent atmosphere, much less work has been directed toward the inverse prediction of agent source location and strength from the measured concentration, even though the importance of this problem for a number of practical applications is obvious. In general, the inverse problem of source reconstruction is ill-posed and unsolvable without additional information. It is demonstrated that a Bayesian probabilistic inferential framework provides a natural and logically consistent method for source reconstruction from a limited number of noisy concentration data. In particular, the Bayesian approach permits one to incorporate prior knowledge about the source as well as additional information regarding both model and data errors. The latter enables a rigorous determination of the uncertainty in the inference of the source parameters (e.g., spatial location, emission rate, release time, etc.), hence extending the potential of the methodology as a tool for quantitative source reconstruction. A model (or, source-receptor relationship) that relates the source distribution to the concentration data measured by a number of sensors is formulated, and Bayesian probability theory is used to derive the posterior probability density function of the source parameters. A computationally efficient methodology for determination of the likelihood function for the problem, based on an adjoint representation of the source-receptor relationship, is described. Furthermore, we describe the application of efficient stochastic algorithms based on Markov chain Monte Carlo (MCMC) for sampling from the posterior distribution of the source parameters, the latter of which is required to undertake the Bayesian computation. The Bayesian inferential methodology for source reconstruction is validated against real dispersion data for two cases involving contaminant dispersion in highly disturbed flows over urban and complex environments where the idealizations of horizontal homogeneity and/or temporal stationarity in the flow cannot be applied to simplify the problem. Furthermore, the methodology is applied to the case of reconstruction of multiple sources.

  6. Bayesian Estimation of Small Effects in Exercise and Sports Science.

    PubMed

    Mengersen, Kerrie L; Drovandi, Christopher C; Robert, Christian P; Pyne, David B; Gore, Christopher J

    2016-01-01

    The aim of this paper is to provide a Bayesian formulation of the so-called magnitude-based inference approach to quantifying and interpreting effects, and in a case study example provide accurate probabilistic statements that correspond to the intended magnitude-based inferences. The model is described in the context of a published small-scale athlete study which employed a magnitude-based inference approach to compare the effect of two altitude training regimens (live high-train low (LHTL), and intermittent hypoxic exposure (IHE)) on running performance and blood measurements of elite triathletes. The posterior distributions, and corresponding point and interval estimates, for the parameters and associated effects and comparisons of interest, were estimated using Markov chain Monte Carlo simulations. The Bayesian analysis was shown to provide more direct probabilistic comparisons of treatments and able to identify small effects of interest. The approach avoided asymptotic assumptions and overcame issues such as multiple testing. Bayesian analysis of unscaled effects showed a probability of 0.96 that LHTL yields a substantially greater increase in hemoglobin mass than IHE, a 0.93 probability of a substantially greater improvement in running economy and a greater than 0.96 probability that both IHE and LHTL yield a substantially greater improvement in maximum blood lactate concentration compared to a Placebo. The conclusions are consistent with those obtained using a 'magnitude-based inference' approach that has been promoted in the field. The paper demonstrates that a fully Bayesian analysis is a simple and effective way of analysing small effects, providing a rich set of results that are straightforward to interpret in terms of probabilistic statements.

  7. Credible occurrence probabilities for extreme geophysical events: earthquakes, volcanic eruptions, magnetic storms

    USGS Publications Warehouse

    Love, Jeffrey J.

    2012-01-01

    Statistical analysis is made of rare, extreme geophysical events recorded in historical data -- counting the number of events $k$ with sizes that exceed chosen thresholds during specific durations of time $\\tau$. Under transformations that stabilize data and model-parameter variances, the most likely Poisson-event occurrence rate, $k/\\tau$, applies for frequentist inference and, also, for Bayesian inference with a Jeffreys prior that ensures posterior invariance under changes of variables. Frequentist confidence intervals and Bayesian (Jeffreys) credibility intervals are approximately the same and easy to calculate: $(1/\\tau)[(\\sqrt{k} - z/2)^{2},(\\sqrt{k} + z/2)^{2}]$, where $z$ is a parameter that specifies the width, $z=1$ ($z=2$) corresponding to $1\\sigma$, $68.3\\%$ ($2\\sigma$, $95.4\\%$). If only a few events have been observed, as is usually the case for extreme events, then these "error-bar" intervals might be considered to be relatively wide. From historical records, we estimate most likely long-term occurrence rates, 10-yr occurrence probabilities, and intervals of frequentist confidence and Bayesian credibility for large earthquakes, explosive volcanic eruptions, and magnetic storms.

  8. Variational Gaussian approximation for Poisson data

    NASA Astrophysics Data System (ADS)

    Arridge, Simon R.; Ito, Kazufumi; Jin, Bangti; Zhang, Chen

    2018-02-01

    The Poisson model is frequently employed to describe count data, but in a Bayesian context it leads to an analytically intractable posterior probability distribution. In this work, we analyze a variational Gaussian approximation to the posterior distribution arising from the Poisson model with a Gaussian prior. This is achieved by seeking an optimal Gaussian distribution minimizing the Kullback-Leibler divergence from the posterior distribution to the approximation, or equivalently maximizing the lower bound for the model evidence. We derive an explicit expression for the lower bound, and show the existence and uniqueness of the optimal Gaussian approximation. The lower bound functional can be viewed as a variant of classical Tikhonov regularization that penalizes also the covariance. Then we develop an efficient alternating direction maximization algorithm for solving the optimization problem, and analyze its convergence. We discuss strategies for reducing the computational complexity via low rank structure of the forward operator and the sparsity of the covariance. Further, as an application of the lower bound, we discuss hierarchical Bayesian modeling for selecting the hyperparameter in the prior distribution, and propose a monotonically convergent algorithm for determining the hyperparameter. We present extensive numerical experiments to illustrate the Gaussian approximation and the algorithms.

  9. Sequential bearings-only-tracking initiation with particle filtering method.

    PubMed

    Liu, Bin; Hao, Chengpeng

    2013-01-01

    The tracking initiation problem is examined in the context of autonomous bearings-only-tracking (BOT) of a single appearing/disappearing target in the presence of clutter measurements. In general, this problem suffers from a combinatorial explosion in the number of potential tracks resulted from the uncertainty in the linkage between the target and the measurement (a.k.a the data association problem). In addition, the nonlinear measurements lead to a non-Gaussian posterior probability density function (pdf) in the optimal Bayesian sequential estimation framework. The consequence of this nonlinear/non-Gaussian context is the absence of a closed-form solution. This paper models the linkage uncertainty and the nonlinear/non-Gaussian estimation problem jointly with solid Bayesian formalism. A particle filtering (PF) algorithm is derived for estimating the model's parameters in a sequential manner. Numerical results show that the proposed solution provides a significant benefit over the most commonly used methods, IPDA and IMMPDA. The posterior Cramér-Rao bounds are also involved for performance evaluation.

  10. Itô-SDE MCMC method for Bayesian characterization of errors associated with data limitations in stochastic expansion methods for uncertainty quantification

    NASA Astrophysics Data System (ADS)

    Arnst, M.; Abello Álvarez, B.; Ponthot, J.-P.; Boman, R.

    2017-11-01

    This paper is concerned with the characterization and the propagation of errors associated with data limitations in polynomial-chaos-based stochastic methods for uncertainty quantification. Such an issue can arise in uncertainty quantification when only a limited amount of data is available. When the available information does not suffice to accurately determine the probability distributions that must be assigned to the uncertain variables, the Bayesian method for assigning these probability distributions becomes attractive because it allows the stochastic model to account explicitly for insufficiency of the available information. In previous work, such applications of the Bayesian method had already been implemented by using the Metropolis-Hastings and Gibbs Markov Chain Monte Carlo (MCMC) methods. In this paper, we present an alternative implementation, which uses an alternative MCMC method built around an Itô stochastic differential equation (SDE) that is ergodic for the Bayesian posterior. We draw together from the mathematics literature a number of formal properties of this Itô SDE that lend support to its use in the implementation of the Bayesian method, and we describe its discretization, including the choice of the free parameters, by using the implicit Euler method. We demonstrate the proposed methodology on a problem of uncertainty quantification in a complex nonlinear engineering application relevant to metal forming.

  11. Inverse modeling of hydrologic parameters using surface flux and runoff observations in the Community Land Model

    NASA Astrophysics Data System (ADS)

    Sun, Y.; Hou, Z.; Huang, M.; Tian, F.; Leung, L. Ruby

    2013-12-01

    This study demonstrates the possibility of inverting hydrologic parameters using surface flux and runoff observations in version 4 of the Community Land Model (CLM4). Previous studies showed that surface flux and runoff calculations are sensitive to major hydrologic parameters in CLM4 over different watersheds, and illustrated the necessity and possibility of parameter calibration. Both deterministic least-square fitting and stochastic Markov-chain Monte Carlo (MCMC)-Bayesian inversion approaches are evaluated by applying them to CLM4 at selected sites with different climate and soil conditions. The unknowns to be estimated include surface and subsurface runoff generation parameters and vadose zone soil water parameters. We find that using model parameters calibrated by the sampling-based stochastic inversion approaches provides significant improvements in the model simulations compared to using default CLM4 parameter values, and that as more information comes in, the predictive intervals (ranges of posterior distributions) of the calibrated parameters become narrower. In general, parameters that are identified to be significant through sensitivity analyses and statistical tests are better calibrated than those with weak or nonlinear impacts on flux or runoff observations. Temporal resolution of observations has larger impacts on the results of inverse modeling using heat flux data than runoff data. Soil and vegetation cover have important impacts on parameter sensitivities, leading to different patterns of posterior distributions of parameters at different sites. Overall, the MCMC-Bayesian inversion approach effectively and reliably improves the simulation of CLM under different climates and environmental conditions. Bayesian model averaging of the posterior estimates with different reference acceptance probabilities can smooth the posterior distribution and provide more reliable parameter estimates, but at the expense of wider uncertainty bounds.

  12. Creation of the BMA ensemble for SST using a parallel processing technique

    NASA Astrophysics Data System (ADS)

    Kim, Kwangjin; Lee, Yang Won

    2013-10-01

    Despite the same purpose, each satellite product has different value because of its inescapable uncertainty. Also the satellite products have been calculated for a long time, and the kinds of the products are various and enormous. So the efforts for reducing the uncertainty and dealing with enormous data will be necessary. In this paper, we create an ensemble Sea Surface Temperature (SST) using MODIS Aqua, MODIS Terra and COMS (Communication Ocean and Meteorological Satellite). We used Bayesian Model Averaging (BMA) as ensemble method. The principle of the BMA is synthesizing the conditional probability density function (PDF) using posterior probability as weight. The posterior probability is estimated using EM algorithm. The BMA PDF is obtained by weighted average. As the result, the ensemble SST showed the lowest RMSE and MAE, which proves the applicability of BMA for satellite data ensemble. As future work, parallel processing techniques using Hadoop framework will be adopted for more efficient computation of very big satellite data.

  13. A novel Bayesian framework for discriminative feature extraction in Brain-Computer Interfaces.

    PubMed

    Suk, Heung-Il; Lee, Seong-Whan

    2013-02-01

    As there has been a paradigm shift in the learning load from a human subject to a computer, machine learning has been considered as a useful tool for Brain-Computer Interfaces (BCIs). In this paper, we propose a novel Bayesian framework for discriminative feature extraction for motor imagery classification in an EEG-based BCI in which the class-discriminative frequency bands and the corresponding spatial filters are optimized by means of the probabilistic and information-theoretic approaches. In our framework, the problem of simultaneous spatiospectral filter optimization is formulated as the estimation of an unknown posterior probability density function (pdf) that represents the probability that a single-trial EEG of predefined mental tasks can be discriminated in a state. In order to estimate the posterior pdf, we propose a particle-based approximation method by extending a factored-sampling technique with a diffusion process. An information-theoretic observation model is also devised to measure discriminative power of features between classes. From the viewpoint of classifier design, the proposed method naturally allows us to construct a spectrally weighted label decision rule by linearly combining the outputs from multiple classifiers. We demonstrate the feasibility and effectiveness of the proposed method by analyzing the results and its success on three public databases.

  14. Bayesian adaptive phase II screening design for combination trials

    PubMed Central

    Cai, Chunyan; Yuan, Ying; Johnson, Valen E

    2013-01-01

    Background Trials of combination therapies for the treatment of cancer are playing an increasingly important role in the battle against this disease. To more efficiently handle the large number of combination therapies that must be tested, we propose a novel Bayesian phase II adaptive screening design to simultaneously select among possible treatment combinations involving multiple agents. Methods Our design is based on formulating the selection procedure as a Bayesian hypothesis testing problem in which the superiority of each treatment combination is equated to a single hypothesis. During the trial conduct, we use the current values of the posterior probabilities of all hypotheses to adaptively allocate patients to treatment combinations. Results Simulation studies show that the proposed design substantially outperforms the conventional multiarm balanced factorial trial design. The proposed design yields a significantly higher probability for selecting the best treatment while allocating substantially more patients to efficacious treatments. Limitations The proposed design is most appropriate for the trials combining multiple agents and screening out the efficacious combination to be further investigated. Conclusions The proposed Bayesian adaptive phase II screening design substantially outperformed the conventional complete factorial design. Our design allocates more patients to better treatments while providing higher power to identify the best treatment at the end of the trial. PMID:23359875

  15. Embedding the results of focussed Bayesian fusion into a global context

    NASA Astrophysics Data System (ADS)

    Sander, Jennifer; Heizmann, Michael

    2014-05-01

    Bayesian statistics offers a well-founded and powerful fusion methodology also for the fusion of heterogeneous information sources. However, except in special cases, the needed posterior distribution is not analytically derivable. As consequence, Bayesian fusion may cause unacceptably high computational and storage costs in practice. Local Bayesian fusion approaches aim at reducing the complexity of the Bayesian fusion methodology significantly. This is done by concentrating the actual Bayesian fusion on the potentially most task relevant parts of the domain of the Properties of Interest. Our research on these approaches is motivated by an analogy to criminal investigations where criminalists pursue clues also only locally. This publication follows previous publications on a special local Bayesian fusion technique called focussed Bayesian fusion. Here, the actual calculation of the posterior distribution gets completely restricted to a suitably chosen local context. By this, the global posterior distribution is not completely determined. Strategies for using the results of a focussed Bayesian analysis appropriately are needed. In this publication, we primarily contrast different ways of embedding the results of focussed Bayesian fusion explicitly into a global context. To obtain a unique global posterior distribution, we analyze the application of the Maximum Entropy Principle that has been shown to be successfully applicable in metrology and in different other areas. To address the special need for making further decisions subsequently to the actual fusion task, we further analyze criteria for decision making under partial information.

  16. cosmoabc: Likelihood-free inference for cosmology

    NASA Astrophysics Data System (ADS)

    Ishida, Emille E. O.; Vitenti, Sandro D. P.; Penna-Lima, Mariana; Trindade, Arlindo M.; Cisewski, Jessi; M.; de Souza, Rafael; Cameron, Ewan; Busti, Vinicius C.

    2015-05-01

    Approximate Bayesian Computation (ABC) enables parameter inference for complex physical systems in cases where the true likelihood function is unknown, unavailable, or computationally too expensive. It relies on the forward simulation of mock data and comparison between observed and synthetic catalogs. cosmoabc is a Python Approximate Bayesian Computation (ABC) sampler featuring a Population Monte Carlo variation of the original ABC algorithm, which uses an adaptive importance sampling scheme. The code can be coupled to an external simulator to allow incorporation of arbitrary distance and prior functions. When coupled with the numcosmo library, it has been used to estimate posterior probability distributions over cosmological parameters based on measurements of galaxy clusters number counts without computing the likelihood function.

  17. Posterior Predictive Bayesian Phylogenetic Model Selection

    PubMed Central

    Lewis, Paul O.; Xie, Wangang; Chen, Ming-Hui; Fan, Yu; Kuo, Lynn

    2014-01-01

    We present two distinctly different posterior predictive approaches to Bayesian phylogenetic model selection and illustrate these methods using examples from green algal protein-coding cpDNA sequences and flowering plant rDNA sequences. The Gelfand–Ghosh (GG) approach allows dissection of an overall measure of model fit into components due to posterior predictive variance (GGp) and goodness-of-fit (GGg), which distinguishes this method from the posterior predictive P-value approach. The conditional predictive ordinate (CPO) method provides a site-specific measure of model fit useful for exploratory analyses and can be combined over sites yielding the log pseudomarginal likelihood (LPML) which is useful as an overall measure of model fit. CPO provides a useful cross-validation approach that is computationally efficient, requiring only a sample from the posterior distribution (no additional simulation is required). Both GG and CPO add new perspectives to Bayesian phylogenetic model selection based on the predictive abilities of models and complement the perspective provided by the marginal likelihood (including Bayes Factor comparisons) based solely on the fit of competing models to observed data. [Bayesian; conditional predictive ordinate; CPO; L-measure; LPML; model selection; phylogenetics; posterior predictive.] PMID:24193892

  18. Parameter estimation of multivariate multiple regression model using bayesian with non-informative Jeffreys’ prior distribution

    NASA Astrophysics Data System (ADS)

    Saputro, D. R. S.; Amalia, F.; Widyaningsih, P.; Affan, R. C.

    2018-05-01

    Bayesian method is a method that can be used to estimate the parameters of multivariate multiple regression model. Bayesian method has two distributions, there are prior and posterior distributions. Posterior distribution is influenced by the selection of prior distribution. Jeffreys’ prior distribution is a kind of Non-informative prior distribution. This prior is used when the information about parameter not available. Non-informative Jeffreys’ prior distribution is combined with the sample information resulting the posterior distribution. Posterior distribution is used to estimate the parameter. The purposes of this research is to estimate the parameters of multivariate regression model using Bayesian method with Non-informative Jeffreys’ prior distribution. Based on the results and discussion, parameter estimation of β and Σ which were obtained from expected value of random variable of marginal posterior distribution function. The marginal posterior distributions for β and Σ are multivariate normal and inverse Wishart. However, in calculation of the expected value involving integral of a function which difficult to determine the value. Therefore, approach is needed by generating of random samples according to the posterior distribution characteristics of each parameter using Markov chain Monte Carlo (MCMC) Gibbs sampling algorithm.

  19. Bayesian sample size calculations in phase II clinical trials using a mixture of informative priors.

    PubMed

    Gajewski, Byron J; Mayo, Matthew S

    2006-08-15

    A number of researchers have discussed phase II clinical trials from a Bayesian perspective. A recent article by Mayo and Gajewski focuses on sample size calculations, which they determine by specifying an informative prior distribution and then calculating a posterior probability that the true response will exceed a prespecified target. In this article, we extend these sample size calculations to include a mixture of informative prior distributions. The mixture comes from several sources of information. For example consider information from two (or more) clinicians. The first clinician is pessimistic about the drug and the second clinician is optimistic. We tabulate the results for sample size design using the fact that the simple mixture of Betas is a conjugate family for the Beta- Binomial model. We discuss the theoretical framework for these types of Bayesian designs and show that the Bayesian designs in this paper approximate this theoretical framework. Copyright 2006 John Wiley & Sons, Ltd.

  20. Technical note: Bayesian calibration of dynamic ruminant nutrition models.

    PubMed

    Reed, K F; Arhonditsis, G B; France, J; Kebreab, E

    2016-08-01

    Mechanistic models of ruminant digestion and metabolism have advanced our understanding of the processes underlying ruminant animal physiology. Deterministic modeling practices ignore the inherent variation within and among individual animals and thus have no way to assess how sources of error influence model outputs. We introduce Bayesian calibration of mathematical models to address the need for robust mechanistic modeling tools that can accommodate error analysis by remaining within the bounds of data-based parameter estimation. For the purpose of prediction, the Bayesian approach generates a posterior predictive distribution that represents the current estimate of the value of the response variable, taking into account both the uncertainty about the parameters and model residual variability. Predictions are expressed as probability distributions, thereby conveying significantly more information than point estimates in regard to uncertainty. Our study illustrates some of the technical advantages of Bayesian calibration and discusses the future perspectives in the context of animal nutrition modeling. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  1. Bayesian analyses of seasonal runoff forecasts

    NASA Astrophysics Data System (ADS)

    Krzysztofowicz, R.; Reese, S.

    1991-12-01

    Forecasts of seasonal snowmelt runoff volume provide indispensable information for rational decision making by water project operators, irrigation district managers, and farmers in the western United States. Bayesian statistical models and communication frames have been researched in order to enhance the forecast information disseminated to the users, and to characterize forecast skill from the decision maker's point of view. Four products are presented: (i) a Bayesian Processor of Forecasts, which provides a statistical filter for calibrating the forecasts, and a procedure for estimating the posterior probability distribution of the seasonal runoff; (ii) the Bayesian Correlation Score, a new measure of forecast skill, which is related monotonically to the ex ante economic value of forecasts for decision making; (iii) a statistical predictor of monthly cumulative runoffs within the snowmelt season, conditional on the total seasonal runoff forecast; and (iv) a framing of the forecast message that conveys the uncertainty associated with the forecast estimates to the users. All analyses are illustrated with numerical examples of forecasts for six gauging stations from the period 1971 1988.

  2. Log-Linear Models for Gene Association

    PubMed Central

    Hu, Jianhua; Joshi, Adarsh; Johnson, Valen E.

    2009-01-01

    We describe a class of log-linear models for the detection of interactions in high-dimensional genomic data. This class of models leads to a Bayesian model selection algorithm that can be applied to data that have been reduced to contingency tables using ranks of observations within subjects, and discretization of these ranks within gene/network components. Many normalization issues associated with the analysis of genomic data are thereby avoided. A prior density based on Ewens’ sampling distribution is used to restrict the number of interacting components assigned high posterior probability, and the calculation of posterior model probabilities is expedited by approximations based on the likelihood ratio statistic. Simulation studies are used to evaluate the efficiency of the resulting algorithm for known interaction structures. Finally, the algorithm is validated in a microarray study for which it was possible to obtain biological confirmation of detected interactions. PMID:19655032

  3. BCM: toolkit for Bayesian analysis of Computational Models using samplers.

    PubMed

    Thijssen, Bram; Dijkstra, Tjeerd M H; Heskes, Tom; Wessels, Lodewyk F A

    2016-10-21

    Computational models in biology are characterized by a large degree of uncertainty. This uncertainty can be analyzed with Bayesian statistics, however, the sampling algorithms that are frequently used for calculating Bayesian statistical estimates are computationally demanding, and each algorithm has unique advantages and disadvantages. It is typically unclear, before starting an analysis, which algorithm will perform well on a given computational model. We present BCM, a toolkit for the Bayesian analysis of Computational Models using samplers. It provides efficient, multithreaded implementations of eleven algorithms for sampling from posterior probability distributions and for calculating marginal likelihoods. BCM includes tools to simplify the process of model specification and scripts for visualizing the results. The flexible architecture allows it to be used on diverse types of biological computational models. In an example inference task using a model of the cell cycle based on ordinary differential equations, BCM is significantly more efficient than existing software packages, allowing more challenging inference problems to be solved. BCM represents an efficient one-stop-shop for computational modelers wishing to use sampler-based Bayesian statistics.

  4. RadVel: General toolkit for modeling Radial Velocities

    NASA Astrophysics Data System (ADS)

    Fulton, Benjamin J.; Petigura, Erik A.; Blunt, Sarah; Sinukoff, Evan

    2018-01-01

    RadVel models Keplerian orbits in radial velocity (RV) time series. The code is written in Python with a fast Kepler's equation solver written in C. It provides a framework for fitting RVs using maximum a posteriori optimization and computing robust confidence intervals by sampling the posterior probability density via Markov Chain Monte Carlo (MCMC). RadVel can perform Bayesian model comparison and produces publication quality plots and LaTeX tables.

  5. Estimating Tree Height-Diameter Models with the Bayesian Method

    PubMed Central

    Duan, Aiguo; Zhang, Jianguo; Xiang, Congwei

    2014-01-01

    Six candidate height-diameter models were used to analyze the height-diameter relationships. The common methods for estimating the height-diameter models have taken the classical (frequentist) approach based on the frequency interpretation of probability, for example, the nonlinear least squares method (NLS) and the maximum likelihood method (ML). The Bayesian method has an exclusive advantage compared with classical method that the parameters to be estimated are regarded as random variables. In this study, the classical and Bayesian methods were used to estimate six height-diameter models, respectively. Both the classical method and Bayesian method showed that the Weibull model was the “best” model using data1. In addition, based on the Weibull model, data2 was used for comparing Bayesian method with informative priors with uninformative priors and classical method. The results showed that the improvement in prediction accuracy with Bayesian method led to narrower confidence bands of predicted value in comparison to that for the classical method, and the credible bands of parameters with informative priors were also narrower than uninformative priors and classical method. The estimated posterior distributions for parameters can be set as new priors in estimating the parameters using data2. PMID:24711733

  6. Estimating tree height-diameter models with the Bayesian method.

    PubMed

    Zhang, Xiongqing; Duan, Aiguo; Zhang, Jianguo; Xiang, Congwei

    2014-01-01

    Six candidate height-diameter models were used to analyze the height-diameter relationships. The common methods for estimating the height-diameter models have taken the classical (frequentist) approach based on the frequency interpretation of probability, for example, the nonlinear least squares method (NLS) and the maximum likelihood method (ML). The Bayesian method has an exclusive advantage compared with classical method that the parameters to be estimated are regarded as random variables. In this study, the classical and Bayesian methods were used to estimate six height-diameter models, respectively. Both the classical method and Bayesian method showed that the Weibull model was the "best" model using data1. In addition, based on the Weibull model, data2 was used for comparing Bayesian method with informative priors with uninformative priors and classical method. The results showed that the improvement in prediction accuracy with Bayesian method led to narrower confidence bands of predicted value in comparison to that for the classical method, and the credible bands of parameters with informative priors were also narrower than uninformative priors and classical method. The estimated posterior distributions for parameters can be set as new priors in estimating the parameters using data2.

  7. A bayesian approach to classification criteria for spectacled eiders

    USGS Publications Warehouse

    Taylor, B.L.; Wade, P.R.; Stehn, R.A.; Cochrane, J.F.

    1996-01-01

    To facilitate decisions to classify species according to risk of extinction, we used Bayesian methods to analyze trend data for the Spectacled Eider, an arctic sea duck. Trend data from three independent surveys of the Yukon-Kuskokwim Delta were analyzed individually and in combination to yield posterior distributions for population growth rates. We used classification criteria developed by the recovery team for Spectacled Eiders that seek to equalize errors of under- or overprotecting the species. We conducted both a Bayesian decision analysis and a frequentist (classical statistical inference) decision analysis. Bayesian decision analyses are computationally easier, yield basically the same results, and yield results that are easier to explain to nonscientists. With the exception of the aerial survey analysis of the 10 most recent years, both Bayesian and frequentist methods indicated that an endangered classification is warranted. The discrepancy between surveys warrants further research. Although the trend data are abundance indices, we used a preliminary estimate of absolute abundance to demonstrate how to calculate extinction distributions using the joint probability distributions for population growth rate and variance in growth rate generated by the Bayesian analysis. Recent apparent increases in abundance highlight the need for models that apply to declining and then recovering species.

  8. Bayesian models based on test statistics for multiple hypothesis testing problems.

    PubMed

    Ji, Yuan; Lu, Yiling; Mills, Gordon B

    2008-04-01

    We propose a Bayesian method for the problem of multiple hypothesis testing that is routinely encountered in bioinformatics research, such as the differential gene expression analysis. Our algorithm is based on modeling the distributions of test statistics under both null and alternative hypotheses. We substantially reduce the complexity of the process of defining posterior model probabilities by modeling the test statistics directly instead of modeling the full data. Computationally, we apply a Bayesian FDR approach to control the number of rejections of null hypotheses. To check if our model assumptions for the test statistics are valid for various bioinformatics experiments, we also propose a simple graphical model-assessment tool. Using extensive simulations, we demonstrate the performance of our models and the utility of the model-assessment tool. In the end, we apply the proposed methodology to an siRNA screening and a gene expression experiment.

  9. A Bayesian sequential processor approach to spectroscopic portal system decisions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sale, K; Candy, J; Breitfeller, E

    The development of faster more reliable techniques to detect radioactive contraband in a portal type scenario is an extremely important problem especially in this era of constant terrorist threats. Towards this goal the development of a model-based, Bayesian sequential data processor for the detection problem is discussed. In the sequential processor each datum (detector energy deposit and pulse arrival time) is used to update the posterior probability distribution over the space of model parameters. The nature of the sequential processor approach is that a detection is produced as soon as it is statistically justified by the data rather than waitingmore » for a fixed counting interval before any analysis is performed. In this paper the Bayesian model-based approach, physics and signal processing models and decision functions are discussed along with the first results of our research.« less

  10. Quantifying uncertainty in soot volume fraction estimates using Bayesian inference of auto-correlated laser-induced incandescence measurements

    NASA Astrophysics Data System (ADS)

    Hadwin, Paul J.; Sipkens, T. A.; Thomson, K. A.; Liu, F.; Daun, K. J.

    2016-01-01

    Auto-correlated laser-induced incandescence (AC-LII) infers the soot volume fraction (SVF) of soot particles by comparing the spectral incandescence from laser-energized particles to the pyrometrically inferred peak soot temperature. This calculation requires detailed knowledge of model parameters such as the absorption function of soot, which may vary with combustion chemistry, soot age, and the internal structure of the soot. This work presents a Bayesian methodology to quantify such uncertainties. This technique treats the additional "nuisance" model parameters, including the soot absorption function, as stochastic variables and incorporates the current state of knowledge of these parameters into the inference process through maximum entropy priors. While standard AC-LII analysis provides a point estimate of the SVF, Bayesian techniques infer the posterior probability density, which will allow scientists and engineers to better assess the reliability of AC-LII inferred SVFs in the context of environmental regulations and competing diagnostics.

  11. Bayesian tomography and integrated data analysis in fusion diagnostics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Dong, E-mail: lid@swip.ac.cn; Dong, Y. B.; Deng, Wei

    2016-11-15

    In this article, a Bayesian tomography method using non-stationary Gaussian process for a prior has been introduced. The Bayesian formalism allows quantities which bear uncertainty to be expressed in the probabilistic form so that the uncertainty of a final solution can be fully resolved from the confidence interval of a posterior probability. Moreover, a consistency check of that solution can be performed by checking whether the misfits between predicted and measured data are reasonably within an assumed data error. In particular, the accuracy of reconstructions is significantly improved by using the non-stationary Gaussian process that can adapt to the varyingmore » smoothness of emission distribution. The implementation of this method to a soft X-ray diagnostics on HL-2A has been used to explore relevant physics in equilibrium and MHD instability modes. This project is carried out within a large size inference framework, aiming at an integrated analysis of heterogeneous diagnostics.« less

  12. The maximum entropy method of moments and Bayesian probability theory

    NASA Astrophysics Data System (ADS)

    Bretthorst, G. Larry

    2013-08-01

    The problem of density estimation occurs in many disciplines. For example, in MRI it is often necessary to classify the types of tissues in an image. To perform this classification one must first identify the characteristics of the tissues to be classified. These characteristics might be the intensity of a T1 weighted image and in MRI many other types of characteristic weightings (classifiers) may be generated. In a given tissue type there is no single intensity that characterizes the tissue, rather there is a distribution of intensities. Often this distributions can be characterized by a Gaussian, but just as often it is much more complicated. Either way, estimating the distribution of intensities is an inference problem. In the case of a Gaussian distribution, one must estimate the mean and standard deviation. However, in the Non-Gaussian case the shape of the density function itself must be inferred. Three common techniques for estimating density functions are binned histograms [1, 2], kernel density estimation [3, 4], and the maximum entropy method of moments [5, 6]. In the introduction, the maximum entropy method of moments will be reviewed. Some of its problems and conditions under which it fails will be discussed. Then in later sections, the functional form of the maximum entropy method of moments probability distribution will be incorporated into Bayesian probability theory. It will be shown that Bayesian probability theory solves all of the problems with the maximum entropy method of moments. One gets posterior probabilities for the Lagrange multipliers, and, finally, one can put error bars on the resulting estimated density function.

  13. Bronchoscopic lung-volume reduction with Exhale airway stents for emphysema (EASE trial): randomised, sham-controlled, multicentre trial.

    PubMed

    Shah, P L; Slebos, D-J; Cardoso, P F G; Cetti, E; Voelker, K; Levine, B; Russell, M E; Goldin, J; Brown, M; Cooper, J D; Sybrecht, G W

    2011-09-10

    Airway bypass is a bronchoscopic lung-volume reduction procedure for emphysema whereby transbronchial passages into the lung are created to release trapped air, supported with paclitaxel-coated stents to ease the mechanics of breathing. The aim of the EASE (Exhale airway stents for emphysema) trial was to evaluate safety and efficacy of airway bypass in people with severe homogeneous emphysema. We undertook a randomised, double-blind, sham-controlled study in 38 specialist respiratory centres worldwide. We recruited 315 patients who had severe hyperinflation (ratio of residual volume [RV] to total lung capacity of ≥0·65). By computer using a random number generator, we randomly allocated participants (in a 2:1 ratio) to either airway bypass (n=208) or sham control (107). We divided investigators into team A (masked), who completed pre-procedure and post-procedure assessments, and team B (unmasked), who only did bronchoscopies without further interaction with patients. Participants were followed up for 12 months. The 6-month co-primary efficacy endpoint required 12% or greater improvement in forced vital capacity (FVC) and 1 point or greater decrease in the modified Medical Research Council dyspnoea score from baseline. The composite primary safety endpoint incorporated five severe adverse events. We did Bayesian analysis to show the posterior probability that airway bypass was superior to sham control (success threshold, 0·965). Analysis was by intention to treat. This study is registered with ClinicalTrials.gov, number NCT00391612. All recruited patients were included in the analysis. At 6 months, no difference between treatment arms was noted with respect to the co-primary efficacy endpoint (30 of 208 for airway bypass vs 12 of 107 for sham control; posterior probability 0·749, below the Bayesian success threshold of 0·965). The 6-month composite primary safety endpoint was 14·4% (30 of 208) for airway bypass versus 11·2% (12 of 107) for sham control (judged non-inferior, with a posterior probability of 1·00 [Bayesian success threshold >0·95]). Although our findings showed safety and transient improvements, no sustainable benefit was recorded with airway bypass in patients with severe homogeneous emphysema. Broncus Technologies. Copyright © 2011 Elsevier Ltd. All rights reserved.

  14. BM-Map: Bayesian Mapping of Multireads for Next-Generation Sequencing Data

    PubMed Central

    Ji, Yuan; Xu, Yanxun; Zhang, Qiong; Tsui, Kam-Wah; Yuan, Yuan; Norris, Clift; Liang, Shoudan; Liang, Han

    2011-01-01

    Summary Next-generation sequencing (NGS) technology generates millions of short reads, which provide valuable information for various aspects of cellular activities and biological functions. A key step in NGS applications (e.g., RNA-Seq) is to map short reads to correct genomic locations within the source genome. While most reads are mapped to a unique location, a significant proportion of reads align to multiple genomic locations with equal or similar numbers of mismatches; these are called multireads. The ambiguity in mapping the multireads may lead to bias in downstream analyses. Currently, most practitioners discard the multireads in their analysis, resulting in a loss of valuable information, especially for the genes with similar sequences. To refine the read mapping, we develop a Bayesian model that computes the posterior probability of mapping a multiread to each competing location. The probabilities are used for downstream analyses, such as the quantification of gene expression. We show through simulation studies and RNA-Seq analysis of real life data that the Bayesian method yields better mapping than the current leading methods. We provide a C++ program for downloading that is being packaged into a user-friendly software. PMID:21517792

  15. Iterative updating of model error for Bayesian inversion

    NASA Astrophysics Data System (ADS)

    Calvetti, Daniela; Dunlop, Matthew; Somersalo, Erkki; Stuart, Andrew

    2018-02-01

    In computational inverse problems, it is common that a detailed and accurate forward model is approximated by a computationally less challenging substitute. The model reduction may be necessary to meet constraints in computing time when optimization algorithms are used to find a single estimate, or to speed up Markov chain Monte Carlo (MCMC) calculations in the Bayesian framework. The use of an approximate model introduces a discrepancy, or modeling error, that may have a detrimental effect on the solution of the ill-posed inverse problem, or it may severely distort the estimate of the posterior distribution. In the Bayesian paradigm, the modeling error can be considered as a random variable, and by using an estimate of the probability distribution of the unknown, one may estimate the probability distribution of the modeling error and incorporate it into the inversion. We introduce an algorithm which iterates this idea to update the distribution of the model error, leading to a sequence of posterior distributions that are demonstrated empirically to capture the underlying truth with increasing accuracy. Since the algorithm is not based on rejections, it requires only limited full model evaluations. We show analytically that, in the linear Gaussian case, the algorithm converges geometrically fast with respect to the number of iterations when the data is finite dimensional. For more general models, we introduce particle approximations of the iteratively generated sequence of distributions; we also prove that each element of the sequence converges in the large particle limit under a simplifying assumption. We show numerically that, as in the linear case, rapid convergence occurs with respect to the number of iterations. Additionally, we show through computed examples that point estimates obtained from this iterative algorithm are superior to those obtained by neglecting the model error.

  16. Bayesian functional integral method for inferring continuous data from discrete measurements.

    PubMed

    Heuett, William J; Miller, Bernard V; Racette, Susan B; Holloszy, John O; Chow, Carson C; Periwal, Vipul

    2012-02-08

    Inference of the insulin secretion rate (ISR) from C-peptide measurements as a quantification of pancreatic β-cell function is clinically important in diseases related to reduced insulin sensitivity and insulin action. ISR derived from C-peptide concentration is an example of nonparametric Bayesian model selection where a proposed ISR time-course is considered to be a "model". An inferred value of inaccessible continuous variables from discrete observable data is often problematic in biology and medicine, because it is a priori unclear how robust the inference is to the deletion of data points, and a closely related question, how much smoothness or continuity the data actually support. Predictions weighted by the posterior distribution can be cast as functional integrals as used in statistical field theory. Functional integrals are generally difficult to evaluate, especially for nonanalytic constraints such as positivity of the estimated parameters. We propose a computationally tractable method that uses the exact solution of an associated likelihood function as a prior probability distribution for a Markov-chain Monte Carlo evaluation of the posterior for the full model. As a concrete application of our method, we calculate the ISR from actual clinical C-peptide measurements in human subjects with varying degrees of insulin sensitivity. Our method demonstrates the feasibility of functional integral Bayesian model selection as a practical method for such data-driven inference, allowing the data to determine the smoothing timescale and the width of the prior probability distribution on the space of models. In particular, our model comparison method determines the discrete time-step for interpolation of the unobservable continuous variable that is supported by the data. Attempts to go to finer discrete time-steps lead to less likely models. Copyright © 2012 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  17. VizieR Online Data Catalog: Giant HII regions BOND abundances (Vale Asari+, 2016)

    NASA Astrophysics Data System (ADS)

    Vale Asari, N.; Stasinska, G.; Morisset, C.; Cid Fernandes, R.

    2017-10-01

    BOND determines nitrogen and oxygen gas-phase abundances by using strong and semistrong lines and comparing them to a grid of photoionization models in a Bayesian framework. The code is written in python and its source is publicly available at http://bond.ufsc.br. The grid of models presented here is included in the 3MdB data base (Morisset, Delgado-Inglada & Flores-Fajardo 2015RMxAA..51..103M, see https://sites.google.com/site/mexicanmillionmodels/) under the reference 'BOND'. The Bayesian posterior probability calculated by bond stands on two pillars: our grid of models and our choice of observational constraints (from which we calculate our likelihoods). We discuss each of these in turn. (2 data files).

  18. Bayesian calibration of the Community Land Model using surrogates

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ray, Jaideep; Hou, Zhangshuan; Huang, Maoyi

    2014-02-01

    We present results from the Bayesian calibration of hydrological parameters of the Community Land Model (CLM), which is often used in climate simulations and Earth system models. A statistical inverse problem is formulated for three hydrological parameters, conditional on observations of latent heat surface fluxes over 48 months. Our calibration method uses polynomial and Gaussian process surrogates of the CLM, and solves the parameter estimation problem using a Markov chain Monte Carlo sampler. Posterior probability densities for the parameters are developed for two sites with different soil and vegetation covers. Our method also allows us to examine the structural errormore » in CLM under two error models. We find that surrogate models can be created for CLM in most cases. The posterior distributions are more predictive than the default parameter values in CLM. Climatologically averaging the observations does not modify the parameters' distributions significantly. The structural error model reveals a correlation time-scale which can be used to identify the physical process that could be contributing to it. While the calibrated CLM has a higher predictive skill, the calibration is under-dispersive.« less

  19. Multinomial mixture model with heterogeneous classification probabilities

    USGS Publications Warehouse

    Holland, M.D.; Gray, B.R.

    2011-01-01

    Royle and Link (Ecology 86(9):2505-2512, 2005) proposed an analytical method that allowed estimation of multinomial distribution parameters and classification probabilities from categorical data measured with error. While useful, we demonstrate algebraically and by simulations that this method yields biased multinomial parameter estimates when the probabilities of correct category classifications vary among sampling units. We address this shortcoming by treating these probabilities as logit-normal random variables within a Bayesian framework. We use Markov chain Monte Carlo to compute Bayes estimates from a simulated sample from the posterior distribution. Based on simulations, this elaborated Royle-Link model yields nearly unbiased estimates of multinomial and correct classification probability estimates when classification probabilities are allowed to vary according to the normal distribution on the logit scale or according to the Beta distribution. The method is illustrated using categorical submersed aquatic vegetation data. ?? 2010 Springer Science+Business Media, LLC.

  20. Diagnostics for insufficiencies of posterior calculations in Bayesian signal inference.

    PubMed

    Dorn, Sebastian; Oppermann, Niels; Ensslin, Torsten A

    2013-11-01

    We present an error-diagnostic validation method for posterior distributions in Bayesian signal inference, an advancement of a previous work. It transfers deviations from the correct posterior into characteristic deviations from a uniform distribution of a quantity constructed for this purpose. We show that this method is able to reveal and discriminate several kinds of numerical and approximation errors, as well as their impact on the posterior distribution. For this we present four typical analytical examples of posteriors with incorrect variance, skewness, position of the maximum, or normalization. We show further how this test can be applied to multidimensional signals.

  1. A probabilistic model framework for evaluating year-to-year variation in crop productivity

    NASA Astrophysics Data System (ADS)

    Yokozawa, M.; Iizumi, T.; Tao, F.

    2008-12-01

    Most models describing the relation between crop productivity and weather condition have so far been focused on mean changes of crop yield. For keeping stable food supply against abnormal weather as well as climate change, evaluating the year-to-year variations in crop productivity rather than the mean changes is more essential. We here propose a new framework of probabilistic model based on Bayesian inference and Monte Carlo simulation. As an example, we firstly introduce a model on paddy rice production in Japan. It is called PRYSBI (Process- based Regional rice Yield Simulator with Bayesian Inference; Iizumi et al., 2008). The model structure is the same as that of SIMRIW, which was developed and used widely in Japan. The model includes three sub- models describing phenological development, biomass accumulation and maturing of rice crop. These processes are formulated to include response nature of rice plant to weather condition. This model inherently was developed to predict rice growth and yield at plot paddy scale. We applied it to evaluate the large scale rice production with keeping the same model structure. Alternatively, we assumed the parameters as stochastic variables. In order to let the model catch up actual yield at larger scale, model parameters were determined based on agricultural statistical data of each prefecture of Japan together with weather data averaged over the region. The posterior probability distribution functions (PDFs) of parameters included in the model were obtained using Bayesian inference. The MCMC (Markov Chain Monte Carlo) algorithm was conducted to numerically solve the Bayesian theorem. For evaluating the year-to-year changes in rice growth/yield under this framework, we firstly iterate simulations with set of parameter values sampled from the estimated posterior PDF of each parameter and then take the ensemble mean weighted with the posterior PDFs. We will also present another example for maize productivity in China. The framework proposed here provides us information on uncertainties, possibilities and limitations on future improvements in crop model as well.

  2. Posterior Predictive Model Checking in Bayesian Networks

    ERIC Educational Resources Information Center

    Crawford, Aaron

    2014-01-01

    This simulation study compared the utility of various discrepancy measures within a posterior predictive model checking (PPMC) framework for detecting different types of data-model misfit in multidimensional Bayesian network (BN) models. The investigated conditions were motivated by an applied research program utilizing an operational complex…

  3. Exact Bayesian p-values for a test of independence in a 2 × 2 contingency table with missing data.

    PubMed

    Lin, Yan; Lipsitz, Stuart R; Sinha, Debajyoti; Fitzmaurice, Garrett; Lipshultz, Steven

    2017-01-01

    Altham (Altham PME. Exact Bayesian analysis of a 2 × 2 contingency table, and Fisher's "exact" significance test. J R Stat Soc B 1969; 31: 261-269) showed that a one-sided p-value from Fisher's exact test of independence in a 2 × 2 contingency table is equal to the posterior probability of negative association in the 2 × 2 contingency table under a Bayesian analysis using an improper prior. We derive an extension of Fisher's exact test p-value in the presence of missing data, assuming the missing data mechanism is ignorable (i.e., missing at random or completely at random). Further, we propose Bayesian p-values for a test of independence in a 2 × 2 contingency table with missing data using alternative priors; we also present results from a simulation study exploring the Type I error rate and power of the proposed exact test p-values. An example, using data on the association between blood pressure and a cardiac enzyme, is presented to illustrate the methods.

  4. Enhanced optical alignment of a digital micro mirror device through Bayesian adaptive exploration

    NASA Astrophysics Data System (ADS)

    Wynne, Kevin B.; Knuth, Kevin H.; Petruccelli, Jonathan

    2017-12-01

    As the use of Digital Micro Mirror Devices (DMDs) becomes more prevalent in optics research, the ability to precisely locate the Fourier "footprint" of an image beam at the Fourier plane becomes a pressing need. In this approach, Bayesian adaptive exploration techniques were employed to characterize the size and position of the beam on a DMD located at the Fourier plane. It couples a Bayesian inference engine with an inquiry engine to implement the search. The inquiry engine explores the DMD by engaging mirrors and recording light intensity values based on the maximization of the expected information gain. Using the data collected from this exploration, the Bayesian inference engine updates the posterior probability describing the beam's characteristics. The process is iterated until the beam is located to within the desired precision. This methodology not only locates the center and radius of the beam with remarkable precision but accomplishes the task in far less time than a brute force search. The employed approach has applications to system alignment for both Fourier processing and coded aperture design.

  5. Bayesian seismic tomography by parallel interacting Markov chains

    NASA Astrophysics Data System (ADS)

    Gesret, Alexandrine; Bottero, Alexis; Romary, Thomas; Noble, Mark; Desassis, Nicolas

    2014-05-01

    The velocity field estimated by first arrival traveltime tomography is commonly used as a starting point for further seismological, mineralogical, tectonic or similar analysis. In order to interpret quantitatively the results, the tomography uncertainty values as well as their spatial distribution are required. The estimated velocity model is obtained through inverse modeling by minimizing an objective function that compares observed and computed traveltimes. This step is often performed by gradient-based optimization algorithms. The major drawback of such local optimization schemes, beyond the possibility of being trapped in a local minimum, is that they do not account for the multiple possible solutions of the inverse problem. They are therefore unable to assess the uncertainties linked to the solution. Within a Bayesian (probabilistic) framework, solving the tomography inverse problem aims at estimating the posterior probability density function of velocity model using a global sampling algorithm. Markov chains Monte-Carlo (MCMC) methods are known to produce samples of virtually any distribution. In such a Bayesian inversion, the total number of simulations we can afford is highly related to the computational cost of the forward model. Although fast algorithms have been recently developed for computing first arrival traveltimes of seismic waves, the complete browsing of the posterior distribution of velocity model is hardly performed, especially when it is high dimensional and/or multimodal. In the latter case, the chain may even stay stuck in one of the modes. In order to improve the mixing properties of classical single MCMC, we propose to make interact several Markov chains at different temperatures. This method can make efficient use of large CPU clusters, without increasing the global computational cost with respect to classical MCMC and is therefore particularly suited for Bayesian inversion. The exchanges between the chains allow a precise sampling of the high probability zones of the model space while avoiding the chains to end stuck in a probability maximum. This approach supplies thus a robust way to analyze the tomography imaging uncertainties. The interacting MCMC approach is illustrated on two synthetic examples of tomography of calibration shots such as encountered in induced microseismic studies. On the second application, a wavelet based model parameterization is presented that allows to significantly reduce the dimension of the problem, making thus the algorithm efficient even for a complex velocity model.

  6. How Much Can We Learn from a Single Chromatographic Experiment? A Bayesian Perspective.

    PubMed

    Wiczling, Paweł; Kaliszan, Roman

    2016-01-05

    In this work, we proposed and investigated a Bayesian inference procedure to find the desired chromatographic conditions based on known analyte properties (lipophilicity, pKa, and polar surface area) using one preliminary experiment. A previously developed nonlinear mixed effect model was used to specify the prior information about a new analyte with known physicochemical properties. Further, the prior (no preliminary data) and posterior predictive distribution (prior + one experiment) were determined sequentially to search towards the desired separation. The following isocratic high-performance reversed-phase liquid chromatographic conditions were sought: (1) retention time of a single analyte within the range of 4-6 min and (2) baseline separation of two analytes with retention times within the range of 4-10 min. The empirical posterior Bayesian distribution of parameters was estimated using the "slice sampling" Markov Chain Monte Carlo (MCMC) algorithm implemented in Matlab. The simulations with artificial analytes and experimental data of ketoprofen and papaverine were used to test the proposed methodology. The simulation experiment showed that for a single and two randomly selected analytes, there is 97% and 74% probability of obtaining a successful chromatogram using none or one preliminary experiment. The desired separation for ketoprofen and papaverine was established based on a single experiment. It was confirmed that the search for a desired separation rarely requires a large number of chromatographic analyses at least for a simple optimization problem. The proposed Bayesian-based optimization scheme is a powerful method of finding a desired chromatographic separation based on a small number of preliminary experiments.

  7. Topics in Bayesian Hierarchical Modeling and its Monte Carlo Computations

    NASA Astrophysics Data System (ADS)

    Tak, Hyung Suk

    The first chapter addresses a Beta-Binomial-Logit model that is a Beta-Binomial conjugate hierarchical model with covariate information incorporated via a logistic regression. Various researchers in the literature have unknowingly used improper posterior distributions or have given incorrect statements about posterior propriety because checking posterior propriety can be challenging due to the complicated functional form of a Beta-Binomial-Logit model. We derive data-dependent necessary and sufficient conditions for posterior propriety within a class of hyper-prior distributions that encompass those used in previous studies. Frequency coverage properties of several hyper-prior distributions are also investigated to see when and whether Bayesian interval estimates of random effects meet their nominal confidence levels. The second chapter deals with a time delay estimation problem in astrophysics. When the gravitational field of an intervening galaxy between a quasar and the Earth is strong enough to split light into two or more images, the time delay is defined as the difference between their travel times. The time delay can be used to constrain cosmological parameters and can be inferred from the time series of brightness data of each image. To estimate the time delay, we construct a Gaussian hierarchical model based on a state-space representation for irregularly observed time series generated by a latent continuous-time Ornstein-Uhlenbeck process. Our Bayesian approach jointly infers model parameters via a Gibbs sampler. We also introduce a profile likelihood of the time delay as an approximation of its marginal posterior distribution. The last chapter specifies a repelling-attracting Metropolis algorithm, a new Markov chain Monte Carlo method to explore multi-modal distributions in a simple and fast manner. This algorithm is essentially a Metropolis-Hastings algorithm with a proposal that consists of a downhill move in density that aims to make local modes repelling, followed by an uphill move in density that aims to make local modes attracting. The downhill move is achieved via a reciprocal Metropolis ratio so that the algorithm prefers downward movement. The uphill move does the opposite using the standard Metropolis ratio which prefers upward movement. This down-up movement in density increases the probability of a proposed move to a different mode.

  8. A Bayesian Analysis of a Randomized Clinical Trial Comparing Antimetabolite Therapies for Non-Infectious Uveitis.

    PubMed

    Browne, Erica N; Rathinam, Sivakumar R; Kanakath, Anuradha; Thundikandy, Radhika; Babu, Manohar; Lietman, Thomas M; Acharya, Nisha R

    2017-02-01

    To conduct a Bayesian analysis of a randomized clinical trial (RCT) for non-infectious uveitis using expert opinion as a subjective prior belief. A RCT was conducted to determine which antimetabolite, methotrexate or mycophenolate mofetil, is more effective as an initial corticosteroid-sparing agent for the treatment of intermediate, posterior, and pan-uveitis. Before the release of trial results, expert opinion on the relative effectiveness of these two medications was collected via online survey. Members of the American Uveitis Society executive committee were invited to provide an estimate for the relative decrease in efficacy with a 95% credible interval (CrI). A prior probability distribution was created from experts' estimates. A Bayesian analysis was performed using the constructed expert prior probability distribution and the trial's primary outcome. A total of 11 of the 12 invited uveitis specialists provided estimates. Eight of 11 experts (73%) believed mycophenolate mofetil is more effective. The group prior belief was that the odds of treatment success for patients taking mycophenolate mofetil were 1.4-fold the odds of those taking methotrexate (95% CrI 0.03-45.0). The odds of treatment success with mycophenolate mofetil compared to methotrexate was 0.4 from the RCT (95% confidence interval 0.1-1.2) and 0.7 (95% CrI 0.2-1.7) from the Bayesian analysis. A Bayesian analysis combining expert belief with the trial's result did not indicate preference for one drug. However, the wide credible interval leaves open the possibility of a substantial treatment effect. This suggests clinical equipoise necessary to allow a larger, more definitive RCT.

  9. Geotechnical parameter spatial distribution stochastic analysis based on multi-precision information assimilation

    NASA Astrophysics Data System (ADS)

    Wang, C.; Rubin, Y.

    2014-12-01

    Spatial distribution of important geotechnical parameter named compression modulus Es contributes considerably to the understanding of the underlying geological processes and the adequate assessment of the Es mechanics effects for differential settlement of large continuous structure foundation. These analyses should be derived using an assimilating approach that combines in-situ static cone penetration test (CPT) with borehole experiments. To achieve such a task, the Es distribution of stratum of silty clay in region A of China Expo Center (Shanghai) is studied using the Bayesian-maximum entropy method. This method integrates rigorously and efficiently multi-precision of different geotechnical investigations and sources of uncertainty. Single CPT samplings were modeled as a rational probability density curve by maximum entropy theory. Spatial prior multivariate probability density function (PDF) and likelihood PDF of the CPT positions were built by borehole experiments and the potential value of the prediction point, then, preceding numerical integration on the CPT probability density curves, the posterior probability density curve of the prediction point would be calculated by the Bayesian reverse interpolation framework. The results were compared between Gaussian Sequential Stochastic Simulation and Bayesian methods. The differences were also discussed between single CPT samplings of normal distribution and simulated probability density curve based on maximum entropy theory. It is shown that the study of Es spatial distributions can be improved by properly incorporating CPT sampling variation into interpolation process, whereas more informative estimations are generated by considering CPT Uncertainty for the estimation points. Calculation illustrates the significance of stochastic Es characterization in a stratum, and identifies limitations associated with inadequate geostatistical interpolation techniques. This characterization results will provide a multi-precision information assimilation method of other geotechnical parameters.

  10. Inverse Modeling of Hydrologic Parameters Using Surface Flux and Runoff Observations in the Community Land Model

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sun, Yu; Hou, Zhangshuan; Huang, Maoyi

    2013-12-10

    This study demonstrates the possibility of inverting hydrologic parameters using surface flux and runoff observations in version 4 of the Community Land Model (CLM4). Previous studies showed that surface flux and runoff calculations are sensitive to major hydrologic parameters in CLM4 over different watersheds, and illustrated the necessity and possibility of parameter calibration. Two inversion strategies, the deterministic least-square fitting and stochastic Markov-Chain Monte-Carlo (MCMC) - Bayesian inversion approaches, are evaluated by applying them to CLM4 at selected sites. The unknowns to be estimated include surface and subsurface runoff generation parameters and vadose zone soil water parameters. We find thatmore » using model parameters calibrated by the least-square fitting provides little improvements in the model simulations but the sampling-based stochastic inversion approaches are consistent - as more information comes in, the predictive intervals of the calibrated parameters become narrower and the misfits between the calculated and observed responses decrease. In general, parameters that are identified to be significant through sensitivity analyses and statistical tests are better calibrated than those with weak or nonlinear impacts on flux or runoff observations. Temporal resolution of observations has larger impacts on the results of inverse modeling using heat flux data than runoff data. Soil and vegetation cover have important impacts on parameter sensitivities, leading to the different patterns of posterior distributions of parameters at different sites. Overall, the MCMC-Bayesian inversion approach effectively and reliably improves the simulation of CLM under different climates and environmental conditions. Bayesian model averaging of the posterior estimates with different reference acceptance probabilities can smooth the posterior distribution and provide more reliable parameter estimates, but at the expense of wider uncertainty bounds.« less

  11. Hydrologic Model Selection using Markov chain Monte Carlo methods

    NASA Astrophysics Data System (ADS)

    Marshall, L.; Sharma, A.; Nott, D.

    2002-12-01

    Estimation of parameter uncertainty (and in turn model uncertainty) allows assessment of the risk in likely applications of hydrological models. Bayesian statistical inference provides an ideal means of assessing parameter uncertainty whereby prior knowledge about the parameter is combined with information from the available data to produce a probability distribution (the posterior distribution) that describes uncertainty about the parameter and serves as a basis for selecting appropriate values for use in modelling applications. Widespread use of Bayesian techniques in hydrology has been hindered by difficulties in summarizing and exploring the posterior distribution. These difficulties have been largely overcome by recent advances in Markov chain Monte Carlo (MCMC) methods that involve random sampling of the posterior distribution. This study presents an adaptive MCMC sampling algorithm which has characteristics that are well suited to model parameters with a high degree of correlation and interdependence, as is often evident in hydrological models. The MCMC sampling technique is used to compare six alternative configurations of a commonly used conceptual rainfall-runoff model, the Australian Water Balance Model (AWBM), using 11 years of daily rainfall runoff data from the Bass river catchment in Australia. The alternative configurations considered fall into two classes - those that consider model errors to be independent of prior values, and those that model the errors as an autoregressive process. Each such class consists of three formulations that represent increasing levels of complexity (and parameterisation) of the original model structure. The results from this study point both to the importance of using Bayesian approaches in evaluating model performance, as well as the simplicity of the MCMC sampling framework that has the ability to bring such approaches within the reach of the applied hydrological community.

  12. The effect of business improvement districts on the incidence of violent crimes

    PubMed Central

    Golinelli, Daniela; Stokes, Robert J; Bluthenthal, Ricky

    2010-01-01

    Objective To examine whether business improvement districts (BID) contributed to greater than expected declines in the incidence of violent crimes in affected neighbourhoods. Method A Bayesian hierarchical model was used to assess the changes in the incidence of violent crimes between 1994 and 2005 and the implementation of 30 BID in Los Angeles neighbourhoods. Results The implementation of BID was associated with a 12% reduction in the incidence of robbery (95% posterior probability interval −2 to 24) and an 8% reduction in the total incidence of violent crimes (95% posterior probability interval −5 to 21). The strength of the effect of BID on robbery crimes varied by location. Conclusion These findings indicate that the implementation of BID can reduce the incidence of violent crimes likely to result in injury to individuals. The findings also indicate that the establishment of a BID by itself is not a panacea, and highlight the importance of targeting BID efforts to crime prevention interventions that reduce violence exposure associated with criminal behaviours. PMID:20587814

  13. The effect of business improvement districts on the incidence of violent crimes.

    PubMed

    MacDonald, John; Golinelli, Daniela; Stokes, Robert J; Bluthenthal, Ricky

    2010-10-01

    To examine whether business improvement districts (BID) contributed to greater than expected declines in the incidence of violent crimes in affected neighbourhoods. A Bayesian hierarchical model was used to assess the changes in the incidence of violent crimes between 1994 and 2005 and the implementation of 30 BID in Los Angeles neighbourhoods. The implementation of BID was associated with a 12% reduction in the incidence of robbery (95% posterior probability interval -2 to 24) and an 8% reduction in the total incidence of violent crimes (95% posterior probability interval -5 to 21). The strength of the effect of BID on robbery crimes varied by location. These findings indicate that the implementation of BID can reduce the incidence of violent crimes likely to result in injury to individuals. The findings also indicate that the establishment of a BID by itself is not a panacea, and highlight the importance of targeting BID efforts to crime prevention interventions that reduce violence exposure associated with criminal behaviours.

  14. Basics of Bayesian methods.

    PubMed

    Ghosh, Sujit K

    2010-01-01

    Bayesian methods are rapidly becoming popular tools for making statistical inference in various fields of science including biology, engineering, finance, and genetics. One of the key aspects of Bayesian inferential method is its logical foundation that provides a coherent framework to utilize not only empirical but also scientific information available to a researcher. Prior knowledge arising from scientific background, expert judgment, or previously collected data is used to build a prior distribution which is then combined with current data via the likelihood function to characterize the current state of knowledge using the so-called posterior distribution. Bayesian methods allow the use of models of complex physical phenomena that were previously too difficult to estimate (e.g., using asymptotic approximations). Bayesian methods offer a means of more fully understanding issues that are central to many practical problems by allowing researchers to build integrated models based on hierarchical conditional distributions that can be estimated even with limited amounts of data. Furthermore, advances in numerical integration methods, particularly those based on Monte Carlo methods, have made it possible to compute the optimal Bayes estimators. However, there is a reasonably wide gap between the background of the empirically trained scientists and the full weight of Bayesian statistical inference. Hence, one of the goals of this chapter is to bridge the gap by offering elementary to advanced concepts that emphasize linkages between standard approaches and full probability modeling via Bayesian methods.

  15. eDNAoccupancy: An R package for multi-scale occupancy modeling of environmental DNA data

    USGS Publications Warehouse

    Dorazio, Robert; Erickson, Richard A.

    2017-01-01

    In this article we describe eDNAoccupancy, an R package for fitting Bayesian, multi-scale occupancy models. These models are appropriate for occupancy surveys that include three, nested levels of sampling: primary sample units within a study area, secondary sample units collected from each primary unit, and replicates of each secondary sample unit. This design is commonly used in occupancy surveys of environmental DNA (eDNA). eDNAoccupancy allows users to specify and fit multi-scale occupancy models with or without covariates, to estimate posterior summaries of occurrence and detection probabilities, and to compare different models using Bayesian model-selection criteria. We illustrate these features by analyzing two published data sets: eDNA surveys of a fungal pathogen of amphibians and eDNA surveys of an endangered fish species.

  16. Bayesian source tracking via focalization and marginalization in an uncertain Mediterranean Sea environment.

    PubMed

    Dosso, Stan E; Wilmut, Michael J; Nielsen, Peter L

    2010-07-01

    This paper applies Bayesian source tracking in an uncertain environment to Mediterranean Sea data, and investigates the resulting tracks and track uncertainties as a function of data information content (number of data time-segments, number of frequencies, and signal-to-noise ratio) and of prior information (environmental uncertainties and source-velocity constraints). To track low-level sources, acoustic data recorded for multiple time segments (corresponding to multiple source positions along the track) are inverted simultaneously. Environmental uncertainty is addressed by including unknown water-column and seabed properties as nuisance parameters in an augmented inversion. Two approaches are considered: Focalization-tracking maximizes the posterior probability density (PPD) over the unknown source and environmental parameters. Marginalization-tracking integrates the PPD over environmental parameters to obtain a sequence of joint marginal probability distributions over source coordinates, from which the most-probable track and track uncertainties can be extracted. Both approaches apply track constraints on the maximum allowable vertical and radial source velocity. The two approaches are applied for towed-source acoustic data recorded at a vertical line array at a shallow-water test site in the Mediterranean Sea where previous geoacoustic studies have been carried out.

  17. Phylogeny of sipunculan worms: A combined analysis of four gene regions and morphology.

    PubMed

    Schulze, Anja; Cutler, Edward B; Giribet, Gonzalo

    2007-01-01

    The intra-phyletic relationships of sipunculan worms were analyzed based on DNA sequence data from four gene regions and 58 morphological characters. Initially we analyzed the data under direct optimization using parsimony as optimality criterion. An implied alignment resulting from the direct optimization analysis was subsequently utilized to perform a Bayesian analysis with mixed models for the different data partitions. For this we applied a doublet model for the stem regions of the 18S rRNA. Both analyses support monophyly of Sipuncula and most of the same clades within the phylum. The analyses differ with respect to the relationships among the major groups but whereas the deep nodes in the direct optimization analysis generally show low jackknife support, they are supported by 100% posterior probability in the Bayesian analysis. Direct optimization has been useful for handling sequences of unequal length and generating conservative phylogenetic hypotheses whereas the Bayesian analysis under mixed models provided high resolution in the basal nodes of the tree.

  18. Model Diagnostics for Bayesian Networks

    ERIC Educational Resources Information Center

    Sinharay, Sandip

    2006-01-01

    Bayesian networks are frequently used in educational assessments primarily for learning about students' knowledge and skills. There is a lack of works on assessing fit of Bayesian networks. This article employs the posterior predictive model checking method, a popular Bayesian model checking tool, to assess fit of simple Bayesian networks. A…

  19. A general Bayesian framework for calibrating and evaluating stochastic models of annual multi-site hydrological data

    NASA Astrophysics Data System (ADS)

    Frost, Andrew J.; Thyer, Mark A.; Srikanthan, R.; Kuczera, George

    2007-07-01

    SummaryMulti-site simulation of hydrological data are required for drought risk assessment of large multi-reservoir water supply systems. In this paper, a general Bayesian framework is presented for the calibration and evaluation of multi-site hydrological data at annual timescales. Models included within this framework are the hidden Markov model (HMM) and the widely used lag-1 autoregressive (AR(1)) model. These models are extended by the inclusion of a Box-Cox transformation and a spatial correlation function in a multi-site setting. Parameter uncertainty is evaluated using Markov chain Monte Carlo techniques. Models are evaluated by their ability to reproduce a range of important extreme statistics and compared using Bayesian model selection techniques which evaluate model probabilities. The case study, using multi-site annual rainfall data situated within catchments which contribute to Sydney's main water supply, provided the following results: Firstly, in terms of model probabilities and diagnostics, the inclusion of the Box-Cox transformation was preferred. Secondly the AR(1) and HMM performed similarly, while some other proposed AR(1)/HMM models with regionally pooled parameters had greater posterior probability than these two models. The practical significance of parameter and model uncertainty was illustrated using a case study involving drought security analysis for urban water supply. It was shown that ignoring parameter uncertainty resulted in a significant overestimate of reservoir yield and an underestimation of system vulnerability to severe drought.

  20. A Bayesian approach to the modelling of α Cen A

    NASA Astrophysics Data System (ADS)

    Bazot, M.; Bourguignon, S.; Christensen-Dalsgaard, J.

    2012-12-01

    Determining the physical characteristics of a star is an inverse problem consisting of estimating the parameters of models for the stellar structure and evolution, and knowing certain observable quantities. We use a Bayesian approach to solve this problem for α Cen A, which allows us to incorporate prior information on the parameters to be estimated, in order to better constrain the problem. Our strategy is based on the use of a Markov chain Monte Carlo (MCMC) algorithm to estimate the posterior probability densities of the stellar parameters: mass, age, initial chemical composition, etc. We use the stellar evolutionary code ASTEC to model the star. To constrain this model both seismic and non-seismic observations were considered. Several different strategies were tested to fit these values, using either two free parameters or five free parameters in ASTEC. We are thus able to show evidence that MCMC methods become efficient with respect to more classical grid-based strategies when the number of parameters increases. The results of our MCMC algorithm allow us to derive estimates for the stellar parameters and robust uncertainties thanks to the statistical analysis of the posterior probability densities. We are also able to compute odds for the presence of a convective core in α Cen A. When using core-sensitive seismic observational constraints, these can rise above ˜40 per cent. The comparison of results to previous studies also indicates that these seismic constraints are of critical importance for our knowledge of the structure of this star.

  1. Limitations of polynomial chaos expansions in the Bayesian solution of inverse problems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lu, Fei; Department of Mathematics, University of California, Berkeley; Morzfeld, Matthias, E-mail: mmo@math.lbl.gov

    2015-02-01

    Polynomial chaos expansions are used to reduce the computational cost in the Bayesian solutions of inverse problems by creating a surrogate posterior that can be evaluated inexpensively. We show, by analysis and example, that when the data contain significant information beyond what is assumed in the prior, the surrogate posterior can be very different from the posterior, and the resulting estimates become inaccurate. One can improve the accuracy by adaptively increasing the order of the polynomial chaos, but the cost may increase too fast for this to be cost effective compared to Monte Carlo sampling without a surrogate posterior.

  2. Confident difference criterion: a new Bayesian differentially expressed gene selection algorithm with applications.

    PubMed

    Yu, Fang; Chen, Ming-Hui; Kuo, Lynn; Talbott, Heather; Davis, John S

    2015-08-07

    Recently, the Bayesian method becomes more popular for analyzing high dimensional gene expression data as it allows us to borrow information across different genes and provides powerful estimators for evaluating gene expression levels. It is crucial to develop a simple but efficient gene selection algorithm for detecting differentially expressed (DE) genes based on the Bayesian estimators. In this paper, by extending the two-criterion idea of Chen et al. (Chen M-H, Ibrahim JG, Chi Y-Y. A new class of mixture models for differential gene expression in DNA microarray data. J Stat Plan Inference. 2008;138:387-404), we propose two new gene selection algorithms for general Bayesian models and name these new methods as the confident difference criterion methods. One is based on the standardized differences between two mean expression values among genes; the other adds the differences between two variances to it. The proposed confident difference criterion methods first evaluate the posterior probability of a gene having different gene expressions between competitive samples and then declare a gene to be DE if the posterior probability is large. The theoretical connection between the proposed first method based on the means and the Bayes factor approach proposed by Yu et al. (Yu F, Chen M-H, Kuo L. Detecting differentially expressed genes using alibrated Bayes factors. Statistica Sinica. 2008;18:783-802) is established under the normal-normal-model with equal variances between two samples. The empirical performance of the proposed methods is examined and compared to those of several existing methods via several simulations. The results from these simulation studies show that the proposed confident difference criterion methods outperform the existing methods when comparing gene expressions across different conditions for both microarray studies and sequence-based high-throughput studies. A real dataset is used to further demonstrate the proposed methodology. In the real data application, the confident difference criterion methods successfully identified more clinically important DE genes than the other methods. The confident difference criterion method proposed in this paper provides a new efficient approach for both microarray studies and sequence-based high-throughput studies to identify differentially expressed genes.

  3. Efficient Bayesian parameter estimation with implicit sampling and surrogate modeling for a vadose zone hydrological problem

    NASA Astrophysics Data System (ADS)

    Liu, Y.; Pau, G. S. H.; Finsterle, S.

    2015-12-01

    Parameter inversion involves inferring the model parameter values based on sparse observations of some observables. To infer the posterior probability distributions of the parameters, Markov chain Monte Carlo (MCMC) methods are typically used. However, the large number of forward simulations needed and limited computational resources limit the complexity of the hydrological model we can use in these methods. In view of this, we studied the implicit sampling (IS) method, an efficient importance sampling technique that generates samples in the high-probability region of the posterior distribution and thus reduces the number of forward simulations that we need to run. For a pilot-point inversion of a heterogeneous permeability field based on a synthetic ponded infiltration experiment simu­lated with TOUGH2 (a subsurface modeling code), we showed that IS with linear map provides an accurate Bayesian description of the parameterized permeability field at the pilot points with just approximately 500 forward simulations. We further studied the use of surrogate models to improve the computational efficiency of parameter inversion. We implemented two reduced-order models (ROMs) for the TOUGH2 forward model. One is based on polynomial chaos expansion (PCE), of which the coefficients are obtained using the sparse Bayesian learning technique to mitigate the "curse of dimensionality" of the PCE terms. The other model is Gaussian process regression (GPR) for which different covariance, likelihood and inference models are considered. Preliminary results indicate that ROMs constructed based on the prior parameter space perform poorly. It is thus impractical to replace this hydrological model by a ROM directly in a MCMC method. However, the IS method can work with a ROM constructed for parameters in the close vicinity of the maximum a posteriori probability (MAP) estimate. We will discuss the accuracy and computational efficiency of using ROMs in the implicit sampling procedure for the hydrological problem considered. This work was supported, in part, by the U.S. Dept. of Energy under Contract No. DE-AC02-05CH11231

  4. How Reliable is Bayesian Model Averaging Under Noisy Data? Statistical Assessment and Implications for Robust Model Selection

    NASA Astrophysics Data System (ADS)

    Schöniger, Anneli; Wöhling, Thomas; Nowak, Wolfgang

    2014-05-01

    Bayesian model averaging ranks the predictive capabilities of alternative conceptual models based on Bayes' theorem. The individual models are weighted with their posterior probability to be the best one in the considered set of models. Finally, their predictions are combined into a robust weighted average and the predictive uncertainty can be quantified. This rigorous procedure does, however, not yet account for possible instabilities due to measurement noise in the calibration data set. This is a major drawback, since posterior model weights may suffer a lack of robustness related to the uncertainty in noisy data, which may compromise the reliability of model ranking. We present a new statistical concept to account for measurement noise as source of uncertainty for the weights in Bayesian model averaging. Our suggested upgrade reflects the limited information content of data for the purpose of model selection. It allows us to assess the significance of the determined posterior model weights, the confidence in model selection, and the accuracy of the quantified predictive uncertainty. Our approach rests on a brute-force Monte Carlo framework. We determine the robustness of model weights against measurement noise by repeatedly perturbing the observed data with random realizations of measurement error. Then, we analyze the induced variability in posterior model weights and introduce this "weighting variance" as an additional term into the overall prediction uncertainty analysis scheme. We further determine the theoretical upper limit in performance of the model set which is imposed by measurement noise. As an extension to the merely relative model ranking, this analysis provides a measure of absolute model performance. To finally decide, whether better data or longer time series are needed to ensure a robust basis for model selection, we resample the measurement time series and assess the convergence of model weights for increasing time series length. We illustrate our suggested approach with an application to model selection between different soil-plant models following up on a study by Wöhling et al. (2013). Results show that measurement noise compromises the reliability of model ranking and causes a significant amount of weighting uncertainty, if the calibration data time series is not long enough to compensate for its noisiness. This additional contribution to the overall predictive uncertainty is neglected without our approach. Thus, we strongly advertise to include our suggested upgrade in the Bayesian model averaging routine.

  5. RadVel: The Radial Velocity Modeling Toolkit

    NASA Astrophysics Data System (ADS)

    Fulton, Benjamin J.; Petigura, Erik A.; Blunt, Sarah; Sinukoff, Evan

    2018-04-01

    RadVel is an open-source Python package for modeling Keplerian orbits in radial velocity (RV) timeseries. RadVel provides a convenient framework to fit RVs using maximum a posteriori optimization and to compute robust confidence intervals by sampling the posterior probability density via Markov Chain Monte Carlo (MCMC). RadVel allows users to float or fix parameters, impose priors, and perform Bayesian model comparison. We have implemented real-time MCMC convergence tests to ensure adequate sampling of the posterior. RadVel can output a number of publication-quality plots and tables. Users may interface with RadVel through a convenient command-line interface or directly from Python. The code is object-oriented and thus naturally extensible. We encourage contributions from the community. Documentation is available at http://radvel.readthedocs.io.

  6. Determining X-ray source intensity and confidence bounds in crowded fields

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Primini, F. A.; Kashyap, V. L., E-mail: fap@head.cfa.harvard.edu

    We present a rigorous description of the general problem of aperture photometry in high-energy astrophysics photon-count images, in which the statistical noise model is Poisson, not Gaussian. We compute the full posterior probability density function for the expected source intensity for various cases of interest, including the important cases in which both source and background apertures contain contributions from the source, and when multiple source apertures partially overlap. A Bayesian approach offers the advantages of allowing one to (1) include explicit prior information on source intensities, (2) propagate posterior distributions as priors for future observations, and (3) use Poisson likelihoods,more » making the treatment valid in the low-counts regime. Elements of this approach have been implemented in the Chandra Source Catalog.« less

  7. Bayesian assessment of uncertainty in aerosol size distributions and index of refraction retrieved from multiwavelength lidar measurements.

    PubMed

    Herman, Benjamin R; Gross, Barry; Moshary, Fred; Ahmed, Samir

    2008-04-01

    We investigate the assessment of uncertainty in the inference of aerosol size distributions from backscatter and extinction measurements that can be obtained from a modern elastic/Raman lidar system with a Nd:YAG laser transmitter. To calculate the uncertainty, an analytic formula for the correlated probability density function (PDF) describing the error for an optical coefficient ratio is derived based on a normally distributed fractional error in the optical coefficients. Assuming a monomodal lognormal particle size distribution of spherical, homogeneous particles with a known index of refraction, we compare the assessment of uncertainty using a more conventional forward Monte Carlo method with that obtained from a Bayesian posterior PDF assuming a uniform prior PDF and show that substantial differences between the two methods exist. In addition, we use the posterior PDF formalism, which was extended to include an unknown refractive index, to find credible sets for a variety of optical measurement scenarios. We find the uncertainty is greatly reduced with the addition of suitable extinction measurements in contrast to the inclusion of extra backscatter coefficients, which we show to have a minimal effect and strengthens similar observations based on numerical regularization methods.

  8. Computational statistics using the Bayesian Inference Engine

    NASA Astrophysics Data System (ADS)

    Weinberg, Martin D.

    2013-09-01

    This paper introduces the Bayesian Inference Engine (BIE), a general parallel, optimized software package for parameter inference and model selection. This package is motivated by the analysis needs of modern astronomical surveys and the need to organize and reuse expensive derived data. The BIE is the first platform for computational statistics designed explicitly to enable Bayesian update and model comparison for astronomical problems. Bayesian update is based on the representation of high-dimensional posterior distributions using metric-ball-tree based kernel density estimation. Among its algorithmic offerings, the BIE emphasizes hybrid tempered Markov chain Monte Carlo schemes that robustly sample multimodal posterior distributions in high-dimensional parameter spaces. Moreover, the BIE implements a full persistence or serialization system that stores the full byte-level image of the running inference and previously characterized posterior distributions for later use. Two new algorithms to compute the marginal likelihood from the posterior distribution, developed for and implemented in the BIE, enable model comparison for complex models and data sets. Finally, the BIE was designed to be a collaborative platform for applying Bayesian methodology to astronomy. It includes an extensible object-oriented and easily extended framework that implements every aspect of the Bayesian inference. By providing a variety of statistical algorithms for all phases of the inference problem, a scientist may explore a variety of approaches with a single model and data implementation. Additional technical details and download details are available from http://www.astro.umass.edu/bie. The BIE is distributed under the GNU General Public License.

  9. A Bayesian Method for Evaluating Passing Scores: The PPoP Curve

    ERIC Educational Resources Information Center

    Wainer, Howard; Wang, X. A.; Skorupski, William P.; Bradlow, Eric T.

    2005-01-01

    In this note, we demonstrate an interesting use of the posterior distributions (and corresponding posterior samples of proficiency) that are yielded by fitting a fully Bayesian test scoring model to a complex assessment. Specifically, we examine the efficacy of the test in combination with the specific passing score that was chosen through expert…

  10. A Bayesian observer replicates convexity context effects in figure-ground perception.

    PubMed

    Goldreich, Daniel; Peterson, Mary A

    2012-01-01

    Peterson and Salvagio (2008) demonstrated convexity context effects in figure-ground perception. Subjects shown displays consisting of unfamiliar alternating convex and concave regions identified the convex regions as foreground objects progressively more frequently as the number of regions increased; this occurred only when the concave regions were homogeneously colored. The origins of these effects have been unclear. Here, we present a two-free-parameter Bayesian observer that replicates convexity context effects. The Bayesian observer incorporates two plausible expectations regarding three-dimensional scenes: (1) objects tend to be convex rather than concave, and (2) backgrounds tend (more than foreground objects) to be homogeneously colored. The Bayesian observer estimates the probability that a depicted scene is three-dimensional, and that the convex regions are figures. It responds stochastically by sampling from its posterior distributions. Like human observers, the Bayesian observer shows convexity context effects only for images with homogeneously colored concave regions. With optimal parameter settings, it performs similarly to the average human subject on the four display types tested. We propose that object convexity and background color homogeneity are environmental regularities exploited by human visual perception; vision achieves figure-ground perception by interpreting ambiguous images in light of these and other expected regularities in natural scenes.

  11. BOP2: Bayesian optimal design for phase II clinical trials with simple and complex endpoints.

    PubMed

    Zhou, Heng; Lee, J Jack; Yuan, Ying

    2017-09-20

    We propose a flexible Bayesian optimal phase II (BOP2) design that is capable of handling simple (e.g., binary) and complicated (e.g., ordinal, nested, and co-primary) endpoints under a unified framework. We use a Dirichlet-multinomial model to accommodate different types of endpoints. At each interim, the go/no-go decision is made by evaluating a set of posterior probabilities of the events of interest, which is optimized to maximize power or minimize the number of patients under the null hypothesis. Unlike other existing Bayesian designs, the BOP2 design explicitly controls the type I error rate, thereby bridging the gap between Bayesian designs and frequentist designs. In addition, the stopping boundary of the BOP2 design can be enumerated prior to the onset of the trial. These features make the BOP2 design accessible to a wide range of users and regulatory agencies and particularly easy to implement in practice. Simulation studies show that the BOP2 design has favorable operating characteristics with higher power and lower risk of incorrectly terminating the trial than some existing Bayesian phase II designs. The software to implement the BOP2 design is freely available at www.trialdesign.org. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  12. Hierarchical Bayesian sparse image reconstruction with application to MRFM.

    PubMed

    Dobigeon, Nicolas; Hero, Alfred O; Tourneret, Jean-Yves

    2009-09-01

    This paper presents a hierarchical Bayesian model to reconstruct sparse images when the observations are obtained from linear transformations and corrupted by an additive white Gaussian noise. Our hierarchical Bayes model is well suited to such naturally sparse image applications as it seamlessly accounts for properties such as sparsity and positivity of the image via appropriate Bayes priors. We propose a prior that is based on a weighted mixture of a positive exponential distribution and a mass at zero. The prior has hyperparameters that are tuned automatically by marginalization over the hierarchical Bayesian model. To overcome the complexity of the posterior distribution, a Gibbs sampling strategy is proposed. The Gibbs samples can be used to estimate the image to be recovered, e.g., by maximizing the estimated posterior distribution. In our fully Bayesian approach, the posteriors of all the parameters are available. Thus, our algorithm provides more information than other previously proposed sparse reconstruction methods that only give a point estimate. The performance of the proposed hierarchical Bayesian sparse reconstruction method is illustrated on synthetic data and real data collected from a tobacco virus sample using a prototype MRFM instrument.

  13. A Bayesian Analysis of a Randomized Clinical Trial Comparing Antimetabolite Therapies for Non-Infectious Uveitis

    PubMed Central

    Browne, Erica N; Rathinam, Sivakumar R; Kanakath, Anuradha; Thundikandy, Radhika; Babu, Manohar; Lietman, Thomas M; Acharya, Nisha R

    2017-01-01

    Purpose To conduct a Bayesian analysis of a randomized clinical trial (RCT) for non-infectious uveitis using expert opinion as a subjective prior belief. Methods A RCT was conducted to determine which antimetabolite, methotrexate or mycophenolate mofetil, is more effective as an initial corticosteroid-sparing agent for the treatment of intermediate, posterior, and pan- uveitis. Before the release of trial results, expert opinion on the relative effectiveness of these two medications was collected via online survey. Members of the American Uveitis Society executive committee were invited to provide an estimate for the relative decrease in efficacy with a 95% credible interval (CrI). A prior probability distribution was created from experts’ estimates. A Bayesian analysis was performed using the constructed expert prior probability distribution and the trial’s primary outcome. Results 11 of 12 invited uveitis specialists provided estimates. Eight of 11 experts (73%) believed mycophenolate mofetil is more effective. The group prior belief was that the odds of treatment success for patients taking mycophenolate mofetil were 1.4-fold the odds of those taking methotrexate (95% CrI 0.03 – 45.0). The odds of treatment success with mycophenolate mofetil compared to methotrexate was 0.4 from the RCT (95% confidence interval 0.1–1.2) and 0.7 (95% CrI 0.2–1.7) from the Bayesian analysis. Conclusions A Bayesian analysis combining expert belief with the trial’s result did not indicate preference for one drug. However, the wide credible interval leaves open the possibility of a substantial treatment effect. This suggests clinical equipoise necessary to allow a larger, more definitive RCT. PMID:27982726

  14. A computer program for estimation from incomplete multinomial data

    NASA Technical Reports Server (NTRS)

    Credeur, K. R.

    1978-01-01

    Coding is given for maximum likelihood and Bayesian estimation of the vector p of multinomial cell probabilities from incomplete data. Also included is coding to calculate and approximate elements of the posterior mean and covariance matrices. The program is written in FORTRAN 4 language for the Control Data CYBER 170 series digital computer system with network operating system (NOS) 1.1. The program requires approximately 44000 octal locations of core storage. A typical case requires from 72 seconds to 92 seconds on CYBER 175 depending on the value of the prior parameter.

  15. iSEDfit: Bayesian spectral energy distribution modeling of galaxies

    NASA Astrophysics Data System (ADS)

    Moustakas, John

    2017-08-01

    iSEDfit uses Bayesian inference to extract the physical properties of galaxies from their observed broadband photometric spectral energy distribution (SED). In its default mode, the inputs to iSEDfit are the measured photometry (fluxes and corresponding inverse variances) and a measurement of the galaxy redshift. Alternatively, iSEDfit can be used to estimate photometric redshifts from the input photometry alone. After the priors have been specified, iSEDfit calculates the marginalized posterior probability distributions for the physical parameters of interest, including the stellar mass, star-formation rate, dust content, star formation history, and stellar metallicity. iSEDfit also optionally computes K-corrections and produces multiple "quality assurance" (QA) plots at each stage of the modeling procedure to aid in the interpretation of the prior parameter choices and subsequent fitting results. The software is distributed as part of the impro IDL suite.

  16. A Bayesian method for inferring transmission chains in a partially observed epidemic.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Marzouk, Youssef M.; Ray, Jaideep

    2008-10-01

    We present a Bayesian approach for estimating transmission chains and rates in the Abakaliki smallpox epidemic of 1967. The epidemic affected 30 individuals in a community of 74; only the dates of appearance of symptoms were recorded. Our model assumes stochastic transmission of the infections over a social network. Distinct binomial random graphs model intra- and inter-compound social connections, while disease transmission over each link is treated as a Poisson process. Link probabilities and rate parameters are objects of inference. Dates of infection and recovery comprise the remaining unknowns. Distributions for smallpox incubation and recovery periods are obtained from historicalmore » data. Using Markov chain Monte Carlo, we explore the joint posterior distribution of the scalar parameters and provide an expected connectivity pattern for the social graph and infection pathway.« less

  17. An Integrative Framework for Bayesian Variable Selection with Informative Priors for Identifying Genes and Pathways

    PubMed Central

    Ander, Bradley P.; Zhang, Xiaoshuai; Xue, Fuzhong; Sharp, Frank R.; Yang, Xiaowei

    2013-01-01

    The discovery of genetic or genomic markers plays a central role in the development of personalized medicine. A notable challenge exists when dealing with the high dimensionality of the data sets, as thousands of genes or millions of genetic variants are collected on a relatively small number of subjects. Traditional gene-wise selection methods using univariate analyses face difficulty to incorporate correlational, structural, or functional structures amongst the molecular measures. For microarray gene expression data, we first summarize solutions in dealing with ‘large p, small n’ problems, and then propose an integrative Bayesian variable selection (iBVS) framework for simultaneously identifying causal or marker genes and regulatory pathways. A novel partial least squares (PLS) g-prior for iBVS is developed to allow the incorporation of prior knowledge on gene-gene interactions or functional relationships. From the point view of systems biology, iBVS enables user to directly target the joint effects of multiple genes and pathways in a hierarchical modeling diagram to predict disease status or phenotype. The estimated posterior selection probabilities offer probabilitic and biological interpretations. Both simulated data and a set of microarray data in predicting stroke status are used in validating the performance of iBVS in a Probit model with binary outcomes. iBVS offers a general framework for effective discovery of various molecular biomarkers by combining data-based statistics and knowledge-based priors. Guidelines on making posterior inferences, determining Bayesian significance levels, and improving computational efficiencies are also discussed. PMID:23844055

  18. An integrative framework for Bayesian variable selection with informative priors for identifying genes and pathways.

    PubMed

    Peng, Bin; Zhu, Dianwen; Ander, Bradley P; Zhang, Xiaoshuai; Xue, Fuzhong; Sharp, Frank R; Yang, Xiaowei

    2013-01-01

    The discovery of genetic or genomic markers plays a central role in the development of personalized medicine. A notable challenge exists when dealing with the high dimensionality of the data sets, as thousands of genes or millions of genetic variants are collected on a relatively small number of subjects. Traditional gene-wise selection methods using univariate analyses face difficulty to incorporate correlational, structural, or functional structures amongst the molecular measures. For microarray gene expression data, we first summarize solutions in dealing with 'large p, small n' problems, and then propose an integrative Bayesian variable selection (iBVS) framework for simultaneously identifying causal or marker genes and regulatory pathways. A novel partial least squares (PLS) g-prior for iBVS is developed to allow the incorporation of prior knowledge on gene-gene interactions or functional relationships. From the point view of systems biology, iBVS enables user to directly target the joint effects of multiple genes and pathways in a hierarchical modeling diagram to predict disease status or phenotype. The estimated posterior selection probabilities offer probabilitic and biological interpretations. Both simulated data and a set of microarray data in predicting stroke status are used in validating the performance of iBVS in a Probit model with binary outcomes. iBVS offers a general framework for effective discovery of various molecular biomarkers by combining data-based statistics and knowledge-based priors. Guidelines on making posterior inferences, determining Bayesian significance levels, and improving computational efficiencies are also discussed.

  19. Bayesian Inference for Generalized Linear Models for Spiking Neurons

    PubMed Central

    Gerwinn, Sebastian; Macke, Jakob H.; Bethge, Matthias

    2010-01-01

    Generalized Linear Models (GLMs) are commonly used statistical methods for modelling the relationship between neural population activity and presented stimuli. When the dimension of the parameter space is large, strong regularization has to be used in order to fit GLMs to datasets of realistic size without overfitting. By imposing properly chosen priors over parameters, Bayesian inference provides an effective and principled approach for achieving regularization. Here we show how the posterior distribution over model parameters of GLMs can be approximated by a Gaussian using the Expectation Propagation algorithm. In this way, we obtain an estimate of the posterior mean and posterior covariance, allowing us to calculate Bayesian confidence intervals that characterize the uncertainty about the optimal solution. From the posterior we also obtain a different point estimate, namely the posterior mean as opposed to the commonly used maximum a posteriori estimate. We systematically compare the different inference techniques on simulated as well as on multi-electrode recordings of retinal ganglion cells, and explore the effects of the chosen prior and the performance measure used. We find that good performance can be achieved by choosing an Laplace prior together with the posterior mean estimate. PMID:20577627

  20. Bayesian anomaly detection in monitoring data applying relevance vector machine

    NASA Astrophysics Data System (ADS)

    Saito, Tomoo

    2011-04-01

    A method for automatically classifying the monitoring data into two categories, normal and anomaly, is developed in order to remove anomalous data included in the enormous amount of monitoring data, applying the relevance vector machine (RVM) to a probabilistic discriminative model with basis functions and their weight parameters whose posterior PDF (probabilistic density function) conditional on the learning data set is given by Bayes' theorem. The proposed framework is applied to actual monitoring data sets containing some anomalous data collected at two buildings in Tokyo, Japan, which shows that the trained models discriminate anomalous data from normal data very clearly, giving high probabilities of being normal to normal data and low probabilities of being normal to anomalous data.

  1. Good fences make for good neighbors but bad science: a review of what improves Bayesian reasoning and why

    PubMed Central

    Brase, Gary L.; Hill, W. Trey

    2015-01-01

    Bayesian reasoning, defined here as the updating of a posterior probability following new information, has historically been problematic for humans. Classic psychology experiments have tested human Bayesian reasoning through the use of word problems and have evaluated each participant’s performance against the normatively correct answer provided by Bayes’ theorem. The standard finding is of generally poor performance. Over the past two decades, though, progress has been made on how to improve Bayesian reasoning. Most notably, research has demonstrated that the use of frequencies in a natural sampling framework—as opposed to single-event probabilities—can improve participants’ Bayesian estimates. Furthermore, pictorial aids and certain individual difference factors also can play significant roles in Bayesian reasoning success. The mechanics of how to build tasks which show these improvements is not under much debate. The explanations for why naturally sampled frequencies and pictures help Bayesian reasoning remain hotly contested, however, with many researchers falling into ingrained “camps” organized around two dominant theoretical perspectives. The present paper evaluates the merits of these theoretical perspectives, including the weight of empirical evidence, theoretical coherence, and predictive power. By these criteria, the ecological rationality approach is clearly better than the heuristics and biases view. Progress in the study of Bayesian reasoning will depend on continued research that honestly, vigorously, and consistently engages across these different theoretical accounts rather than staying “siloed” within one particular perspective. The process of science requires an understanding of competing points of view, with the ultimate goal being integration. PMID:25873904

  2. A Bayesian least squares support vector machines based framework for fault diagnosis and failure prognosis

    NASA Astrophysics Data System (ADS)

    Khawaja, Taimoor Saleem

    A high-belief low-overhead Prognostics and Health Management (PHM) system is desired for online real-time monitoring of complex non-linear systems operating in a complex (possibly non-Gaussian) noise environment. This thesis presents a Bayesian Least Squares Support Vector Machine (LS-SVM) based framework for fault diagnosis and failure prognosis in nonlinear non-Gaussian systems. The methodology assumes the availability of real-time process measurements, definition of a set of fault indicators and the existence of empirical knowledge (or historical data) to characterize both nominal and abnormal operating conditions. An efficient yet powerful Least Squares Support Vector Machine (LS-SVM) algorithm, set within a Bayesian Inference framework, not only allows for the development of real-time algorithms for diagnosis and prognosis but also provides a solid theoretical framework to address key concepts related to classification for diagnosis and regression modeling for prognosis. SVM machines are founded on the principle of Structural Risk Minimization (SRM) which tends to find a good trade-off between low empirical risk and small capacity. The key features in SVM are the use of non-linear kernels, the absence of local minima, the sparseness of the solution and the capacity control obtained by optimizing the margin. The Bayesian Inference framework linked with LS-SVMs allows a probabilistic interpretation of the results for diagnosis and prognosis. Additional levels of inference provide the much coveted features of adaptability and tunability of the modeling parameters. The two main modules considered in this research are fault diagnosis and failure prognosis. With the goal of designing an efficient and reliable fault diagnosis scheme, a novel Anomaly Detector is suggested based on the LS-SVM machines. The proposed scheme uses only baseline data to construct a 1-class LS-SVM machine which, when presented with online data is able to distinguish between normal behavior and any abnormal or novel data during real-time operation. The results of the scheme are interpreted as a posterior probability of health (1 - probability of fault). As shown through two case studies in Chapter 3, the scheme is well suited for diagnosing imminent faults in dynamical non-linear systems. Finally, the failure prognosis scheme is based on an incremental weighted Bayesian LS-SVR machine. It is particularly suited for online deployment given the incremental nature of the algorithm and the quick optimization problem solved in the LS-SVR algorithm. By way of kernelization and a Gaussian Mixture Modeling (GMM) scheme, the algorithm can estimate "possibly" non-Gaussian posterior distributions for complex non-linear systems. An efficient regression scheme associated with the more rigorous core algorithm allows for long-term predictions, fault growth estimation with confidence bounds and remaining useful life (RUL) estimation after a fault is detected. The leading contributions of this thesis are (a) the development of a novel Bayesian Anomaly Detector for efficient and reliable Fault Detection and Identification (FDI) based on Least Squares Support Vector Machines, (b) the development of a data-driven real-time architecture for long-term Failure Prognosis using Least Squares Support Vector Machines, (c) Uncertainty representation and management using Bayesian Inference for posterior distribution estimation and hyper-parameter tuning, and finally (d) the statistical characterization of the performance of diagnosis and prognosis algorithms in order to relate the efficiency and reliability of the proposed schemes.

  3. Bayesian multiple-source localization in an uncertain ocean environment.

    PubMed

    Dosso, Stan E; Wilmut, Michael J

    2011-06-01

    This paper considers simultaneous localization of multiple acoustic sources when properties of the ocean environment (water column and seabed) are poorly known. A Bayesian formulation is developed in which the environmental parameters, noise statistics, and locations and complex strengths (amplitudes and phases) of multiple sources are considered to be unknown random variables constrained by acoustic data and prior information. Two approaches are considered for estimating source parameters. Focalization maximizes the posterior probability density (PPD) over all parameters using adaptive hybrid optimization. Marginalization integrates the PPD using efficient Markov-chain Monte Carlo methods to produce joint marginal probability distributions for source ranges and depths, from which source locations are obtained. This approach also provides quantitative uncertainty analysis for all parameters, which can aid in understanding of the inverse problem and may be of practical interest (e.g., source-strength probability distributions). In both approaches, closed-form maximum-likelihood expressions for source strengths and noise variance at each frequency allow these parameters to be sampled implicitly, substantially reducing the dimensionality and difficulty of the inversion. Examples are presented of both approaches applied to single- and multi-frequency localization of multiple sources in an uncertain shallow-water environment, and a Monte Carlo performance evaluation study is carried out. © 2011 Acoustical Society of America

  4. Bayesian block-diagonal variable selection and model averaging

    PubMed Central

    Papaspiliopoulos, O.; Rossell, D.

    2018-01-01

    Summary We propose a scalable algorithmic framework for exact Bayesian variable selection and model averaging in linear models under the assumption that the Gram matrix is block-diagonal, and as a heuristic for exploring the model space for general designs. In block-diagonal designs our approach returns the most probable model of any given size without resorting to numerical integration. The algorithm also provides a novel and efficient solution to the frequentist best subset selection problem for block-diagonal designs. Posterior probabilities for any number of models are obtained by evaluating a single one-dimensional integral, and other quantities of interest such as variable inclusion probabilities and model-averaged regression estimates are obtained by an adaptive, deterministic one-dimensional numerical integration. The overall computational cost scales linearly with the number of blocks, which can be processed in parallel, and exponentially with the block size, rendering it most adequate in situations where predictors are organized in many moderately-sized blocks. For general designs, we approximate the Gram matrix by a block-diagonal matrix using spectral clustering and propose an iterative algorithm that capitalizes on the block-diagonal algorithms to explore efficiently the model space. All methods proposed in this paper are implemented in the R library mombf. PMID:29861501

  5. Robust Bayesian Algorithm for Targeted Compound Screening in Forensic Toxicology.

    PubMed

    Woldegebriel, Michael; Gonsalves, John; van Asten, Arian; Vivó-Truyols, Gabriel

    2016-02-16

    As part of forensic toxicological investigation of cases involving unexpected death of an individual, targeted or untargeted xenobiotic screening of post-mortem samples is normally conducted. To this end, liquid chromatography (LC) coupled to high-resolution mass spectrometry (MS) is typically employed. For data analysis, almost all commonly applied algorithms are threshold-based (frequentist). These algorithms examine the value of a certain measurement (e.g., peak height) to decide whether a certain xenobiotic of interest (XOI) is present/absent, yielding a binary output. Frequentist methods pose a problem when several sources of information [e.g., shape of the chromatographic peak, isotopic distribution, estimated mass-to-charge ratio (m/z), adduct, etc.] need to be combined, requiring the approach to make arbitrary decisions at substep levels of data analysis. We hereby introduce a novel Bayesian probabilistic algorithm for toxicological screening. The method tackles the problem with a different strategy. It is not aimed at reaching a final conclusion regarding the presence of the XOI, but it estimates its probability. The algorithm effectively and efficiently combines all possible pieces of evidence from the chromatogram and calculates the posterior probability of the presence/absence of XOI features. This way, the model can accommodate more information by updating the probability if extra evidence is acquired. The final probabilistic result assists the end user to make a final decision with respect to the presence/absence of the xenobiotic. The Bayesian method was validated and found to perform better (in terms of false positives and false negatives) than the vendor-supplied software package.

  6. Bayesian Model Selection in Geophysics: The evidence

    NASA Astrophysics Data System (ADS)

    Vrugt, J. A.

    2016-12-01

    Bayesian inference has found widespread application and use in science and engineering to reconcile Earth system models with data, including prediction in space (interpolation), prediction in time (forecasting), assimilation of observations and deterministic/stochastic model output, and inference of the model parameters. Per Bayes theorem, the posterior probability, , P(H|D), of a hypothesis, H, given the data D, is equivalent to the product of its prior probability, P(H), and likelihood, L(H|D), divided by a normalization constant, P(D). In geophysics, the hypothesis, H, often constitutes a description (parameterization) of the subsurface for some entity of interest (e.g. porosity, moisture content). The normalization constant, P(D), is not required for inference of the subsurface structure, yet of great value for model selection. Unfortunately, it is not particularly easy to estimate P(D) in practice. Here, I will introduce the various building blocks of a general purpose method which provides robust and unbiased estimates of the evidence, P(D). This method uses multi-dimensional numerical integration of the posterior (parameter) distribution. I will then illustrate this new estimator by application to three competing subsurface models (hypothesis) using GPR travel time data from the South Oyster Bacterial Transport Site, in Virginia, USA. The three subsurface models differ in their treatment of the porosity distribution and use (a) horizontal layering with fixed layer thicknesses, (b) vertical layering with fixed layer thicknesses and (c) a multi-Gaussian field. The results of the new estimator are compared against the brute force Monte Carlo method, and the Laplace-Metropolis method.

  7. Efficient Mean Field Variational Algorithm for Data Assimilation (Invited)

    NASA Astrophysics Data System (ADS)

    Vrettas, M. D.; Cornford, D.; Opper, M.

    2013-12-01

    Data assimilation algorithms combine available observations of physical systems with the assumed model dynamics in a systematic manner, to produce better estimates of initial conditions for prediction. Broadly they can be categorized in three main approaches: (a) sequential algorithms, (b) sampling methods and (c) variational algorithms which transform the density estimation problem to an optimization problem. However, given finite computational resources, only a handful of ensemble Kalman filters and 4DVar algorithms have been applied operationally to very high dimensional geophysical applications, such as weather forecasting. In this paper we present a recent extension to our variational Bayesian algorithm which seeks the ';optimal' posterior distribution over the continuous time states, within a family of non-stationary Gaussian processes. Our initial work on variational Bayesian approaches to data assimilation, unlike the well-known 4DVar method which seeks only the most probable solution, computes the best time varying Gaussian process approximation to the posterior smoothing distribution for dynamical systems that can be represented by stochastic differential equations. This approach was based on minimising the Kullback-Leibler divergence, over paths, between the true posterior and our Gaussian process approximation. Whilst the observations were informative enough to keep the posterior smoothing density close to Gaussian the algorithm proved very effective on low dimensional systems (e.g. O(10)D). However for higher dimensional systems, the high computational demands make the algorithm prohibitively expensive. To overcome the difficulties presented in the original framework and make our approach more efficient in higher dimensional systems we have been developing a new mean field version of the algorithm which treats the state variables at any given time as being independent in the posterior approximation, while still accounting for their relationships in the mean solution arising from the original system dynamics. Here we present this new mean field approach, illustrating its performance on a range of benchmark data assimilation problems whose dimensionality varies from O(10) to O(10^3)D. We emphasise that the variational Bayesian approach we adopt, unlike other variational approaches, provides a natural bound on the marginal likelihood of the observations given the model parameters which also allows for inference of (hyper-) parameters such as observational errors, parameters in the dynamical model and model error representation. We also stress that since our approach is intrinsically parallel it can be implemented very efficiently to address very long data assimilation time windows. Moreover, like most traditional variational approaches our Bayesian variational method has the benefit of being posed as an optimisation problem therefore its complexity can be tuned to the available computational resources. We finish with a sketch of possible future directions.

  8. Bayesian approach for three-dimensional aquifer characterization at the Hanford 300 Area

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Murakami, Haruko; Chen, X.; Hahn, Melanie S.

    2010-10-21

    This study presents a stochastic, three-dimensional characterization of a heterogeneous hydraulic conductivity field within DOE's Hanford 300 Area site, Washington, by assimilating large-scale, constant-rate injection test data with small-scale, three-dimensional electromagnetic borehole flowmeter (EBF) measurement data. We first inverted the injection test data to estimate the transmissivity field, using zeroth-order temporal moments of pressure buildup curves. We applied a newly developed Bayesian geostatistical inversion framework, the method of anchored distributions (MAD), to obtain a joint posterior distribution of geostatistical parameters and local log-transmissivities at multiple locations. The unique aspects of MAD that make it suitable for this purpose are itsmore » ability to integrate multi-scale, multi-type data within a Bayesian framework and to compute a nonparametric posterior distribution. After we combined the distribution of transmissivities with depth-discrete relative-conductivity profile from EBF data, we inferred the three-dimensional geostatistical parameters of the log-conductivity field, using the Bayesian model-based geostatistics. Such consistent use of the Bayesian approach throughout the procedure enabled us to systematically incorporate data uncertainty into the final posterior distribution. The method was tested in a synthetic study and validated using the actual data that was not part of the estimation. Results showed broader and skewed posterior distributions of geostatistical parameters except for the mean, which suggests the importance of inferring the entire distribution to quantify the parameter uncertainty.« less

  9. Finite‐fault Bayesian inversion of teleseismic body waves

    USGS Publications Warehouse

    Clayton, Brandon; Hartzell, Stephen; Moschetti, Morgan P.; Minson, Sarah E.

    2017-01-01

    Inverting geophysical data has provided fundamental information about the behavior of earthquake rupture. However, inferring kinematic source model parameters for finite‐fault ruptures is an intrinsically underdetermined problem (the problem of nonuniqueness), because we are restricted to finite noisy observations. Although many studies use least‐squares techniques to make the finite‐fault problem tractable, these methods generally lack the ability to apply non‐Gaussian error analysis and the imposition of nonlinear constraints. However, the Bayesian approach can be employed to find a Gaussian or non‐Gaussian distribution of all probable model parameters, while utilizing nonlinear constraints. We present case studies to quantify the resolving power and associated uncertainties using only teleseismic body waves in a Bayesian framework to infer the slip history for a synthetic case and two earthquakes: the 2011 Mw 7.1 Van, east Turkey, earthquake and the 2010 Mw 7.2 El Mayor–Cucapah, Baja California, earthquake. In implementing the Bayesian method, we further present two distinct solutions to investigate the uncertainties by performing the inversion with and without velocity structure perturbations. We find that the posterior ensemble becomes broader when including velocity structure variability and introduces a spatial smearing of slip. Using the Bayesian framework solely on teleseismic body waves, we find rake is poorly constrained by the observations and rise time is poorly resolved when slip amplitude is low.

  10. A Pragmatic Bayesian Perspective on Correlation Analysis. The exoplanetary gravity - stellar activity case

    NASA Astrophysics Data System (ADS)

    Figueira, P.; Faria, J. P.; Adibekyan, V. Zh.; Oshagh, M.; Santos, N. C.

    2016-11-01

    We apply the Bayesian framework to assess the presence of a correlation between two quantities. To do so, we estimate the probability distribution of the parameter of interest, ρ, characterizing the strength of the correlation. We provide an implementation of these ideas and concepts using python programming language and the pyMC module in a very short (˜ 130 lines of code, heavily commented) and user-friendly program. We used this tool to assess the presence and properties of the correlation between planetary surface gravity and stellar activity level as measured by the log(R^' }_{ {HK}}) indicator. The results of the Bayesian analysis are qualitatively similar to those obtained via p-value analysis, and support the presence of a correlation in the data. The results are more robust in their derivation and more informative, revealing interesting features such as asymmetric posterior distributions or markedly different credible intervals, and allowing for a deeper exploration. We encourage the reader interested in this kind of problem to apply our code to his/her own scientific problems. The full understanding of what the Bayesian framework is can only be gained through the insight that comes by handling priors, assessing the convergence of Monte Carlo runs, and a multitude of other practical problems. We hope to contribute so that Bayesian analysis becomes a tool in the toolkit of researchers, and they understand by experience its advantages and limitations.

  11. A surrogate-based sensitivity quantification and Bayesian inversion of a regional groundwater flow model

    NASA Astrophysics Data System (ADS)

    Chen, Mingjie; Izady, Azizallah; Abdalla, Osman A.; Amerjeed, Mansoor

    2018-02-01

    Bayesian inference using Markov Chain Monte Carlo (MCMC) provides an explicit framework for stochastic calibration of hydrogeologic models accounting for uncertainties; however, the MCMC sampling entails a large number of model calls, and could easily become computationally unwieldy if the high-fidelity hydrogeologic model simulation is time consuming. This study proposes a surrogate-based Bayesian framework to address this notorious issue, and illustrates the methodology by inverse modeling a regional MODFLOW model. The high-fidelity groundwater model is approximated by a fast statistical model using Bagging Multivariate Adaptive Regression Spline (BMARS) algorithm, and hence the MCMC sampling can be efficiently performed. In this study, the MODFLOW model is developed to simulate the groundwater flow in an arid region of Oman consisting of mountain-coast aquifers, and used to run representative simulations to generate training dataset for BMARS model construction. A BMARS-based Sobol' method is also employed to efficiently calculate input parameter sensitivities, which are used to evaluate and rank their importance for the groundwater flow model system. According to sensitivity analysis, insensitive parameters are screened out of Bayesian inversion of the MODFLOW model, further saving computing efforts. The posterior probability distribution of input parameters is efficiently inferred from the prescribed prior distribution using observed head data, demonstrating that the presented BMARS-based Bayesian framework is an efficient tool to reduce parameter uncertainties of a groundwater system.

  12. Convergence analysis of surrogate-based methods for Bayesian inverse problems

    NASA Astrophysics Data System (ADS)

    Yan, Liang; Zhang, Yuan-Xiang

    2017-12-01

    The major challenges in the Bayesian inverse problems arise from the need for repeated evaluations of the forward model, as required by Markov chain Monte Carlo (MCMC) methods for posterior sampling. Many attempts at accelerating Bayesian inference have relied on surrogates for the forward model, typically constructed through repeated forward simulations that are performed in an offline phase. Although such approaches can be quite effective at reducing computation cost, there has been little analysis of the approximation on posterior inference. In this work, we prove error bounds on the Kullback-Leibler (KL) distance between the true posterior distribution and the approximation based on surrogate models. Our rigorous error analysis show that if the forward model approximation converges at certain rate in the prior-weighted L 2 norm, then the posterior distribution generated by the approximation converges to the true posterior at least two times faster in the KL sense. The error bound on the Hellinger distance is also provided. To provide concrete examples focusing on the use of the surrogate model based methods, we present an efficient technique for constructing stochastic surrogate models to accelerate the Bayesian inference approach. The Christoffel least squares algorithms, based on generalized polynomial chaos, are used to construct a polynomial approximation of the forward solution over the support of the prior distribution. The numerical strategy and the predicted convergence rates are then demonstrated on the nonlinear inverse problems, involving the inference of parameters appearing in partial differential equations.

  13. Probability modeling of the number of positive cores in a prostate cancer biopsy session, with applications.

    PubMed

    Serfling, Robert; Ogola, Gerald

    2016-02-10

    Among men, prostate cancer (CaP) is the most common newly diagnosed cancer and the second leading cause of death from cancer. A major issue of very large scale is avoiding both over-treatment and under-treatment of CaP cases. The central challenge is deciding clinical significance or insignificance when the CaP biopsy results are positive but only marginally so. A related concern is deciding how to increase the number of biopsy cores for larger prostates. As a foundation for improved choice of number of cores and improved interpretation of biopsy results, we develop a probability model for the number of positive cores found in a biopsy, given the total number of cores, the volumes of the tumor nodules, and - very importantly - the prostate volume. Also, three applications are carried out: guidelines for the number of cores as a function of prostate volume, decision rules for insignificant versus significant CaP using number of positive cores, and, using prior distributions on total tumor size, Bayesian posterior probabilities for insignificant CaP and posterior median CaP. The model-based results have generality of application, take prostate volume into account, and provide attractive tradeoffs of specificity versus sensitivity. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.

  14. A Bayesian test for Hardy–Weinberg equilibrium of biallelic X-chromosomal markers

    PubMed Central

    Puig, X; Ginebra, J; Graffelman, J

    2017-01-01

    The X chromosome is a relatively large chromosome, harboring a lot of genetic information. Much of the statistical analysis of X-chromosomal information is complicated by the fact that males only have one copy. Recently, frequentist statistical tests for Hardy–Weinberg equilibrium have been proposed specifically for dealing with markers on the X chromosome. Bayesian test procedures for Hardy–Weinberg equilibrium for the autosomes have been described, but Bayesian work on the X chromosome in this context is lacking. This paper gives the first Bayesian approach for testing Hardy–Weinberg equilibrium with biallelic markers at the X chromosome. Marginal and joint posterior distributions for the inbreeding coefficient in females and the male to female allele frequency ratio are computed, and used for statistical inference. The paper gives a detailed account of the proposed Bayesian test, and illustrates it with data from the 1000 Genomes project. In that implementation, a novel approach to tackle multiple testing from a Bayesian perspective through posterior predictive checks is used. PMID:28900292

  15. Inverse Modeling Using Markov Chain Monte Carlo Aided by Adaptive Stochastic Collocation Method with Transformation

    NASA Astrophysics Data System (ADS)

    Zhang, D.; Liao, Q.

    2016-12-01

    The Bayesian inference provides a convenient framework to solve statistical inverse problems. In this method, the parameters to be identified are treated as random variables. The prior knowledge, the system nonlinearity, and the measurement errors can be directly incorporated in the posterior probability density function (PDF) of the parameters. The Markov chain Monte Carlo (MCMC) method is a powerful tool to generate samples from the posterior PDF. However, since the MCMC usually requires thousands or even millions of forward simulations, it can be a computationally intensive endeavor, particularly when faced with large-scale flow and transport models. To address this issue, we construct a surrogate system for the model responses in the form of polynomials by the stochastic collocation method. In addition, we employ interpolation based on the nested sparse grids and takes into account the different importance of the parameters, under the condition of high random dimensions in the stochastic space. Furthermore, in case of low regularity such as discontinuous or unsmooth relation between the input parameters and the output responses, we introduce an additional transform process to improve the accuracy of the surrogate model. Once we build the surrogate system, we may evaluate the likelihood with very little computational cost. We analyzed the convergence rate of the forward solution and the surrogate posterior by Kullback-Leibler divergence, which quantifies the difference between probability distributions. The fast convergence of the forward solution implies fast convergence of the surrogate posterior to the true posterior. We also tested the proposed algorithm on water-flooding two-phase flow reservoir examples. The posterior PDF calculated from a very long chain with direct forward simulation is assumed to be accurate. The posterior PDF calculated using the surrogate model is in reasonable agreement with the reference, revealing a great improvement in terms of computational efficiency.

  16. Influence of Averaging Preprocessing on Image Analysis with a Markov Random Field Model

    NASA Astrophysics Data System (ADS)

    Sakamoto, Hirotaka; Nakanishi-Ohno, Yoshinori; Okada, Masato

    2018-02-01

    This paper describes our investigations into the influence of averaging preprocessing on the performance of image analysis. Averaging preprocessing involves a trade-off: image averaging is often undertaken to reduce noise while the number of image data available for image analysis is decreased. We formulated a process of generating image data by using a Markov random field (MRF) model to achieve image analysis tasks such as image restoration and hyper-parameter estimation by a Bayesian approach. According to the notions of Bayesian inference, posterior distributions were analyzed to evaluate the influence of averaging. There are three main results. First, we found that the performance of image restoration with a predetermined value for hyper-parameters is invariant regardless of whether averaging is conducted. We then found that the performance of hyper-parameter estimation deteriorates due to averaging. Our analysis of the negative logarithm of the posterior probability, which is called the free energy based on an analogy with statistical mechanics, indicated that the confidence of hyper-parameter estimation remains higher without averaging. Finally, we found that when the hyper-parameters are estimated from the data, the performance of image restoration worsens as averaging is undertaken. We conclude that averaging adversely influences the performance of image analysis through hyper-parameter estimation.

  17. Exploring the Energy Landscapes of Protein Folding Simulations with Bayesian Computation

    PubMed Central

    Burkoff, Nikolas S.; Várnai, Csilla; Wells, Stephen A.; Wild, David L.

    2012-01-01

    Nested sampling is a Bayesian sampling technique developed to explore probability distributions localized in an exponentially small area of the parameter space. The algorithm provides both posterior samples and an estimate of the evidence (marginal likelihood) of the model. The nested sampling algorithm also provides an efficient way to calculate free energies and the expectation value of thermodynamic observables at any temperature, through a simple post processing of the output. Previous applications of the algorithm have yielded large efficiency gains over other sampling techniques, including parallel tempering. In this article, we describe a parallel implementation of the nested sampling algorithm and its application to the problem of protein folding in a Gō-like force field of empirical potentials that were designed to stabilize secondary structure elements in room-temperature simulations. We demonstrate the method by conducting folding simulations on a number of small proteins that are commonly used for testing protein-folding procedures. A topological analysis of the posterior samples is performed to produce energy landscape charts, which give a high-level description of the potential energy surface for the protein folding simulations. These charts provide qualitative insights into both the folding process and the nature of the model and force field used. PMID:22385859

  18. Exploring the energy landscapes of protein folding simulations with Bayesian computation.

    PubMed

    Burkoff, Nikolas S; Várnai, Csilla; Wells, Stephen A; Wild, David L

    2012-02-22

    Nested sampling is a Bayesian sampling technique developed to explore probability distributions localized in an exponentially small area of the parameter space. The algorithm provides both posterior samples and an estimate of the evidence (marginal likelihood) of the model. The nested sampling algorithm also provides an efficient way to calculate free energies and the expectation value of thermodynamic observables at any temperature, through a simple post processing of the output. Previous applications of the algorithm have yielded large efficiency gains over other sampling techniques, including parallel tempering. In this article, we describe a parallel implementation of the nested sampling algorithm and its application to the problem of protein folding in a Gō-like force field of empirical potentials that were designed to stabilize secondary structure elements in room-temperature simulations. We demonstrate the method by conducting folding simulations on a number of small proteins that are commonly used for testing protein-folding procedures. A topological analysis of the posterior samples is performed to produce energy landscape charts, which give a high-level description of the potential energy surface for the protein folding simulations. These charts provide qualitative insights into both the folding process and the nature of the model and force field used. Copyright © 2012 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  19. National, regional, and global trends in systolic blood pressure since 1980: systematic analysis of health examination surveys and epidemiological studies with 786 country-years and 5·4 million participants.

    PubMed

    Danaei, Goodarz; Finucane, Mariel M; Lin, John K; Singh, Gitanjali M; Paciorek, Christopher J; Cowan, Melanie J; Farzadfar, Farshad; Stevens, Gretchen A; Lim, Stephen S; Riley, Leanne M; Ezzati, Majid

    2011-02-12

    Data for trends in blood pressure are needed to understand the effects of its dietary, lifestyle, and pharmacological determinants; set intervention priorities; and evaluate national programmes. However, few worldwide analyses of trends in blood pressure have been done. We estimated worldwide trends in population mean systolic blood pressure (SBP). We estimated trends and their uncertainties in mean SBP for adults 25 years and older in 199 countries and territories. We obtained data from published and unpublished health examination surveys and epidemiological studies (786 country-years and 5·4 million participants). For each sex, we used a Bayesian hierarchical model to estimate mean SBP by age, country, and year, accounting for whether a study was nationally representative. In 2008, age-standardised mean SBP worldwide was 128·1 mm Hg (95% uncertainty interval 126·7-129·4) in men and 124·4 mm Hg (123·0-125·9) in women. Globally, between 1980 and 2008, SBP decreased by 0·8 mm Hg per decade (-0·4 to 2·2, posterior probability of being a true decline=0·90) in men and 1·0 mm Hg per decade (-0·3 to 2·3, posterior probability=0·93) in women. Female SBP decreased by 3·5 mm Hg or more per decade in western Europe and Australasia (posterior probabilities ≥0·999). Male SBP fell most in high-income North America, by 2·8 mm Hg per decade (1·3-4·5, posterior probability >0·999), followed by Australasia and western Europe where it decreased by more than 2·0 mm Hg per decade (posterior probabilities >0·98). SBP rose in Oceania, east Africa, and south and southeast Asia for both sexes, and in west Africa for women, with the increases ranging 0·8-1·6 mm Hg per decade in men (posterior probabilities 0·72-0·91) and 1·0-2·7 mm Hg per decade for women (posterior probabilities 0·75-0·98). Female SBP was highest in some east and west African countries, with means of 135 mm Hg or greater. Male SBP was highest in Baltic and east and west African countries, where mean SBP reached 138 mm Hg or more. Men and women in western Europe had the highest SBP in high-income regions. On average, global population SBP decreased slightly since 1980, but trends varied significantly across regions and countries. SBP is currently highest in low-income and middle-income countries. Effective population-based and personal interventions should be targeted towards low-income and middle-income countries. Funding Bill & Melinda Gates Foundation and WHO. Copyright © 2011 Elsevier Ltd. All rights reserved.

  20. Data analysis in emission tomography using emission-count posteriors

    NASA Astrophysics Data System (ADS)

    Sitek, Arkadiusz

    2012-11-01

    A novel approach to the analysis of emission tomography data using the posterior probability of the number of emissions per voxel (emission count) conditioned on acquired tomographic data is explored. The posterior is derived from the prior and the Poisson likelihood of the emission-count data by marginalizing voxel activities. Based on emission-count posteriors, examples of Bayesian analysis including estimation and classification tasks in emission tomography are provided. The application of the method to computer simulations of 2D tomography is demonstrated. In particular, the minimum-mean-square-error point estimator of the emission count is demonstrated. The process of finding this estimator can be considered as a tomographic image reconstruction technique since the estimates of the number of emissions per voxel divided by voxel sensitivities and acquisition time are the estimates of the voxel activities. As an example of a classification task, a hypothesis stating that some region of interest (ROI) emitted at least or at most r-times the number of events in some other ROI is tested. The ROIs are specified by the user. The analysis described in this work provides new quantitative statistical measures that can be used in decision making in diagnostic imaging using emission tomography.

  1. A comparison of Monte Carlo-based Bayesian parameter estimation methods for stochastic models of genetic networks

    PubMed Central

    Zaikin, Alexey; Míguez, Joaquín

    2017-01-01

    We compare three state-of-the-art Bayesian inference methods for the estimation of the unknown parameters in a stochastic model of a genetic network. In particular, we introduce a stochastic version of the paradigmatic synthetic multicellular clock model proposed by Ullner et al., 2007. By introducing dynamical noise in the model and assuming that the partial observations of the system are contaminated by additive noise, we enable a principled mechanism to represent experimental uncertainties in the synthesis of the multicellular system and pave the way for the design of probabilistic methods for the estimation of any unknowns in the model. Within this setup, we tackle the Bayesian estimation of a subset of the model parameters. Specifically, we compare three Monte Carlo based numerical methods for the approximation of the posterior probability density function of the unknown parameters given a set of partial and noisy observations of the system. The schemes we assess are the particle Metropolis-Hastings (PMH) algorithm, the nonlinear population Monte Carlo (NPMC) method and the approximate Bayesian computation sequential Monte Carlo (ABC-SMC) scheme. We present an extensive numerical simulation study, which shows that while the three techniques can effectively solve the problem there are significant differences both in estimation accuracy and computational efficiency. PMID:28797087

  2. Bayesian Analysis for Risk Assessment of Selected Medical Events in Support of the Integrated Medical Model Effort

    NASA Technical Reports Server (NTRS)

    Gilkey, Kelly M.; Myers, Jerry G.; McRae, Michael P.; Griffin, Elise A.; Kallrui, Aditya S.

    2012-01-01

    The Exploration Medical Capability project is creating a catalog of risk assessments using the Integrated Medical Model (IMM). The IMM is a software-based system intended to assist mission planners in preparing for spaceflight missions by helping them to make informed decisions about medical preparations and supplies needed for combating and treating various medical events using Probabilistic Risk Assessment. The objective is to use statistical analyses to inform the IMM decision tool with estimated probabilities of medical events occurring during an exploration mission. Because data regarding astronaut health are limited, Bayesian statistical analysis is used. Bayesian inference combines prior knowledge, such as data from the general U.S. population, the U.S. Submarine Force, or the analog astronaut population located at the NASA Johnson Space Center, with observed data for the medical condition of interest. The posterior results reflect the best evidence for specific medical events occurring in flight. Bayes theorem provides a formal mechanism for combining available observed data with data from similar studies to support the quantification process. The IMM team performed Bayesian updates on the following medical events: angina, appendicitis, atrial fibrillation, atrial flutter, dental abscess, dental caries, dental periodontal disease, gallstone disease, herpes zoster, renal stones, seizure, and stroke.

  3. Manual hierarchical clustering of regional geochemical data using a Bayesian finite mixture model

    USGS Publications Warehouse

    Ellefsen, Karl J.; Smith, David

    2016-01-01

    Interpretation of regional scale, multivariate geochemical data is aided by a statistical technique called “clustering.” We investigate a particular clustering procedure by applying it to geochemical data collected in the State of Colorado, United States of America. The clustering procedure partitions the field samples for the entire survey area into two clusters. The field samples in each cluster are partitioned again to create two subclusters, and so on. This manual procedure generates a hierarchy of clusters, and the different levels of the hierarchy show geochemical and geological processes occurring at different spatial scales. Although there are many different clustering methods, we use Bayesian finite mixture modeling with two probability distributions, which yields two clusters. The model parameters are estimated with Hamiltonian Monte Carlo sampling of the posterior probability density function, which usually has multiple modes. Each mode has its own set of model parameters; each set is checked to ensure that it is consistent both with the data and with independent geologic knowledge. The set of model parameters that is most consistent with the independent geologic knowledge is selected for detailed interpretation and partitioning of the field samples.

  4. Uncertainties in Parameters Estimated with Neural Networks: Application to Strong Gravitational Lensing

    NASA Astrophysics Data System (ADS)

    Perreault Levasseur, Laurence; Hezaveh, Yashar D.; Wechsler, Risa H.

    2017-11-01

    In Hezaveh et al. we showed that deep learning can be used for model parameter estimation and trained convolutional neural networks to determine the parameters of strong gravitational-lensing systems. Here we demonstrate a method for obtaining the uncertainties of these parameters. We review the framework of variational inference to obtain approximate posteriors of Bayesian neural networks and apply it to a network trained to estimate the parameters of the Singular Isothermal Ellipsoid plus external shear and total flux magnification. We show that the method can capture the uncertainties due to different levels of noise in the input data, as well as training and architecture-related errors made by the network. To evaluate the accuracy of the resulting uncertainties, we calculate the coverage probabilities of marginalized distributions for each lensing parameter. By tuning a single variational parameter, the dropout rate, we obtain coverage probabilities approximately equal to the confidence levels for which they were calculated, resulting in accurate and precise uncertainty estimates. Our results suggest that the application of approximate Bayesian neural networks to astrophysical modeling problems can be a fast alternative to Monte Carlo Markov Chains, allowing orders of magnitude improvement in speed.

  5. Diagnostic accuracy of enzyme-linked immunosorbent assay (ELISA) and immunoblot (IB) for the detection of antibodies against Neospora caninum in milk from dairy cows.

    PubMed

    Chatziprodromidou, I P; Apostolou, T

    2018-04-01

    The aim of the study was to estimate the sensitivity and specificity of enzyme-linked immunosorbent assay (ELISA) and immunoblot (IB) for detecting antibodies of Neospora caninum in dairy cows, in the absence of a gold standard. The study complies with STRADAS-paratuberculosis guidelines for reporting the accuracy of the test. We tried to apply Bayesian models that do not require conditional independence of the tests under evaluation, but as convergence problems appeared, we used Bayesian methodology, that does not assume conditional dependence of the tests. Informative prior probability distributions were constructed, based on scientific inputs regarding sensitivity and specificity of the IB test and the prevalence of disease in the studied populations. IB sensitivity and specificity were estimated to be 98.8% and 91.3%, respectively, while the respective estimates for ELISA were 60% and 96.7%. A sensitivity analysis, where modified prior probability distributions concerning IB diagnostic accuracy applied, showed a limited effect in posterior assessments. We concluded that ELISA can be used to screen the bulk milk and secondly, IB can be used whenever needed.

  6. Defining Probability in Sex Offender Risk Assessment.

    PubMed

    Elwood, Richard W

    2016-12-01

    There is ongoing debate and confusion over using actuarial scales to predict individuals' risk of sexual recidivism. Much of the debate comes from not distinguishing Frequentist from Bayesian definitions of probability. Much of the confusion comes from applying Frequentist probability to individuals' risk. By definition, only Bayesian probability can be applied to the single case. The Bayesian concept of probability resolves most of the confusion and much of the debate in sex offender risk assessment. Although Bayesian probability is well accepted in risk assessment generally, it has not been widely used to assess the risk of sex offenders. I review the two concepts of probability and show how the Bayesian view alone provides a coherent scheme to conceptualize individuals' risk of sexual recidivism.

  7. SU-G-JeP2-02: A Unifying Multi-Atlas Approach to Electron Density Mapping Using Multi-Parametric MRI for Radiation Treatment Planning

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ren, S; Tianjin University, Tianjin; Hara, W

    Purpose: MRI has a number of advantages over CT as a primary modality for radiation treatment planning (RTP). However, one key bottleneck problem still remains, which is the lack of electron density information in MRI. In the work, a reliable method to map electron density is developed by leveraging the differential contrast of multi-parametric MRI. Methods: We propose a probabilistic Bayesian approach for electron density mapping based on T1 and T2-weighted MRI, using multiple patients as atlases. For each voxel, we compute two conditional probabilities: (1) electron density given its image intensity on T1 and T2-weighted MR images, and (2)more » electron density given its geometric location in a reference anatomy. The two sources of information (image intensity and spatial location) are combined into a unifying posterior probability density function using the Bayesian formalism. The mean value of the posterior probability density function provides the estimated electron density. Results: We evaluated the method on 10 head and neck patients and performed leave-one-out cross validation (9 patients as atlases and remaining 1 as test). The proposed method significantly reduced the errors in electron density estimation, with a mean absolute HU error of 138, compared with 193 for the T1-weighted intensity approach and 261 without density correction. For bone detection (HU>200), the proposed method had an accuracy of 84% and a sensitivity of 73% at specificity of 90% (AUC = 87%). In comparison, the AUC for bone detection is 73% and 50% using the intensity approach and without density correction, respectively. Conclusion: The proposed unifying method provides accurate electron density estimation and bone detection based on multi-parametric MRI of the head with highly heterogeneous anatomy. This could allow for accurate dose calculation and reference image generation for patient setup in MRI-based radiation treatment planning.« less

  8. Estimating Bayesian Phylogenetic Information Content

    PubMed Central

    Lewis, Paul O.; Chen, Ming-Hui; Kuo, Lynn; Lewis, Louise A.; Fučíková, Karolina; Neupane, Suman; Wang, Yu-Bo; Shi, Daoyuan

    2016-01-01

    Measuring the phylogenetic information content of data has a long history in systematics. Here we explore a Bayesian approach to information content estimation. The entropy of the posterior distribution compared with the entropy of the prior distribution provides a natural way to measure information content. If the data have no information relevant to ranking tree topologies beyond the information supplied by the prior, the posterior and prior will be identical. Information in data discourages consideration of some hypotheses allowed by the prior, resulting in a posterior distribution that is more concentrated (has lower entropy) than the prior. We focus on measuring information about tree topology using marginal posterior distributions of tree topologies. We show that both the accuracy and the computational efficiency of topological information content estimation improve with use of the conditional clade distribution, which also allows topological information content to be partitioned by clade. We explore two important applications of our method: providing a compelling definition of saturation and detecting conflict among data partitions that can negatively affect analyses of concatenated data. [Bayesian; concatenation; conditional clade distribution; entropy; information; phylogenetics; saturation.] PMID:27155008

  9. Using Latent Class Analysis to Model Temperament Types.

    PubMed

    Loken, Eric

    2004-10-01

    Mixture models are appropriate for data that arise from a set of qualitatively different subpopulations. In this study, latent class analysis was applied to observational data from a laboratory assessment of infant temperament at four months of age. The EM algorithm was used to fit the models, and the Bayesian method of posterior predictive checks was used for model selection. Results show at least three types of infant temperament, with patterns consistent with those identified by previous researchers who classified the infants using a theoretically based system. Multiple imputation of group memberships is proposed as an alternative to assigning subjects to the latent class with maximum posterior probability in order to reflect variance due to uncertainty in the parameter estimation. Latent class membership at four months of age predicted longitudinal outcomes at four years of age. The example illustrates issues relevant to all mixture models, including estimation, multi-modality, model selection, and comparisons based on the latent group indicators.

  10. qPR: An adaptive partial-report procedure based on Bayesian inference.

    PubMed

    Baek, Jongsoo; Lesmes, Luis Andres; Lu, Zhong-Lin

    2016-08-01

    Iconic memory is best assessed with the partial report procedure in which an array of letters appears briefly on the screen and a poststimulus cue directs the observer to report the identity of the cued letter(s). Typically, 6-8 cue delays or 600-800 trials are tested to measure the iconic memory decay function. Here we develop a quick partial report, or qPR, procedure based on a Bayesian adaptive framework to estimate the iconic memory decay function with much reduced testing time. The iconic memory decay function is characterized by an exponential function and a joint probability distribution of its three parameters. Starting with a prior of the parameters, the method selects the stimulus to maximize the expected information gain in the next test trial. It then updates the posterior probability distribution of the parameters based on the observer's response using Bayesian inference. The procedure is reiterated until either the total number of trials or the precision of the parameter estimates reaches a certain criterion. Simulation studies showed that only 100 trials were necessary to reach an average absolute bias of 0.026 and a precision of 0.070 (both in terms of probability correct). A psychophysical validation experiment showed that estimates of the iconic memory decay function obtained with 100 qPR trials exhibited good precision (the half width of the 68.2% credible interval = 0.055) and excellent agreement with those obtained with 1,600 trials of the conventional method of constant stimuli procedure (RMSE = 0.063). Quick partial-report relieves the data collection burden in characterizing iconic memory and makes it possible to assess iconic memory in clinical populations.

  11. qPR: An adaptive partial-report procedure based on Bayesian inference

    PubMed Central

    Baek, Jongsoo; Lesmes, Luis Andres; Lu, Zhong-Lin

    2016-01-01

    Iconic memory is best assessed with the partial report procedure in which an array of letters appears briefly on the screen and a poststimulus cue directs the observer to report the identity of the cued letter(s). Typically, 6–8 cue delays or 600–800 trials are tested to measure the iconic memory decay function. Here we develop a quick partial report, or qPR, procedure based on a Bayesian adaptive framework to estimate the iconic memory decay function with much reduced testing time. The iconic memory decay function is characterized by an exponential function and a joint probability distribution of its three parameters. Starting with a prior of the parameters, the method selects the stimulus to maximize the expected information gain in the next test trial. It then updates the posterior probability distribution of the parameters based on the observer's response using Bayesian inference. The procedure is reiterated until either the total number of trials or the precision of the parameter estimates reaches a certain criterion. Simulation studies showed that only 100 trials were necessary to reach an average absolute bias of 0.026 and a precision of 0.070 (both in terms of probability correct). A psychophysical validation experiment showed that estimates of the iconic memory decay function obtained with 100 qPR trials exhibited good precision (the half width of the 68.2% credible interval = 0.055) and excellent agreement with those obtained with 1,600 trials of the conventional method of constant stimuli procedure (RMSE = 0.063). Quick partial-report relieves the data collection burden in characterizing iconic memory and makes it possible to assess iconic memory in clinical populations. PMID:27580045

  12. Unraveling multiple changes in complex climate time series using Bayesian inference

    NASA Astrophysics Data System (ADS)

    Berner, Nadine; Trauth, Martin H.; Holschneider, Matthias

    2016-04-01

    Change points in time series are perceived as heterogeneities in the statistical or dynamical characteristics of observations. Unraveling such transitions yields essential information for the understanding of the observed system. The precise detection and basic characterization of underlying changes is therefore of particular importance in environmental sciences. We present a kernel-based Bayesian inference approach to investigate direct as well as indirect climate observations for multiple generic transition events. In order to develop a diagnostic approach designed to capture a variety of natural processes, the basic statistical features of central tendency and dispersion are used to locally approximate a complex time series by a generic transition model. A Bayesian inversion approach is developed to robustly infer on the location and the generic patterns of such a transition. To systematically investigate time series for multiple changes occurring at different temporal scales, the Bayesian inversion is extended to a kernel-based inference approach. By introducing basic kernel measures, the kernel inference results are composed into a proxy probability to a posterior distribution of multiple transitions. Thus, based on a generic transition model a probability expression is derived that is capable to indicate multiple changes within a complex time series. We discuss the method's performance by investigating direct and indirect climate observations. The approach is applied to environmental time series (about 100 a), from the weather station in Tuscaloosa, Alabama, and confirms documented instrumentation changes. Moreover, the approach is used to investigate a set of complex terrigenous dust records from the ODP sites 659, 721/722 and 967 interpreted as climate indicators of the African region of the Plio-Pleistocene period (about 5 Ma). The detailed inference unravels multiple transitions underlying the indirect climate observations coinciding with established global climate events.

  13. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sigeti, David E.; Pelak, Robert A.

    We present a Bayesian statistical methodology for identifying improvement in predictive simulations, including an analysis of the number of (presumably expensive) simulations that will need to be made in order to establish with a given level of confidence that an improvement has been observed. Our analysis assumes the ability to predict (or postdict) the same experiments with legacy and new simulation codes and uses a simple binomial model for the probability, {theta}, that, in an experiment chosen at random, the new code will provide a better prediction than the old. This model makes it possible to do statistical analysis withmore » an absolute minimum of assumptions about the statistics of the quantities involved, at the price of discarding some potentially important information in the data. In particular, the analysis depends only on whether or not the new code predicts better than the old in any given experiment, and not on the magnitude of the improvement. We show how the posterior distribution for {theta} may be used, in a kind of Bayesian hypothesis testing, both to decide if an improvement has been observed and to quantify our confidence in that decision. We quantify the predictive probability that should be assigned, prior to taking any data, to the possibility of achieving a given level of confidence, as a function of sample size. We show how this predictive probability depends on the true value of {theta} and, in particular, how there will always be a region around {theta} = 1/2 where it is highly improbable that we will be able to identify an improvement in predictive capability, although the width of this region will shrink to zero as the sample size goes to infinity. We show how the posterior standard deviation may be used, as a kind of 'plan B metric' in the case that the analysis shows that {theta} is close to 1/2 and argue that such a plan B should generally be part of hypothesis testing. All the analysis presented in the paper is done with a general beta-function prior for {theta}, enabling sequential analysis in which a small number of new simulations may be done and the resulting posterior for {theta} used as a prior to inform the next stage of power analysis.« less

  14. Linking big models to big data: efficient ecosystem model calibration through Bayesian model emulation

    NASA Astrophysics Data System (ADS)

    Fer, I.; Kelly, R.; Andrews, T.; Dietze, M.; Richardson, A. D.

    2016-12-01

    Our ability to forecast ecosystems is limited by how well we parameterize ecosystem models. Direct measurements for all model parameters are not always possible and inverse estimation of these parameters through Bayesian methods is computationally costly. A solution to computational challenges of Bayesian calibration is to approximate the posterior probability surface using a Gaussian Process that emulates the complex process-based model. Here we report the integration of this method within an ecoinformatics toolbox, Predictive Ecosystem Analyzer (PEcAn), and its application with two ecosystem models: SIPNET and ED2.1. SIPNET is a simple model, allowing application of MCMC methods both to the model itself and to its emulator. We used both approaches to assimilate flux (CO2 and latent heat), soil respiration, and soil carbon data from Bartlett Experimental Forest. This comparison showed that emulator is reliable in terms of convergence to the posterior distribution. A 10000-iteration MCMC analysis with SIPNET itself required more than two orders of magnitude greater computation time than an MCMC run of same length with its emulator. This difference would be greater for a more computationally demanding model. Validation of the emulator-calibrated SIPNET against both the assimilated data and out-of-sample data showed improved fit and reduced uncertainty around model predictions. We next applied the validated emulator method to the ED2, whose complexity precludes standard Bayesian data assimilation. We used the ED2 emulator to assimilate demographic data from a network of inventory plots. For validation of the calibrated ED2, we compared the model to results from Empirical Succession Mapping (ESM), a novel synthesis of successional patterns in Forest Inventory and Analysis data. Our results revealed that while the pre-assimilation ED2 formulation cannot capture the emergent demographic patterns from ESM analysis, constrained model parameters controlling demographic processes increased their agreement considerably.

  15. A Comparison of a Bayesian and a Maximum Likelihood Tailored Testing Procedure.

    ERIC Educational Resources Information Center

    McKinley, Robert L.; Reckase, Mark D.

    A study was conducted to compare tailored testing procedures based on a Bayesian ability estimation technique and on a maximum likelihood ability estimation technique. The Bayesian tailored testing procedure selected items so as to minimize the posterior variance of the ability estimate distribution, while the maximum likelihood tailored testing…

  16. An empirical investigation into the role of subjective prior probability in searching for potentially missing items

    PubMed Central

    Fanshawe, T. R.

    2015-01-01

    There are many examples from the scientific literature of visual search tasks in which the length, scope and success rate of the search have been shown to vary according to the searcher's expectations of whether the search target is likely to be present. This phenomenon has major practical implications, for instance in cancer screening, when the prevalence of the condition is low and the consequences of a missed disease diagnosis are severe. We consider this problem from an empirical Bayesian perspective to explain how the effect of a low prior probability, subjectively assessed by the searcher, might impact on the extent of the search. We show how the searcher's posterior probability that the target is present depends on the prior probability and the proportion of possible target locations already searched, and also consider the implications of imperfect search, when the probability of false-positive and false-negative decisions is non-zero. The theoretical results are applied to two studies of radiologists' visual assessment of pulmonary lesions on chest radiographs. Further application areas in diagnostic medicine and airport security are also discussed. PMID:26587267

  17. Relevance Vector Machine Learning for Neonate Pain Intensity Assessment Using Digital Imaging

    PubMed Central

    Gholami, Behnood; Tannenbaum, Allen R.

    2011-01-01

    Pain assessment in patients who are unable to verbally communicate is a challenging problem. The fundamental limitations in pain assessment in neonates stem from subjective assessment criteria, rather than quantifiable and measurable data. This often results in poor quality and inconsistent treatment of patient pain management. Recent advancements in pattern recognition techniques using relevance vector machine (RVM) learning techniques can assist medical staff in assessing pain by constantly monitoring the patient and providing the clinician with quantifiable data for pain management. The RVM classification technique is a Bayesian extension of the support vector machine (SVM) algorithm, which achieves comparable performance to SVM while providing posterior probabilities for class memberships and a sparser model. If classes represent “pure” facial expressions (i.e., extreme expressions that an observer can identify with a high degree of confidence), then the posterior probability of the membership of some intermediate facial expression to a class can provide an estimate of the intensity of such an expression. In this paper, we use the RVM classification technique to distinguish pain from nonpain in neonates as well as assess their pain intensity levels. We also correlate our results with the pain intensity assessed by expert and nonexpert human examiners. PMID:20172803

  18. On the Empirical Importance of the Conditional Skewness Assumption in Modelling the Relationship between Risk and Return

    NASA Astrophysics Data System (ADS)

    Pipień, M.

    2008-09-01

    We present the results of an application of Bayesian inference in testing the relation between risk and return on the financial instruments. On the basis of the Intertemporal Capital Asset Pricing Model, proposed by Merton we built a general sampling distribution suitable in analysing this relationship. The most important feature of our assumptions is that the skewness of the conditional distribution of returns is used as an alternative source of relation between risk and return. This general specification relates to Skewed Generalized Autoregressive Conditionally Heteroscedastic-in-Mean model. In order to make conditional distribution of financial returns skewed we considered the unified approach based on the inverse probability integral transformation. In particular, we applied hidden truncation mechanism, inverse scale factors, order statistics concept, Beta and Bernstein distribution transformations and also a constructive method. Based on the daily excess returns on the Warsaw Stock Exchange Index we checked the empirical importance of the conditional skewness assumption on the relation between risk and return on the Warsaw Stock Market. We present posterior probabilities of all competing specifications as well as the posterior analysis of the positive sign of the tested relationship.

  19. Model-based Bayesian inference for ROC data analysis

    NASA Astrophysics Data System (ADS)

    Lei, Tianhu; Bae, K. Ty

    2013-03-01

    This paper presents a study of model-based Bayesian inference to Receiver Operating Characteristics (ROC) data. The model is a simple version of general non-linear regression model. Different from Dorfman model, it uses a probit link function with a covariate variable having zero-one two values to express binormal distributions in a single formula. Model also includes a scale parameter. Bayesian inference is implemented by Markov Chain Monte Carlo (MCMC) method carried out by Bayesian analysis Using Gibbs Sampling (BUGS). Contrast to the classical statistical theory, Bayesian approach considers model parameters as random variables characterized by prior distributions. With substantial amount of simulated samples generated by sampling algorithm, posterior distributions of parameters as well as parameters themselves can be accurately estimated. MCMC-based BUGS adopts Adaptive Rejection Sampling (ARS) protocol which requires the probability density function (pdf) which samples are drawing from be log concave with respect to the targeted parameters. Our study corrects a common misconception and proves that pdf of this regression model is log concave with respect to its scale parameter. Therefore, ARS's requirement is satisfied and a Gaussian prior which is conjugate and possesses many analytic and computational advantages is assigned to the scale parameter. A cohort of 20 simulated data sets and 20 simulations from each data set are used in our study. Output analysis and convergence diagnostics for MCMC method are assessed by CODA package. Models and methods by using continuous Gaussian prior and discrete categorical prior are compared. Intensive simulations and performance measures are given to illustrate our practice in the framework of model-based Bayesian inference using MCMC method.

  20. A Bayesian hierarchical model for mortality data from cluster-sampling household surveys in humanitarian crises.

    PubMed

    Heudtlass, Peter; Guha-Sapir, Debarati; Speybroeck, Niko

    2018-05-31

    The crude death rate (CDR) is one of the defining indicators of humanitarian emergencies. When data from vital registration systems are not available, it is common practice to estimate the CDR from household surveys with cluster-sampling design. However, sample sizes are often too small to compare mortality estimates to emergency thresholds, at least in a frequentist framework. Several authors have proposed Bayesian methods for health surveys in humanitarian crises. Here, we develop an approach specifically for mortality data and cluster-sampling surveys. We describe a Bayesian hierarchical Poisson-Gamma mixture model with generic (weakly informative) priors that could be used as default in absence of any specific prior knowledge, and compare Bayesian and frequentist CDR estimates using five different mortality datasets. We provide an interpretation of the Bayesian estimates in the context of an emergency threshold and demonstrate how to interpret parameters at the cluster level and ways in which informative priors can be introduced. With the same set of weakly informative priors, Bayesian CDR estimates are equivalent to frequentist estimates, for all practical purposes. The probability that the CDR surpasses the emergency threshold can be derived directly from the posterior of the mean of the mixing distribution. All observation in the datasets contribute to the estimation of cluster-level estimates, through the hierarchical structure of the model. In a context of sparse data, Bayesian mortality assessments have advantages over frequentist ones already when using only weakly informative priors. More informative priors offer a formal and transparent way of combining new data with existing data and expert knowledge and can help to improve decision-making in humanitarian crises by complementing frequentist estimates.

  1. Star Cluster Properties in Two LEGUS Galaxies Computed with Stochastic Stellar Population Synthesis Models

    NASA Astrophysics Data System (ADS)

    Krumholz, Mark R.; Adamo, Angela; Fumagalli, Michele; Wofford, Aida; Calzetti, Daniela; Lee, Janice C.; Whitmore, Bradley C.; Bright, Stacey N.; Grasha, Kathryn; Gouliermis, Dimitrios A.; Kim, Hwihyun; Nair, Preethi; Ryon, Jenna E.; Smith, Linda J.; Thilker, David; Ubeda, Leonardo; Zackrisson, Erik

    2015-10-01

    We investigate a novel Bayesian analysis method, based on the Stochastically Lighting Up Galaxies (slug) code, to derive the masses, ages, and extinctions of star clusters from integrated light photometry. Unlike many analysis methods, slug correctly accounts for incomplete initial mass function (IMF) sampling, and returns full posterior probability distributions rather than simply probability maxima. We apply our technique to 621 visually confirmed clusters in two nearby galaxies, NGC 628 and NGC 7793, that are part of the Legacy Extragalactic UV Survey (LEGUS). LEGUS provides Hubble Space Telescope photometry in the NUV, U, B, V, and I bands. We analyze the sensitivity of the derived cluster properties to choices of prior probability distribution, evolutionary tracks, IMF, metallicity, treatment of nebular emission, and extinction curve. We find that slug's results for individual clusters are insensitive to most of these choices, but that the posterior probability distributions we derive are often quite broad, and sometimes multi-peaked and quite sensitive to the choice of priors. In contrast, the properties of the cluster population as a whole are relatively robust against all of these choices. We also compare our results from slug to those derived with a conventional non-stochastic fitting code, Yggdrasil. We show that slug's stochastic models are generally a better fit to the observations than the deterministic ones used by Yggdrasil. However, the overall properties of the cluster populations recovered by both codes are qualitatively similar.

  2. Effect of supersaturated oxygen delivery on infarct size after percutaneous coronary intervention in acute myocardial infarction.

    PubMed

    Stone, Gregg W; Martin, Jack L; de Boer, Menko-Jan; Margheri, Massimo; Bramucci, Ezio; Blankenship, James C; Metzger, D Christopher; Gibbons, Raymond J; Lindsay, Barbara S; Weiner, Bonnie H; Lansky, Alexandra J; Krucoff, Mitchell W; Fahy, Martin; Boscardin, W John

    2009-10-01

    Myocardial salvage is often suboptimal after percutaneous coronary intervention in ST-segment elevation myocardial infarction. Posthoc subgroup analysis from a previous trial (AMIHOT I) suggested that intracoronary delivery of supersaturated oxygen (SSO(2)) may reduce infarct size in patients with large ST-segment elevation myocardial infarction treated early. A prospective, multicenter trial was performed in which 301 patients with anterior ST-segment elevation myocardial infarction undergoing percutaneous coronary intervention within 6 hours of symptom onset were randomized to a 90-minute intracoronary SSO(2) infusion in the left anterior descending artery infarct territory (n=222) or control (n=79). The primary efficacy measure was infarct size in the intention-to-treat population (powered for superiority), and the primary safety measure was composite major adverse cardiovascular events at 30 days in the intention-to-treat and per-protocol populations (powered for noninferiority), with Bayesian hierarchical modeling used to allow partial pooling of evidence from AMIHOT I. Among 281 randomized patients with tc-99m-sestamibi single-photon emission computed tomography data in AMIHOT II, median (interquartile range) infarct size was 26.5% (8.5%, 44%) with control compared with 20% (6%, 37%) after SSO(2). The pooled adjusted infarct size was 25% (7%, 42%) with control compared with 18.5% (3.5%, 34.5%) after SSO(2) (P(Wilcoxon)=0.02; Bayesian posterior probability of superiority, 96.9%). The Bayesian pooled 30-day mean (+/-SE) rates of major adverse cardiovascular events were 5.0+/-1.4% for control and 5.9+/-1.4% for SSO(2) by intention-to-treat, and 5.1+/-1.5% for control and 4.7+/-1.5% for SSO(2) by per-protocol analysis (posterior probability of noninferiority, 99.5% and 99.9%, respectively). Among patients with anterior ST-segment elevation myocardial infarction undergoing percutaneous coronary intervention within 6 hours of symptom onset, infusion of SSO(2) into the left anterior descending artery infarct territory results in a significant reduction in infarct size with noninferior rates of major adverse cardiovascular events at 30 days. Clinical Trial Registration- clinicaltrials.gov Identifier: NCT00175058.

  3. Bayesian ensemble refinement by replica simulations and reweighting.

    PubMed

    Hummer, Gerhard; Köfinger, Jürgen

    2015-12-28

    We describe different Bayesian ensemble refinement methods, examine their interrelation, and discuss their practical application. With ensemble refinement, the properties of dynamic and partially disordered (bio)molecular structures can be characterized by integrating a wide range of experimental data, including measurements of ensemble-averaged observables. We start from a Bayesian formulation in which the posterior is a functional that ranks different configuration space distributions. By maximizing this posterior, we derive an optimal Bayesian ensemble distribution. For discrete configurations, this optimal distribution is identical to that obtained by the maximum entropy "ensemble refinement of SAXS" (EROS) formulation. Bayesian replica ensemble refinement enhances the sampling of relevant configurations by imposing restraints on averages of observables in coupled replica molecular dynamics simulations. We show that the strength of the restraints should scale linearly with the number of replicas to ensure convergence to the optimal Bayesian result in the limit of infinitely many replicas. In the "Bayesian inference of ensembles" method, we combine the replica and EROS approaches to accelerate the convergence. An adaptive algorithm can be used to sample directly from the optimal ensemble, without replicas. We discuss the incorporation of single-molecule measurements and dynamic observables such as relaxation parameters. The theoretical analysis of different Bayesian ensemble refinement approaches provides a basis for practical applications and a starting point for further investigations.

  4. Bayesian ensemble refinement by replica simulations and reweighting

    NASA Astrophysics Data System (ADS)

    Hummer, Gerhard; Köfinger, Jürgen

    2015-12-01

    We describe different Bayesian ensemble refinement methods, examine their interrelation, and discuss their practical application. With ensemble refinement, the properties of dynamic and partially disordered (bio)molecular structures can be characterized by integrating a wide range of experimental data, including measurements of ensemble-averaged observables. We start from a Bayesian formulation in which the posterior is a functional that ranks different configuration space distributions. By maximizing this posterior, we derive an optimal Bayesian ensemble distribution. For discrete configurations, this optimal distribution is identical to that obtained by the maximum entropy "ensemble refinement of SAXS" (EROS) formulation. Bayesian replica ensemble refinement enhances the sampling of relevant configurations by imposing restraints on averages of observables in coupled replica molecular dynamics simulations. We show that the strength of the restraints should scale linearly with the number of replicas to ensure convergence to the optimal Bayesian result in the limit of infinitely many replicas. In the "Bayesian inference of ensembles" method, we combine the replica and EROS approaches to accelerate the convergence. An adaptive algorithm can be used to sample directly from the optimal ensemble, without replicas. We discuss the incorporation of single-molecule measurements and dynamic observables such as relaxation parameters. The theoretical analysis of different Bayesian ensemble refinement approaches provides a basis for practical applications and a starting point for further investigations.

  5. Hierarchical Bayesian Spatio–Temporal Analysis of Climatic and Socio–Economic Determinants of Rocky Mountain Spotted Fever

    PubMed Central

    Raghavan, Ram K.; Goodin, Douglas G.; Neises, Daniel; Anderson, Gary A.; Ganta, Roman R.

    2016-01-01

    This study aims to examine the spatio-temporal dynamics of Rocky Mountain spotted fever (RMSF) prevalence in four contiguous states of Midwestern United States, and to determine the impact of environmental and socio–economic factors associated with this disease. Bayesian hierarchical models were used to quantify space and time only trends and spatio–temporal interaction effect in the case reports submitted to the state health departments in the region. Various socio–economic, environmental and climatic covariates screened a priori in a bivariate procedure were added to a main–effects Bayesian model in progressive steps to evaluate important drivers of RMSF space-time patterns in the region. Our results show a steady increase in RMSF incidence over the study period to newer geographic areas, and the posterior probabilities of county-specific trends indicate clustering of high risk counties in the central and southern parts of the study region. At the spatial scale of a county, the prevalence levels of RMSF is influenced by poverty status, average relative humidity, and average land surface temperature (>35°C) in the region, and the relevance of these factors in the context of climate–change impacts on tick–borne diseases are discussed. PMID:26942604

  6. Hierarchical Bayesian Spatio-Temporal Analysis of Climatic and Socio-Economic Determinants of Rocky Mountain Spotted Fever.

    PubMed

    Raghavan, Ram K; Goodin, Douglas G; Neises, Daniel; Anderson, Gary A; Ganta, Roman R

    2016-01-01

    This study aims to examine the spatio-temporal dynamics of Rocky Mountain spotted fever (RMSF) prevalence in four contiguous states of Midwestern United States, and to determine the impact of environmental and socio-economic factors associated with this disease. Bayesian hierarchical models were used to quantify space and time only trends and spatio-temporal interaction effect in the case reports submitted to the state health departments in the region. Various socio-economic, environmental and climatic covariates screened a priori in a bivariate procedure were added to a main-effects Bayesian model in progressive steps to evaluate important drivers of RMSF space-time patterns in the region. Our results show a steady increase in RMSF incidence over the study period to newer geographic areas, and the posterior probabilities of county-specific trends indicate clustering of high risk counties in the central and southern parts of the study region. At the spatial scale of a county, the prevalence levels of RMSF is influenced by poverty status, average relative humidity, and average land surface temperature (>35°C) in the region, and the relevance of these factors in the context of climate-change impacts on tick-borne diseases are discussed.

  7. Quantitative trait nucleotide analysis using Bayesian model selection.

    PubMed

    Blangero, John; Goring, Harald H H; Kent, Jack W; Williams, Jeff T; Peterson, Charles P; Almasy, Laura; Dyer, Thomas D

    2005-10-01

    Although much attention has been given to statistical genetic methods for the initial localization and fine mapping of quantitative trait loci (QTLs), little methodological work has been done to date on the problem of statistically identifying the most likely functional polymorphisms using sequence data. In this paper we provide a general statistical genetic framework, called Bayesian quantitative trait nucleotide (BQTN) analysis, for assessing the likely functional status of genetic variants. The approach requires the initial enumeration of all genetic variants in a set of resequenced individuals. These polymorphisms are then typed in a large number of individuals (potentially in families), and marker variation is related to quantitative phenotypic variation using Bayesian model selection and averaging. For each sequence variant a posterior probability of effect is obtained and can be used to prioritize additional molecular functional experiments. An example of this quantitative nucleotide analysis is provided using the GAW12 simulated data. The results show that the BQTN method may be useful for choosing the most likely functional variants within a gene (or set of genes). We also include instructions on how to use our computer program, SOLAR, for association analysis and BQTN analysis.

  8. The DNA database search controversy revisited: bridging the Bayesian-frequentist gap.

    PubMed

    Storvik, Geir; Egeland, Thore

    2007-09-01

    Two different quantities have been suggested for quantification of evidence in cases where a suspect is found by a search through a database of DNA profiles. The likelihood ratio, typically motivated from a Bayesian setting, is preferred by most experts in the field. The so-called np rule has been suggested through frequentist arguments and has been suggested by the American National Research Council and Stockmarr (1999, Biometrics55, 671-677). The two quantities differ substantially and have given rise to the DNA database search controversy. Although several authors have criticized the different approaches, a full explanation of why these differences appear is still lacking. In this article we show that a P-value in a frequentist hypothesis setting is approximately equal to the result of the np rule. We argue, however, that a more reasonable procedure in this case is to use conditional testing, in which case a P-value directly related to posterior probabilities and the likelihood ratio is obtained. This way of viewing the problem bridges the gap between the Bayesian and frequentist approaches. At the same time it indicates that the np rule should not be used to quantify evidence.

  9. BASiCS: Bayesian Analysis of Single-Cell Sequencing Data

    PubMed Central

    Vallejos, Catalina A.; Marioni, John C.; Richardson, Sylvia

    2015-01-01

    Single-cell mRNA sequencing can uncover novel cell-to-cell heterogeneity in gene expression levels in seemingly homogeneous populations of cells. However, these experiments are prone to high levels of unexplained technical noise, creating new challenges for identifying genes that show genuine heterogeneous expression within the population of cells under study. BASiCS (Bayesian Analysis of Single-Cell Sequencing data) is an integrated Bayesian hierarchical model where: (i) cell-specific normalisation constants are estimated as part of the model parameters, (ii) technical variability is quantified based on spike-in genes that are artificially introduced to each analysed cell’s lysate and (iii) the total variability of the expression counts is decomposed into technical and biological components. BASiCS also provides an intuitive detection criterion for highly (or lowly) variable genes within the population of cells under study. This is formalised by means of tail posterior probabilities associated to high (or low) biological cell-to-cell variance contributions, quantities that can be easily interpreted by users. We demonstrate our method using gene expression measurements from mouse Embryonic Stem Cells. Cross-validation and meaningful enrichment of gene ontology categories within genes classified as highly (or lowly) variable supports the efficacy of our approach. PMID:26107944

  10. BASiCS: Bayesian Analysis of Single-Cell Sequencing Data.

    PubMed

    Vallejos, Catalina A; Marioni, John C; Richardson, Sylvia

    2015-06-01

    Single-cell mRNA sequencing can uncover novel cell-to-cell heterogeneity in gene expression levels in seemingly homogeneous populations of cells. However, these experiments are prone to high levels of unexplained technical noise, creating new challenges for identifying genes that show genuine heterogeneous expression within the population of cells under study. BASiCS (Bayesian Analysis of Single-Cell Sequencing data) is an integrated Bayesian hierarchical model where: (i) cell-specific normalisation constants are estimated as part of the model parameters, (ii) technical variability is quantified based on spike-in genes that are artificially introduced to each analysed cell's lysate and (iii) the total variability of the expression counts is decomposed into technical and biological components. BASiCS also provides an intuitive detection criterion for highly (or lowly) variable genes within the population of cells under study. This is formalised by means of tail posterior probabilities associated to high (or low) biological cell-to-cell variance contributions, quantities that can be easily interpreted by users. We demonstrate our method using gene expression measurements from mouse Embryonic Stem Cells. Cross-validation and meaningful enrichment of gene ontology categories within genes classified as highly (or lowly) variable supports the efficacy of our approach.

  11. Bayesian approach to estimate AUC, partition coefficient and drug targeting index for studies with serial sacrifice design.

    PubMed

    Wang, Tianli; Baron, Kyle; Zhong, Wei; Brundage, Richard; Elmquist, William

    2014-03-01

    The current study presents a Bayesian approach to non-compartmental analysis (NCA), which provides the accurate and precise estimate of AUC 0 (∞) and any AUC 0 (∞) -based NCA parameter or derivation. In order to assess the performance of the proposed method, 1,000 simulated datasets were generated in different scenarios. A Bayesian method was used to estimate the tissue and plasma AUC 0 (∞) s and the tissue-to-plasma AUC 0 (∞) ratio. The posterior medians and the coverage of 95% credible intervals for the true parameter values were examined. The method was applied to laboratory data from a mice brain distribution study with serial sacrifice design for illustration. Bayesian NCA approach is accurate and precise in point estimation of the AUC 0 (∞) and the partition coefficient under a serial sacrifice design. It also provides a consistently good variance estimate, even considering the variability of the data and the physiological structure of the pharmacokinetic model. The application in the case study obtained a physiologically reasonable posterior distribution of AUC, with a posterior median close to the value estimated by classic Bailer-type methods. This Bayesian NCA approach for sparse data analysis provides statistical inference on the variability of AUC 0 (∞) -based parameters such as partition coefficient and drug targeting index, so that the comparison of these parameters following destructive sampling becomes statistically feasible.

  12. Using Bayesian neural networks to classify forest scenes

    NASA Astrophysics Data System (ADS)

    Vehtari, Aki; Heikkonen, Jukka; Lampinen, Jouko; Juujarvi, Jouni

    1998-10-01

    We present results that compare the performance of Bayesian learning methods for neural networks on the task of classifying forest scenes into trees and background. Classification task is demanding due to the texture richness of the trees, occlusions of the forest scene objects and diverse lighting conditions under operation. This makes it difficult to determine which are optimal image features for the classification. A natural way to proceed is to extract many different types of potentially suitable features, and to evaluate their usefulness in later processing stages. One approach to cope with large number of features is to use Bayesian methods to control the model complexity. Bayesian learning uses a prior on model parameters, combines this with evidence from a training data, and the integrates over the resulting posterior to make predictions. With this method, we can use large networks and many features without fear of overfitting. For this classification task we compare two Bayesian learning methods for multi-layer perceptron (MLP) neural networks: (1) The evidence framework of MacKay uses a Gaussian approximation to the posterior weight distribution and maximizes with respect to hyperparameters. (2) In a Markov Chain Monte Carlo (MCMC) method due to Neal, the posterior distribution of the network parameters is numerically integrated using the MCMC method. As baseline classifiers for comparison we use (3) MLP early stop committee, (4) K-nearest-neighbor and (5) Classification And Regression Tree.

  13. General Metropolis-Hastings jump diffusions for automatic target recognition in infrared scenes

    NASA Astrophysics Data System (ADS)

    Lanterman, Aaron D.; Miller, Michael I.; Snyder, Donald L.

    1997-04-01

    To locate and recognize ground-based targets in forward- looking IR (FLIR) images, 3D faceted models with associated pose parameters are formulated to accommodate the variability found in FLIR imagery. Taking a Bayesian approach, scenes are simulated from the emissive characteristics of the CAD models and compared with the collected data by a likelihood function based on sensor statistics. This likelihood is combined with a prior distribution defined over the set of possible scenes to form a posterior distribution. To accommodate scenes with variable numbers of targets, the posterior distribution is defined over parameter vectors of varying dimension. An inference algorithm based on Metropolis-Hastings jump- diffusion processes empirically samples from the posterior distribution, generating configurations of templates and transformations that match the collected sensor data with high probability. The jumps accommodate the addition and deletion of targets and the estimation of target identities; diffusions refine the hypotheses by drifting along the gradient of the posterior distribution with respect to the orientation and position parameters. Previous results on jumps strategies analogous to the Metropolis acceptance/rejection algorithm, with proposals drawn from the prior and accepted based on the likelihood, are extended to encompass general Metropolis-Hastings proposal densities. In particular, the algorithm proposes moves by drawing from the posterior distribution over computationally tractible subsets of the parameter space. The algorithm is illustrated by an implementation on a Silicon Graphics Onyx/Reality Engine.

  14. A model to systematically employ professional judgment in the Bayesian Decision Analysis for a semiconductor industry exposure assessment.

    PubMed

    Torres, Craig; Jones, Rachael; Boelter, Fred; Poole, James; Dell, Linda; Harper, Paul

    2014-01-01

    Bayesian Decision Analysis (BDA) uses Bayesian statistics to integrate multiple types of exposure information and classify exposures within the exposure rating categorization scheme promoted in American Industrial Hygiene Association (AIHA) publications. Prior distributions for BDA may be developed from existing monitoring data, mathematical models, or professional judgment. Professional judgments may misclassify exposures. We suggest that a structured qualitative risk assessment (QLRA) method can provide consistency and transparency in professional judgments. In this analysis, we use a structured QLRA method to define prior distributions (priors) for BDA. We applied this approach at three semiconductor facilities in South Korea, and present an evaluation of the performance of structured QLRA for determination of priors, and an evaluation of occupational exposures using BDA. Specifically, the structured QLRA was applied to chemical agents in similar exposure groups to identify provisional risk ratings. Standard priors were developed for each risk rating before review of historical monitoring data. Newly collected monitoring data were used to update priors informed by QLRA or historical monitoring data, and determine the posterior distribution. Exposure ratings were defined by the rating category with the highest probability--i.e., the most likely. We found the most likely exposure rating in the QLRA-informed priors to be consistent with historical and newly collected monitoring data, and the posterior exposure ratings developed with QLRA-informed priors to be equal to or greater than those developed with data-informed priors in 94% of comparisons. Overall, exposures at these facilities are consistent with well-controlled work environments. That is, the 95th percentile of exposure distributions are ≤50% of the occupational exposure limit (OEL) for all chemical-SEG combinations evaluated; and are ≤10% of the limit for 94% of chemical-SEG combinations evaluated.

  15. Survival Bayesian Estimation of Exponential-Gamma Under Linex Loss Function

    NASA Astrophysics Data System (ADS)

    Rizki, S. W.; Mara, M. N.; Sulistianingsih, E.

    2017-06-01

    This paper elaborates a research of the cancer patients after receiving a treatment in cencored data using Bayesian estimation under Linex Loss function for Survival Model which is assumed as an exponential distribution. By giving Gamma distribution as prior and likelihood function produces a gamma distribution as posterior distribution. The posterior distribution is used to find estimatior {\\hat{λ }}BL by using Linex approximation. After getting {\\hat{λ }}BL, the estimators of hazard function {\\hat{h}}BL and survival function {\\hat{S}}BL can be found. Finally, we compare the result of Maximum Likelihood Estimation (MLE) and Linex approximation to find the best method for this observation by finding smaller MSE. The result shows that MSE of hazard and survival under MLE are 2.91728E-07 and 0.000309004 and by using Bayesian Linex worths 2.8727E-07 and 0.000304131, respectively. It concludes that the Bayesian Linex is better than MLE.

  16. A Bayesian approach to tracking patients having changing pharmacokinetic parameters

    NASA Technical Reports Server (NTRS)

    Bayard, David S.; Jelliffe, Roger W.

    2004-01-01

    This paper considers the updating of Bayesian posterior densities for pharmacokinetic models associated with patients having changing parameter values. For estimation purposes it is proposed to use the Interacting Multiple Model (IMM) estimation algorithm, which is currently a popular algorithm in the aerospace community for tracking maneuvering targets. The IMM algorithm is described, and compared to the multiple model (MM) and Maximum A-Posteriori (MAP) Bayesian estimation methods, which are presently used for posterior updating when pharmacokinetic parameters do not change. Both the MM and MAP Bayesian estimation methods are used in their sequential forms, to facilitate tracking of changing parameters. Results indicate that the IMM algorithm is well suited for tracking time-varying pharmacokinetic parameters in acutely ill and unstable patients, incurring only about half of the integrated error compared to the sequential MM and MAP methods on the same example.

  17. Seismic imaging of Q structures by a trans-dimensional coda-wave analysis

    NASA Astrophysics Data System (ADS)

    Takahashi, Tsutomu

    2017-04-01

    Wave scattering and intrinsic attenuation are important processes to describe incoherent and complex wave trains of high frequency seismic wave (>1Hz). The multiple lapse time window analysis (MLTWA) has been used to estimate scattering and intrinsic Q values by assuming constant Q in a study area (e.g., Hoshiba 1993). This study generalizes this MLTWA to estimate lateral variations of Q values under the Bayesian framework in dimension variable space. Study area is partitioned into small areas by means of the Voronoi tessellation. Scattering and intrinsic Q in each small area are constant. We define a misfit function for spatiotemporal variations of wave energy as with the original MLTWA, and maximize the posterior probability with changing not only Q values but the number and spatial layout of the Voronoi cells. This maximization is conducted by means of the reversible jump Markov chain Monte Carlo (rjMCMC) (Green 1995) since the number of unknown parameters (i.e., dimension of posterior probability) is variable. After a convergence to the maximum posterior, we estimate Q structures from the ensemble averages of MCMC samples around the maximum posterior probability. Synthetic tests showed stable reconstructions of input structures with reasonable error distributions. We applied this method for seismic waveform data recorded by ocean bottom seismograms at the outer-rise area off Tohoku, and estimated Q values at 4-8Hz, 8-16Hz and 16-32Hz. Intrinsic Q are nearly constant at all frequency bands, and scattering Q shows two distinct strong scattering regions at petit spot area and high seismicity area. These strong scattering are probably related to magma inclusions and fractured structure, respectively. Difference between these two areas becomes clear at high frequencies. It means that scale dependences of inhomogeneities or smaller scale inhomogeneity is important to discuss medium property and origins of structural variations. While the generalized MLTWA is based on a classical waveform modeling in constant Q medium, this method can be a fundamental basis for Q structure imaging in the crust.

  18. Dimension-independent likelihood-informed MCMC

    DOE PAGES

    Cui, Tiangang; Law, Kody J. H.; Marzouk, Youssef M.

    2015-10-08

    Many Bayesian inference problems require exploring the posterior distribution of highdimensional parameters that represent the discretization of an underlying function. Our work introduces a family of Markov chain Monte Carlo (MCMC) samplers that can adapt to the particular structure of a posterior distribution over functions. There are two distinct lines of research that intersect in the methods we develop here. First, we introduce a general class of operator-weighted proposal distributions that are well defined on function space, such that the performance of the resulting MCMC samplers is independent of the discretization of the function. Second, by exploiting local Hessian informationmore » and any associated lowdimensional structure in the change from prior to posterior distributions, we develop an inhomogeneous discretization scheme for the Langevin stochastic differential equation that yields operator-weighted proposals adapted to the non-Gaussian structure of the posterior. The resulting dimension-independent and likelihood-informed (DILI) MCMC samplers may be useful for a large class of high-dimensional problems where the target probability measure has a density with respect to a Gaussian reference measure. Finally, we use two nonlinear inverse problems in order to demonstrate the efficiency of these DILI samplers: an elliptic PDE coefficient inverse problem and path reconstruction in a conditioned diffusion.« less

  19. The phylogenetic relationships of known mosquito (Diptera: Culicidae) mitogenomes.

    PubMed

    Chu, Hongliang; Li, Chunxiao; Guo, Xiaoxia; Zhang, Hengduan; Luo, Peng; Wu, Zhonghua; Wang, Gang; Zhao, Tongyan

    2018-01-01

    The known mosquito mitogenomes, containing a total of 34 species, which belong to five genera, were collected from GenBank, and the practicality and effectiveness of the variation in the complete mitochondrial DNA genome and portions of mitochondrial COI gene were assessed to reconstruct the phylogeny of mosquitoes. Phylogenetic trees were reconstructed on the basis of parsimony, maximum likelihood, and Bayesian (BI) methods. It is concluded that: (1) Both mitogenomes and COI gene support the monophly of following taxa: Subgenus Nyssorhynchus, Subgenus Cellia, Anopheles albitarsis complex, Anopheles gambiae complex, and Anopheles punctulatus group; (2) Genus Aedes is not monophyletic relative to Ochlerotatus vigilax; (3) The mitogenome results indicate a close relationship between Anopheles epiroticus and Anopheles gambiae complex, Anopheles dirus complex and Anopheles punctulatus group, respectively; (4) The Bayesian posterior probability (BPP) within phylogenetic tree reconstructed by mitogenomes is higher than COI tree. The results show that phylogenetic relationships reconstructed using the mitogenomes were more similar to those based on morphological data.

  20. Information and Entropy

    NASA Astrophysics Data System (ADS)

    Caticha, Ariel

    2007-11-01

    What is information? Is it physical? We argue that in a Bayesian theory the notion of information must be defined in terms of its effects on the beliefs of rational agents. Information is whatever constrains rational beliefs and therefore it is the force that induces us to change our minds. This problem of updating from a prior to a posterior probability distribution is tackled through an eliminative induction process that singles out the logarithmic relative entropy as the unique tool for inference. The resulting method of Maximum relative Entropy (ME), which is designed for updating from arbitrary priors given information in the form of arbitrary constraints, includes as special cases both MaxEnt (which allows arbitrary constraints) and Bayes' rule (which allows arbitrary priors). Thus, ME unifies the two themes of these workshops—the Maximum Entropy and the Bayesian methods—into a single general inference scheme that allows us to handle problems that lie beyond the reach of either of the two methods separately. I conclude with a couple of simple illustrative examples.

  1. Bayesian inference of physiologically meaningful parameters from body sway measurements.

    PubMed

    Tietäväinen, A; Gutmann, M U; Keski-Vakkuri, E; Corander, J; Hæggström, E

    2017-06-19

    The control of the human body sway by the central nervous system, muscles, and conscious brain is of interest since body sway carries information about the physiological status of a person. Several models have been proposed to describe body sway in an upright standing position, however, due to the statistical intractability of the more realistic models, no formal parameter inference has previously been conducted and the expressive power of such models for real human subjects remains unknown. Using the latest advances in Bayesian statistical inference for intractable models, we fitted a nonlinear control model to posturographic measurements, and we showed that it can accurately predict the sway characteristics of both simulated and real subjects. Our method provides a full statistical characterization of the uncertainty related to all model parameters as quantified by posterior probability density functions, which is useful for comparisons across subjects and test settings. The ability to infer intractable control models from sensor data opens new possibilities for monitoring and predicting body status in health applications.

  2. COSMOABC: Likelihood-free inference via Population Monte Carlo Approximate Bayesian Computation

    NASA Astrophysics Data System (ADS)

    Ishida, E. E. O.; Vitenti, S. D. P.; Penna-Lima, M.; Cisewski, J.; de Souza, R. S.; Trindade, A. M. M.; Cameron, E.; Busti, V. C.; COIN Collaboration

    2015-11-01

    Approximate Bayesian Computation (ABC) enables parameter inference for complex physical systems in cases where the true likelihood function is unknown, unavailable, or computationally too expensive. It relies on the forward simulation of mock data and comparison between observed and synthetic catalogues. Here we present COSMOABC, a Python ABC sampler featuring a Population Monte Carlo variation of the original ABC algorithm, which uses an adaptive importance sampling scheme. The code is very flexible and can be easily coupled to an external simulator, while allowing to incorporate arbitrary distance and prior functions. As an example of practical application, we coupled COSMOABC with the NUMCOSMO library and demonstrate how it can be used to estimate posterior probability distributions over cosmological parameters based on measurements of galaxy clusters number counts without computing the likelihood function. COSMOABC is published under the GPLv3 license on PyPI and GitHub and documentation is available at http://goo.gl/SmB8EX.

  3. A taxonomic monograph of Nearctic Scolytus Geoffroy (Coleoptera, Curculionidae, Scolytinae).

    PubMed

    Smith, Sarah M; Cognato, Anthony I

    2014-01-01

    The Nearctic bark beetle genus Scolytus Geoffroy was revised based in part on a molecular and morphological phylogeny. Monophyly of the native species was tested using mitochondrial (COI) and nuclear (28S, CAD, ArgK) genes and 43 morphological characters in parsimony and Bayesian phylogenetic analyses. Parsimony analyses of molecular and combined datasets provided mixed results while Bayesian analysis recovered most nodes with posterior probabilities >90%. Native hardwood- and conifer-feeding Scolytus species were recovered as paraphyletic. Native Nearctic species were recovered as paraphyletic with hardwood-feeding species sister to Palearctic hardwood-feeding species rather than to native conifer-feeding species. The Nearctic conifer-feeding species were monophyletic. Twenty-five species were recognized. Four new synonyms were discovered: Scolytuspraeceps LeConte, 1868 (= Scolytusabietis Blackman, 1934; = Scolytusopacus Blackman, 1934), Scolytusreflexus Blackman, 1934 (= Scolytusvirgatus Bright, 1972; = Scolytuswickhami Blackman, 1934). Two species were reinstated: Scolytusfiskei Blackman, 1934 and Scolytussilvaticus Bright, 1972. A diagnosis, description, distribution, host records and images were provided for each species and a key is presented to all species.

  4. Random Partition Distribution Indexed by Pairwise Information

    PubMed Central

    Dahl, David B.; Day, Ryan; Tsai, Jerry W.

    2017-01-01

    We propose a random partition distribution indexed by pairwise similarity information such that partitions compatible with the similarities are given more probability. The use of pairwise similarities, in the form of distances, is common in some clustering algorithms (e.g., hierarchical clustering), but we show how to use this type of information to define a prior partition distribution for flexible Bayesian modeling. A defining feature of the distribution is that it allocates probability among partitions within a given number of subsets, but it does not shift probability among sets of partitions with different numbers of subsets. Our distribution places more probability on partitions that group similar items yet keeps the total probability of partitions with a given number of subsets constant. The distribution of the number of subsets (and its moments) is available in closed-form and is not a function of the similarities. Our formulation has an explicit probability mass function (with a tractable normalizing constant) so the full suite of MCMC methods may be used for posterior inference. We compare our distribution with several existing partition distributions, showing that our formulation has attractive properties. We provide three demonstrations to highlight the features and relative performance of our distribution. PMID:29276318

  5. Optimal Bayesian Adaptive Design for Test-Item Calibration.

    PubMed

    van der Linden, Wim J; Ren, Hao

    2015-06-01

    An optimal adaptive design for test-item calibration based on Bayesian optimality criteria is presented. The design adapts the choice of field-test items to the examinees taking an operational adaptive test using both the information in the posterior distributions of their ability parameters and the current posterior distributions of the field-test parameters. Different criteria of optimality based on the two types of posterior distributions are possible. The design can be implemented using an MCMC scheme with alternating stages of sampling from the posterior distributions of the test takers' ability parameters and the parameters of the field-test items while reusing samples from earlier posterior distributions of the other parameters. Results from a simulation study demonstrated the feasibility of the proposed MCMC implementation for operational item calibration. A comparison of performances for different optimality criteria showed faster calibration of substantial numbers of items for the criterion of D-optimality relative to A-optimality, a special case of c-optimality, and random assignment of items to the test takers.

  6. Joint Model and Parameter Dimension Reduction for Bayesian Inversion Applied to an Ice Sheet Flow Problem

    NASA Astrophysics Data System (ADS)

    Ghattas, O.; Petra, N.; Cui, T.; Marzouk, Y.; Benjamin, P.; Willcox, K.

    2016-12-01

    Model-based projections of the dynamics of the polar ice sheets play a central role in anticipating future sea level rise. However, a number of mathematical and computational challenges place significant barriers on improving predictability of these models. One such challenge is caused by the unknown model parameters (e.g., in the basal boundary conditions) that must be inferred from heterogeneous observational data, leading to an ill-posed inverse problem and the need to quantify uncertainties in its solution. In this talk we discuss the problem of estimating the uncertainty in the solution of (large-scale) ice sheet inverse problems within the framework of Bayesian inference. Computing the general solution of the inverse problem--i.e., the posterior probability density--is intractable with current methods on today's computers, due to the expense of solving the forward model (3D full Stokes flow with nonlinear rheology) and the high dimensionality of the uncertain parameters (which are discretizations of the basal sliding coefficient field). To overcome these twin computational challenges, it is essential to exploit problem structure (e.g., sensitivity of the data to parameters, the smoothing property of the forward model, and correlations in the prior). To this end, we present a data-informed approach that identifies low-dimensional structure in both parameter space and the forward model state space. This approach exploits the fact that the observations inform only a low-dimensional parameter space and allows us to construct a parameter-reduced posterior. Sampling this parameter-reduced posterior still requires multiple evaluations of the forward problem, therefore we also aim to identify a low dimensional state space to reduce the computational cost. To this end, we apply a proper orthogonal decomposition (POD) approach to approximate the state using a low-dimensional manifold constructed using ``snapshots'' from the parameter reduced posterior, and the discrete empirical interpolation method (DEIM) to approximate the nonlinearity in the forward problem. We show that using only a limited number of forward solves, the resulting subspaces lead to an efficient method to explore the high-dimensional posterior.

  7. Sparse-grid, reduced-basis Bayesian inversion: Nonaffine-parametric nonlinear equations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, Peng, E-mail: peng@ices.utexas.edu; Schwab, Christoph, E-mail: christoph.schwab@sam.math.ethz.ch

    2016-07-01

    We extend the reduced basis (RB) accelerated Bayesian inversion methods for affine-parametric, linear operator equations which are considered in [16,17] to non-affine, nonlinear parametric operator equations. We generalize the analysis of sparsity of parametric forward solution maps in [20] and of Bayesian inversion in [48,49] to the fully discrete setting, including Petrov–Galerkin high-fidelity (“HiFi”) discretization of the forward maps. We develop adaptive, stochastic collocation based reduction methods for the efficient computation of reduced bases on the parametric solution manifold. The nonaffinity and nonlinearity with respect to (w.r.t.) the distributed, uncertain parameters and the unknown solution is collocated; specifically, by themore » so-called Empirical Interpolation Method (EIM). For the corresponding Bayesian inversion problems, computational efficiency is enhanced in two ways: first, expectations w.r.t. the posterior are computed by adaptive quadratures with dimension-independent convergence rates proposed in [49]; the present work generalizes [49] to account for the impact of the PG discretization in the forward maps on the convergence rates of the Quantities of Interest (QoI for short). Second, we propose to perform the Bayesian estimation only w.r.t. a parsimonious, RB approximation of the posterior density. Based on the approximation results in [49], the infinite-dimensional parametric, deterministic forward map and operator admit N-term RB and EIM approximations which converge at rates which depend only on the sparsity of the parametric forward map. In several numerical experiments, the proposed algorithms exhibit dimension-independent convergence rates which equal, at least, the currently known rate estimates for N-term approximation. We propose to accelerate Bayesian estimation by first offline construction of reduced basis surrogates of the Bayesian posterior density. The parsimonious surrogates can then be employed for online data assimilation and for Bayesian estimation. They also open a perspective for optimal experimental design.« less

  8. Entropy-Bayesian Inversion of Time-Lapse Tomographic GPR data for Monitoring Dielectric Permittivity and Soil Moisture Variations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hou, Z; Terry, N; Hubbard, S S

    2013-02-12

    In this study, we evaluate the possibility of monitoring soil moisture variation using tomographic ground penetrating radar travel time data through Bayesian inversion, which is integrated with entropy memory function and pilot point concepts, as well as efficient sampling approaches. It is critical to accurately estimate soil moisture content and variations in vadose zone studies. Many studies have illustrated the promise and value of GPR tomographic data for estimating soil moisture and associated changes, however, challenges still exist in the inversion of GPR tomographic data in a manner that quantifies input and predictive uncertainty, incorporates multiple data types, handles non-uniquenessmore » and nonlinearity, and honors time-lapse tomograms collected in a series. To address these challenges, we develop a minimum relative entropy (MRE)-Bayesian based inverse modeling framework that non-subjectively defines prior probabilities, incorporates information from multiple sources, and quantifies uncertainty. The framework enables us to estimate dielectric permittivity at pilot point locations distributed within the tomogram, as well as the spatial correlation range. In the inversion framework, MRE is first used to derive prior probability distribution functions (pdfs) of dielectric permittivity based on prior information obtained from a straight-ray GPR inversion. The probability distributions are then sampled using a Quasi-Monte Carlo (QMC) approach, and the sample sets provide inputs to a sequential Gaussian simulation (SGSim) algorithm that constructs a highly resolved permittivity/velocity field for evaluation with a curved-ray GPR forward model. The likelihood functions are computed as a function of misfits, and posterior pdfs are constructed using a Gaussian kernel. Inversion of subsequent time-lapse datasets combines the Bayesian estimates from the previous inversion (as a memory function) with new data. The memory function and pilot point design takes advantage of the spatial-temporal correlation of the state variables. We first apply the inversion framework to a static synthetic example and then to a time-lapse GPR tomographic dataset collected during a dynamic experiment conducted at the Hanford Site in Richland, WA. We demonstrate that the MRE-Bayesian inversion enables us to merge various data types, quantify uncertainty, evaluate nonlinear models, and produce more detailed and better resolved estimates than straight-ray based inversion; therefore, it has the potential to improve estimates of inter-wellbore dielectric permittivity and soil moisture content and to monitor their temporal dynamics more accurately.« less

  9. Entropy-Bayesian Inversion of Time-Lapse Tomographic GPR data for Monitoring Dielectric Permittivity and Soil Moisture Variations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hou, Zhangshuan; Terry, Neil C.; Hubbard, Susan S.

    2013-02-22

    In this study, we evaluate the possibility of monitoring soil moisture variation using tomographic ground penetrating radar travel time data through Bayesian inversion, which is integrated with entropy memory function and pilot point concepts, as well as efficient sampling approaches. It is critical to accurately estimate soil moisture content and variations in vadose zone studies. Many studies have illustrated the promise and value of GPR tomographic data for estimating soil moisture and associated changes, however, challenges still exist in the inversion of GPR tomographic data in a manner that quantifies input and predictive uncertainty, incorporates multiple data types, handles non-uniquenessmore » and nonlinearity, and honors time-lapse tomograms collected in a series. To address these challenges, we develop a minimum relative entropy (MRE)-Bayesian based inverse modeling framework that non-subjectively defines prior probabilities, incorporates information from multiple sources, and quantifies uncertainty. The framework enables us to estimate dielectric permittivity at pilot point locations distributed within the tomogram, as well as the spatial correlation range. In the inversion framework, MRE is first used to derive prior probability density functions (pdfs) of dielectric permittivity based on prior information obtained from a straight-ray GPR inversion. The probability distributions are then sampled using a Quasi-Monte Carlo (QMC) approach, and the sample sets provide inputs to a sequential Gaussian simulation (SGSIM) algorithm that constructs a highly resolved permittivity/velocity field for evaluation with a curved-ray GPR forward model. The likelihood functions are computed as a function of misfits, and posterior pdfs are constructed using a Gaussian kernel. Inversion of subsequent time-lapse datasets combines the Bayesian estimates from the previous inversion (as a memory function) with new data. The memory function and pilot point design takes advantage of the spatial-temporal correlation of the state variables. We first apply the inversion framework to a static synthetic example and then to a time-lapse GPR tomographic dataset collected during a dynamic experiment conducted at the Hanford Site in Richland, WA. We demonstrate that the MRE-Bayesian inversion enables us to merge various data types, quantify uncertainty, evaluate nonlinear models, and produce more detailed and better resolved estimates than straight-ray based inversion; therefore, it has the potential to improve estimates of inter-wellbore dielectric permittivity and soil moisture content and to monitor their temporal dynamics more accurately.« less

  10. Monte Carlo Bayesian inference on a statistical model of sub-gridcolumn moisture variability using high-resolution cloud observations. Part 1: Method.

    PubMed

    Norris, Peter M; da Silva, Arlindo M

    2016-07-01

    A method is presented to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation. The gridcolumn model includes assumed probability density function (PDF) intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used in the current study are Moderate Resolution Imaging Spectroradiometer (MODIS) cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. The current study uses a skewed-triangle distribution for layer moisture. The article also includes a discussion of the Metropolis and multiple-try Metropolis versions of MCMC.

  11. Monte Carlo Bayesian Inference on a Statistical Model of Sub-Gridcolumn Moisture Variability Using High-Resolution Cloud Observations. Part 1: Method

    NASA Technical Reports Server (NTRS)

    Norris, Peter M.; Da Silva, Arlindo M.

    2016-01-01

    A method is presented to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation. The gridcolumn model includes assumed probability density function (PDF) intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used in the current study are Moderate Resolution Imaging Spectroradiometer (MODIS) cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. The current study uses a skewed-triangle distribution for layer moisture. The article also includes a discussion of the Metropolis and multiple-try Metropolis versions of MCMC.

  12. Monte Carlo Bayesian inference on a statistical model of sub-gridcolumn moisture variability using high-resolution cloud observations. Part 1: Method

    PubMed Central

    Norris, Peter M.; da Silva, Arlindo M.

    2018-01-01

    A method is presented to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation. The gridcolumn model includes assumed probability density function (PDF) intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used in the current study are Moderate Resolution Imaging Spectroradiometer (MODIS) cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. The current study uses a skewed-triangle distribution for layer moisture. The article also includes a discussion of the Metropolis and multiple-try Metropolis versions of MCMC. PMID:29618847

  13. Bayesian-network-based safety risk assessment for steel construction projects.

    PubMed

    Leu, Sou-Sen; Chang, Ching-Miao

    2013-05-01

    There are four primary accident types at steel building construction (SC) projects: falls (tumbles), object falls, object collapse, and electrocution. Several systematic safety risk assessment approaches, such as fault tree analysis (FTA) and failure mode and effect criticality analysis (FMECA), have been used to evaluate safety risks at SC projects. However, these traditional methods ineffectively address dependencies among safety factors at various levels that fail to provide early warnings to prevent occupational accidents. To overcome the limitations of traditional approaches, this study addresses the development of a safety risk-assessment model for SC projects by establishing the Bayesian networks (BN) based on fault tree (FT) transformation. The BN-based safety risk-assessment model was validated against the safety inspection records of six SC building projects and nine projects in which site accidents occurred. The ranks of posterior probabilities from the BN model were highly consistent with the accidents that occurred at each project site. The model accurately provides site safety-management abilities by calculating the probabilities of safety risks and further analyzing the causes of accidents based on their relationships in BNs. In practice, based on the analysis of accident risks and significant safety factors, proper preventive safety management strategies can be established to reduce the occurrence of accidents on SC sites. Copyright © 2013 Elsevier Ltd. All rights reserved.

  14. A flexible Bayesian hierarchical model of preterm birth risk among US Hispanic subgroups in relation to maternal nativity and education

    PubMed Central

    2011-01-01

    Background Previous research has documented heterogeneity in the effects of maternal education on adverse birth outcomes by nativity and Hispanic subgroup in the United States. In this article, we considered the risk of preterm birth (PTB) using 9 years of vital statistics birth data from New York City. We employed finer categorizations of exposure than used previously and estimated the risk dose-response across the range of education by nativity and ethnicity. Methods Using Bayesian random effects logistic regression models with restricted quadratic spline terms for years of completed maternal education, we calculated and plotted the estimated posterior probabilities of PTB (gestational age < 37 weeks) for each year of education by ethnic and nativity subgroups adjusted for only maternal age, as well as with more extensive covariate adjustments. We then estimated the posterior risk difference between native and foreign born mothers by ethnicity over the continuous range of education exposures. Results The risk of PTB varied substantially by education, nativity and ethnicity. Native born groups showed higher absolute risk of PTB and declining risk associated with higher levels of education beyond about 10 years, as did foreign-born Puerto Ricans. For most other foreign born groups, however, risk of PTB was flatter across the education range. For Mexicans, Central Americans, Dominicans, South Americans and "Others", the protective effect of foreign birth diminished progressively across the educational range. Only for Puerto Ricans was there no nativity advantage for the foreign born, although small numbers of foreign born Cubans limited precision of estimates for that group. Conclusions Using flexible Bayesian regression models with random effects allowed us to estimate absolute risks without strong modeling assumptions. Risk comparisons for any sub-groups at any exposure level were simple to calculate. Shrinkage of posterior estimates through the use of random effects allowed for finer categorization of exposures without restricting joint effects to follow a fixed parametric scale. Although foreign born Hispanic women with the least education appeared to generally have low risk, this seems likely to be a marker for unmeasured environmental and behavioral factors, rather than a causally protective effect of low education itself. PMID:21504612

  15. Information-Based Analysis of Data Assimilation (Invited)

    NASA Astrophysics Data System (ADS)

    Nearing, G. S.; Gupta, H. V.; Crow, W. T.; Gong, W.

    2013-12-01

    Data assimilation is defined as the Bayesian conditioning of uncertain model simulations on observations for the purpose of reducing uncertainty about model states. Practical data assimilation methods make the application of Bayes' law tractable either by employing assumptions about the prior, posterior and likelihood distributions (e.g., the Kalman family of filters) or by using resampling methods (e.g., bootstrap filter). We propose to quantify the efficiency of these approximations in an OSSE setting using information theory and, in an OSSE or real-world validation setting, to measure the amount - and more importantly, the quality - of information extracted from observations during data assimilation. To analyze DA assumptions, uncertainty is quantified as the Shannon-type entropy of a discretized probability distribution. The maximum amount of information that can be extracted from observations about model states is the mutual information between states and observations, which is equal to the reduction in entropy in our estimate of the state due to Bayesian filtering. The difference between this potential and the actual reduction in entropy due to Kalman (or other type of) filtering measures the inefficiency of the filter assumptions. Residual uncertainty in DA posterior state estimates can be attributed to three sources: (i) non-injectivity of the observation operator, (ii) noise in the observations, and (iii) filter approximations. The contribution of each of these sources is measurable in an OSSE setting. The amount of information extracted from observations by data assimilation (or system identification, including parameter estimation) can also be measured by Shannon's theory. Since practical filters are approximations of Bayes' law, it is important to know whether the information that is extracted form observations by a filter is reliable. We define information as either good or bad, and propose to measure these two types of information using partial Kullback-Leibler divergences. Defined this way, good and bad information sum to total information. This segregation of information into good and bad components requires a validation target distribution; in a DA OSSE setting, this can be the true Bayesian posterior, but in a real-world setting the validation target might be determined by a set of in situ observations.

  16. A flexible Bayesian hierarchical model of preterm birth risk among US Hispanic subgroups in relation to maternal nativity and education.

    PubMed

    Kaufman, Jay S; MacLehose, Richard F; Torrone, Elizabeth A; Savitz, David A

    2011-04-19

    Previous research has documented heterogeneity in the effects of maternal education on adverse birth outcomes by nativity and Hispanic subgroup in the United States. In this article, we considered the risk of preterm birth (PTB) using 9 years of vital statistics birth data from New York City. We employed finer categorizations of exposure than used previously and estimated the risk dose-response across the range of education by nativity and ethnicity. Using Bayesian random effects logistic regression models with restricted quadratic spline terms for years of completed maternal education, we calculated and plotted the estimated posterior probabilities of PTB (gestational age < 37 weeks) for each year of education by ethnic and nativity subgroups adjusted for only maternal age, as well as with more extensive covariate adjustments. We then estimated the posterior risk difference between native and foreign born mothers by ethnicity over the continuous range of education exposures. The risk of PTB varied substantially by education, nativity and ethnicity. Native born groups showed higher absolute risk of PTB and declining risk associated with higher levels of education beyond about 10 years, as did foreign-born Puerto Ricans. For most other foreign born groups, however, risk of PTB was flatter across the education range. For Mexicans, Central Americans, Dominicans, South Americans and "Others", the protective effect of foreign birth diminished progressively across the educational range. Only for Puerto Ricans was there no nativity advantage for the foreign born, although small numbers of foreign born Cubans limited precision of estimates for that group. Using flexible Bayesian regression models with random effects allowed us to estimate absolute risks without strong modeling assumptions. Risk comparisons for any sub-groups at any exposure level were simple to calculate. Shrinkage of posterior estimates through the use of random effects allowed for finer categorization of exposures without restricting joint effects to follow a fixed parametric scale. Although foreign born Hispanic women with the least education appeared to generally have low risk, this seems likely to be a marker for unmeasured environmental and behavioral factors, rather than a causally protective effect of low education itself.

  17. Estimation of post-test probabilities by residents: Bayesian reasoning versus heuristics?

    PubMed

    Hall, Stacey; Phang, Sen Han; Schaefer, Jeffrey P; Ghali, William; Wright, Bruce; McLaughlin, Kevin

    2014-08-01

    Although the process of diagnosing invariably begins with a heuristic, we encourage our learners to support their diagnoses by analytical cognitive processes, such as Bayesian reasoning, in an attempt to mitigate the effects of heuristics on diagnosing. There are, however, limited data on the use ± impact of Bayesian reasoning on the accuracy of disease probability estimates. In this study our objective was to explore whether Internal Medicine residents use a Bayesian process to estimate disease probabilities by comparing their disease probability estimates to literature-derived Bayesian post-test probabilities. We gave 35 Internal Medicine residents four clinical vignettes in the form of a referral letter and asked them to estimate the post-test probability of the target condition in each case. We then compared these to literature-derived probabilities. For each vignette the estimated probability was significantly different from the literature-derived probability. For the two cases with low literature-derived probability our participants significantly overestimated the probability of these target conditions being the correct diagnosis, whereas for the two cases with high literature-derived probability the estimated probability was significantly lower than the calculated value. Our results suggest that residents generate inaccurate post-test probability estimates. Possible explanations for this include ineffective application of Bayesian reasoning, attribute substitution whereby a complex cognitive task is replaced by an easier one (e.g., a heuristic), or systematic rater bias, such as central tendency bias. Further studies are needed to identify the reasons for inaccuracy of disease probability estimates and to explore ways of improving accuracy.

  18. Assessment of phylogenetic sensitivity for reconstructing HIV-1 epidemiological relationships.

    PubMed

    Beloukas, Apostolos; Magiorkinis, Emmanouil; Magiorkinis, Gkikas; Zavitsanou, Asimina; Karamitros, Timokratis; Hatzakis, Angelos; Paraskevis, Dimitrios

    2012-06-01

    Phylogenetic analysis has been extensively used as a tool for the reconstruction of epidemiological relations for research or for forensic purposes. It was our objective to assess the sensitivity of different phylogenetic methods and various phylogenetic programs to reconstruct epidemiological links among HIV-1 infected patients that is the probability to reveal a true transmission relationship. Multiple datasets (90) were prepared consisting of HIV-1 sequences in protease (PR) and partial reverse transcriptase (RT) sampled from patients with documented epidemiological relationship (target population), and from unrelated individuals (control population) belonging to the same HIV-1 subtype as the target population. Each dataset varied regarding the number, the geographic origin and the transmission risk groups of the sequences among the control population. Phylogenetic trees were inferred by neighbor-joining (NJ), maximum likelihood heuristics (hML) and Bayesian methods. All clusters of sequences belonging to the target population were correctly reconstructed by NJ and Bayesian methods receiving high bootstrap and posterior probability (PP) support, respectively. On the other hand, TreePuzzle failed to reconstruct or provide significant support for several clusters; high puzzling step support was associated with the inclusion of control sequences from the same geographic area as the target population. In contrary, all clusters were correctly reconstructed by hML as implemented in PhyML 3.0 receiving high bootstrap support. We report that under the conditions of our study, hML using PhyML, NJ and Bayesian methods were the most sensitive for the reconstruction of epidemiological links mostly from sexually infected individuals. Copyright © 2012 Elsevier B.V. All rights reserved.

  19. A computer program for uncertainty analysis integrating regression and Bayesian methods

    USGS Publications Warehouse

    Lu, Dan; Ye, Ming; Hill, Mary C.; Poeter, Eileen P.; Curtis, Gary

    2014-01-01

    This work develops a new functionality in UCODE_2014 to evaluate Bayesian credible intervals using the Markov Chain Monte Carlo (MCMC) method. The MCMC capability in UCODE_2014 is based on the FORTRAN version of the differential evolution adaptive Metropolis (DREAM) algorithm of Vrugt et al. (2009), which estimates the posterior probability density function of model parameters in high-dimensional and multimodal sampling problems. The UCODE MCMC capability provides eleven prior probability distributions and three ways to initialize the sampling process. It evaluates parametric and predictive uncertainties and it has parallel computing capability based on multiple chains to accelerate the sampling process. This paper tests and demonstrates the MCMC capability using a 10-dimensional multimodal mathematical function, a 100-dimensional Gaussian function, and a groundwater reactive transport model. The use of the MCMC capability is made straightforward and flexible by adopting the JUPITER API protocol. With the new MCMC capability, UCODE_2014 can be used to calculate three types of uncertainty intervals, which all can account for prior information: (1) linear confidence intervals which require linearity and Gaussian error assumptions and typically 10s–100s of highly parallelizable model runs after optimization, (2) nonlinear confidence intervals which require a smooth objective function surface and Gaussian observation error assumptions and typically 100s–1,000s of partially parallelizable model runs after optimization, and (3) MCMC Bayesian credible intervals which require few assumptions and commonly 10,000s–100,000s or more partially parallelizable model runs. Ready access allows users to select methods best suited to their work, and to compare methods in many circumstances.

  20. [Determination of wine original regions using information fusion of NIR and MIR spectroscopy].

    PubMed

    Xiang, Ling-Li; Li, Meng-Hua; Li, Jing-Mingz; Li, Jun-Hui; Zhang, Lu-Da; Zhao, Long-Lian

    2014-10-01

    Geographical origins of wine grapes are significant factors affecting wine quality and wine prices. Tasters' evaluation is a good method but has some limitations. It is important to discriminate different wine original regions quickly and accurately. The present paper proposed a method to determine wine original regions based on Bayesian information fusion that fused near-infrared (NIR) transmission spectra information and mid-infrared (MIR) ATR spectra information of wines. This method improved the determination results by expanding the sources of analysis information. NIR spectra and MIR spectra of 153 wine samples from four different regions of grape growing were collected by near-infrared and mid-infrared Fourier transform spe trometer separately. These four different regions are Huailai, Yantai, Gansu and Changli, which areall typical geographical originals for Chinese wines. NIR and MIR discriminant models for wine regions were established using partial least squares discriminant analysis (PLS-DA) based on NIR spectra and MIR spectra separately. In PLS-DA, the regions of wine samples are presented in group of binary code. There are four wine regions in this paper, thereby using four nodes standing for categorical variables. The output nodes values for each sample in NIR and MIR models were normalized first. These values stand for the probabilities of each sample belonging to each category. They seemed as the input to the Bayesian discriminant formula as a priori probability value. The probabilities were substituteed into the Bayesian formula to get posterior probabilities, by which we can judge the new class characteristics of these samples. Considering the stability of PLS-DA models, all the wine samples were divided into calibration sets and validation sets randomly for ten times. The results of NIR and MIR discriminant models of four wine regions were as follows: the average accuracy rates of calibration sets were 78.21% (NIR) and 82.57% (MIR), and the average accuracy rates of validation sets were 82.50% (NIR) and 81.98% (MIR). After using the method proposed in this paper, the accuracy rates of calibration and validation changed to 87.11% and 90.87% separately, which all achieved better results of determination than individual spectroscopy. These results suggest that Bayesian information fusion of NIR and MIR spectra is feasible for fast identification of wine original regions.

  1. Dynamic Bayesian wavelet transform: New methodology for extraction of repetitive transients

    NASA Astrophysics Data System (ADS)

    Wang, Dong; Tsui, Kwok-Leung

    2017-05-01

    Thanks to some recent research works, dynamic Bayesian wavelet transform as new methodology for extraction of repetitive transients is proposed in this short communication to reveal fault signatures hidden in rotating machine. The main idea of the dynamic Bayesian wavelet transform is to iteratively estimate posterior parameters of wavelet transform via artificial observations and dynamic Bayesian inference. First, a prior wavelet parameter distribution can be established by one of many fast detection algorithms, such as the fast kurtogram, the improved kurtogram, the enhanced kurtogram, the sparsogram, the infogram, continuous wavelet transform, discrete wavelet transform, wavelet packets, multiwavelets, empirical wavelet transform, empirical mode decomposition, local mean decomposition, etc.. Second, artificial observations can be constructed based on one of many metrics, such as kurtosis, the sparsity measurement, entropy, approximate entropy, the smoothness index, a synthesized criterion, etc., which are able to quantify repetitive transients. Finally, given artificial observations, the prior wavelet parameter distribution can be posteriorly updated over iterations by using dynamic Bayesian inference. More importantly, the proposed new methodology can be extended to establish the optimal parameters required by many other signal processing methods for extraction of repetitive transients.

  2. Bayesian posterior distributions without Markov chains.

    PubMed

    Cole, Stephen R; Chu, Haitao; Greenland, Sander; Hamra, Ghassan; Richardson, David B

    2012-03-01

    Bayesian posterior parameter distributions are often simulated using Markov chain Monte Carlo (MCMC) methods. However, MCMC methods are not always necessary and do not help the uninitiated understand Bayesian inference. As a bridge to understanding Bayesian inference, the authors illustrate a transparent rejection sampling method. In example 1, they illustrate rejection sampling using 36 cases and 198 controls from a case-control study (1976-1983) assessing the relation between residential exposure to magnetic fields and the development of childhood cancer. Results from rejection sampling (odds ratio (OR) = 1.69, 95% posterior interval (PI): 0.57, 5.00) were similar to MCMC results (OR = 1.69, 95% PI: 0.58, 4.95) and approximations from data-augmentation priors (OR = 1.74, 95% PI: 0.60, 5.06). In example 2, the authors apply rejection sampling to a cohort study of 315 human immunodeficiency virus seroconverters (1984-1998) to assess the relation between viral load after infection and 5-year incidence of acquired immunodeficiency syndrome, adjusting for (continuous) age at seroconversion and race. In this more complex example, rejection sampling required a notably longer run time than MCMC sampling but remained feasible and again yielded similar results. The transparency of the proposed approach comes at a price of being less broadly applicable than MCMC.

  3. Objectified quantification of uncertainties in Bayesian atmospheric inversions

    NASA Astrophysics Data System (ADS)

    Berchet, A.; Pison, I.; Chevallier, F.; Bousquet, P.; Bonne, J.-L.; Paris, J.-D.

    2015-05-01

    Classical Bayesian atmospheric inversions process atmospheric observations and prior emissions, the two being connected by an observation operator picturing mainly the atmospheric transport. These inversions rely on prescribed errors in the observations, the prior emissions and the observation operator. When data pieces are sparse, inversion results are very sensitive to the prescribed error distributions, which are not accurately known. The classical Bayesian framework experiences difficulties in quantifying the impact of mis-specified error distributions on the optimized fluxes. In order to cope with this issue, we rely on recent research results to enhance the classical Bayesian inversion framework through a marginalization on a large set of plausible errors that can be prescribed in the system. The marginalization consists in computing inversions for all possible error distributions weighted by the probability of occurrence of the error distributions. The posterior distribution of the fluxes calculated by the marginalization is not explicitly describable. As a consequence, we carry out a Monte Carlo sampling based on an approximation of the probability of occurrence of the error distributions. This approximation is deduced from the well-tested method of the maximum likelihood estimation. Thus, the marginalized inversion relies on an automatic objectified diagnosis of the error statistics, without any prior knowledge about the matrices. It robustly accounts for the uncertainties on the error distributions, contrary to what is classically done with frozen expert-knowledge error statistics. Some expert knowledge is still used in the method for the choice of an emission aggregation pattern and of a sampling protocol in order to reduce the computation cost. The relevance and the robustness of the method is tested on a case study: the inversion of methane surface fluxes at the mesoscale with virtual observations on a realistic network in Eurasia. Observing system simulation experiments are carried out with different transport patterns, flux distributions and total prior amounts of emitted methane. The method proves to consistently reproduce the known "truth" in most cases, with satisfactory tolerance intervals. Additionally, the method explicitly provides influence scores and posterior correlation matrices. An in-depth interpretation of the inversion results is then possible. The more objective quantification of the influence of the observations on the fluxes proposed here allows us to evaluate the impact of the observation network on the characterization of the surface fluxes. The explicit correlations between emission aggregates reveal the mis-separated regions, hence the typical temporal and spatial scales the inversion can analyse. These scales are consistent with the chosen aggregation patterns.

  4. An approach to quantifying the efficiency of a Bayesian filter

    USDA-ARS?s Scientific Manuscript database

    Data assimilation is defined as the Bayesian conditioning of uncertain model simulations on observations for the purpose of reducing uncertainty about model states. Practical data assimilation applications require that simplifying assumptions be made about the prior and posterior state distributions...

  5. INFERRING THE ECCENTRICITY DISTRIBUTION

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hogg, David W.; Bovy, Jo; Myers, Adam D., E-mail: david.hogg@nyu.ed

    2010-12-20

    Standard maximum-likelihood estimators for binary-star and exoplanet eccentricities are biased high, in the sense that the estimated eccentricity tends to be larger than the true eccentricity. As with most non-trivial observables, a simple histogram of estimated eccentricities is not a good estimate of the true eccentricity distribution. Here, we develop and test a hierarchical probabilistic method for performing the relevant meta-analysis, that is, inferring the true eccentricity distribution, taking as input the likelihood functions for the individual star eccentricities, or samplings of the posterior probability distributions for the eccentricities (under a given, uninformative prior). The method is a simple implementationmore » of a hierarchical Bayesian model; it can also be seen as a kind of heteroscedastic deconvolution. It can be applied to any quantity measured with finite precision-other orbital parameters, or indeed any astronomical measurements of any kind, including magnitudes, distances, or photometric redshifts-so long as the measurements have been communicated as a likelihood function or a posterior sampling.« less

  6. A comparison of two worlds: How does Bayes hold up to the status quo for the analysis of clinical trials?

    PubMed

    Pressman, Alice R; Avins, Andrew L; Hubbard, Alan; Satariano, William A

    2011-07-01

    There is a paucity of literature comparing Bayesian analytic techniques with traditional approaches for analyzing clinical trials using real trial data. We compared Bayesian and frequentist group sequential methods using data from two published clinical trials. We chose two widely accepted frequentist rules, O'Brien-Fleming and Lan-DeMets, and conjugate Bayesian priors. Using the nonparametric bootstrap, we estimated a sampling distribution of stopping times for each method. Because current practice dictates the preservation of an experiment-wise false positive rate (Type I error), we approximated these error rates for our Bayesian and frequentist analyses with the posterior probability of detecting an effect in a simulated null sample. Thus for the data-generated distribution represented by these trials, we were able to compare the relative performance of these techniques. No final outcomes differed from those of the original trials. However, the timing of trial termination differed substantially by method and varied by trial. For one trial, group sequential designs of either type dictated early stopping of the study. In the other, stopping times were dependent upon the choice of spending function and prior distribution. Results indicate that trialists ought to consider Bayesian methods in addition to traditional approaches for analysis of clinical trials. Though findings from this small sample did not demonstrate either method to consistently outperform the other, they did suggest the need to replicate these comparisons using data from varied clinical trials in order to determine the conditions under which the different methods would be most efficient. Copyright © 2011 Elsevier Inc. All rights reserved.

  7. A comparison of two worlds: How does Bayes hold up to the status quo for the analysis of clinical trials?

    PubMed Central

    Pressman, Alice R.; Avins, Andrew L.; Hubbard, Alan; Satariano, William A.

    2014-01-01

    Background There is a paucity of literature comparing Bayesian analytic techniques with traditional approaches for analyzing clinical trials using real trial data. Methods We compared Bayesian and frequentist group sequential methods using data from two published clinical trials. We chose two widely accepted frequentist rules, O'Brien–Fleming and Lan–DeMets, and conjugate Bayesian priors. Using the nonparametric bootstrap, we estimated a sampling distribution of stopping times for each method. Because current practice dictates the preservation of an experiment-wise false positive rate (Type I error), we approximated these error rates for our Bayesian and frequentist analyses with the posterior probability of detecting an effect in a simulated null sample. Thus for the data-generated distribution represented by these trials, we were able to compare the relative performance of these techniques. Results No final outcomes differed from those of the original trials. However, the timing of trial termination differed substantially by method and varied by trial. For one trial, group sequential designs of either type dictated early stopping of the study. In the other, stopping times were dependent upon the choice of spending function and prior distribution. Conclusions Results indicate that trialists ought to consider Bayesian methods in addition to traditional approaches for analysis of clinical trials. Though findings from this small sample did not demonstrate either method to consistently outperform the other, they did suggest the need to replicate these comparisons using data from varied clinical trials in order to determine the conditions under which the different methods would be most efficient. PMID:21453792

  8. a Novel Discrete Optimal Transport Method for Bayesian Inverse Problems

    NASA Astrophysics Data System (ADS)

    Bui-Thanh, T.; Myers, A.; Wang, K.; Thiery, A.

    2017-12-01

    We present the Augmented Ensemble Transform (AET) method for generating approximate samples from a high-dimensional posterior distribution as a solution to Bayesian inverse problems. Solving large-scale inverse problems is critical for some of the most relevant and impactful scientific endeavors of our time. Therefore, constructing novel methods for solving the Bayesian inverse problem in more computationally efficient ways can have a profound impact on the science community. This research derives the novel AET method for exploring a posterior by solving a sequence of linear programming problems, resulting in a series of transport maps which map prior samples to posterior samples, allowing for the computation of moments of the posterior. We show both theoretical and numerical results, indicating this method can offer superior computational efficiency when compared to other SMC methods. Most of this efficiency is derived from matrix scaling methods to solve the linear programming problem and derivative-free optimization for particle movement. We use this method to determine inter-well connectivity in a reservoir and the associated uncertainty related to certain parameters. The attached file shows the difference between the true parameter and the AET parameter in an example 3D reservoir problem. The error is within the Morozov discrepancy allowance with lower computational cost than other particle methods.

  9. Feasibility study of direct spectra measurements for Thomson scattered signals for KSTAR fusion-grade plasmas

    NASA Astrophysics Data System (ADS)

    Park, K.-R.; Kim, K.-h.; Kwak, S.; Svensson, J.; Lee, J.; Ghim, Y.-c.

    2017-11-01

    Feasibility study of direct spectra measurements of Thomson scattered photons for fusion-grade plasmas is performed based on a forward model of the KSTAR Thomson scattering system. Expected spectra in the forward model are calculated based on Selden function including the relativistic polarization correction. Noise in the signal is modeled with photon noise and Gaussian electrical noise. Electron temperature and density are inferred using Bayesian probability theory. Based on bias error, full width at half maximum and entropy of posterior distributions, spectral measurements are found to be feasible. Comparisons between spectrometer-based and polychromator-based Thomson scattering systems are performed with varying quantum efficiency and electrical noise levels.

  10. Bayesian generalized least squares regression with application to log Pearson type 3 regional skew estimation

    NASA Astrophysics Data System (ADS)

    Reis, D. S.; Stedinger, J. R.; Martins, E. S.

    2005-10-01

    This paper develops a Bayesian approach to analysis of a generalized least squares (GLS) regression model for regional analyses of hydrologic data. The new approach allows computation of the posterior distributions of the parameters and the model error variance using a quasi-analytic approach. Two regional skew estimation studies illustrate the value of the Bayesian GLS approach for regional statistical analysis of a shape parameter and demonstrate that regional skew models can be relatively precise with effective record lengths in excess of 60 years. With Bayesian GLS the marginal posterior distribution of the model error variance and the corresponding mean and variance of the parameters can be computed directly, thereby providing a simple but important extension of the regional GLS regression procedures popularized by Tasker and Stedinger (1989), which is sensitive to the likely values of the model error variance when it is small relative to the sampling error in the at-site estimator.

  11. The Efficacy of Consensus Tree Methods for Summarizing Phylogenetic Relationships from a Posterior Sample of Trees Estimated from Morphological Data.

    PubMed

    O'Reilly, Joseph E; Donoghue, Philip C J

    2018-03-01

    Consensus trees are required to summarize trees obtained through MCMC sampling of a posterior distribution, providing an overview of the distribution of estimated parameters such as topology, branch lengths, and divergence times. Numerous consensus tree construction methods are available, each presenting a different interpretation of the tree sample. The rise of morphological clock and sampled-ancestor methods of divergence time estimation, in which times and topology are coestimated, has increased the popularity of the maximum clade credibility (MCC) consensus tree method. The MCC method assumes that the sampled, fully resolved topology with the highest clade credibility is an adequate summary of the most probable clades, with parameter estimates from compatible sampled trees used to obtain the marginal distributions of parameters such as clade ages and branch lengths. Using both simulated and empirical data, we demonstrate that MCC trees, and trees constructed using the similar maximum a posteriori (MAP) method, often include poorly supported and incorrect clades when summarizing diffuse posterior samples of trees. We demonstrate that the paucity of information in morphological data sets contributes to the inability of MCC and MAP trees to accurately summarise of the posterior distribution. Conversely, majority-rule consensus (MRC) trees represent a lower proportion of incorrect nodes when summarizing the same posterior samples of trees. Thus, we advocate the use of MRC trees, in place of MCC or MAP trees, in attempts to summarize the results of Bayesian phylogenetic analyses of morphological data.

  12. The Efficacy of Consensus Tree Methods for Summarizing Phylogenetic Relationships from a Posterior Sample of Trees Estimated from Morphological Data

    PubMed Central

    O’Reilly, Joseph E; Donoghue, Philip C J

    2018-01-01

    Abstract Consensus trees are required to summarize trees obtained through MCMC sampling of a posterior distribution, providing an overview of the distribution of estimated parameters such as topology, branch lengths, and divergence times. Numerous consensus tree construction methods are available, each presenting a different interpretation of the tree sample. The rise of morphological clock and sampled-ancestor methods of divergence time estimation, in which times and topology are coestimated, has increased the popularity of the maximum clade credibility (MCC) consensus tree method. The MCC method assumes that the sampled, fully resolved topology with the highest clade credibility is an adequate summary of the most probable clades, with parameter estimates from compatible sampled trees used to obtain the marginal distributions of parameters such as clade ages and branch lengths. Using both simulated and empirical data, we demonstrate that MCC trees, and trees constructed using the similar maximum a posteriori (MAP) method, often include poorly supported and incorrect clades when summarizing diffuse posterior samples of trees. We demonstrate that the paucity of information in morphological data sets contributes to the inability of MCC and MAP trees to accurately summarise of the posterior distribution. Conversely, majority-rule consensus (MRC) trees represent a lower proportion of incorrect nodes when summarizing the same posterior samples of trees. Thus, we advocate the use of MRC trees, in place of MCC or MAP trees, in attempts to summarize the results of Bayesian phylogenetic analyses of morphological data. PMID:29106675

  13. Understanding seasonal variability of uncertainty in hydrological prediction

    NASA Astrophysics Data System (ADS)

    Li, M.; Wang, Q. J.

    2012-04-01

    Understanding uncertainty in hydrological prediction can be highly valuable for improving the reliability of streamflow prediction. In this study, a monthly water balance model, WAPABA, in a Bayesian joint probability with error models are presented to investigate the seasonal dependency of prediction error structure. A seasonal invariant error model, analogous to traditional time series analysis, uses constant parameters for model error and account for no seasonal variations. In contrast, a seasonal variant error model uses a different set of parameters for bias, variance and autocorrelation for each individual calendar month. Potential connection amongst model parameters from similar months is not considered within the seasonal variant model and could result in over-fitting and over-parameterization. A hierarchical error model further applies some distributional restrictions on model parameters within a Bayesian hierarchical framework. An iterative algorithm is implemented to expedite the maximum a posterior (MAP) estimation of a hierarchical error model. Three error models are applied to forecasting streamflow at a catchment in southeast Australia in a cross-validation analysis. This study also presents a number of statistical measures and graphical tools to compare the predictive skills of different error models. From probability integral transform histograms and other diagnostic graphs, the hierarchical error model conforms better to reliability when compared to the seasonal invariant error model. The hierarchical error model also generally provides the most accurate mean prediction in terms of the Nash-Sutcliffe model efficiency coefficient and the best probabilistic prediction in terms of the continuous ranked probability score (CRPS). The model parameters of the seasonal variant error model are very sensitive to each cross validation, while the hierarchical error model produces much more robust and reliable model parameters. Furthermore, the result of the hierarchical error model shows that most of model parameters are not seasonal variant except for error bias. The seasonal variant error model is likely to use more parameters than necessary to maximize the posterior likelihood. The model flexibility and robustness indicates that the hierarchical error model has great potential for future streamflow predictions.

  14. Variable selection models for genomic selection using whole-genome sequence data and singular value decomposition.

    PubMed

    Meuwissen, Theo H E; Indahl, Ulf G; Ødegård, Jørgen

    2017-12-27

    Non-linear Bayesian genomic prediction models such as BayesA/B/C/R involve iteration and mostly Markov chain Monte Carlo (MCMC) algorithms, which are computationally expensive, especially when whole-genome sequence (WGS) data are analyzed. Singular value decomposition (SVD) of the genotype matrix can facilitate genomic prediction in large datasets, and can be used to estimate marker effects and their prediction error variances (PEV) in a computationally efficient manner. Here, we developed, implemented, and evaluated a direct, non-iterative method for the estimation of marker effects for the BayesC genomic prediction model. The BayesC model assumes a priori that markers have normally distributed effects with probability [Formula: see text] and no effect with probability (1 - [Formula: see text]). Marker effects and their PEV are estimated by using SVD and the posterior probability of the marker having a non-zero effect is calculated. These posterior probabilities are used to obtain marker-specific effect variances, which are subsequently used to approximate BayesC estimates of marker effects in a linear model. A computer simulation study was conducted to compare alternative genomic prediction methods, where a single reference generation was used to estimate marker effects, which were subsequently used for 10 generations of forward prediction, for which accuracies were evaluated. SVD-based posterior probabilities of markers having non-zero effects were generally lower than MCMC-based posterior probabilities, but for some regions the opposite occurred, resulting in clear signals for QTL-rich regions. The accuracies of breeding values estimated using SVD- and MCMC-based BayesC analyses were similar across the 10 generations of forward prediction. For an intermediate number of generations (2 to 5) of forward prediction, accuracies obtained with the BayesC model tended to be slightly higher than accuracies obtained using the best linear unbiased prediction of SNP effects (SNP-BLUP model). When reducing marker density from WGS data to 30 K, SNP-BLUP tended to yield the highest accuracies, at least in the short term. Based on SVD of the genotype matrix, we developed a direct method for the calculation of BayesC estimates of marker effects. Although SVD- and MCMC-based marker effects differed slightly, their prediction accuracies were similar. Assuming that the SVD of the marker genotype matrix is already performed for other reasons (e.g. for SNP-BLUP), computation times for the BayesC predictions were comparable to those of SNP-BLUP.

  15. Uncertainty Quantification of Hypothesis Testing for the Integrated Knowledge Engine

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cuellar, Leticia

    2012-05-31

    The Integrated Knowledge Engine (IKE) is a tool of Bayesian analysis, based on Bayesian Belief Networks or Bayesian networks for short. A Bayesian network is a graphical model (directed acyclic graph) that allows representing the probabilistic structure of many variables assuming a localized type of dependency called the Markov property. The Markov property in this instance makes any node or random variable to be independent of any non-descendant node given information about its parent. A direct consequence of this property is that it is relatively easy to incorporate new evidence and derive the appropriate consequences, which in general is notmore » an easy or feasible task. Typically we use Bayesian networks as predictive models for a small subset of the variables, either the leave nodes or the root nodes. In IKE, since most applications deal with diagnostics, we are interested in predicting the likelihood of the root nodes given new observations on any of the children nodes. The root nodes represent the various possible outcomes of the analysis, and an important problem is to determine when we have gathered enough evidence to lean toward one of these particular outcomes. This document presents criteria to decide when the evidence gathered is sufficient to draw a particular conclusion or decide in favor of a particular outcome by quantifying the uncertainty in the conclusions that are drawn from the data. The material in this document is organized as follows: Section 2 presents briefly a forensics Bayesian network, and we explore evaluating the information provided by new evidence by looking first at the posterior distribution of the nodes of interest, and then at the corresponding posterior odds ratios. Section 3 presents a third alternative: Bayes Factors. In section 4 we finalize by showing the relation between the posterior odds ratios and Bayes factors and showing examples these cases, and in section 5 we conclude by providing clear guidelines of how to use these for the type of Bayesian networks used in IKE.« less

  16. The pharmacokinetics of dexmedetomidine during long-term infusion in critically ill pediatric patients. A Bayesian approach with informative priors.

    PubMed

    Wiczling, Paweł; Bartkowska-Śniatkowska, Alicja; Szerkus, Oliwia; Siluk, Danuta; Rosada-Kurasińska, Jowita; Warzybok, Justyna; Borsuk, Agnieszka; Kaliszan, Roman; Grześkowiak, Edmund; Bienert, Agnieszka

    2016-06-01

    The purpose of this study was to assess the pharmacokinetics of dexmedetomidine in the ICU settings during the prolonged infusion and to compare it with the existing literature data using the Bayesian population modeling with literature-based informative priors. Thirty-eight patients were included in the analysis with concentration measurements obtained at two occasions: first from 0 to 24 h after infusion initiation and second from 0 to 8 h after infusion end. Data analysis was conducted using WinBUGS software. The prior information on dexmedetomidine pharmacokinetics was elicited from the literature study pooling results from a relatively large group of 95 children. A two compartment PK model, with allometrically scaled parameters, maturation of clearance and t-student residual distribution on a log-scale was used to describe the data. The incorporation of time-dependent (different between two occasions) PK parameters improved the model. It was observed that volume of distribution is 1.5-fold higher during the second occasion. There was also an evidence of increased (1.3-fold) clearance for the second occasion with posterior probability equal to 62 %. This work demonstrated the usefulness of Bayesian modeling with informative priors in analyzing pharmacokinetic data and comparing it with existing literature knowledge.

  17. Bayesian evidence for non-zero θ 13 and CP-violation in neutrino oscillations

    NASA Astrophysics Data System (ADS)

    Bergström, Johannes

    2012-08-01

    We present the Bayesian method for evaluating the evidence for a non-zero value of the leptonic mixing angle θ 13 and CP-violation in neutrino oscillation experiments. This is an application of the well-established method of Bayesian model selection, of which we give a concise and pedagogical overview. When comparing the hypothesis θ 13 = 0 with hypotheses where θ 13 > 0 using global data but excluding the recent reactor measurements, we obtain only a weak preference for a non-zero θ 13, even though the significance is over 3 σ. We then add the reactor measurements one by one and show how the evidence for θ 13 > 0 quickly increases. When including the D ouble C hooz, D aya B ay, and RENO data, the evidence becomes overwhelming with a posterior probability of the hypothesis θ 13 = 0 below 10-11. Owing to the small amount of information on the CP-phase δ, very similar evidences are obtained for the CP-conserving and CP-violating hypotheses. Hence, there is, not unexpectedly, neither evidence for nor against leptonic CP-violation. However, when future experiments aiming to search for CP-violation have started taking data, this question will be of great importance and the method described here can be used as an important complement to standard analyses.

  18. Quantifying temporal trends in fisheries abundance using Bayesian dynamic linear models: A case study of riverine Smallmouth Bass populations

    USGS Publications Warehouse

    Schall, Megan K.; Blazer, Vicki S.; Lorantas, Robert M.; Smith, Geoffrey; Mullican, John E.; Keplinger, Brandon J.; Wagner, Tyler

    2018-01-01

    Detecting temporal changes in fish abundance is an essential component of fisheries management. Because of the need to understand short‐term and nonlinear changes in fish abundance, traditional linear models may not provide adequate information for management decisions. This study highlights the utility of Bayesian dynamic linear models (DLMs) as a tool for quantifying temporal dynamics in fish abundance. To achieve this goal, we quantified temporal trends of Smallmouth Bass Micropterus dolomieu catch per effort (CPE) from rivers in the mid‐Atlantic states, and we calculated annual probabilities of decline from the posterior distributions of annual rates of change in CPE. We were interested in annual declines because of recent concerns about fish health in portions of the study area. In general, periods of decline were greatest within the Susquehanna River basin, Pennsylvania. The declines in CPE began in the late 1990s—prior to observations of fish health problems—and began to stabilize toward the end of the time series (2011). In contrast, many of the other rivers investigated did not have the same magnitude or duration of decline in CPE. Bayesian DLMs provide information about annual changes in abundance that can inform management and are easily communicated with managers and stakeholders.

  19. Probabilistic techniques for obtaining accurate patient counts in Clinical Data Warehouses

    PubMed Central

    Myers, Risa B.; Herskovic, Jorge R.

    2011-01-01

    Proposal and execution of clinical trials, computation of quality measures and discovery of correlation between medical phenomena are all applications where an accurate count of patients is needed. However, existing sources of this type of patient information, including Clinical Data Warehouses (CDW) may be incomplete or inaccurate. This research explores applying probabilistic techniques, supported by the MayBMS probabilistic database, to obtain accurate patient counts from a clinical data warehouse containing synthetic patient data. We present a synthetic clinical data warehouse (CDW), and populate it with simulated data using a custom patient data generation engine. We then implement, evaluate and compare different techniques for obtaining patients counts. We model billing as a test for the presence of a condition. We compute billing’s sensitivity and specificity both by conducting a “Simulated Expert Review” where a representative sample of records are reviewed and labeled by experts, and by obtaining the ground truth for every record. We compute the posterior probability of a patient having a condition through a “Bayesian Chain”, using Bayes’ Theorem to calculate the probability of a patient having a condition after each visit. The second method is a “one-shot” approach that computes the probability of a patient having a condition based on whether the patient is ever billed for the condition Our results demonstrate the utility of probabilistic approaches, which improve on the accuracy of raw counts. In particular, the simulated review paired with a single application of Bayes’ Theorem produces the best results, with an average error rate of 2.1% compared to 43.7% for the straightforward billing counts. Overall, this research demonstrates that Bayesian probabilistic approaches improve patient counts on simulated patient populations. We believe that total patient counts based on billing data are one of the many possible applications of our Bayesian framework. Use of these probabilistic techniques will enable more accurate patient counts and better results for applications requiring this metric. PMID:21986292

  20. Prediction of road accidents: A Bayesian hierarchical approach.

    PubMed

    Deublein, Markus; Schubert, Matthias; Adey, Bryan T; Köhler, Jochen; Faber, Michael H

    2013-03-01

    In this paper a novel methodology for the prediction of the occurrence of road accidents is presented. The methodology utilizes a combination of three statistical methods: (1) gamma-updating of the occurrence rates of injury accidents and injured road users, (2) hierarchical multivariate Poisson-lognormal regression analysis taking into account correlations amongst multiple dependent model response variables and effects of discrete accident count data e.g. over-dispersion, and (3) Bayesian inference algorithms, which are applied by means of data mining techniques supported by Bayesian Probabilistic Networks in order to represent non-linearity between risk indicating and model response variables, as well as different types of uncertainties which might be present in the development of the specific models. Prior Bayesian Probabilistic Networks are first established by means of multivariate regression analysis of the observed frequencies of the model response variables, e.g. the occurrence of an accident, and observed values of the risk indicating variables, e.g. degree of road curvature. Subsequently, parameter learning is done using updating algorithms, to determine the posterior predictive probability distributions of the model response variables, conditional on the values of the risk indicating variables. The methodology is illustrated through a case study using data of the Austrian rural motorway network. In the case study, on randomly selected road segments the methodology is used to produce a model to predict the expected number of accidents in which an injury has occurred and the expected number of light, severe and fatally injured road users. Additionally, the methodology is used for geo-referenced identification of road sections with increased occurrence probabilities of injury accident events on a road link between two Austrian cities. It is shown that the proposed methodology can be used to develop models to estimate the occurrence of road accidents for any road network provided that the required data are available. Copyright © 2012 Elsevier Ltd. All rights reserved.

  1. An adaptive sparse-grid high-order stochastic collocation method for Bayesian inference in groundwater reactive transport modeling

    NASA Astrophysics Data System (ADS)

    Zhang, Guannan; Lu, Dan; Ye, Ming; Gunzburger, Max; Webster, Clayton

    2013-10-01

    Bayesian analysis has become vital to uncertainty quantification in groundwater modeling, but its application has been hindered by the computational cost associated with numerous model executions required by exploring the posterior probability density function (PPDF) of model parameters. This is particularly the case when the PPDF is estimated using Markov Chain Monte Carlo (MCMC) sampling. In this study, a new approach is developed to improve the computational efficiency of Bayesian inference by constructing a surrogate of the PPDF, using an adaptive sparse-grid high-order stochastic collocation (aSG-hSC) method. Unlike previous works using first-order hierarchical basis, this paper utilizes a compactly supported higher-order hierarchical basis to construct the surrogate system, resulting in a significant reduction in the number of required model executions. In addition, using the hierarchical surplus as an error indicator allows locally adaptive refinement of sparse grids in the parameter space, which further improves computational efficiency. To efficiently build the surrogate system for the PPDF with multiple significant modes, optimization techniques are used to identify the modes, for which high-probability regions are defined and components of the aSG-hSC approximation are constructed. After the surrogate is determined, the PPDF can be evaluated by sampling the surrogate system directly without model execution, resulting in improved efficiency of the surrogate-based MCMC compared with conventional MCMC. The developed method is evaluated using two synthetic groundwater reactive transport models. The first example involves coupled linear reactions and demonstrates the accuracy of our high-order hierarchical basis approach in approximating high-dimensional posteriori distribution. The second example is highly nonlinear because of the reactions of uranium surface complexation, and demonstrates how the iterative aSG-hSC method is able to capture multimodal and non-Gaussian features of PPDF caused by model nonlinearity. Both experiments show that aSG-hSC is an effective and efficient tool for Bayesian inference.

  2. Exact posterior computation in non-conjugate Gaussian location-scale parameters models

    NASA Astrophysics Data System (ADS)

    Andrade, J. A. A.; Rathie, P. N.

    2017-12-01

    In Bayesian analysis the class of conjugate models allows to obtain exact posterior distributions, however this class quite restrictive in the sense that it involves only a few distributions. In fact, most of the practical applications involves non-conjugate models, thus approximate methods, such as the MCMC algorithms, are required. Although these methods can deal with quite complex structures, some practical problems can make their applications quite time demanding, for example, when we use heavy-tailed distributions, convergence may be difficult, also the Metropolis-Hastings algorithm can become very slow, in addition to the extra work inevitably required on choosing efficient candidate generator distributions. In this work, we draw attention to the special functions as a tools for Bayesian computation, we propose an alternative method for obtaining the posterior distribution in Gaussian non-conjugate models in an exact form. We use complex integration methods based on the H-function in order to obtain the posterior distribution and some of its posterior quantities in an explicit computable form. Two examples are provided in order to illustrate the theory.

  3. Robust Bayesian Factor Analysis

    ERIC Educational Resources Information Center

    Hayashi, Kentaro; Yuan, Ke-Hai

    2003-01-01

    Bayesian factor analysis (BFA) assumes the normal distribution of the current sample conditional on the parameters. Practical data in social and behavioral sciences typically have significant skewness and kurtosis. If the normality assumption is not attainable, the posterior analysis will be inaccurate, although the BFA depends less on the current…

  4. Bayesian Semiparametric Structural Equation Models with Latent Variables

    ERIC Educational Resources Information Center

    Yang, Mingan; Dunson, David B.

    2010-01-01

    Structural equation models (SEMs) with latent variables are widely useful for sparse covariance structure modeling and for inferring relationships among latent variables. Bayesian SEMs are appealing in allowing for the incorporation of prior information and in providing exact posterior distributions of unknowns, including the latent variables. In…

  5. Systematic influences of gamma-ray spectrometry data near the decision threshold for radioactivity measurements in the environment.

    PubMed

    Zorko, Benjamin; Korun, Matjaž; Mora Canadas, Juan Carlos; Nicoulaud-Gouin, Valerie; Chyly, Pavol; Blixt Buhr, Anna Maria; Lager, Charlotte; Aquilonius, Karin; Krajewski, Pawel

    2016-07-01

    Several methods for reporting outcomes of gamma-ray spectrometric measurements of environmental samples for dose calculations are presented and discussed. The measurement outcomes can be reported as primary measurement results, primary measurement results modified according to the quantification limit, best estimates obtained by the Bayesian posterior (ISO 11929), best estimates obtained by the probability density distribution resembling shifting, and the procedure recommended by the European Commission (EC). The annual dose is calculated from the arithmetic average using any of these five procedures. It was shown that the primary measurement results modified according to the quantification limit could lead to an underestimation of the annual dose. On the other hand the best estimates lead to an overestimation of the annual dose. The annual doses calculated from the measurement outcomes obtained according to the EC's recommended procedure, which does not cope with the uncertainties, fluctuate between an under- and overestimation, depending on the frequency of the measurement results that are larger than the limit of detection. In the extreme case, when no measurement results above the detection limit occur, the average over primary measurement results modified according to the quantification limit underestimates the average over primary measurement results for about 80%. The average over best estimates calculated according the procedure resembling shifting overestimates the average over primary measurement results for 35%, the average obtained by the Bayesian posterior for 85% and the treatment according to the EC recommendation for 89%. Copyright © 2016 Elsevier Ltd. All rights reserved.

  6. Modeling stream fish distributions using interval-censored detection times.

    PubMed

    Ferreira, Mário; Filipe, Ana Filipa; Bardos, David C; Magalhães, Maria Filomena; Beja, Pedro

    2016-08-01

    Controlling for imperfect detection is important for developing species distribution models (SDMs). Occupancy-detection models based on the time needed to detect a species can be used to address this problem, but this is hindered when times to detection are not known precisely. Here, we extend the time-to-detection model to deal with detections recorded in time intervals and illustrate the method using a case study on stream fish distribution modeling. We collected electrofishing samples of six fish species across a Mediterranean watershed in Northeast Portugal. Based on a Bayesian hierarchical framework, we modeled the probability of water presence in stream channels, and the probability of species occupancy conditional on water presence, in relation to environmental and spatial variables. We also modeled time-to-first detection conditional on occupancy in relation to local factors, using modified interval-censored exponential survival models. Posterior distributions of occupancy probabilities derived from the models were used to produce species distribution maps. Simulations indicated that the modified time-to-detection model provided unbiased parameter estimates despite interval-censoring. There was a tendency for spatial variation in detection rates to be primarily influenced by depth and, to a lesser extent, stream width. Species occupancies were consistently affected by stream order, elevation, and annual precipitation. Bayesian P-values and AUCs indicated that all models had adequate fit and high discrimination ability, respectively. Mapping of predicted occupancy probabilities showed widespread distribution by most species, but uncertainty was generally higher in tributaries and upper reaches. The interval-censored time-to-detection model provides a practical solution to model occupancy-detection when detections are recorded in time intervals. This modeling framework is useful for developing SDMs while controlling for variation in detection rates, as it uses simple data that can be readily collected by field ecologists.

  7. Does Aggressive Phototherapy Increase Mortality while Decreasing Profound Impairment among the Smallest and Sickest Newborns?

    PubMed Central

    Tyson, Jon E; Pedroza, Claudia; Langer, John; Green, Charles; Morris, Brenda; Stevenson, David; Van Meurs, Krisa P.; Oh, William; Phelps, Dale; O’Shea, Michael; McDavid, Georgia E.; Grisby, Cathy; Higgins, Rose

    2013-01-01

    Objective Aggressive phototherapy (AgPT) is widely used and assumed to be safe and effective for even the most immature infants. We assessed whether the benefits and hazards for the smallest and sickest infants differed from those for other extremely low birth weight (ELBW; (≤1000 g) infants in our Neonatal Research Network trial, the only large trial of AgPT. Study Design ELBW infants (n=1974) were randomized to AgPT or conservative phototherapy at age 12–36 hours. The effect of AgPT on outcomes (death; impairment; profound impairment; death or impairment [primary outcome], and death or profound impairment) at 18–22 months corrected age was related to BW stratum (501–750 g; 751–1000 g) and baseline severity of illness using multilevel regression equations. The probability of benefit and of harm was directly assessed with Bayesian analyses. Results Baseline illness severity was well characterized using mechanical ventilation and FiO2 at 24 hours age. Among mechanically ventilated infants ≤750 g BW (n =684), a reduction in impairment and in profound impairment was offset by higher mortality (p for interaction <0.05) with no significant effect on composite outcomes. Conservative Bayesian analyses of this subgroup identified a 99% (posterior) probability that AgPT increased mortality, a 97% probability that AgPT reduced impairment, and a 99% probability that AgPT reduced profound impairment. Conclusions Findings from the only large trial of AgPT suggest that AgPT may increase mortality while reducing impairment and profound impairment among the smallest and sickest infants. New approaches to reduce their serum bilirubin need development and rigorous testing. PMID:22652561

  8. Ancestral sequence reconstruction in primate mitochondrial DNA: compositional bias and effect on functional inference.

    PubMed

    Krishnan, Neeraja M; Seligmann, Hervé; Stewart, Caro-Beth; De Koning, A P Jason; Pollock, David D

    2004-10-01

    Reconstruction of ancestral DNA and amino acid sequences is an important means of inferring information about past evolutionary events. Such reconstructions suggest changes in molecular function and evolutionary processes over the course of evolution and are used to infer adaptation and convergence. Maximum likelihood (ML) is generally thought to provide relatively accurate reconstructed sequences compared to parsimony, but both methods lead to the inference of multiple directional changes in nucleotide frequencies in primate mitochondrial DNA (mtDNA). To better understand this surprising result, as well as to better understand how parsimony and ML differ, we constructed a series of computationally simple "conditional pathway" methods that differed in the number of substitutions allowed per site along each branch, and we also evaluated the entire Bayesian posterior frequency distribution of reconstructed ancestral states. We analyzed primate mitochondrial cytochrome b (Cyt-b) and cytochrome oxidase subunit I (COI) genes and found that ML reconstructs ancestral frequencies that are often more different from tip sequences than are parsimony reconstructions. In contrast, frequency reconstructions based on the posterior ensemble more closely resemble extant nucleotide frequencies. Simulations indicate that these differences in ancestral sequence inference are probably due to deterministic bias caused by high uncertainty in the optimization-based ancestral reconstruction methods (parsimony, ML, Bayesian maximum a posteriori). In contrast, ancestral nucleotide frequencies based on an average of the Bayesian set of credible ancestral sequences are much less biased. The methods involving simpler conditional pathway calculations have slightly reduced likelihood values compared to full likelihood calculations, but they can provide fairly unbiased nucleotide reconstructions and may be useful in more complex phylogenetic analyses than considered here due to their speed and flexibility. To determine whether biased reconstructions using optimization methods might affect inferences of functional properties, ancestral primate mitochondrial tRNA sequences were inferred and helix-forming propensities for conserved pairs were evaluated in silico. For ambiguously reconstructed nucleotides at sites with high base composition variability, ancestral tRNA sequences from Bayesian analyses were more compatible with canonical base pairing than were those inferred by other methods. Thus, nucleotide bias in reconstructed sequences apparently can lead to serious bias and inaccuracies in functional predictions.

  9. Markov Chain Monte Carlo Inference of Parametric Dictionaries for Sparse Bayesian Approximations

    PubMed Central

    Chaspari, Theodora; Tsiartas, Andreas; Tsilifis, Panagiotis; Narayanan, Shrikanth

    2016-01-01

    Parametric dictionaries can increase the ability of sparse representations to meaningfully capture and interpret the underlying signal information, such as encountered in biomedical problems. Given a mapping function from the atom parameter space to the actual atoms, we propose a sparse Bayesian framework for learning the atom parameters, because of its ability to provide full posterior estimates, take uncertainty into account and generalize on unseen data. Inference is performed with Markov Chain Monte Carlo, that uses block sampling to generate the variables of the Bayesian problem. Since the parameterization of dictionary atoms results in posteriors that cannot be analytically computed, we use a Metropolis-Hastings-within-Gibbs framework, according to which variables with closed-form posteriors are generated with the Gibbs sampler, while the remaining ones with the Metropolis Hastings from appropriate candidate-generating densities. We further show that the corresponding Markov Chain is uniformly ergodic ensuring its convergence to a stationary distribution independently of the initial state. Results on synthetic data and real biomedical signals indicate that our approach offers advantages in terms of signal reconstruction compared to previously proposed Steepest Descent and Equiangular Tight Frame methods. This paper demonstrates the ability of Bayesian learning to generate parametric dictionaries that can reliably represent the exemplar data and provides the foundation towards inferring the entire variable set of the sparse approximation problem for signal denoising, adaptation and other applications. PMID:28649173

  10. Variations on Bayesian Prediction and Inference

    DTIC Science & Technology

    2016-05-09

    inference 2.2.1 Background There are a number of statistical inference problems that are not generally formulated via a full probability model...problem of inference about an unknown parameter, the Bayesian approach requires a full probability 1. REPORT DATE (DD-MM-YYYY) 4. TITLE AND...the problem of inference about an unknown parameter, the Bayesian approach requires a full probability model/likelihood which can be an obstacle

  11. Development of uncertainty-based work injury model using Bayesian structural equation modelling.

    PubMed

    Chatterjee, Snehamoy

    2014-01-01

    This paper proposed a Bayesian method-based structural equation model (SEM) of miners' work injury for an underground coal mine in India. The environmental and behavioural variables for work injury were identified and causal relationships were developed. For Bayesian modelling, prior distributions of SEM parameters are necessary to develop the model. In this paper, two approaches were adopted to obtain prior distribution for factor loading parameters and structural parameters of SEM. In the first approach, the prior distributions were considered as a fixed distribution function with specific parameter values, whereas, in the second approach, prior distributions of the parameters were generated from experts' opinions. The posterior distributions of these parameters were obtained by applying Bayesian rule. The Markov Chain Monte Carlo sampling in the form Gibbs sampling was applied for sampling from the posterior distribution. The results revealed that all coefficients of structural and measurement model parameters are statistically significant in experts' opinion-based priors, whereas, two coefficients are not statistically significant when fixed prior-based distributions are applied. The error statistics reveals that Bayesian structural model provides reasonably good fit of work injury with high coefficient of determination (0.91) and less mean squared error as compared to traditional SEM.

  12. Declining Abundance of Beaked Whales (Family Ziphiidae) in the California Current Large Marine Ecosystem

    PubMed Central

    Moore, Jeffrey E.; Barlow, Jay P.

    2013-01-01

    Beaked whales are among the most diverse yet least understood groups of marine mammals. A diverse set of mostly anthropogenic threats necessitates improvement in our ability to assess population status for this cryptic group. The Southwest Fisheries Science Center (NOAA) conducted six ship line-transect cetacean abundance surveys in the California Current off the contiguous western United States between 1991 and 2008. We used a Bayesian hidden-process modeling approach to estimate abundance and population trends of beaked whales using sightings data from these surveys. We also compiled records of beaked whale stranding events (3 genera, at least 8 species) on adjacent beaches from 1900 to 2012, to help assess population status of beaked whales in the northern part of the California Current. Bayesian posterior summaries for trend parameters provide strong evidence of declining beaked whale abundance in the study area. The probability of negative trend for Cuvier's beaked whale (Ziphius cavirostris) during 1991–2008 was 0.84, with 1991 and 2008 estimates of 10771 (CV = 0.51) and ≈7550 (CV = 0.55), respectively. The probability of decline for Mesoplodon spp. (pooled across species) was 0.96, with 1991 and 2008 estimates of 2206 (CV = 0.46) and 811 (CV = 0.65). The mean posterior estimates for average rate of decline were 2.9% and 7.0% per year. There was no evidence of abundance trend for Baird's beaked whale (Berardius bairdii), for which annual abundance estimates in the survey area ranged from ≈900 to 1300 (CV≈1.3). Stranding data were consistent with the survey results. Causes of apparent declines are unknown. Direct impacts of fisheries (bycatch) can be ruled out, but impacts of anthropogenic sound (e.g., naval active sonar) and ecosystem change are plausible hypotheses that merit investigation. PMID:23341907

  13. Effects of Green Tea Gargling on the Prevention of Influenza Infection: An Analysis Using Bayesian Approaches.

    PubMed

    Ide, Kazuki; Kawasaki, Yohei; Akutagawa, Maiko; Yamada, Hiroshi

    2017-02-01

    The aim of this study is to analyze the data obtained from a randomized trial on the prevention of influenza by gargling with green tea, which gave nonsignificant results based on frequentist approaches, by using Bayesian approaches. The posterior proportion, with 95% credible interval (CrI), of influenza in each group was calculated. The Bayesian index θ is the probability that a hypothesis is true. In this case, θ is the probability that the hypothesis that green tea gargling reduced influenza compared with water gargling is true. Univariate and multivariate logistic regression analyses were also performed by using the Markov chain Monte Carlo method. The full analysis set included 747 participants. During the study period, influenza occurred in 44 participants (5.9%). The difference between the two independent binominal proportions was -0.019 (95% CrI, -0.054 to 0.015; θ = 0.87). The partial regression coefficients in the univariate analysis were -0.35 (95% CrI, -1.00 to 0.24) with use of a uniform prior and -0.34 (95% CrI, -0.96 to 0.27) with use of a Jeffreys prior. In the multivariate analysis, the values were -0.37 (95% CrI, -0.96 to 0.30) and -0.36 (95% CrI, -1.03 to 0.21), respectively. The difference between the two independent binominal proportions was less than 0, and θ was greater than 0.85. Therefore, green tea gargling may slightly reduce influenza compared with water gargling. This analysis suggests that green tea gargling can be an additional preventive measure for use with other pharmaceutical and nonpharmaceutical measures and indicates the need for additional studies to confirm the effect of green tea gargling.

  14. Clinical judgment to estimate pretest probability in the diagnosis of Cushing's syndrome under a Bayesian perspective.

    PubMed

    Cipoli, Daniel E; Martinez, Edson Z; Castro, Margaret de; Moreira, Ayrton C

    2012-12-01

    To estimate the pretest probability of Cushing's syndrome (CS) diagnosis by a Bayesian approach using intuitive clinical judgment. Physicians were requested, in seven endocrinology meetings, to answer three questions: "Based on your personal expertise, after obtaining clinical history and physical examination, without using laboratorial tests, what is your probability of diagnosing Cushing's Syndrome?"; "For how long have you been practicing Endocrinology?"; and "Where do you work?". A Bayesian beta regression, using the WinBugs software was employed. We obtained 294 questionnaires. The mean pretest probability of CS diagnosis was 51.6% (95%CI: 48.7-54.3). The probability was directly related to experience in endocrinology, but not with the place of work. Pretest probability of CS diagnosis was estimated using a Bayesian methodology. Although pretest likelihood can be context-dependent, experience based on years of practice may help the practitioner to diagnosis CS.

  15. A nonparametric method to generate synthetic populations to adjust for complex sampling design features.

    PubMed

    Dong, Qi; Elliott, Michael R; Raghunathan, Trivellore E

    2014-06-01

    Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs.

  16. A nonparametric method to generate synthetic populations to adjust for complex sampling design features

    PubMed Central

    Dong, Qi; Elliott, Michael R.; Raghunathan, Trivellore E.

    2017-01-01

    Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs. PMID:29200608

  17. A Bayesian Approach to Genome/Linguistic Relationships in Native South Americans

    PubMed Central

    Amorim, Carlos Eduardo Guerra; Bisso-Machado, Rafael; Ramallo, Virginia; Bortolini, Maria Cátira; Bonatto, Sandro Luis; Salzano, Francisco Mauro; Hünemeier, Tábita

    2013-01-01

    The relationship between the evolution of genes and languages has been studied for over three decades. These studies rely on the assumption that languages, as many other cultural traits, evolve in a gene-like manner, accumulating heritable diversity through time and being subjected to evolutionary mechanisms of change. In the present work we used genetic data to evaluate South American linguistic classifications. We compared discordant models of language classifications to the current Native American genome-wide variation using realistic demographic models analyzed under an Approximate Bayesian Computation (ABC) framework. Data on 381 STRs spread along the autosomes were gathered from the literature for populations representing the five main South Amerindian linguistic groups: Andean, Arawakan, Chibchan-Paezan, Macro-Jê, and Tupí. The results indicated a higher posterior probability for the classification proposed by J.H. Greenberg in 1987, although L. Campbell's 1997 classification cannot be ruled out. Based on Greenberg's classification, it was possible to date the time of Tupí-Arawakan divergence (2.8 kya), and the time of emergence of the structure between present day major language groups in South America (3.1 kya). PMID:23696865

  18. Bayesian averaging over Decision Tree models for trauma severity scoring.

    PubMed

    Schetinin, V; Jakaite, L; Krzanowski, W

    2018-01-01

    Health care practitioners analyse possible risks of misleading decisions and need to estimate and quantify uncertainty in predictions. We have examined the "gold" standard of screening a patient's conditions for predicting survival probability, based on logistic regression modelling, which is used in trauma care for clinical purposes and quality audit. This methodology is based on theoretical assumptions about data and uncertainties. Models induced within such an approach have exposed a number of problems, providing unexplained fluctuation of predicted survival and low accuracy of estimating uncertainty intervals within which predictions are made. Bayesian method, which in theory is capable of providing accurate predictions and uncertainty estimates, has been adopted in our study using Decision Tree models. Our approach has been tested on a large set of patients registered in the US National Trauma Data Bank and has outperformed the standard method in terms of prediction accuracy, thereby providing practitioners with accurate estimates of the predictive posterior densities of interest that are required for making risk-aware decisions. Copyright © 2017 Elsevier B.V. All rights reserved.

  19. A bayesian approach to genome/linguistic relationships in native South Americans.

    PubMed

    Amorim, Carlos Eduardo Guerra; Bisso-Machado, Rafael; Ramallo, Virginia; Bortolini, Maria Cátira; Bonatto, Sandro Luis; Salzano, Francisco Mauro; Hünemeier, Tábita

    2013-01-01

    The relationship between the evolution of genes and languages has been studied for over three decades. These studies rely on the assumption that languages, as many other cultural traits, evolve in a gene-like manner, accumulating heritable diversity through time and being subjected to evolutionary mechanisms of change. In the present work we used genetic data to evaluate South American linguistic classifications. We compared discordant models of language classifications to the current Native American genome-wide variation using realistic demographic models analyzed under an Approximate Bayesian Computation (ABC) framework. Data on 381 STRs spread along the autosomes were gathered from the literature for populations representing the five main South Amerindian linguistic groups: Andean, Arawakan, Chibchan-Paezan, Macro-Jê, and Tupí. The results indicated a higher posterior probability for the classification proposed by J.H. Greenberg in 1987, although L. Campbell's 1997 classification cannot be ruled out. Based on Greenberg's classification, it was possible to date the time of Tupí-Arawakan divergence (2.8 kya), and the time of emergence of the structure between present day major language groups in South America (3.1 kya).

  20. Adaptive design optimization: a mutual information-based approach to model discrimination in cognitive science.

    PubMed

    Cavagnaro, Daniel R; Myung, Jay I; Pitt, Mark A; Kujala, Janne V

    2010-04-01

    Discriminating among competing statistical models is a pressing issue for many experimentalists in the field of cognitive science. Resolving this issue begins with designing maximally informative experiments. To this end, the problem to be solved in adaptive design optimization is identifying experimental designs under which one can infer the underlying model in the fewest possible steps. When the models under consideration are nonlinear, as is often the case in cognitive science, this problem can be impossible to solve analytically without simplifying assumptions. However, as we show in this letter, a full solution can be found numerically with the help of a Bayesian computational trick derived from the statistics literature, which recasts the problem as a probability density simulation in which the optimal design is the mode of the density. We use a utility function based on mutual information and give three intuitive interpretations of the utility function in terms of Bayesian posterior estimates. As a proof of concept, we offer a simple example application to an experiment on memory retention.

  1. Variational Bayesian Inversion of Quasi-Localized Seismic Attributes for the Spatial Distribution of Geological Facies

    NASA Astrophysics Data System (ADS)

    Nawaz, Muhammad Atif; Curtis, Andrew

    2018-04-01

    We introduce a new Bayesian inversion method that estimates the spatial distribution of geological facies from attributes of seismic data, by showing how the usual probabilistic inverse problem can be solved using an optimization framework still providing full probabilistic results. Our mathematical model consists of seismic attributes as observed data, which are assumed to have been generated by the geological facies. The method infers the post-inversion (posterior) probability density of the facies plus some other unknown model parameters, from the seismic attributes and geological prior information. Most previous research in this domain is based on the localized likelihoods assumption, whereby the seismic attributes at a location are assumed to depend on the facies only at that location. Such an assumption is unrealistic because of imperfect seismic data acquisition and processing, and fundamental limitations of seismic imaging methods. In this paper, we relax this assumption: we allow probabilistic dependence between seismic attributes at a location and the facies in any neighbourhood of that location through a spatial filter. We term such likelihoods quasi-localized.

  2. A taxonomic monograph of Nearctic Scolytus Geoffroy (Coleoptera, Curculionidae, Scolytinae)

    PubMed Central

    Smith, Sarah M.; Cognato, Anthony I.

    2014-01-01

    Abstract The Nearctic bark beetle genus Scolytus Geoffroy was revised based in part on a molecular and morphological phylogeny. Monophyly of the native species was tested using mitochondrial (COI) and nuclear (28S, CAD, ArgK) genes and 43 morphological characters in parsimony and Bayesian phylogenetic analyses. Parsimony analyses of molecular and combined datasets provided mixed results while Bayesian analysis recovered most nodes with posterior probabilities >90%. Native hardwood- and conifer-feeding Scolytus species were recovered as paraphyletic. Native Nearctic species were recovered as paraphyletic with hardwood-feeding species sister to Palearctic hardwood-feeding species rather than to native conifer-feeding species. The Nearctic conifer-feeding species were monophyletic. Twenty-five species were recognized. Four new synonyms were discovered: Scolytus praeceps LeConte, 1868 (= Scolytus abietis Blackman, 1934; = Scolytus opacus Blackman, 1934), Scolytus reflexus Blackman, 1934 (= Scolytus virgatus Bright, 1972; = Scolytus wickhami Blackman, 1934). Two species were reinstated: Scolytus fiskei Blackman, 1934 and Scolytus silvaticus Bright, 1972. A diagnosis, description, distribution, host records and images were provided for each species and a key is presented to all species. PMID:25408617

  3. A Bayesian inversion for slip distribution of 1 Apr 2007 Mw8.1 Solomon Islands Earthquake

    NASA Astrophysics Data System (ADS)

    Chen, T.; Luo, H.

    2013-12-01

    On 1 Apr 2007 the megathrust Mw8.1 Solomon Islands earthquake occurred in the southeast pacific along the New Britain subduction zone. 102 vertical displacement measurements over the southeastern end of the rupture zone from two field surveys after this event provide a unique constraint for slip distribution inversion. In conventional inversion method (such as bounded variable least squares) the smoothing parameter that determines the relative weight placed on fitting the data versus smoothing the slip distribution is often subjectively selected at the bend of the trade-off curve. Here a fully probabilistic inversion method[Fukuda,2008] is applied to estimate distributed slip and smoothing parameter objectively. The joint posterior probability density function of distributed slip and the smoothing parameter is formulated under a Bayesian framework and sampled with Markov chain Monte Carlo method. We estimate the spatial distribution of dip slip associated with the 1 Apr 2007 Solomon Islands earthquake with this method. Early results show a shallower dip angle than previous study and highly variable dip slip both along-strike and down-dip.

  4. SOMBI: Bayesian identification of parameter relations in unstructured cosmological data

    NASA Astrophysics Data System (ADS)

    Frank, Philipp; Jasche, Jens; Enßlin, Torsten A.

    2016-11-01

    This work describes the implementation and application of a correlation determination method based on self organizing maps and Bayesian inference (SOMBI). SOMBI aims to automatically identify relations between different observed parameters in unstructured cosmological or astrophysical surveys by automatically identifying data clusters in high-dimensional datasets via the self organizing map neural network algorithm. Parameter relations are then revealed by means of a Bayesian inference within respective identified data clusters. Specifically such relations are assumed to be parametrized as a polynomial of unknown order. The Bayesian approach results in a posterior probability distribution function for respective polynomial coefficients. To decide which polynomial order suffices to describe correlation structures in data, we include a method for model selection, the Bayesian information criterion, to the analysis. The performance of the SOMBI algorithm is tested with mock data. As illustration we also provide applications of our method to cosmological data. In particular, we present results of a correlation analysis between galaxy and active galactic nucleus (AGN) properties provided by the SDSS catalog with the cosmic large-scale-structure (LSS). The results indicate that the combined galaxy and LSS dataset indeed is clustered into several sub-samples of data with different average properties (for example different stellar masses or web-type classifications). The majority of data clusters appear to have a similar correlation structure between galaxy properties and the LSS. In particular we revealed a positive and linear dependency between the stellar mass, the absolute magnitude and the color of a galaxy with the corresponding cosmic density field. A remaining subset of data shows inverted correlations, which might be an artifact of non-linear redshift distortions.

  5. Bayesian Probability Theory

    NASA Astrophysics Data System (ADS)

    von der Linden, Wolfgang; Dose, Volker; von Toussaint, Udo

    2014-06-01

    Preface; Part I. Introduction: 1. The meaning of probability; 2. Basic definitions; 3. Bayesian inference; 4. Combinatrics; 5. Random walks; 6. Limit theorems; 7. Continuous distributions; 8. The central limit theorem; 9. Poisson processes and waiting times; Part II. Assigning Probabilities: 10. Transformation invariance; 11. Maximum entropy; 12. Qualified maximum entropy; 13. Global smoothness; Part III. Parameter Estimation: 14. Bayesian parameter estimation; 15. Frequentist parameter estimation; 16. The Cramer-Rao inequality; Part IV. Testing Hypotheses: 17. The Bayesian way; 18. The frequentist way; 19. Sampling distributions; 20. Bayesian vs frequentist hypothesis tests; Part V. Real World Applications: 21. Regression; 22. Inconsistent data; 23. Unrecognized signal contributions; 24. Change point problems; 25. Function estimation; 26. Integral equations; 27. Model selection; 28. Bayesian experimental design; Part VI. Probabilistic Numerical Techniques: 29. Numerical integration; 30. Monte Carlo methods; 31. Nested sampling; Appendixes; References; Index.

  6. Beginning Bayes

    ERIC Educational Resources Information Center

    Erickson, Tim

    2017-01-01

    Understanding a Bayesian perspective demands comfort with conditional probability and with probabilities that appear to change as we acquire additional information. This paper suggests a simple context in conditional probability that helps develop the understanding students would need for a successful introduction to Bayesian reasoning.

  7. What Is the Probability You Are a Bayesian?

    ERIC Educational Resources Information Center

    Wulff, Shaun S.; Robinson, Timothy J.

    2014-01-01

    Bayesian methodology continues to be widely used in statistical applications. As a result, it is increasingly important to introduce students to Bayesian thinking at early stages in their mathematics and statistics education. While many students in upper level probability courses can recite the differences in the Frequentist and Bayesian…

  8. Evidence that multiple genetic variants of MC4R play a functional role in the regulation of energy expenditure and appetite in Hispanic children1234

    PubMed Central

    Cole, Shelley A; Voruganti, V Saroja; Cai, Guowen; Haack, Karin; Kent, Jack W; Blangero, John; Comuzzie, Anthony G; McPherson, John D; Gibbs, Richard A

    2010-01-01

    Background: Melanocortin-4-receptor (MC4R) haploinsufficiency is the most common form of monogenic obesity; however, the frequency of MC4R variants and their functional effects in general populations remain uncertain. Objective: The aim was to identify and characterize the effects of MC4R variants in Hispanic children. Design: MC4R was resequenced in 376 parents, and the identified single nucleotide polymorphisms (SNPs) were genotyped in 613 parents and 1016 children from the Viva la Familia cohort. Measured genotype analysis (MGA) tested associations between SNPs and phenotypes. Bayesian quantitative trait nucleotide (BQTN) analysis was used to infer the most likely functional polymorphisms influencing obesity-related traits. Results: Seven rare SNPs in coding and 18 SNPs in flanking regions of MC4R were identified. MGA showed suggestive associations between MC4R variants and body size, adiposity, glucose, insulin, leptin, ghrelin, energy expenditure, physical activity, and food intake. BQTN analysis identified SNP 1704 in a predicted micro-RNA target sequence in the downstream flanking region of MC4R as a strong, probable functional variant influencing total, sedentary, and moderate activities with posterior probabilities of 1.0. SNP 2132 was identified as a variant with a high probability (1.0) of exerting a functional effect on total energy expenditure and sleeping metabolic rate. SNP rs34114122 was selected as having likely functional effects on the appetite hormone ghrelin, with a posterior probability of 0.81. Conclusion: This comprehensive investigation provides strong evidence that MC4R genetic variants are likely to play a functional role in the regulation of weight, not only through energy intake but through energy expenditure. PMID:19889825

  9. Sequential Designs Based on Bayesian Uncertainty Quantification in Sparse Representation Surrogate Modeling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, Ray -Bing; Wang, Weichung; Jeff Wu, C. F.

    A numerical method, called OBSM, was recently proposed which employs overcomplete basis functions to achieve sparse representations. While the method can handle non-stationary response without the need of inverting large covariance matrices, it lacks the capability to quantify uncertainty in predictions. We address this issue by proposing a Bayesian approach which first imposes a normal prior on the large space of linear coefficients, then applies the MCMC algorithm to generate posterior samples for predictions. From these samples, Bayesian credible intervals can then be obtained to assess prediction uncertainty. A key application for the proposed method is the efficient construction ofmore » sequential designs. Several sequential design procedures with different infill criteria are proposed based on the generated posterior samples. As a result, numerical studies show that the proposed schemes are capable of solving problems of positive point identification, optimization, and surrogate fitting.« less

  10. Approximate Bayesian computation for spatial SEIR(S) epidemic models.

    PubMed

    Brown, Grant D; Porter, Aaron T; Oleson, Jacob J; Hinman, Jessica A

    2018-02-01

    Approximate Bayesia n Computation (ABC) provides an attractive approach to estimation in complex Bayesian inferential problems for which evaluation of the kernel of the posterior distribution is impossible or computationally expensive. These highly parallelizable techniques have been successfully applied to many fields, particularly in cases where more traditional approaches such as Markov chain Monte Carlo (MCMC) are impractical. In this work, we demonstrate the application of approximate Bayesian inference to spatially heterogeneous Susceptible-Exposed-Infectious-Removed (SEIR) stochastic epidemic models. These models have a tractable posterior distribution, however MCMC techniques nevertheless become computationally infeasible for moderately sized problems. We discuss the practical implementation of these techniques via the open source ABSEIR package for R. The performance of ABC relative to traditional MCMC methods in a small problem is explored under simulation, as well as in the spatially heterogeneous context of the 2014 epidemic of Chikungunya in the Americas. Copyright © 2017 Elsevier Ltd. All rights reserved.

  11. Sequential Designs Based on Bayesian Uncertainty Quantification in Sparse Representation Surrogate Modeling

    DOE PAGES

    Chen, Ray -Bing; Wang, Weichung; Jeff Wu, C. F.

    2017-04-12

    A numerical method, called OBSM, was recently proposed which employs overcomplete basis functions to achieve sparse representations. While the method can handle non-stationary response without the need of inverting large covariance matrices, it lacks the capability to quantify uncertainty in predictions. We address this issue by proposing a Bayesian approach which first imposes a normal prior on the large space of linear coefficients, then applies the MCMC algorithm to generate posterior samples for predictions. From these samples, Bayesian credible intervals can then be obtained to assess prediction uncertainty. A key application for the proposed method is the efficient construction ofmore » sequential designs. Several sequential design procedures with different infill criteria are proposed based on the generated posterior samples. As a result, numerical studies show that the proposed schemes are capable of solving problems of positive point identification, optimization, and surrogate fitting.« less

  12. Analytic posteriors for Pearson's correlation coefficient.

    PubMed

    Ly, Alexander; Marsman, Maarten; Wagenmakers, Eric-Jan

    2018-02-01

    Pearson's correlation is one of the most common measures of linear dependence. Recently, Bernardo (11th International Workshop on Objective Bayes Methodology, 2015) introduced a flexible class of priors to study this measure in a Bayesian setting. For this large class of priors, we show that the (marginal) posterior for Pearson's correlation coefficient and all of the posterior moments are analytic. Our results are available in the open-source software package JASP.

  13. Bayesian bivariate meta-analysis of correlated effects: Impact of the prior distributions on the between-study correlation, borrowing of strength, and joint inferences

    PubMed Central

    Bujkiewicz, Sylwia; Riley, Richard D

    2016-01-01

    Multivariate random-effects meta-analysis allows the joint synthesis of correlated results from multiple studies, for example, for multiple outcomes or multiple treatment groups. In a Bayesian univariate meta-analysis of one endpoint, the importance of specifying a sensible prior distribution for the between-study variance is well understood. However, in multivariate meta-analysis, there is little guidance about the choice of prior distributions for the variances or, crucially, the between-study correlation, ρB; for the latter, researchers often use a Uniform(−1,1) distribution assuming it is vague. In this paper, an extensive simulation study and a real illustrative example is used to examine the impact of various (realistically) vague prior distributions for ρB and the between-study variances within a Bayesian bivariate random-effects meta-analysis of two correlated treatment effects. A range of diverse scenarios are considered, including complete and missing data, to examine the impact of the prior distributions on posterior results (for treatment effect and between-study correlation), amount of borrowing of strength, and joint predictive distributions of treatment effectiveness in new studies. Two key recommendations are identified to improve the robustness of multivariate meta-analysis results. First, the routine use of a Uniform(−1,1) prior distribution for ρB should be avoided, if possible, as it is not necessarily vague. Instead, researchers should identify a sensible prior distribution, for example, by restricting values to be positive or negative as indicated by prior knowledge. Second, it remains critical to use sensible (e.g. empirically based) prior distributions for the between-study variances, as an inappropriate choice can adversely impact the posterior distribution for ρB, which may then adversely affect inferences such as joint predictive probabilities. These recommendations are especially important with a small number of studies and missing data. PMID:26988929

  14. Integrated survival analysis using an event-time approach in a Bayesian framework

    USGS Publications Warehouse

    Walsh, Daniel P.; Dreitz, VJ; Heisey, Dennis M.

    2015-01-01

    Event-time or continuous-time statistical approaches have been applied throughout the biostatistical literature and have led to numerous scientific advances. However, these techniques have traditionally relied on knowing failure times. This has limited application of these analyses, particularly, within the ecological field where fates of marked animals may be unknown. To address these limitations, we developed an integrated approach within a Bayesian framework to estimate hazard rates in the face of unknown fates. We combine failure/survival times from individuals whose fates are known and times of which are interval-censored with information from those whose fates are unknown, and model the process of detecting animals with unknown fates. This provides the foundation for our integrated model and permits necessary parameter estimation. We provide the Bayesian model, its derivation, and use simulation techniques to investigate the properties and performance of our approach under several scenarios. Lastly, we apply our estimation technique using a piece-wise constant hazard function to investigate the effects of year, age, chick size and sex, sex of the tending adult, and nesting habitat on mortality hazard rates of the endangered mountain plover (Charadrius montanus) chicks. Traditional models were inappropriate for this analysis because fates of some individual chicks were unknown due to failed radio transmitters. Simulations revealed biases of posterior mean estimates were minimal (≤ 4.95%), and posterior distributions behaved as expected with RMSE of the estimates decreasing as sample sizes, detection probability, and survival increased. We determined mortality hazard rates for plover chicks were highest at <5 days old and were lower for chicks with larger birth weights and/or whose nest was within agricultural habitats. Based on its performance, our approach greatly expands the range of problems for which event-time analyses can be used by eliminating the need for having completely known fate data.

  15. Integrated survival analysis using an event-time approach in a Bayesian framework.

    PubMed

    Walsh, Daniel P; Dreitz, Victoria J; Heisey, Dennis M

    2015-02-01

    Event-time or continuous-time statistical approaches have been applied throughout the biostatistical literature and have led to numerous scientific advances. However, these techniques have traditionally relied on knowing failure times. This has limited application of these analyses, particularly, within the ecological field where fates of marked animals may be unknown. To address these limitations, we developed an integrated approach within a Bayesian framework to estimate hazard rates in the face of unknown fates. We combine failure/survival times from individuals whose fates are known and times of which are interval-censored with information from those whose fates are unknown, and model the process of detecting animals with unknown fates. This provides the foundation for our integrated model and permits necessary parameter estimation. We provide the Bayesian model, its derivation, and use simulation techniques to investigate the properties and performance of our approach under several scenarios. Lastly, we apply our estimation technique using a piece-wise constant hazard function to investigate the effects of year, age, chick size and sex, sex of the tending adult, and nesting habitat on mortality hazard rates of the endangered mountain plover (Charadrius montanus) chicks. Traditional models were inappropriate for this analysis because fates of some individual chicks were unknown due to failed radio transmitters. Simulations revealed biases of posterior mean estimates were minimal (≤ 4.95%), and posterior distributions behaved as expected with RMSE of the estimates decreasing as sample sizes, detection probability, and survival increased. We determined mortality hazard rates for plover chicks were highest at <5 days old and were lower for chicks with larger birth weights and/or whose nest was within agricultural habitats. Based on its performance, our approach greatly expands the range of problems for which event-time analyses can be used by eliminating the need for having completely known fate data.

  16. 2D Bayesian automated tilted-ring fitting of disc galaxies in large H I galaxy surveys: 2DBAT

    NASA Astrophysics Data System (ADS)

    Oh, Se-Heon; Staveley-Smith, Lister; Spekkens, Kristine; Kamphuis, Peter; Koribalski, Bärbel S.

    2018-01-01

    We present a novel algorithm based on a Bayesian method for 2D tilted-ring analysis of disc galaxy velocity fields. Compared to the conventional algorithms based on a chi-squared minimization procedure, this new Bayesian-based algorithm suffers less from local minima of the model parameters even with highly multimodal posterior distributions. Moreover, the Bayesian analysis, implemented via Markov Chain Monte Carlo sampling, only requires broad ranges of posterior distributions of the parameters, which makes the fitting procedure fully automated. This feature will be essential when performing kinematic analysis on the large number of resolved galaxies expected to be detected in neutral hydrogen (H I) surveys with the Square Kilometre Array and its pathfinders. The so-called 2D Bayesian Automated Tilted-ring fitter (2DBAT) implements Bayesian fits of 2D tilted-ring models in order to derive rotation curves of galaxies. We explore 2DBAT performance on (a) artificial H I data cubes built based on representative rotation curves of intermediate-mass and massive spiral galaxies, and (b) Australia Telescope Compact Array H I data from the Local Volume H I Survey. We find that 2DBAT works best for well-resolved galaxies with intermediate inclinations (20° < i < 70°), complementing 3D techniques better suited to modelling inclined galaxies.

  17. Bayesian inference of Earth's radial seismic structure from body-wave traveltimes using neural networks

    NASA Astrophysics Data System (ADS)

    de Wit, Ralph W. L.; Valentine, Andrew P.; Trampert, Jeannot

    2013-10-01

    How do body-wave traveltimes constrain the Earth's radial (1-D) seismic structure? Existing 1-D seismological models underpin 3-D seismic tomography and earthquake location algorithms. It is therefore crucial to assess the quality of such 1-D models, yet quantifying uncertainties in seismological models is challenging and thus often ignored. Ideally, quality assessment should be an integral part of the inverse method. Our aim in this study is twofold: (i) we show how to solve a general Bayesian non-linear inverse problem and quantify model uncertainties, and (ii) we investigate the constraint on spherically symmetric P-wave velocity (VP) structure provided by body-wave traveltimes from the EHB bulletin (phases Pn, P, PP and PKP). Our approach is based on artificial neural networks, which are very common in pattern recognition problems and can be used to approximate an arbitrary function. We use a Mixture Density Network to obtain 1-D marginal posterior probability density functions (pdfs), which provide a quantitative description of our knowledge on the individual Earth parameters. No linearization or model damping is required, which allows us to infer a model which is constrained purely by the data. We present 1-D marginal posterior pdfs for the 22 VP parameters and seven discontinuity depths in our model. P-wave velocities in the inner core, outer core and lower mantle are resolved well, with standard deviations of ˜0.2 to 1 per cent with respect to the mean of the posterior pdfs. The maximum likelihoods of VP are in general similar to the corresponding ak135 values, which lie within one or two standard deviations from the posterior means, thus providing an independent validation of ak135 in this part of the radial model. Conversely, the data contain little or no information on P-wave velocity in the D'' layer, the upper mantle and the homogeneous crustal layers. Further, the data do not constrain the depth of the discontinuities in our model. Using additional phases available in the ISC bulletin, such as PcP, PKKP and the converted phases SP and ScP, may enhance the resolvability of these parameters. Finally, we show how the method can be extended to obtain a posterior pdf for a multidimensional model space. This enables us to investigate correlations between model parameters.

  18. The Chandra Source Catalog: X-ray Aperture Photometry

    NASA Astrophysics Data System (ADS)

    Kashyap, Vinay; Primini, F. A.; Glotfelty, K. J.; Anderson, C. S.; Bonaventura, N. R.; Chen, J. C.; Davis, J. E.; Doe, S. M.; Evans, I. N.; Evans, J. D.; Fabbiano, G.; Galle, E. C.; Gibbs, D. G., II; Grier, J. D.; Hain, R.; Hall, D. M.; Harbo, P. N.; He, X.; Houck, J. C.; Karovska, M.; Lauer, J.; McCollough, M. L.; McDowell, J. C.; Miller, J. B.; Mitschang, A. W.; Morgan, D. L.; Nichols, J. S.; Nowak, M. A.; Plummer, D. A.; Refsdal, B. L.; Rots, A. H.; Siemiginowska, A. L.; Sundheim, B. A.; Tibbetts, M. S.; van Stone, D. W.; Winkelman, S. L.; Zografou, P.

    2009-09-01

    The Chandra Source Catalog (CSC) represents a reanalysis of the entire ACIS and HRC imaging observations over the 9-year Chandra mission. We describe here the method by which fluxes are measured for detected sources. Source detection is carried out on a uniform basis, using the CIAO tool wavdetect. Source fluxes are estimated post-facto using a Bayesian method that accounts for background, spatial resolution effects, and contamination from nearby sources. We use gamma-function prior distributions, which could be either non-informative, or in case there exist previous observations of the same source, strongly informative. The current implementation is however limited to non-informative priors. The resulting posterior probability density functions allow us to report the flux and a robust credible range on it.

  19. What are hierarchical models and how do we analyze them?

    USGS Publications Warehouse

    Royle, Andy

    2016-01-01

    In this chapter we provide a basic definition of hierarchical models and introduce the two canonical hierarchical models in this book: site occupancy and N-mixture models. The former is a hierarchical extension of logistic regression and the latter is a hierarchical extension of Poisson regression. We introduce basic concepts of probability modeling and statistical inference including likelihood and Bayesian perspectives. We go through the mechanics of maximizing the likelihood and characterizing the posterior distribution by Markov chain Monte Carlo (MCMC) methods. We give a general perspective on topics such as model selection and assessment of model fit, although we demonstrate these topics in practice in later chapters (especially Chapters 5, 6, 7, and 10 Chapter 5 Chapter 6 Chapter 7 Chapter 10)

  20. Probabilistic objective functions for sensor management

    NASA Astrophysics Data System (ADS)

    Mahler, Ronald P. S.; Zajic, Tim R.

    2004-08-01

    This paper continues the investigation of a foundational and yet potentially practical basis for control-theoretic sensor management, using a comprehensive, intuitive, system-level Bayesian paradigm based on finite-set statistics (FISST). In this paper we report our most recent progress, focusing on multistep look-ahead -- i.e., allocation of sensor resources throughout an entire future time-window. We determine future sensor states in the time-window using a "probabilistically natural" sensor management objective function, the posterior expected number of targets (PENT). This objective function is constructed using a new "maxi-PIMS" optimization strategy that hedges against unknowable future observation-collections. PENT is used in conjuction with approximate multitarget filters: the probability hypothesis density (PHD) filter or the multi-hypothesis correlator (MHC) filter.

  1. Application of Bayesian Approach in Cancer Clinical Trial

    PubMed Central

    Bhattacharjee, Atanu

    2014-01-01

    The application of Bayesian approach in clinical trials becomes more useful over classical method. It is beneficial from design to analysis phase. The straight forward statement is possible to obtain through Bayesian about the drug treatment effect. Complex computational problems are simple to handle with Bayesian techniques. The technique is only feasible to performing presence of prior information of the data. The inference is possible to establish through posterior estimates. However, some limitations are present in this method. The objective of this work was to explore the several merits and demerits of Bayesian approach in cancer research. The review of the technique will be helpful for the clinical researcher involved in the oncology to explore the limitation and power of Bayesian techniques. PMID:29147387

  2. Propagation of population pharmacokinetic information using a Bayesian approach: comparison with meta-analysis.

    PubMed

    Dokoumetzidis, Aristides; Aarons, Leon

    2005-08-01

    We investigated the propagation of population pharmacokinetic information across clinical studies by applying Bayesian techniques. The aim was to summarize the population pharmacokinetic estimates of a study in appropriate statistical distributions in order to use them as Bayesian priors in consequent population pharmacokinetic analyses. Various data sets of simulated and real clinical data were fitted with WinBUGS, with and without informative priors. The posterior estimates of fittings with non-informative priors were used to build parametric informative priors and the whole procedure was carried on in a consecutive manner. The posterior distributions of the fittings with informative priors where compared to those of the meta-analysis fittings of the respective combinations of data sets. Good agreement was found, for the simulated and experimental datasets when the populations were exchangeable, with the posterior distribution from the fittings with the prior to be nearly identical to the ones estimated with meta-analysis. However, when populations were not exchangeble an alternative parametric form for the prior, the natural conjugate prior, had to be used in order to have consistent results. In conclusion, the results of a population pharmacokinetic analysis may be summarized in Bayesian prior distributions that can be used consecutively with other analyses. The procedure is an alternative to meta-analysis and gives comparable results. It has the advantage that it is faster than the meta-analysis, due to the large datasets used with the latter and can be performed when the data included in the prior are not actually available.

  3. A Bayesian Assessment of Seismic Semi-Periodicity Forecasts

    NASA Astrophysics Data System (ADS)

    Nava, F.; Quinteros, C.; Glowacka, E.; Frez, J.

    2016-01-01

    Among the schemes for earthquake forecasting, the search for semi-periodicity during large earthquakes in a given seismogenic region plays an important role. When considering earthquake forecasts based on semi-periodic sequence identification, the Bayesian formalism is a useful tool for: (1) assessing how well a given earthquake satisfies a previously made forecast; (2) re-evaluating the semi-periodic sequence probability; and (3) testing other prior estimations of the sequence probability. A comparison of Bayesian estimates with updated estimates of semi-periodic sequences that incorporate new data not used in the original estimates shows extremely good agreement, indicating that: (1) the probability that a semi-periodic sequence is not due to chance is an appropriate estimate for the prior sequence probability estimate; and (2) the Bayesian formalism does a very good job of estimating corrected semi-periodicity probabilities, using slightly less data than that used for updated estimates. The Bayesian approach is exemplified explicitly by its application to the Parkfield semi-periodic forecast, and results are given for its application to other forecasts in Japan and Venezuela.

  4. Coupling Self-Organizing Maps with a Naïve Bayesian classifier: A case study for classifying Vermont streams using geomorphic, habitat and biological assessment data

    NASA Astrophysics Data System (ADS)

    Fytilis, N.; Rizzo, D. M.

    2012-12-01

    Environmental managers are increasingly required to forecast the long-term effects and the resilience or vulnerability of biophysical systems to human-generated stresses. Mitigation strategies for hydrological and environmental systems need to be assessed in the presence of uncertainty. An important aspect of such complex systems is the assessment of variable uncertainty on the model response outputs. We develop a new classification tool that couples a Naïve Bayesian Classifier with a modified Kohonen Self-Organizing Map to tackle this challenge. For proof-of-concept, we use rapid geomorphic and reach-scale habitat assessments data from over 2500 Vermont stream reaches (~1371 stream miles) assessed by the Vermont Agency of Natural Resources (VTANR). In addition, the Vermont Department of Environmental Conservation (VTDEC) estimates stream habitat biodiversity indices (macro-invertebrates and fish) and a variety of water quality data. Our approach fully utilizes the existing VTANR and VTDEC data sets to improve classification of stream-reach habitat and biological integrity. The combined SOM-Naïve Bayesian architecture is sufficiently flexible to allow for continual updates and increased accuracy associated with acquiring new data. The Kohonen Self-Organizing Map (SOM) is an unsupervised artificial neural network that autonomously analyzes properties inherent in a given a set of data. It is typically used to cluster data vectors into similar categories when a priori classes do not exist. The ability of the SOM to convert nonlinear, high dimensional data to some user-defined lower dimension and mine large amounts of data types (i.e., discrete or continuous, biological or geomorphic data) makes it ideal for characterizing the sensitivity of river networks in a variety of contexts. The procedure is data-driven, and therefore does not require the development of site-specific, process-based classification stream models, or sets of if-then-else rules associated with expert systems. This has the potential to save time and resources, while enabling a truly adaptive management approach using existing knowledge (expressed as prior probabilities) and new information (expressed as likelihood functions) to update estimates (i.e., in this case, improved stream classifications expressed as posterior probabilities). The distribution parameters of these posterior probabilities are used to quantify uncertainty associated with environmental data. Since classification plays a leading role in the future development of data-enabled science and engineering, such a computational tool is applicable to a variety of engineering applications. The ability of the new classification neural network to characterize streams with high environmental risk is essential for a proactive adaptive watershed management approach.

  5. Bayesian design criteria: computation, comparison, and application to a pharmacokinetic and a pharmacodynamic model.

    PubMed

    Merlé, Y; Mentré, F

    1995-02-01

    In this paper 3 criteria to design experiments for Bayesian estimation of the parameters of nonlinear models with respect to their parameters, when a prior distribution is available, are presented: the determinant of the Bayesian information matrix, the determinant of the pre-posterior covariance matrix, and the expected information provided by an experiment. A procedure to simplify the computation of these criteria is proposed in the case of continuous prior distributions and is compared with the criterion obtained from a linearization of the model about the mean of the prior distribution for the parameters. This procedure is applied to two models commonly encountered in the area of pharmacokinetics and pharmacodynamics: the one-compartment open model with bolus intravenous single-dose injection and the Emax model. They both involve two parameters. Additive as well as multiplicative gaussian measurement errors are considered with normal prior distributions. Various combinations of the variances of the prior distribution and of the measurement error are studied. Our attention is restricted to designs with limited numbers of measurements (1 or 2 measurements). This situation often occurs in practice when Bayesian estimation is performed. The optimal Bayesian designs that result vary with the variances of the parameter distribution and with the measurement error. The two-point optimal designs sometimes differ from the D-optimal designs for the mean of the prior distribution and may consist of replicating measurements. For the studied cases, the determinant of the Bayesian information matrix and its linearized form lead to the same optimal designs. In some cases, the pre-posterior covariance matrix can be far from its lower bound, namely, the inverse of the Bayesian information matrix, especially for the Emax model and a multiplicative measurement error. The expected information provided by the experiment and the determinant of the pre-posterior covariance matrix generally lead to the same designs except for the Emax model and the multiplicative measurement error. Results show that these criteria can be easily computed and that they could be incorporated in modules for designing experiments.

  6. Municipal mortality due to thyroid cancer in Spain

    PubMed Central

    Lope, Virginia; Pollán, Marina; Pérez-Gómez, Beatriz; Aragonés, Nuria; Ramis, Rebeca; Gómez-Barroso, Diana; López-Abente, Gonzalo

    2006-01-01

    Background Thyroid cancer is a tumor with a low but growing incidence in Spain. This study sought to depict its spatial municipal mortality pattern, using the classic model proposed by Besag, York and Mollié. Methods It was possible to compile and ascertain the posterior distribution of relative risk on the basis of a single Bayesian spatial model covering all of Spain's 8077 municipal areas. Maps were plotted depicting standardized mortality ratios, smoothed relative risk (RR) estimates, and the posterior probability that RR > 1. Results From 1989 to 1998 a total of 2,538 thyroid cancer deaths were registered in 1,041 municipalities. The highest relative risks were mostly situated in the Canary Islands, the province of Lugo, the east of La Coruña (Corunna) and western areas of Asturias and Orense. Conclusion The observed mortality pattern coincides with areas in Spain where goiter has been declared endemic. The higher frequency in these same areas of undifferentiated, more aggressive carcinomas could be reflected in the mortality figures. Other unknown genetic or environmental factors could also play a role in the etiology of this tumor. PMID:17173668

  7. Bayesian assessment of overtriage and undertriage at a level I trauma centre.

    PubMed

    DiDomenico, Paul B; Pietzsch, Jan B; Paté-Cornell, M Elisabeth

    2008-07-13

    We analysed the trauma triage system at a specific level I trauma centre to assess rates of over- and undertriage and to support recommendations for system improvements. The triage process is designed to estimate the severity of patient injury and allocate resources accordingly, with potential errors of overestimation (overtriage) consuming excess resources and underestimation (undertriage) potentially leading to medical errors.We first modelled the overall trauma system using risk analysis methods to understand interdependencies among the actions of the participants. We interviewed six experienced trauma surgeons to obtain their expert opinion of the over- and undertriage rates occurring in the trauma centre. We then assessed actual over- and undertriage rates in a random sample of 86 trauma cases collected over a six-week period at the same centre. We employed Bayesian analysis to quantitatively combine the data with the prior probabilities derived from expert opinion in order to obtain posterior distributions. The results were estimates of overtriage and undertriage in 16.1 and 4.9% of patients, respectively. This Bayesian approach, which provides a quantitative assessment of the error rates using both case data and expert opinion, provides a rational means of obtaining a best estimate of the system's performance. The overall approach that we describe in this paper can be employed more widely to analyse complex health care delivery systems, with the objective of reduced errors, patient risk and excess costs.

  8. Simple summation rule for optimal fixation selection in visual search.

    PubMed

    Najemnik, Jiri; Geisler, Wilson S

    2009-06-01

    When searching for a known target in a natural texture, practiced humans achieve near-optimal performance compared to a Bayesian ideal searcher constrained with the human map of target detectability across the visual field [Najemnik, J., & Geisler, W. S. (2005). Optimal eye movement strategies in visual search. Nature, 434, 387-391]. To do so, humans must be good at choosing where to fixate during the search [Najemnik, J., & Geisler, W.S. (2008). Eye movement statistics in humans are consistent with an optimal strategy. Journal of Vision, 8(3), 1-14. 4]; however, it seems unlikely that a biological nervous system would implement the computations for the Bayesian ideal fixation selection because of their complexity. Here we derive and test a simple heuristic for optimal fixation selection that appears to be a much better candidate for implementation within a biological nervous system. Specifically, we show that the near-optimal fixation location is the maximum of the current posterior probability distribution for target location after the distribution is filtered by (convolved with) the square of the retinotopic target detectability map. We term the model that uses this strategy the entropy limit minimization (ELM) searcher. We show that when constrained with human-like retinotopic map of target detectability and human search error rates, the ELM searcher performs as well as the Bayesian ideal searcher, and produces fixation statistics similar to human.

  9. Bayesian estimation of multicomponent relaxation parameters in magnetic resonance fingerprinting.

    PubMed

    McGivney, Debra; Deshmane, Anagha; Jiang, Yun; Ma, Dan; Badve, Chaitra; Sloan, Andrew; Gulani, Vikas; Griswold, Mark

    2018-07-01

    To estimate multiple components within a single voxel in magnetic resonance fingerprinting when the number and types of tissues comprising the voxel are not known a priori. Multiple tissue components within a single voxel are potentially separable with magnetic resonance fingerprinting as a result of differences in signal evolutions of each component. The Bayesian framework for inverse problems provides a natural and flexible setting for solving this problem when the tissue composition per voxel is unknown. Assuming that only a few entries from the dictionary contribute to a mixed signal, sparsity-promoting priors can be placed upon the solution. An iterative algorithm is applied to compute the maximum a posteriori estimator of the posterior probability density to determine the magnetic resonance fingerprinting dictionary entries that contribute most significantly to mixed or pure voxels. Simulation results show that the algorithm is robust in finding the component tissues of mixed voxels. Preliminary in vivo data confirm this result, and show good agreement in voxels containing pure tissue. The Bayesian framework and algorithm shown provide accurate solutions for the partial-volume problem in magnetic resonance fingerprinting. The flexibility of the method will allow further study into different priors and hyperpriors that can be applied in the model. Magn Reson Med 80:159-170, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.

  10. Uncertainty quantification for nuclear density functional theory and information content of new measurements.

    PubMed

    McDonnell, J D; Schunck, N; Higdon, D; Sarich, J; Wild, S M; Nazarewicz, W

    2015-03-27

    Statistical tools of uncertainty quantification can be used to assess the information content of measured observables with respect to present-day theoretical models, to estimate model errors and thereby improve predictive capability, to extrapolate beyond the regions reached by experiment, and to provide meaningful input to applications and planned measurements. To showcase new opportunities offered by such tools, we make a rigorous analysis of theoretical statistical uncertainties in nuclear density functional theory using Bayesian inference methods. By considering the recent mass measurements from the Canadian Penning Trap at Argonne National Laboratory, we demonstrate how the Bayesian analysis and a direct least-squares optimization, combined with high-performance computing, can be used to assess the information content of the new data with respect to a model based on the Skyrme energy density functional approach. Employing the posterior probability distribution computed with a Gaussian process emulator, we apply the Bayesian framework to propagate theoretical statistical uncertainties in predictions of nuclear masses, two-neutron dripline, and fission barriers. Overall, we find that the new mass measurements do not impose a constraint that is strong enough to lead to significant changes in the model parameters. The example discussed in this study sets the stage for quantifying and maximizing the impact of new measurements with respect to current modeling and guiding future experimental efforts, thus enhancing the experiment-theory cycle in the scientific method.

  11. Analysis of statistical and standard algorithms for detecting muscle onset with surface electromyography.

    PubMed

    Tenan, Matthew S; Tweedell, Andrew J; Haynes, Courtney A

    2017-01-01

    The timing of muscle activity is a commonly applied analytic method to understand how the nervous system controls movement. This study systematically evaluates six classes of standard and statistical algorithms to determine muscle onset in both experimental surface electromyography (EMG) and simulated EMG with a known onset time. Eighteen participants had EMG collected from the biceps brachii and vastus lateralis while performing a biceps curl or knee extension, respectively. Three established methods and three statistical methods for EMG onset were evaluated. Linear envelope, Teager-Kaiser energy operator + linear envelope and sample entropy were the established methods evaluated while general time series mean/variance, sequential and batch processing of parametric and nonparametric tools, and Bayesian changepoint analysis were the statistical techniques used. Visual EMG onset (experimental data) and objective EMG onset (simulated data) were compared with algorithmic EMG onset via root mean square error and linear regression models for stepwise elimination of inferior algorithms. The top algorithms for both data types were analyzed for their mean agreement with the gold standard onset and evaluation of 95% confidence intervals. The top algorithms were all Bayesian changepoint analysis iterations where the parameter of the prior (p0) was zero. The best performing Bayesian algorithms were p0 = 0 and a posterior probability for onset determination at 60-90%. While existing algorithms performed reasonably, the Bayesian changepoint analysis methodology provides greater reliability and accuracy when determining the singular onset of EMG activity in a time series. Further research is needed to determine if this class of algorithms perform equally well when the time series has multiple bursts of muscle activity.

  12. Scalable posterior approximations for large-scale Bayesian inverse problems via likelihood-informed parameter and state reduction

    NASA Astrophysics Data System (ADS)

    Cui, Tiangang; Marzouk, Youssef; Willcox, Karen

    2016-06-01

    Two major bottlenecks to the solution of large-scale Bayesian inverse problems are the scaling of posterior sampling algorithms to high-dimensional parameter spaces and the computational cost of forward model evaluations. Yet incomplete or noisy data, the state variation and parameter dependence of the forward model, and correlations in the prior collectively provide useful structure that can be exploited for dimension reduction in this setting-both in the parameter space of the inverse problem and in the state space of the forward model. To this end, we show how to jointly construct low-dimensional subspaces of the parameter space and the state space in order to accelerate the Bayesian solution of the inverse problem. As a byproduct of state dimension reduction, we also show how to identify low-dimensional subspaces of the data in problems with high-dimensional observations. These subspaces enable approximation of the posterior as a product of two factors: (i) a projection of the posterior onto a low-dimensional parameter subspace, wherein the original likelihood is replaced by an approximation involving a reduced model; and (ii) the marginal prior distribution on the high-dimensional complement of the parameter subspace. We present and compare several strategies for constructing these subspaces using only a limited number of forward and adjoint model simulations. The resulting posterior approximations can rapidly be characterized using standard sampling techniques, e.g., Markov chain Monte Carlo. Two numerical examples demonstrate the accuracy and efficiency of our approach: inversion of an integral equation in atmospheric remote sensing, where the data dimension is very high; and the inference of a heterogeneous transmissivity field in a groundwater system, which involves a partial differential equation forward model with high dimensional state and parameters.

  13. Bayesian Factor Analysis When Only a Sample Covariance Matrix Is Available

    ERIC Educational Resources Information Center

    Hayashi, Kentaro; Arav, Marina

    2006-01-01

    In traditional factor analysis, the variance-covariance matrix or the correlation matrix has often been a form of inputting data. In contrast, in Bayesian factor analysis, the entire data set is typically required to compute the posterior estimates, such as Bayes factor loadings and Bayes unique variances. We propose a simple method for computing…

  14. Three Insights from a Bayesian Interpretation of the One-Sided "P" Value

    ERIC Educational Resources Information Center

    Marsman, Maarten; Wagenmakers, Eric-Jan

    2017-01-01

    P values have been critiqued on several grounds but remain entrenched as the dominant inferential method in the empirical sciences. In this article, we elaborate on the fact that in many statistical models, the one-sided "P" value has a direct Bayesian interpretation as the approximate posterior mass for values lower than zero. The…

  15. Piéron’s Law and Optimal Behavior in Perceptual Decision-Making

    PubMed Central

    van Maanen, Leendert; Grasman, Raoul P. P. P.; Forstmann, Birte U.; Wagenmakers, Eric-Jan

    2012-01-01

    Piéron’s Law is a psychophysical regularity in signal detection tasks that states that mean response times decrease as a power function of stimulus intensity. In this article, we extend Piéron’s Law to perceptual two-choice decision-making tasks, and demonstrate that the law holds as the discriminability between two competing choices is manipulated, even though the stimulus intensity remains constant. This result is consistent with predictions from a Bayesian ideal observer model. The model assumes that in order to respond optimally in a two-choice decision-making task, participants continually update the posterior probability of each response alternative, until the probability of one alternative crosses a criterion value. In addition to predictions for two-choice decision-making tasks, we extend the ideal observer model to predict Piéron’s Law in signal detection tasks. We conclude that Piéron’s Law is a general phenomenon that may be caused by optimality constraints. PMID:22232572

  16. A Bayesian approach to microwave precipitation profile retrieval

    NASA Technical Reports Server (NTRS)

    Evans, K. Franklin; Turk, Joseph; Wong, Takmeng; Stephens, Graeme L.

    1995-01-01

    A multichannel passive microwave precipitation retrieval algorithm is developed. Bayes theorem is used to combine statistical information from numerical cloud models with forward radiative transfer modeling. A multivariate lognormal prior probability distribution contains the covariance information about hydrometeor distribution that resolves the nonuniqueness inherent in the inversion process. Hydrometeor profiles are retrieved by maximizing the posterior probability density for each vector of observations. The hydrometeor profile retrieval method is tested with data from the Advanced Microwave Precipitation Radiometer (10, 19, 37, and 85 GHz) of convection over ocean and land in Florida. The CP-2 multiparameter radar data are used to verify the retrieved profiles. The results show that the method can retrieve approximate hydrometeor profiles, with larger errors over land than water. There is considerably greater accuracy in the retrieval of integrated hydrometeor contents than of profiles. Many of the retrieval errors are traced to problems with the cloud model microphysical information, and future improvements to the algorithm are suggested.

  17. Selecting Summary Statistics in Approximate Bayesian Computation for Calibrating Stochastic Models

    PubMed Central

    Burr, Tom

    2013-01-01

    Approximate Bayesian computation (ABC) is an approach for using measurement data to calibrate stochastic computer models, which are common in biology applications. ABC is becoming the “go-to” option when the data and/or parameter dimension is large because it relies on user-chosen summary statistics rather than the full data and is therefore computationally feasible. One technical challenge with ABC is that the quality of the approximation to the posterior distribution of model parameters depends on the user-chosen summary statistics. In this paper, the user requirement to choose effective summary statistics in order to accurately estimate the posterior distribution of model parameters is investigated and illustrated by example, using a model and corresponding real data of mitochondrial DNA population dynamics. We show that for some choices of summary statistics, the posterior distribution of model parameters is closely approximated and for other choices of summary statistics, the posterior distribution is not closely approximated. A strategy to choose effective summary statistics is suggested in cases where the stochastic computer model can be run at many trial parameter settings, as in the example. PMID:24288668

  18. Selecting summary statistics in approximate Bayesian computation for calibrating stochastic models.

    PubMed

    Burr, Tom; Skurikhin, Alexei

    2013-01-01

    Approximate Bayesian computation (ABC) is an approach for using measurement data to calibrate stochastic computer models, which are common in biology applications. ABC is becoming the "go-to" option when the data and/or parameter dimension is large because it relies on user-chosen summary statistics rather than the full data and is therefore computationally feasible. One technical challenge with ABC is that the quality of the approximation to the posterior distribution of model parameters depends on the user-chosen summary statistics. In this paper, the user requirement to choose effective summary statistics in order to accurately estimate the posterior distribution of model parameters is investigated and illustrated by example, using a model and corresponding real data of mitochondrial DNA population dynamics. We show that for some choices of summary statistics, the posterior distribution of model parameters is closely approximated and for other choices of summary statistics, the posterior distribution is not closely approximated. A strategy to choose effective summary statistics is suggested in cases where the stochastic computer model can be run at many trial parameter settings, as in the example.

  19. Bayesian inference of uncertainties in precipitation-streamflow modeling in a snow affected catchment

    NASA Astrophysics Data System (ADS)

    Koskela, J. J.; Croke, B. W. F.; Koivusalo, H.; Jakeman, A. J.; Kokkonen, T.

    2012-11-01

    Bayesian inference is used to study the effect of precipitation and model structural uncertainty on estimates of model parameters and confidence limits of predictive variables in a conceptual rainfall-runoff model in the snow-fed Rudbäck catchment (142 ha) in southern Finland. The IHACRES model is coupled with a simple degree day model to account for snow accumulation and melt. The posterior probability distribution of the model parameters is sampled by using the Differential Evolution Adaptive Metropolis (DREAM(ZS)) algorithm and the generalized likelihood function. Precipitation uncertainty is taken into account by introducing additional latent variables that were used as multipliers for individual storm events. Results suggest that occasional snow water equivalent (SWE) observations together with daily streamflow observations do not contain enough information to simultaneously identify model parameters, precipitation uncertainty and model structural uncertainty in the Rudbäck catchment. The addition of an autoregressive component to account for model structure error and latent variables having uniform priors to account for input uncertainty lead to dubious posterior distributions of model parameters. Thus our hypothesis that informative priors for latent variables could be replaced by additional SWE data could not be confirmed. The model was found to work adequately in 1-day-ahead simulation mode, but the results were poor in the simulation batch mode. This was caused by the interaction of parameters that were used to describe different sources of uncertainty. The findings may have lessons for other cases where parameterizations are similarly high in relation to available prior information.

  20. Evaluating marginal likelihood with thermodynamic integration method and comparison with several other numerical methods

    DOE PAGES

    Liu, Peigui; Elshall, Ahmed S.; Ye, Ming; ...

    2016-02-05

    Evaluating marginal likelihood is the most critical and computationally expensive task, when conducting Bayesian model averaging to quantify parametric and model uncertainties. The evaluation is commonly done by using Laplace approximations to evaluate semianalytical expressions of the marginal likelihood or by using Monte Carlo (MC) methods to evaluate arithmetic or harmonic mean of a joint likelihood function. This study introduces a new MC method, i.e., thermodynamic integration, which has not been attempted in environmental modeling. Instead of using samples only from prior parameter space (as in arithmetic mean evaluation) or posterior parameter space (as in harmonic mean evaluation), the thermodynamicmore » integration method uses samples generated gradually from the prior to posterior parameter space. This is done through a path sampling that conducts Markov chain Monte Carlo simulation with different power coefficient values applied to the joint likelihood function. The thermodynamic integration method is evaluated using three analytical functions by comparing the method with two variants of the Laplace approximation method and three MC methods, including the nested sampling method that is recently introduced into environmental modeling. The thermodynamic integration method outperforms the other methods in terms of their accuracy, convergence, and consistency. The thermodynamic integration method is also applied to a synthetic case of groundwater modeling with four alternative models. The application shows that model probabilities obtained using the thermodynamic integration method improves predictive performance of Bayesian model averaging. As a result, the thermodynamic integration method is mathematically rigorous, and its MC implementation is computationally general for a wide range of environmental problems.« less

  1. Structural Information from Single-molecule FRET Experiments Using the Fast Nano-positioning System

    PubMed Central

    Röcker, Carlheinz; Nagy, Julia; Michaelis, Jens

    2017-01-01

    Single-molecule Förster Resonance Energy Transfer (smFRET) can be used to obtain structural information on biomolecular complexes in real-time. Thereby, multiple smFRET measurements are used to localize an unknown dye position inside a protein complex by means of trilateration. In order to obtain quantitative information, the Nano-Positioning System (NPS) uses probabilistic data analysis to combine structural information from X-ray crystallography with single-molecule fluorescence data to calculate not only the most probable position but the complete three-dimensional probability distribution, termed posterior, which indicates the experimental uncertainty. The concept was generalized for the analysis of smFRET networks containing numerous dye molecules. The latest version of NPS, Fast-NPS, features a new algorithm using Bayesian parameter estimation based on Markov Chain Monte Carlo sampling and parallel tempering that allows for the analysis of large smFRET networks in a comparably short time. Moreover, Fast-NPS allows the calculation of the posterior by choosing one of five different models for each dye, that account for the different spatial and orientational behavior exhibited by the dye molecules due to their local environment. Here we present a detailed protocol for obtaining smFRET data and applying the Fast-NPS. We provide detailed instructions for the acquisition of the three input parameters of Fast-NPS: the smFRET values, as well as the quantum yield and anisotropy of the dye molecules. Recently, the NPS has been used to elucidate the architecture of an archaeal open promotor complex. This data is used to demonstrate the influence of the five different dye models on the posterior distribution. PMID:28287526

  2. Structural Information from Single-molecule FRET Experiments Using the Fast Nano-positioning System.

    PubMed

    Dörfler, Thilo; Eilert, Tobias; Röcker, Carlheinz; Nagy, Julia; Michaelis, Jens

    2017-02-09

    Single-molecule Förster Resonance Energy Transfer (smFRET) can be used to obtain structural information on biomolecular complexes in real-time. Thereby, multiple smFRET measurements are used to localize an unknown dye position inside a protein complex by means of trilateration. In order to obtain quantitative information, the Nano-Positioning System (NPS) uses probabilistic data analysis to combine structural information from X-ray crystallography with single-molecule fluorescence data to calculate not only the most probable position but the complete three-dimensional probability distribution, termed posterior, which indicates the experimental uncertainty. The concept was generalized for the analysis of smFRET networks containing numerous dye molecules. The latest version of NPS, Fast-NPS, features a new algorithm using Bayesian parameter estimation based on Markov Chain Monte Carlo sampling and parallel tempering that allows for the analysis of large smFRET networks in a comparably short time. Moreover, Fast-NPS allows the calculation of the posterior by choosing one of five different models for each dye, that account for the different spatial and orientational behavior exhibited by the dye molecules due to their local environment. Here we present a detailed protocol for obtaining smFRET data and applying the Fast-NPS. We provide detailed instructions for the acquisition of the three input parameters of Fast-NPS: the smFRET values, as well as the quantum yield and anisotropy of the dye molecules. Recently, the NPS has been used to elucidate the architecture of an archaeal open promotor complex. This data is used to demonstrate the influence of the five different dye models on the posterior distribution.

  3. Bayesian analysis of time-series data under case-crossover designs: posterior equivalence and inference.

    PubMed

    Li, Shi; Mukherjee, Bhramar; Batterman, Stuart; Ghosh, Malay

    2013-12-01

    Case-crossover designs are widely used to study short-term exposure effects on the risk of acute adverse health events. While the frequentist literature on this topic is vast, there is no Bayesian work in this general area. The contribution of this paper is twofold. First, the paper establishes Bayesian equivalence results that require characterization of the set of priors under which the posterior distributions of the risk ratio parameters based on a case-crossover and time-series analysis are identical. Second, the paper studies inferential issues under case-crossover designs in a Bayesian framework. Traditionally, a conditional logistic regression is used for inference on risk-ratio parameters in case-crossover studies. We consider instead a more general full likelihood-based approach which makes less restrictive assumptions on the risk functions. Formulation of a full likelihood leads to growth in the number of parameters proportional to the sample size. We propose a semi-parametric Bayesian approach using a Dirichlet process prior to handle the random nuisance parameters that appear in a full likelihood formulation. We carry out a simulation study to compare the Bayesian methods based on full and conditional likelihood with the standard frequentist approaches for case-crossover and time-series analysis. The proposed methods are illustrated through the Detroit Asthma Morbidity, Air Quality and Traffic study, which examines the association between acute asthma risk and ambient air pollutant concentrations. © 2013, The International Biometric Society.

  4. Bayesian analyses of time-interval data for environmental radiation monitoring.

    PubMed

    Luo, Peng; Sharp, Julia L; DeVol, Timothy A

    2013-01-01

    Time-interval (time difference between two consecutive pulses) analysis based on the principles of Bayesian inference was investigated for online radiation monitoring. Using experimental and simulated data, Bayesian analysis of time-interval data [Bayesian (ti)] was compared with Bayesian and a conventional frequentist analysis of counts in a fixed count time [Bayesian (cnt) and single interval test (SIT), respectively]. The performances of the three methods were compared in terms of average run length (ARL) and detection probability for several simulated detection scenarios. Experimental data were acquired with a DGF-4C system in list mode. Simulated data were obtained using Monte Carlo techniques to obtain a random sampling of the Poisson distribution. All statistical algorithms were developed using the R Project for statistical computing. Bayesian analysis of time-interval information provided a similar detection probability as Bayesian analysis of count information, but the authors were able to make a decision with fewer pulses at relatively higher radiation levels. In addition, for the cases with very short presence of the source (< count time), time-interval information is more sensitive to detect a change than count information since the source data is averaged by the background data over the entire count time. The relationships of the source time, change points, and modifications to the Bayesian approach for increasing detection probability are presented.

  5. An ensemble-based dynamic Bayesian averaging approach for discharge simulations using multiple global precipitation products and hydrological models

    NASA Astrophysics Data System (ADS)

    Qi, Wei; Liu, Junguo; Yang, Hong; Sweetapple, Chris

    2018-03-01

    Global precipitation products are very important datasets in flow simulations, especially in poorly gauged regions. Uncertainties resulting from precipitation products, hydrological models and their combinations vary with time and data magnitude, and undermine their application to flow simulations. However, previous studies have not quantified these uncertainties individually and explicitly. This study developed an ensemble-based dynamic Bayesian averaging approach (e-Bay) for deterministic discharge simulations using multiple global precipitation products and hydrological models. In this approach, the joint probability of precipitation products and hydrological models being correct is quantified based on uncertainties in maximum and mean estimation, posterior probability is quantified as functions of the magnitude and timing of discharges, and the law of total probability is implemented to calculate expected discharges. Six global fine-resolution precipitation products and two hydrological models of different complexities are included in an illustrative application. e-Bay can effectively quantify uncertainties and therefore generate better deterministic discharges than traditional approaches (weighted average methods with equal and varying weights and maximum likelihood approach). The mean Nash-Sutcliffe Efficiency values of e-Bay are up to 0.97 and 0.85 in training and validation periods respectively, which are at least 0.06 and 0.13 higher than traditional approaches. In addition, with increased training data, assessment criteria values of e-Bay show smaller fluctuations than traditional approaches and its performance becomes outstanding. The proposed e-Bay approach bridges the gap between global precipitation products and their pragmatic applications to discharge simulations, and is beneficial to water resources management in ungauged or poorly gauged regions across the world.

  6. A Dirichlet-Multinomial Bayes Classifier for Disease Diagnosis with Microbial Compositions.

    PubMed

    Gao, Xiang; Lin, Huaiying; Dong, Qunfeng

    2017-01-01

    Dysbiosis of microbial communities is associated with various human diseases, raising the possibility of using microbial compositions as biomarkers for disease diagnosis. We have developed a Bayes classifier by modeling microbial compositions with Dirichlet-multinomial distributions, which are widely used to model multicategorical count data with extra variation. The parameters of the Dirichlet-multinomial distributions are estimated from training microbiome data sets based on maximum likelihood. The posterior probability of a microbiome sample belonging to a disease or healthy category is calculated based on Bayes' theorem, using the likelihood values computed from the estimated Dirichlet-multinomial distribution, as well as a prior probability estimated from the training microbiome data set or previously published information on disease prevalence. When tested on real-world microbiome data sets, our method, called DMBC (for Dirichlet-multinomial Bayes classifier), shows better classification accuracy than the only existing Bayesian microbiome classifier based on a Dirichlet-multinomial mixture model and the popular random forest method. The advantage of DMBC is its built-in automatic feature selection, capable of identifying a subset of microbial taxa with the best classification accuracy between different classes of samples based on cross-validation. This unique ability enables DMBC to maintain and even improve its accuracy at modeling species-level taxa. The R package for DMBC is freely available at https://github.com/qunfengdong/DMBC. IMPORTANCE By incorporating prior information on disease prevalence, Bayes classifiers have the potential to estimate disease probability better than other common machine-learning methods. Thus, it is important to develop Bayes classifiers specifically tailored for microbiome data. Our method shows higher classification accuracy than the only existing Bayesian classifier and the popular random forest method, and thus provides an alternative option for using microbial compositions for disease diagnosis.

  7. Bayesian Analysis of Item Response Curves. Research Report 84-1. Mathematical Sciences Technical Report No. 132.

    ERIC Educational Resources Information Center

    Tsutakawa, Robert K.; Lin, Hsin Ying

    Item response curves for a set of binary responses are studied from a Bayesian viewpoint of estimating the item parameters. For the two-parameter logistic model with normally distributed ability, restricted bivariate beta priors are used to illustrate the computation of the posterior mode via the EM algorithm. The procedure is illustrated by data…

  8. Bayesian Regression with Network Prior: Optimal Bayesian Filtering Perspective

    PubMed Central

    Qian, Xiaoning; Dougherty, Edward R.

    2017-01-01

    The recently introduced intrinsically Bayesian robust filter (IBRF) provides fully optimal filtering relative to a prior distribution over an uncertainty class ofjoint random process models, whereas formerly the theory was limited to model-constrained Bayesian robust filters, for which optimization was limited to the filters that are optimal for models in the uncertainty class. This paper extends the IBRF theory to the situation where there are both a prior on the uncertainty class and sample data. The result is optimal Bayesian filtering (OBF), where optimality is relative to the posterior distribution derived from the prior and the data. The IBRF theories for effective characteristics and canonical expansions extend to the OBF setting. A salient focus of the present work is to demonstrate the advantages of Bayesian regression within the OBF setting over the classical Bayesian approach in the context otlinear Gaussian models. PMID:28824268

  9. An introduction to using Bayesian linear regression with clinical data.

    PubMed

    Baldwin, Scott A; Larson, Michael J

    2017-11-01

    Statistical training psychology focuses on frequentist methods. Bayesian methods are an alternative to standard frequentist methods. This article provides researchers with an introduction to fundamental ideas in Bayesian modeling. We use data from an electroencephalogram (EEG) and anxiety study to illustrate Bayesian models. Specifically, the models examine the relationship between error-related negativity (ERN), a particular event-related potential, and trait anxiety. Methodological topics covered include: how to set up a regression model in a Bayesian framework, specifying priors, examining convergence of the model, visualizing and interpreting posterior distributions, interval estimates, expected and predicted values, and model comparison tools. We also discuss situations where Bayesian methods can outperform frequentist methods as well has how to specify more complicated regression models. Finally, we conclude with recommendations about reporting guidelines for those using Bayesian methods in their own research. We provide data and R code for replicating our analyses. Copyright © 2017 Elsevier Ltd. All rights reserved.

  10. Approximate Bayesian Computation by Subset Simulation using hierarchical state-space models

    NASA Astrophysics Data System (ADS)

    Vakilzadeh, Majid K.; Huang, Yong; Beck, James L.; Abrahamsson, Thomas

    2017-02-01

    A new multi-level Markov Chain Monte Carlo algorithm for Approximate Bayesian Computation, ABC-SubSim, has recently appeared that exploits the Subset Simulation method for efficient rare-event simulation. ABC-SubSim adaptively creates a nested decreasing sequence of data-approximating regions in the output space that correspond to increasingly closer approximations of the observed output vector in this output space. At each level, multiple samples of the model parameter vector are generated by a component-wise Metropolis algorithm so that the predicted output corresponding to each parameter value falls in the current data-approximating region. Theoretically, if continued to the limit, the sequence of data-approximating regions would converge on to the observed output vector and the approximate posterior distributions, which are conditional on the data-approximation region, would become exact, but this is not practically feasible. In this paper we study the performance of the ABC-SubSim algorithm for Bayesian updating of the parameters of dynamical systems using a general hierarchical state-space model. We note that the ABC methodology gives an approximate posterior distribution that actually corresponds to an exact posterior where a uniformly distributed combined measurement and modeling error is added. We also note that ABC algorithms have a problem with learning the uncertain error variances in a stochastic state-space model and so we treat them as nuisance parameters and analytically integrate them out of the posterior distribution. In addition, the statistical efficiency of the original ABC-SubSim algorithm is improved by developing a novel strategy to regulate the proposal variance for the component-wise Metropolis algorithm at each level. We demonstrate that Self-regulated ABC-SubSim is well suited for Bayesian system identification by first applying it successfully to model updating of a two degree-of-freedom linear structure for three cases: globally, locally and un-identifiable model classes, and then to model updating of a two degree-of-freedom nonlinear structure with Duffing nonlinearities in its interstory force-deflection relationship.

  11. Bayesian microsaccade detection

    PubMed Central

    Mihali, Andra; van Opheusden, Bas; Ma, Wei Ji

    2017-01-01

    Microsaccades are high-velocity fixational eye movements, with special roles in perception and cognition. The default microsaccade detection method is to determine when the smoothed eye velocity exceeds a threshold. We have developed a new method, Bayesian microsaccade detection (BMD), which performs inference based on a simple statistical model of eye positions. In this model, a hidden state variable changes between drift and microsaccade states at random times. The eye position is a biased random walk with different velocity distributions for each state. BMD generates samples from the posterior probability distribution over the eye state time series given the eye position time series. Applied to simulated data, BMD recovers the “true” microsaccades with fewer errors than alternative algorithms, especially at high noise. Applied to EyeLink eye tracker data, BMD detects almost all the microsaccades detected by the default method, but also apparent microsaccades embedded in high noise—although these can also be interpreted as false positives. Next we apply the algorithms to data collected with a Dual Purkinje Image eye tracker, whose higher precision justifies defining the inferred microsaccades as ground truth. When we add artificial measurement noise, the inferences of all algorithms degrade; however, at noise levels comparable to EyeLink data, BMD recovers the “true” microsaccades with 54% fewer errors than the default algorithm. Though unsuitable for online detection, BMD has other advantages: It returns probabilities rather than binary judgments, and it can be straightforwardly adapted as the generative model is refined. We make our algorithm available as a software package. PMID:28114483

  12. Multivariate Bayesian analysis of Gaussian, right censored Gaussian, ordered categorical and binary traits using Gibbs sampling

    PubMed Central

    Korsgaard, Inge Riis; Lund, Mogens Sandø; Sorensen, Daniel; Gianola, Daniel; Madsen, Per; Jensen, Just

    2003-01-01

    A fully Bayesian analysis using Gibbs sampling and data augmentation in a multivariate model of Gaussian, right censored, and grouped Gaussian traits is described. The grouped Gaussian traits are either ordered categorical traits (with more than two categories) or binary traits, where the grouping is determined via thresholds on the underlying Gaussian scale, the liability scale. Allowances are made for unequal models, unknown covariance matrices and missing data. Having outlined the theory, strategies for implementation are reviewed. These include joint sampling of location parameters; efficient sampling from the fully conditional posterior distribution of augmented data, a multivariate truncated normal distribution; and sampling from the conditional inverse Wishart distribution, the fully conditional posterior distribution of the residual covariance matrix. Finally, a simulated dataset was analysed to illustrate the methodology. This paper concentrates on a model where residuals associated with liabilities of the binary traits are assumed to be independent. A Bayesian analysis using Gibbs sampling is outlined for the model where this assumption is relaxed. PMID:12633531

  13. Inference of epidemiological parameters from household stratified data

    PubMed Central

    Walker, James N.; Ross, Joshua V.

    2017-01-01

    We consider a continuous-time Markov chain model of SIR disease dynamics with two levels of mixing. For this so-called stochastic households model, we provide two methods for inferring the model parameters—governing within-household transmission, recovery, and between-household transmission—from data of the day upon which each individual became infectious and the household in which each infection occurred, as might be available from First Few Hundred studies. Each method is a form of Bayesian Markov Chain Monte Carlo that allows us to calculate a joint posterior distribution for all parameters and hence the household reproduction number and the early growth rate of the epidemic. The first method performs exact Bayesian inference using a standard data-augmentation approach; the second performs approximate Bayesian inference based on a likelihood approximation derived from branching processes. These methods are compared for computational efficiency and posteriors from each are compared. The branching process is shown to be a good approximation and remains computationally efficient as the amount of data is increased. PMID:29045456

  14. Classical and Bayesian Seismic Yield Estimation: The 1998 Indian and Pakistani Tests

    NASA Astrophysics Data System (ADS)

    Shumway, R. H.

    2001-10-01

    - The nuclear tests in May, 1998, in India and Pakistan have stimulated a renewed interest in yield estimation, based on limited data from uncalibrated test sites. We study here the problem of estimating yields using classical and Bayesian methods developed by Shumway (1992), utilizing calibration data from the Semipalatinsk test site and measured magnitudes for the 1998 Indian and Pakistani tests given by Murphy (1998). Calibration is done using multivariate classical or Bayesian linear regression, depending on the availability of measured magnitude-yield data and prior information. Confidence intervals for the classical approach are derived applying an extension of Fieller's method suggested by Brown (1982). In the case where prior information is available, the posterior predictive magnitude densities are inverted to give posterior intervals for yield. Intervals obtained using the joint distribution of magnitudes are comparable to the single-magnitude estimates produced by Murphy (1998) and reinforce the conclusion that the announced yields of the Indian and Pakistani tests were too high.

  15. Classical and Bayesian Seismic Yield Estimation: The 1998 Indian and Pakistani Tests

    NASA Astrophysics Data System (ADS)

    Shumway, R. H.

    The nuclear tests in May, 1998, in India and Pakistan have stimulated a renewed interest in yield estimation, based on limited data from uncalibrated test sites. We study here the problem of estimating yields using classical and Bayesian methods developed by Shumway (1992), utilizing calibration data from the Semipalatinsk test site and measured magnitudes for the 1998 Indian and Pakistani tests given by Murphy (1998). Calibration is done using multivariate classical or Bayesian linear regression, depending on the availability of measured magnitude-yield data and prior information. Confidence intervals for the classical approach are derived applying an extension of Fieller's method suggested by Brown (1982). In the case where prior information is available, the posterior predictive magnitude densities are inverted to give posterior intervals for yield. Intervals obtained using the joint distribution of magnitudes are comparable to the single-magnitude estimates produced by Murphy (1998) and reinforce the conclusion that the announced yields of the Indian and Pakistani tests were too high.

  16. Validation of the thermal challenge problem using Bayesian Belief Networks.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McFarland, John; Swiler, Laura Painton

    The thermal challenge problem has been developed at Sandia National Laboratories as a testbed for demonstrating various types of validation approaches and prediction methods. This report discusses one particular methodology to assess the validity of a computational model given experimental data. This methodology is based on Bayesian Belief Networks (BBNs) and can incorporate uncertainty in experimental measurements, in physical quantities, and model uncertainties. The approach uses the prior and posterior distributions of model output to compute a validation metric based on Bayesian hypothesis testing (a Bayes' factor). This report discusses various aspects of the BBN, specifically in the context ofmore » the thermal challenge problem. A BBN is developed for a given set of experimental data in a particular experimental configuration. The development of the BBN and the method for ''solving'' the BBN to develop the posterior distribution of model output through Monte Carlo Markov Chain sampling is discussed in detail. The use of the BBN to compute a Bayes' factor is demonstrated.« less

  17. The Chandra Source Catalog: X-ray Aperture Photometry

    NASA Astrophysics Data System (ADS)

    Kashyap, Vinay; Primini, F. A.; Glotfelty, K. J.; Anderson, C. S.; Bonaventura, N. R.; Chen, J. C.; Davis, J. E.; Doe, S. M.; Evans, I. N.; Evans, J. D.; Fabbiano, G.; Galle, E.; Gibbs, D. G.; Grier, J. D.; Hain, R.; Hall, D. M.; Harbo, P. N.; He, X.; Houck, J. C.; Karovska, M.; Lauer, J.; McCollough, M. L.; McDowell, J. C.; Miller, J. B.; Mitschang, A. W.; Morgan, D. L.; Nichols, J. S.; Nowak, M. A.; Plummer, D. A.; Refsdal, B. L.; Rots, A. H.; Siemiginowska, A. L.; Sundheim, B. A.; Tibbetts, M. S.; Van Stone, D. W.; Winkelman, S. L.; Zografou, P.

    2009-01-01

    The Chandra Source Catalog represents a reanalysis of the entire ACIS and HRC imaging observations over the 9-year Chandra mission. Source detection is carried out on a uniform basis, using the CIAO tool wavdetect, and source fluxes are estimated post-facto using a Bayesian method that accounts for background, spatial resolution effects, and contamination from nearby sources. We use gamma-function prior distributions, which could be either non-informative, or in case there exist previous observations of the same source, strongly informative. The resulting posterior probability density functions allow us to report the flux and a robust credible range on it. We also determine limiting sensitivities at arbitrary locations in the field using the same formulation. This work was supported by CXC NASA contracts NAS8-39073 (VK) and NAS8-03060 (CSC).

  18. Uncertainty quantification in LES of channel flow

    DOE PAGES

    Safta, Cosmin; Blaylock, Myra; Templeton, Jeremy; ...

    2016-07-12

    Here, in this paper, we present a Bayesian framework for estimating joint densities for large eddy simulation (LES) sub-grid scale model parameters based on canonical forced isotropic turbulence direct numerical simulation (DNS) data. The framework accounts for noise in the independent variables, and we present alternative formulations for accounting for discrepancies between model and data. To generate probability densities for flow characteristics, posterior densities for sub-grid scale model parameters are propagated forward through LES of channel flow and compared with DNS data. Synthesis of the calibration and prediction results demonstrates that model parameters have an explicit filter width dependence andmore » are highly correlated. Discrepancies between DNS and calibrated LES results point to additional model form inadequacies that need to be accounted for.« less

  19. A feature-based developmental model of the infant brain in structural MRI.

    PubMed

    Toews, Matthew; Wells, William M; Zöllei, Lilla

    2012-01-01

    In this paper, anatomical development is modeled as a collection of distinctive image patterns localized in space and time. A Bayesian posterior probability is defined over a random variable of subject age, conditioned on data in the form of scale-invariant image features. The model is automatically learned from a large set of images exhibiting significant variation, used to discover anatomical structure related to age and development, and fit to new images to predict age. The model is applied to a set of 230 infant structural MRIs of 92 subjects acquired at multiple sites over an age range of 8-590 days. Experiments demonstrate that the model can be used to identify age-related anatomical structure, and to predict the age of new subjects with an average error of 72 days.

  20. Community Detection Algorithm Combining Stochastic Block Model and Attribute Data Clustering

    NASA Astrophysics Data System (ADS)

    Kataoka, Shun; Kobayashi, Takuto; Yasuda, Muneki; Tanaka, Kazuyuki

    2016-11-01

    We propose a new algorithm to detect the community structure in a network that utilizes both the network structure and vertex attribute data. Suppose we have the network structure together with the vertex attribute data, that is, the information assigned to each vertex associated with the community to which it belongs. The problem addressed this paper is the detection of the community structure from the information of both the network structure and the vertex attribute data. Our approach is based on the Bayesian approach that models the posterior probability distribution of the community labels. The detection of the community structure in our method is achieved by using belief propagation and an EM algorithm. We numerically verified the performance of our method using computer-generated networks and real-world networks.

  1. Soft sensor modeling based on variable partition ensemble method for nonlinear batch processes

    NASA Astrophysics Data System (ADS)

    Wang, Li; Chen, Xiangguang; Yang, Kai; Jin, Huaiping

    2017-01-01

    Batch processes are always characterized by nonlinear and system uncertain properties, therefore, the conventional single model may be ill-suited. A local learning strategy soft sensor based on variable partition ensemble method is developed for the quality prediction of nonlinear and non-Gaussian batch processes. A set of input variable sets are obtained by bootstrapping and PMI criterion. Then, multiple local GPR models are developed based on each local input variable set. When a new test data is coming, the posterior probability of each best performance local model is estimated based on Bayesian inference and used to combine these local GPR models to get the final prediction result. The proposed soft sensor is demonstrated by applying to an industrial fed-batch chlortetracycline fermentation process.

  2. Directional data analysis under the general projected normal distribution

    PubMed Central

    Wang, Fangpo; Gelfand, Alan E.

    2013-01-01

    The projected normal distribution is an under-utilized model for explaining directional data. In particular, the general version provides flexibility, e.g., asymmetry and possible bimodality along with convenient regression specification. Here, we clarify the properties of this general class. We also develop fully Bayesian hierarchical models for analyzing circular data using this class. We show how they can be fit using MCMC methods with suitable latent variables. We show how posterior inference for distributional features such as the angular mean direction and concentration can be implemented as well as how prediction within the regression setting can be handled. With regard to model comparison, we argue for an out-of-sample approach using both a predictive likelihood scoring loss criterion and a cumulative rank probability score criterion. PMID:24046539

  3. The estimation of tree posterior probabilities using conditional clade probability distributions.

    PubMed

    Larget, Bret

    2013-07-01

    In this article I introduce the idea of conditional independence of separated subtrees as a principle by which to estimate the posterior probability of trees using conditional clade probability distributions rather than simple sample relative frequencies. I describe an algorithm for these calculations and software which implements these ideas. I show that these alternative calculations are very similar to simple sample relative frequencies for high probability trees but are substantially more accurate for relatively low probability trees. The method allows the posterior probability of unsampled trees to be calculated when these trees contain only clades that are in other sampled trees. Furthermore, the method can be used to estimate the total probability of the set of sampled trees which provides a measure of the thoroughness of a posterior sample.

  4. Bayesian Inference in the Modern Design of Experiments

    NASA Technical Reports Server (NTRS)

    DeLoach, Richard

    2008-01-01

    This paper provides an elementary tutorial overview of Bayesian inference and its potential for application in aerospace experimentation in general and wind tunnel testing in particular. Bayes Theorem is reviewed and examples are provided to illustrate how it can be applied to objectively revise prior knowledge by incorporating insights subsequently obtained from additional observations, resulting in new (posterior) knowledge that combines information from both sources. A logical merger of Bayesian methods and certain aspects of Response Surface Modeling is explored. Specific applications to wind tunnel testing, computational code validation, and instrumentation calibration are discussed.

  5. Part 2. Development of Enhanced Statistical Methods for Assessing Health Effects Associated with an Unknown Number of Major Sources of Multiple Air Pollutants.

    PubMed

    Park, Eun Sug; Symanski, Elaine; Han, Daikwon; Spiegelman, Clifford

    2015-06-01

    A major difficulty with assessing source-specific health effects is that source-specific exposures cannot be measured directly; rather, they need to be estimated by a source-apportionment method such as multivariate receptor modeling. The uncertainty in source apportionment (uncertainty in source-specific exposure estimates and model uncertainty due to the unknown number of sources and identifiability conditions) has been largely ignored in previous studies. Also, spatial dependence of multipollutant data collected from multiple monitoring sites has not yet been incorporated into multivariate receptor modeling. The objectives of this project are (1) to develop a multipollutant approach that incorporates both sources of uncertainty in source-apportionment into the assessment of source-specific health effects and (2) to develop enhanced multivariate receptor models that can account for spatial correlations in the multipollutant data collected from multiple sites. We employed a Bayesian hierarchical modeling framework consisting of multivariate receptor models, health-effects models, and a hierarchical model on latent source contributions. For the health model, we focused on the time-series design in this project. Each combination of number of sources and identifiability conditions (additional constraints on model parameters) defines a different model. We built a set of plausible models with extensive exploratory data analyses and with information from previous studies, and then computed posterior model probability to estimate model uncertainty. Parameter estimation and model uncertainty estimation were implemented simultaneously by Markov chain Monte Carlo (MCMC*) methods. We validated the methods using simulated data. We illustrated the methods using PM2.5 (particulate matter ≤ 2.5 μm in aerodynamic diameter) speciation data and mortality data from Phoenix, Arizona, and Houston, Texas. The Phoenix data included counts of cardiovascular deaths and daily PM2.5 speciation data from 1995-1997. The Houston data included respiratory mortality data and 24-hour PM2.5 speciation data sampled every six days from a region near the Houston Ship Channel in years 2002-2005. We also developed a Bayesian spatial multivariate receptor modeling approach that, while simultaneously dealing with the unknown number of sources and identifiability conditions, incorporated spatial correlations in the multipollutant data collected from multiple sites into the estimation of source profiles and contributions based on the discrete process convolution model for multivariate spatial processes. This new modeling approach was applied to 24-hour ambient air concentrations of 17 volatile organic compounds (VOCs) measured at nine monitoring sites in Harris County, Texas, during years 2000 to 2005. Simulation results indicated that our methods were accurate in identifying the true model and estimated parameters were close to the true values. The results from our methods agreed in general with previous studies on the source apportionment of the Phoenix data in terms of estimated source profiles and contributions. However, we had a greater number of statistically insignificant findings, which was likely a natural consequence of incorporating uncertainty in the estimated source contributions into the health-effects parameter estimation. For the Houston data, a model with five sources (that seemed to be Sulfate-Rich Secondary Aerosol, Motor Vehicles, Industrial Combustion, Soil/Crustal Matter, and Sea Salt) showed the highest posterior model probability among the candidate models considered when fitted simultaneously to the PM2.5 and mortality data. There was a statistically significant positive association between respiratory mortality and same-day PM2.5 concentrations attributed to one of the sources (probably industrial combustion). The Bayesian spatial multivariate receptor modeling approach applied to the VOC data led to a highest posterior model probability for a model with five sources (that seemed to be refinery, petrochemical production, gasoline evaporation, natural gas, and vehicular exhaust) among several candidate models, with the number of sources varying between three and seven and with different identifiability conditions. Our multipollutant approach assessing source-specific health effects is more advantageous than a single-pollutant approach in that it can estimate total health effects from multiple pollutants and can also identify emission sources that are responsible for adverse health effects. Our Bayesian approach can incorporate not only uncertainty in the estimated source contributions, but also model uncertainty that has not been addressed in previous studies on assessing source-specific health effects. The new Bayesian spatial multivariate receptor modeling approach enables predictions of source contributions at unmonitored sites, minimizing exposure misclassification and providing improved exposure estimates along with their uncertainty estimates, as well as accounting for uncertainty in the number of sources and identifiability conditions.

  6. Effect of Therapeutic Hypothermia Initiated After 6 Hours of Age on Death or Disability Among Newborns With Hypoxic-Ischemic Encephalopathy: A Randomized Clinical Trial.

    PubMed

    Laptook, Abbot R; Shankaran, Seetha; Tyson, Jon E; Munoz, Breda; Bell, Edward F; Goldberg, Ronald N; Parikh, Nehal A; Ambalavanan, Namasivayam; Pedroza, Claudia; Pappas, Athina; Das, Abhik; Chaudhary, Aasma S; Ehrenkranz, Richard A; Hensman, Angelita M; Van Meurs, Krisa P; Chalak, Lina F; Khan, Amir M; Hamrick, Shannon E G; Sokol, Gregory M; Walsh, Michele C; Poindexter, Brenda B; Faix, Roger G; Watterberg, Kristi L; Frantz, Ivan D; Guillet, Ronnie; Devaskar, Uday; Truog, William E; Chock, Valerie Y; Wyckoff, Myra H; McGowan, Elisabeth C; Carlton, David P; Harmon, Heidi M; Brumbaugh, Jane E; Cotten, C Michael; Sánchez, Pablo J; Hibbs, Anna Maria; Higgins, Rosemary D

    2017-10-24

    Hypothermia initiated at less than 6 hours after birth reduces death or disability for infants with hypoxic-ischemic encephalopathy at 36 weeks' or later gestation. To our knowledge, hypothermia trials have not been performed in infants presenting after 6 hours. To estimate the probability that hypothermia initiated at 6 to 24 hours after birth reduces the risk of death or disability at 18 months among infants with hypoxic-ischemic encephalopathy. A randomized clinical trial was conducted between April 2008 and June 2016 among infants at 36 weeks' or later gestation with moderate or severe hypoxic-ischemic encephalopathy enrolled at 6 to 24 hours after birth. Twenty-one US Neonatal Research Network centers participated. Bayesian analyses were prespecified given the anticipated limited sample size. Targeted esophageal temperature was used in 168 infants. Eighty-three hypothermic infants were maintained at 33.5°C (acceptable range, 33°C-34°C) for 96 hours and then rewarmed. Eighty-five noncooled infants were maintained at 37.0°C (acceptable range, 36.5°C-37.3°C). The composite of death or disability (moderate or severe) at 18 to 22 months adjusted for level of encephalopathy and age at randomization. Hypothermic and noncooled infants were term (mean [SD], 39 [2] and 39 [1] weeks' gestation, respectively), and 47 of 83 (57%) and 55 of 85 (65%) were male, respectively. Both groups were acidemic at birth, predominantly transferred to the treating center with moderate encephalopathy, and were randomized at a mean (SD) of 16 (5) and 15 (5) hours for hypothermic and noncooled groups, respectively. The primary outcome occurred in 19 of 78 hypothermic infants (24.4%) and 22 of 79 noncooled infants (27.9%) (absolute difference, 3.5%; 95% CI, -1% to 17%). Bayesian analysis using a neutral prior indicated a 76% posterior probability of reduced death or disability with hypothermia relative to the noncooled group (adjusted posterior risk ratio, 0.86; 95% credible interval, 0.58-1.29). The probability that death or disability in cooled infants was at least 1%, 2%, or 3% less than noncooled infants was 71%, 64%, and 56%, respectively. Among term infants with hypoxic-ischemic encephalopathy, hypothermia initiated at 6 to 24 hours after birth compared with noncooling resulted in a 76% probability of any reduction in death or disability, and a 64% probability of at least 2% less death or disability at 18 to 22 months. Hypothermia initiated at 6 to 24 hours after birth may have benefit but there is uncertainty in its effectiveness. clinicaltrials.gov Identifier: NCT00614744.

  7. Effect of Therapeutic Hypothermia Initiated After 6 Hours of Age on Death or Disability Among Newborns With Hypoxic-Ischemic Encephalopathy

    PubMed Central

    Laptook, Abbot R.; Shankaran, Seetha; Tyson, Jon E.; Munoz, Breda; Bell, Edward F.; Goldberg, Ronald N.; Parikh, Nehal A.; Ambalavanan, Namasivayam; Pedroza, Claudia; Pappas, Athina; Das, Abhik; Chaudhary, Aasma S.; Ehrenkranz, Richard A.; Hensman, Angelita M.; Van Meurs, Krisa P.; Chalak, Lina F.; Hamrick, Shannon E. G.; Sokol, Gregory M.; Walsh, Michele C.; Poindexter, Brenda B.; Faix, Roger G.; Watterberg, Kristi L.; Frantz, Ivan D.; Guillet, Ronnie; Devaskar, Uday; Truog, William E.; Chock, Valerie Y.; Wyckoff, Myra H.; McGowan, Elisabeth C.; Carlton, David P.; Harmon, Heidi M.; Brumbaugh, Jane E.; Cotten, C. Michael; Sánchez, Pablo J.; Hibbs, Anna Maria; Higgins, Rosemary D.

    2018-01-01

    IMPORTANCE Hypothermia initiated at less than 6 hours after birth reduces death or disability for infants with hypoxic-ischemic encephalopathy at 36 weeks’ or later gestation. To our knowledge, hypothermia trials have not been performed in infants presenting after 6 hours. OBJECTIVE To estimate the probability that hypothermia initiated at 6 to 24 hours after birth reduces the risk of death or disability at 18 months among infants with hypoxic-ischemic encephalopathy. DESIGN, SETTING, AND PARTICIPANTS A randomized clinical trial was conducted between April 2008 and June 2016 among infants at 36 weeks’ or later gestation with moderate or severe hypoxic-ischemic encephalopathy enrolled at 6 to 24 hours after birth. Twenty-one US Neonatal Research Network centers participated. Bayesian analyses were prespecified given the anticipated limited sample size. INTERVENTIONS Targeted esophageal temperature was used in 168 infants. Eighty-three hypothermic infants were maintained at 33.5°C (acceptable range, 33°C–34°C) for 96 hours and then rewarmed. Eighty-five noncooled infants were maintained at 37.0°C (acceptable range, 36.5°C–37.3°C). MAIN OUTCOMES AND MEASURES The composite of death or disability (moderate or severe) at 18 to 22 months adjusted for level of encephalopathy and age at randomization. RESULTS Hypothermic and noncooled infants were term (mean [SD], 39 [2] and 39 [1] weeks’ gestation, respectively), and 47 of 83 (57%) and 55 of 85 (65%) were male, respectively. Both groups were acidemic at birth, predominantly transferred to the treating center with moderate encephalopathy, and were randomized at a mean (SD) of 16 (5) and 15 (5) hours for hypothermic and noncooled groups, respectively. The primary outcome occurred in 19 of 78 hypothermic infants (24.4%) and 22 of 79 noncooled infants (27.9%) (absolute difference, 3.5%; 95% CI, −1% to 17%). Bayesian analysis using a neutral prior indicated a 76% posterior probability of reduced death or disability with hypothermia relative to the noncooled group (adjusted posterior risk ratio, 0.86; 95% credible interval, 0.58–1.29). The probability that death or disability in cooled infants was at least 1%, 2%, or 3% less than noncooled infants was 71%, 64%, and 56%, respectively. CONCLUSIONS AND RELEVANCE Among term infants with hypoxic-ischemic encephalopathy, hypothermia initiated at 6 to 24 hours after birth compared with noncooling resulted in a 76% probability of any reduction in death or disability, and a 64% probability of at least 2% less death or disability at 18 to 22 months. Hypothermia initiated at 6 to 24 hours after birth may have benefit but there is uncertainty in its effectiveness. TRIAL REGISTRATION clinicaltrials.gov Identifier: NCT00614744 PMID:29067428

  8. Validation of Bayesian analysis of compartmental kinetic models in medical imaging.

    PubMed

    Sitek, Arkadiusz; Li, Quanzheng; El Fakhri, Georges; Alpert, Nathaniel M

    2016-10-01

    Kinetic compartmental analysis is frequently used to compute physiologically relevant quantitative values from time series of images. In this paper, a new approach based on Bayesian analysis to obtain information about these parameters is presented and validated. The closed-form of the posterior distribution of kinetic parameters is derived with a hierarchical prior to model the standard deviation of normally distributed noise. Markov chain Monte Carlo methods are used for numerical estimation of the posterior distribution. Computer simulations of the kinetics of F18-fluorodeoxyglucose (FDG) are used to demonstrate drawing statistical inferences about kinetic parameters and to validate the theory and implementation. Additionally, point estimates of kinetic parameters and covariance of those estimates are determined using the classical non-linear least squares approach. Posteriors obtained using methods proposed in this work are accurate as no significant deviation from the expected shape of the posterior was found (one-sided P>0.08). It is demonstrated that the results obtained by the standard non-linear least-square methods fail to provide accurate estimation of uncertainty for the same data set (P<0.0001). The results of this work validate new methods for a computer simulations of FDG kinetics. Results show that in situations where the classical approach fails in accurate estimation of uncertainty, Bayesian estimation provides an accurate information about the uncertainties in the parameters. Although a particular example of FDG kinetics was used in the paper, the methods can be extended for different pharmaceuticals and imaging modalities. Copyright © 2016 Associazione Italiana di Fisica Medica. Published by Elsevier Ltd. All rights reserved.

  9. Bayes to the Rescue: Continuous Positive Airway Pressure Has Less Mortality Than High-Flow Oxygen.

    PubMed

    Modesto I Alapont, Vicent; Khemani, Robinder G; Medina, Alberto; Del Villar Guerra, Pablo; Molina Cambra, Alfred

    2017-02-01

    The merits of high-flow nasal cannula oxygen versus bubble continuous positive airway pressure are debated in children with pneumonia, with suggestions that randomized controlled trials are needed. In light of a previous randomized controlled trial showing a trend for lower mortality with bubble continuous positive airway pressure, we sought to determine the probability that a new randomized controlled trial would find high-flow nasal cannula oxygen superior to bubble continuous positive airway pressure through a "robust" Bayesian analysis. Sample data were extracted from the trial by Chisti et al, and requisite to "robust" Bayesian analysis, we specified three prior distributions to represent clinically meaningful assumptions. These priors (reference, pessimistic, and optimistic) were used to generate three scenarios to represent the range of possible hypotheses. 1) "Reference": we believe bubble continuous positive airway pressure and high-flow nasal cannula oxygen are equally effective with the same uninformative reference priors; 2) "Sceptic on high-flow nasal cannula oxygen": we believe that bubble continuous positive airway pressure is better than high-flow nasal cannula oxygen (bubble continuous positive airway pressure has an optimistic prior and high-flow nasal cannula oxygen has a pessimistic prior); and 3) "Enthusiastic on high-flow nasal cannula oxygen": we believe that high-flow nasal cannula oxygen is better than bubble continuous positive airway pressure (high-flow nasal cannula oxygen has an optimistic prior and bubble continuous positive airway pressure has a pessimistic prior). Finally, posterior empiric Bayesian distributions were obtained through 100,000 Markov Chain Monte Carlo simulations. In all three scenarios, there was a high probability for more death from high-flow nasal cannula oxygen compared with bubble continuous positive airway pressure (reference, 0.98; sceptic on high-flow nasal cannula oxygen, 0.982; enthusiastic on high-flow nasal cannula oxygen, 0.742). The posterior 95% credible interval on the difference in mortality identified a future randomized controlled trial would be extremely unlikely to find a mortality benefit for high-flow nasal cannula oxygen over bubble continuous positive airway pressure, regardless of the scenario. Interpreting these findings using the "range of practical equivalence" framework would recommend rejecting the hypothesis that high-flow nasal cannula oxygen is superior to bubble continuous positive airway pressure for these children. For children younger than 5 years with pneumonia, high-flow nasal cannula oxygen has higher mortality than bubble continuous positive airway pressure. A future randomized controlled trial in this population is unlikely to find high-flow nasal cannula oxygen superior to bubble continuous positive airway pressure.

  10. Utility-based designs for randomized comparative trials with categorical outcomes

    PubMed Central

    Murray, Thomas A.; Thall, Peter F.; Yuan, Ying

    2016-01-01

    A general utility-based testing methodology for design and conduct of randomized comparative clinical trials with categorical outcomes is presented. Numerical utilities of all elementary events are elicited to quantify their desirabilities. These numerical values are used to map the categorical outcome probability vector of each treatment to a mean utility, which is used as a one-dimensional criterion for constructing comparative tests. Bayesian tests are presented, including fixed sample and group sequential procedures, assuming Dirichlet-multinomial models for the priors and likelihoods. Guidelines are provided for establishing priors, eliciting utilities, and specifying hypotheses. Efficient posterior computation is discussed, and algorithms are provided for jointly calibrating test cutoffs and sample size to control overall type I error and achieve specified power. Asymptotic approximations for the power curve are used to initialize the algorithms. The methodology is applied to re-design a completed trial that compared two chemotherapy regimens for chronic lymphocytic leukemia, in which an ordinal efficacy outcome was dichotomized and toxicity was ignored to construct the trial’s design. The Bayesian tests also are illustrated by several types of categorical outcomes arising in common clinical settings. Freely available computer software for implementation is provided. PMID:27189672

  11. Spatiotemporal hurdle models for zero-inflated count data: Exploring trends in emergency department visits.

    PubMed

    Neelon, Brian; Chang, Howard H; Ling, Qiang; Hastings, Nicole S

    2016-12-01

    Motivated by a study exploring spatiotemporal trends in emergency department use, we develop a class of two-part hurdle models for the analysis of zero-inflated areal count data. The models consist of two components-one for the probability of any emergency department use and one for the number of emergency department visits given use. Through a hierarchical structure, the models incorporate both patient- and region-level predictors, as well as spatially and temporally correlated random effects for each model component. The random effects are assigned multivariate conditionally autoregressive priors, which induce dependence between the components and provide spatial and temporal smoothing across adjacent spatial units and time periods, resulting in improved inferences. To accommodate potential overdispersion, we consider a range of parametric specifications for the positive counts, including truncated negative binomial and generalized Poisson distributions. We adopt a Bayesian inferential approach, and posterior computation is handled conveniently within standard Bayesian software. Our results indicate that the negative binomial and generalized Poisson hurdle models vastly outperform the Poisson hurdle model, demonstrating that overdispersed hurdle models provide a useful approach to analyzing zero-inflated spatiotemporal data. © The Author(s) 2014.

  12. Bayesian inference of metal oxide ultrathin film structure based on crystal truncation rod measurements

    PubMed Central

    Anada, Masato; Nakanishi-Ohno, Yoshinori; Okada, Masato; Kimura, Tsuyoshi; Wakabayashi, Yusuke

    2017-01-01

    Monte Carlo (MC)-based refinement software to analyze the atomic arrangements of perovskite oxide ultrathin films from the crystal truncation rod intensity is developed on the basis of Bayesian inference. The advantages of the MC approach are (i) it is applicable to multi-domain structures, (ii) it provides the posterior probability of structures through Bayes’ theorem, which allows one to evaluate the uncertainty of estimated structural parameters, and (iii) one can involve any information provided by other experiments and theories. The simulated annealing procedure efficiently searches for the optimum model owing to its stochastic updates, regardless of the initial values, without being trapped by local optima. The performance of the software is examined with a five-unit-cell-thick LaAlO3 film fabricated on top of SrTiO3. The software successfully found the global optima from an initial model prepared by a small grid search calculation. The standard deviations of the atomic positions derived from a dataset taken at a second-generation synchrotron are ±0.02 Å for metal sites and ±0.03 Å for oxygen sites. PMID:29217989

  13. A note on the efficiencies of sampling strategies in two-stage Bayesian regional fine mapping of a quantitative trait.

    PubMed

    Chen, Zhijian; Craiu, Radu V; Bull, Shelley B

    2014-11-01

    In focused studies designed to follow up associations detected in a genome-wide association study (GWAS), investigators can proceed to fine-map a genomic region by targeted sequencing or dense genotyping of all variants in the region, aiming to identify a functional sequence variant. For the analysis of a quantitative trait, we consider a Bayesian approach to fine-mapping study design that incorporates stratification according to a promising GWAS tag SNP in the same region. Improved cost-efficiency can be achieved when the fine-mapping phase incorporates a two-stage design, with identification of a smaller set of more promising variants in a subsample taken in stage 1, followed by their evaluation in an independent stage 2 subsample. To avoid the potential negative impact of genetic model misspecification on inference we incorporate genetic model selection based on posterior probabilities for each competing model. Our simulation study shows that, compared to simple random sampling that ignores genetic information from GWAS, tag-SNP-based stratified sample allocation methods reduce the number of variants continuing to stage 2 and are more likely to promote the functional sequence variant into confirmation studies. © 2014 WILEY PERIODICALS, INC.

  14. Bayesian Mapping Reveals That Attention Boosts Neural Responses to Predicted and Unpredicted Stimuli.

    PubMed

    Garrido, Marta I; Rowe, Elise G; Halász, Veronika; Mattingley, Jason B

    2018-05-01

    Predictive coding posits that the human brain continually monitors the environment for regularities and detects inconsistencies. It is unclear, however, what effect attention has on expectation processes, as there have been relatively few studies and the results of these have yielded contradictory findings. Here, we employed Bayesian model comparison to adjudicate between 2 alternative computational models. The "Opposition" model states that attention boosts neural responses equally to predicted and unpredicted stimuli, whereas the "Interaction" model assumes that attentional boosting of neural signals depends on the level of predictability. We designed a novel, audiospatial attention task that orthogonally manipulated attention and prediction by playing oddball sequences in either the attended or unattended ear. We observed sensory prediction error responses, with electroencephalography, across all attentional manipulations. Crucially, posterior probability maps revealed that, overall, the Opposition model better explained scalp and source data, suggesting that attention boosts responses to predicted and unpredicted stimuli equally. Furthermore, Dynamic Causal Modeling showed that these Opposition effects were expressed in plastic changes within the mismatch negativity network. Our findings provide empirical evidence for a computational model of the opposing interplay of attention and expectation in the brain.

  15. Significance testing - are we ready yet to abandon its use?

    PubMed

    The, Bertram

    2011-11-01

    Understanding of the damaging effects of significance testing has steadily grown. Reporting p values without dichotomizing the result to be significant or not, is not the solution. Confidence intervals are better, but are troubled by a non-intuitive interpretation, and are often misused just to see whether the null value lies within the interval. Bayesian statistics provide an alternative which solves most of these problems. Although criticized for relying on subjective models, the interpretation of a Bayesian posterior probability is more intuitive than the interpretation of a p value, and seems to be closest to intuitive patterns of human decision making. Another alternative could be using confidence interval functions (or p value functions) to display a continuum of intervals at different levels of confidence around a point estimate. Thus, better alternatives to significance testing exist. The reluctance to abandon this practice might be both preference of clinging to old habits as well as the unfamiliarity with better methods. Authors might question if using less commonly exercised, though superior, techniques will be well received by the editors, reviewers and the readership. A joint effort will be needed to abandon significance testing in clinical research in the future.

  16. Stan : A Probabilistic Programming Language

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Carpenter, Bob; Gelman, Andrew; Hoffman, Matthew D.

    Stan is a probabilistic programming language for specifying statistical models. A Stan program imperatively defines a log probability function over parameters conditioned on specified data and constants. As of version 2.14.0, Stan provides full Bayesian inference for continuous-variable models through Markov chain Monte Carlo methods such as the No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Penalized maximum likelihood estimates are calculated using optimization methods such as the limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm. Stan is also a platform for computing log densities and their gradients and Hessians, which can be used in alternative algorithms such as variational Bayes, expectationmore » propagation, and marginal inference using approximate integration. To this end, Stan is set up so that the densities, gradients, and Hessians, along with intermediate quantities of the algorithm such as acceptance probabilities, are easily accessible. Stan can also be called from the command line using the cmdstan package, through R using the rstan package, and through Python using the pystan package. All three interfaces support sampling and optimization-based inference with diagnostics and posterior analysis. rstan and pystan also provide access to log probabilities, gradients, Hessians, parameter transforms, and specialized plotting.« less

  17. Stan : A Probabilistic Programming Language

    DOE PAGES

    Carpenter, Bob; Gelman, Andrew; Hoffman, Matthew D.; ...

    2017-01-01

    Stan is a probabilistic programming language for specifying statistical models. A Stan program imperatively defines a log probability function over parameters conditioned on specified data and constants. As of version 2.14.0, Stan provides full Bayesian inference for continuous-variable models through Markov chain Monte Carlo methods such as the No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Penalized maximum likelihood estimates are calculated using optimization methods such as the limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm. Stan is also a platform for computing log densities and their gradients and Hessians, which can be used in alternative algorithms such as variational Bayes, expectationmore » propagation, and marginal inference using approximate integration. To this end, Stan is set up so that the densities, gradients, and Hessians, along with intermediate quantities of the algorithm such as acceptance probabilities, are easily accessible. Stan can also be called from the command line using the cmdstan package, through R using the rstan package, and through Python using the pystan package. All three interfaces support sampling and optimization-based inference with diagnostics and posterior analysis. rstan and pystan also provide access to log probabilities, gradients, Hessians, parameter transforms, and specialized plotting.« less

  18. Sex differences in the development of neuroanatomical functional connectivity underlying intelligence found using Bayesian connectivity analysis.

    PubMed

    Schmithorst, Vincent J; Holland, Scott K

    2007-03-01

    A Bayesian method for functional connectivity analysis was adapted to investigate between-group differences. This method was applied in a large cohort of almost 300 children to investigate differences in boys and girls in the relationship between intelligence and functional connectivity for the task of narrative comprehension. For boys, a greater association was shown between intelligence and the functional connectivity linking Broca's area to auditory processing areas, including Wernicke's areas and the right posterior superior temporal gyrus. For girls, a greater association was shown between intelligence and the functional connectivity linking the left posterior superior temporal gyrus to Wernicke's areas bilaterally. A developmental effect was also seen, with girls displaying a positive correlation with age in the association between intelligence and the functional connectivity linking the right posterior superior temporal gyrus to Wernicke's areas bilaterally. Our results demonstrate a sexual dimorphism in the relationship of functional connectivity to intelligence in children and an increasing reliance on inter-hemispheric connectivity in girls with age.

  19. Internal Medicine residents use heuristics to estimate disease probability.

    PubMed

    Phang, Sen Han; Ravani, Pietro; Schaefer, Jeffrey; Wright, Bruce; McLaughlin, Kevin

    2015-01-01

    Training in Bayesian reasoning may have limited impact on accuracy of probability estimates. In this study, our goal was to explore whether residents previously exposed to Bayesian reasoning use heuristics rather than Bayesian reasoning to estimate disease probabilities. We predicted that if residents use heuristics then post-test probability estimates would be increased by non-discriminating clinical features or a high anchor for a target condition. We randomized 55 Internal Medicine residents to different versions of four clinical vignettes and asked them to estimate probabilities of target conditions. We manipulated the clinical data for each vignette to be consistent with either 1) using a representative heuristic, by adding non-discriminating prototypical clinical features of the target condition, or 2) using anchoring with adjustment heuristic, by providing a high or low anchor for the target condition. When presented with additional non-discriminating data the odds of diagnosing the target condition were increased (odds ratio (OR) 2.83, 95% confidence interval [1.30, 6.15], p = 0.009). Similarly, the odds of diagnosing the target condition were increased when a high anchor preceded the vignette (OR 2.04, [1.09, 3.81], p = 0.025). Our findings suggest that despite previous exposure to the use of Bayesian reasoning, residents use heuristics, such as the representative heuristic and anchoring with adjustment, to estimate probabilities. Potential reasons for attribute substitution include the relative cognitive ease of heuristics vs. Bayesian reasoning or perhaps residents in their clinical practice use gist traces rather than precise probability estimates when diagnosing.

  20. Randomized path optimization for thevMitigated counter detection of UAVS

    DTIC Science & Technology

    2017-06-01

    using Bayesian filtering . The KL divergence is used to compare the probability density of aircraft termination to a normal distribution around the...Bayesian filtering . The KL divergence is used to compare the probability density of aircraft termination to a normal distribution around the true terminal...algorithm’s success. A recursive Bayesian filtering scheme is used to assimilate noisy measurements of the UAVs position to predict its terminal location. We

  1. Efficient implementation of the Metropolis-Hastings algorithm, with application to the Cormack?Jolly?Seber model

    USGS Publications Warehouse

    Link, W.A.; Barker, R.J.

    2008-01-01

    Judicious choice of candidate generating distributions improves efficiency of the Metropolis-Hastings algorithm. In Bayesian applications, it is sometimes possible to identify an approximation to the target posterior distribution; this approximate posterior distribution is a good choice for candidate generation. These observations are applied to analysis of the Cormack?Jolly?Seber model and its extensions.

  2. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bertholon, François; Harant, Olivier; Bourlon, Bertrand

    This article introduces a joined Bayesian estimation of gas samples issued from a gas chromatography column (GC) coupled with a NEMS sensor based on Giddings Eyring microscopic molecular stochastic model. The posterior distribution is sampled using a Monte Carlo Markov Chain and Gibbs sampling. Parameters are estimated using the posterior mean. This estimation scheme is finally applied on simulated and real datasets using this molecular stochastic forward model.

  3. Distribution of Marburg virus in Africa: An evolutionary approach.

    PubMed

    Zehender, Gianguglielmo; Sorrentino, Chiara; Veo, Carla; Fiaschi, Lisa; Gioffrè, Sonia; Ebranati, Erika; Tanzi, Elisabetta; Ciccozzi, Massimo; Lai, Alessia; Galli, Massimo

    2016-10-01

    The aim of this study was to investigate the origin and geographical dispersion of Marburg virus, the first member of the Filoviridae family to be discovered. Seventy-three complete genome sequences of Marburg virus isolated from animals and humans were retrieved from public databases and analysed using a Bayesian phylogeographical framework. The phylogenetic tree of the Marburg virus data set showed two significant evolutionary lineages: Ravn virus (RAVV) and Marburg virus (MARV). MARV divided into two main clades; clade A included isolates from Uganda (five from the European epidemic in 1967), Kenya (1980) and Angola (from the epidemic of 2004-2005); clade B included most of the isolates obtained during the 1999-2000 epidemic in the Democratic Republic of the Congo (DRC) and a group of Ugandan isolates obtained in 2007-2009. The estimated mean evolutionary rate of the whole genome was 3.3×10(-4) substitutions/site/year (credibility interval 2.0-4.8). The MARV strain had a mean root time of the most recent common ancestor of 177.9years ago (YA) (95% highest posterior density 87-284), thus indicating that it probably originated in the mid-XIX century, whereas the RAVV strain had a later origin dating back to a mean 33.8 YA. The most probable location of the MARV ancestor was Uganda (state posterior probability, spp=0.41), whereas that of the RAVV ancestor was Kenya (spp=0.71). There were significant migration rates from Uganda to the DRC (Bayes Factor, BF=42.0) and in the opposite direction (BF=5.7). Our data suggest that Uganda may have been the cradle of Marburg virus in Africa. Copyright © 2016 Elsevier B.V. All rights reserved.

  4. Analysis of statistical and standard algorithms for detecting muscle onset with surface electromyography

    PubMed Central

    Tweedell, Andrew J.; Haynes, Courtney A.

    2017-01-01

    The timing of muscle activity is a commonly applied analytic method to understand how the nervous system controls movement. This study systematically evaluates six classes of standard and statistical algorithms to determine muscle onset in both experimental surface electromyography (EMG) and simulated EMG with a known onset time. Eighteen participants had EMG collected from the biceps brachii and vastus lateralis while performing a biceps curl or knee extension, respectively. Three established methods and three statistical methods for EMG onset were evaluated. Linear envelope, Teager-Kaiser energy operator + linear envelope and sample entropy were the established methods evaluated while general time series mean/variance, sequential and batch processing of parametric and nonparametric tools, and Bayesian changepoint analysis were the statistical techniques used. Visual EMG onset (experimental data) and objective EMG onset (simulated data) were compared with algorithmic EMG onset via root mean square error and linear regression models for stepwise elimination of inferior algorithms. The top algorithms for both data types were analyzed for their mean agreement with the gold standard onset and evaluation of 95% confidence intervals. The top algorithms were all Bayesian changepoint analysis iterations where the parameter of the prior (p0) was zero. The best performing Bayesian algorithms were p0 = 0 and a posterior probability for onset determination at 60–90%. While existing algorithms performed reasonably, the Bayesian changepoint analysis methodology provides greater reliability and accuracy when determining the singular onset of EMG activity in a time series. Further research is needed to determine if this class of algorithms perform equally well when the time series has multiple bursts of muscle activity. PMID:28489897

  5. Genomic selection and complex trait prediction using a fast EM algorithm applied to genome-wide markers

    PubMed Central

    2010-01-01

    Background The information provided by dense genome-wide markers using high throughput technology is of considerable potential in human disease studies and livestock breeding programs. Genome-wide association studies relate individual single nucleotide polymorphisms (SNP) from dense SNP panels to individual measurements of complex traits, with the underlying assumption being that any association is caused by linkage disequilibrium (LD) between SNP and quantitative trait loci (QTL) affecting the trait. Often SNP are in genomic regions of no trait variation. Whole genome Bayesian models are an effective way of incorporating this and other important prior information into modelling. However a full Bayesian analysis is often not feasible due to the large computational time involved. Results This article proposes an expectation-maximization (EM) algorithm called emBayesB which allows only a proportion of SNP to be in LD with QTL and incorporates prior information about the distribution of SNP effects. The posterior probability of being in LD with at least one QTL is calculated for each SNP along with estimates of the hyperparameters for the mixture prior. A simulated example of genomic selection from an international workshop is used to demonstrate the features of the EM algorithm. The accuracy of prediction is comparable to a full Bayesian analysis but the EM algorithm is considerably faster. The EM algorithm was accurate in locating QTL which explained more than 1% of the total genetic variation. A computational algorithm for very large SNP panels is described. Conclusions emBayesB is a fast and accurate EM algorithm for implementing genomic selection and predicting complex traits by mapping QTL in genome-wide dense SNP marker data. Its accuracy is similar to Bayesian methods but it takes only a fraction of the time. PMID:20969788

  6. Sequential Probability Ratio Test for Collision Avoidance Maneuver Decisions

    NASA Technical Reports Server (NTRS)

    Carpenter, J. Russell; Markley, F. Landis

    2010-01-01

    When facing a conjunction between space objects, decision makers must chose whether to maneuver for collision avoidance or not. We apply a well-known decision procedure, the sequential probability ratio test, to this problem. We propose two approaches to the problem solution, one based on a frequentist method, and the other on a Bayesian method. The frequentist method does not require any prior knowledge concerning the conjunction, while the Bayesian method assumes knowledge of prior probability densities. Our results show that both methods achieve desired missed detection rates, but the frequentist method's false alarm performance is inferior to the Bayesian method's

  7. Bayesian calibration of mechanistic aquatic biogeochemical models and benefits for environmental management

    NASA Astrophysics Data System (ADS)

    Arhonditsis, George B.; Papantou, Dimitra; Zhang, Weitao; Perhar, Gurbir; Massos, Evangelia; Shi, Molu

    2008-09-01

    Aquatic biogeochemical models have been an indispensable tool for addressing pressing environmental issues, e.g., understanding oceanic response to climate change, elucidation of the interplay between plankton dynamics and atmospheric CO 2 levels, and examination of alternative management schemes for eutrophication control. Their ability to form the scientific basis for environmental management decisions can be undermined by the underlying structural and parametric uncertainty. In this study, we outline how we can attain realistic predictive links between management actions and ecosystem response through a probabilistic framework that accommodates rigorous uncertainty analysis of a variety of error sources, i.e., measurement error, parameter uncertainty, discrepancy between model and natural system. Because model uncertainty analysis essentially aims to quantify the joint probability distribution of model parameters and to make inference about this distribution, we believe that the iterative nature of Bayes' Theorem is a logical means to incorporate existing knowledge and update the joint distribution as new information becomes available. The statistical methodology begins with the characterization of parameter uncertainty in the form of probability distributions, then water quality data are used to update the distributions, and yield posterior parameter estimates along with predictive uncertainty bounds. Our illustration is based on a six state variable (nitrate, ammonium, dissolved organic nitrogen, phytoplankton, zooplankton, and bacteria) ecological model developed for gaining insight into the mechanisms that drive plankton dynamics in a coastal embayment; the Gulf of Gera, Island of Lesvos, Greece. The lack of analytical expressions for the posterior parameter distributions was overcome using Markov chain Monte Carlo simulations; a convenient way to obtain representative samples of parameter values. The Bayesian calibration resulted in realistic reproduction of the key temporal patterns of the system, offered insights into the degree of information the data contain about model inputs, and also allowed the quantification of the dependence structure among the parameter estimates. Finally, our study uses two synthetic datasets to examine the ability of the updated model to provide estimates of predictive uncertainty for water quality variables of environmental management interest.

  8. Estimation of flock/herd-level true Mycobacterium avium subspecies paratuberculosis prevalence on sheep, beef cattle and deer farms in New Zealand using a novel Bayesian model.

    PubMed

    Verdugo, Cristobal; Jones, Geoff; Johnson, Wes; Wilson, Peter; Stringer, Lesley; Heuer, Cord

    2014-12-01

    The study aimed to estimate the national- and island-level flock/herd true prevalence (HTP) of Mycobacterium avium subsp. paratuberculosis (MAP) infection in pastoral farmed sheep, beef cattle and deer in New Zealand. A random sample of 238 single- or multi-species farms was selected from a postal surveyed population of 1940 farms. The sample included 162 sheep flocks, 116 beef cattle and 99 deer herds from seven of 16 geographical regions. Twenty animals from each species present on farm were randomly selected for blood and faecal sampling. Pooled faecal culture testing was conducted using a single pool (sheep flocks) or two pools (beef cattle/deer herds) of 20 and 10 samples per pool, respectively. To increase flock/herd-level sensitivity, sera from all 20 animals from culture negative flocks/herds were individually tested by Pourquier(®) ELISA (sheep and cattle) or Paralisa™ (deer). Results were adjusted for sensitivity and specificity of diagnostic tests using a novel Bayesian latent class model. Outcomes were adjusted by their sampling fractions to obtain HTP estimates at national level. For each species, the posterior probability (POPR) of HTP differences between New Zealand North (NI) and South (SI) Islands was obtained. Across all species, 69% of farms had at least one species test positive. Sheep flocks had the highest HTP estimate (76%, posterior probability interval (PPI) 70-81%), followed by deer (46%, PPI 38-55%) and beef herds (42%, PPI 35-50%). Differences were observed between the two main islands of New Zealand, with higher HTP in sheep and beef cattle flocks/herds in the NI. Sheep flock HTP was 80% in the NI compared with 70% (POPR=0.96) in the SI, while the HTP for beef cattle was 44% in the NI and 38% in the SI (POPR=0.80). Conversely, deer HTP was higher in the SI (54%) than the NI (33%, POPR=0.99). Infection with MAP is endemic at high prevalence in sheep, beef cattle and deer flocks/herds across New Zealand. Copyright © 2014 Elsevier B.V. All rights reserved.

  9. Baseline predictors of sputum culture conversion in pulmonary tuberculosis: importance of cavities, smoking, time to detection and W-Beijing genotype.

    PubMed

    Visser, Marianne E; Stead, Michael C; Walzl, Gerhard; Warren, Rob; Schomaker, Michael; Grewal, Harleen M S; Swart, Elizabeth C; Maartens, Gary

    2012-01-01

    Time to detection (TTD) on automated liquid mycobacterial cultures is an emerging biomarker of tuberculosis outcomes. The M. tuberculosis W-Beijing genotype is spreading globally, indicating a selective advantage. There is a paucity of data on the association between baseline TTD and W-Beijing genotype and tuberculosis outcomes. To assess baseline predictors of failure of sputum culture conversion, within the first 2 months of antitubercular therapy, in participants with pulmonary tuberculosis. Between May 2005 and August 2008 we conducted a prospective cohort study of time to sputum culture conversion in ambulatory participants with first episodes of smear and culture positive pulmonary tuberculosis attending two primary care clinics in Cape Town, South Africa. Rifampicin resistance (diagnosed on phenotypic susceptibility testing) was an exclusion criterion. Sputum was collected weekly for 8 weeks for mycobacterial culture on liquid media (BACTEC MGIT 960). Due to missing data, multiple imputation was performed. Time to sputum culture conversion was analysed using a Cox-proportional hazards model. Bayesian model averaging determined the posterior effect probability for each variable. 113 participants were enrolled (30.1% female, 10.5% HIV-infected, 44.2% W-Beijing genotype, and 89% cavities). On Kaplan Meier analysis 50.4% of participants underwent sputum culture conversion by 8 weeks. The following baseline factors were associated with slower sputum culture conversion: TTD (adjusted hazard ratio (aHR) = 1.11, 95% CI 1.02; 1.2), lung cavities (aHR = 0.13, 95% CI 0.02; 0.95), ever smoking (aHR = 0.32, 95% CI 0.1; 1.02) and the W-Beijing genotype (aHR = 0.51, 95% CI 0.25; 1.07). On Bayesian model averaging, posterior probability effects were strong for TTD, lung cavitation and smoking and moderate for W-Beijing genotype. We found that baseline TTD, smoking, cavities and W-Beijing genotype were associated with delayed 2 month sputum culture. Larger studies are needed to confirm the relationship between the W-Beijing genotype and sputum culture conversion.

  10. A Bayesian method for using simulator data to enhance human error probabilities assigned by existing HRA methods

    DOE PAGES

    Groth, Katrina M.; Smith, Curtis L.; Swiler, Laura P.

    2014-04-05

    In the past several years, several international agencies have begun to collect data on human performance in nuclear power plant simulators [1]. This data provides a valuable opportunity to improve human reliability analysis (HRA), but there improvements will not be realized without implementation of Bayesian methods. Bayesian methods are widely used in to incorporate sparse data into models in many parts of probabilistic risk assessment (PRA), but Bayesian methods have not been adopted by the HRA community. In this article, we provide a Bayesian methodology to formally use simulator data to refine the human error probabilities (HEPs) assigned by existingmore » HRA methods. We demonstrate the methodology with a case study, wherein we use simulator data from the Halden Reactor Project to update the probability assignments from the SPAR-H method. The case study demonstrates the ability to use performance data, even sparse data, to improve existing HRA methods. Furthermore, this paper also serves as a demonstration of the value of Bayesian methods to improve the technical basis of HRA.« less

  11. Wave-height hazard analysis in Eastern Coast of Spain - Bayesian approach using generalized Pareto distribution

    NASA Astrophysics Data System (ADS)

    Egozcue, J. J.; Pawlowsky-Glahn, V.; Ortego, M. I.

    2005-03-01

    Standard practice of wave-height hazard analysis often pays little attention to the uncertainty of assessed return periods and occurrence probabilities. This fact favors the opinion that, when large events happen, the hazard assessment should change accordingly. However, uncertainty of the hazard estimates is normally able to hide the effect of those large events. This is illustrated using data from the Mediterranean coast of Spain, where the last years have been extremely disastrous. Thus, it is possible to compare the hazard assessment based on data previous to those years with the analysis including them. With our approach, no significant change is detected when the statistical uncertainty is taken into account. The hazard analysis is carried out with a standard model. Time-occurrence of events is assumed Poisson distributed. The wave-height of each event is modelled as a random variable which upper tail follows a Generalized Pareto Distribution (GPD). Moreover, wave-heights are assumed independent from event to event and also independent of their occurrence in time. A threshold for excesses is assessed empirically. The other three parameters (Poisson rate, shape and scale parameters of GPD) are jointly estimated using Bayes' theorem. Prior distribution accounts for physical features of ocean waves in the Mediterranean sea and experience with these phenomena. Posterior distribution of the parameters allows to obtain posterior distributions of other derived parameters like occurrence probabilities and return periods. Predictives are also available. Computations are carried out using the program BGPE v2.0.

  12. On the applicability of surrogate-based Markov chain Monte Carlo-Bayesian inversion to the Community Land Model: Case studies at flux tower sites

    NASA Astrophysics Data System (ADS)

    Huang, Maoyi; Ray, Jaideep; Hou, Zhangshuan; Ren, Huiying; Liu, Ying; Swiler, Laura

    2016-07-01

    The Community Land Model (CLM) has been widely used in climate and Earth system modeling. Accurate estimation of model parameters is needed for reliable model simulations and predictions under current and future conditions, respectively. In our previous work, a subset of hydrological parameters has been identified to have significant impact on surface energy fluxes at selected flux tower sites based on parameter screening and sensitivity analysis, which indicate that the parameters could potentially be estimated from surface flux observations at the towers. To date, such estimates do not exist. In this paper, we assess the feasibility of applying a Bayesian model calibration technique to estimate CLM parameters at selected flux tower sites under various site conditions. The parameters are estimated as a joint probability density function (PDF) that provides estimates of uncertainty of the parameters being inverted, conditional on climatologically average latent heat fluxes derived from observations. We find that the simulated mean latent heat fluxes from CLM using the calibrated parameters are generally improved at all sites when compared to those obtained with CLM simulations using default parameter sets. Further, our calibration method also results in credibility bounds around the simulated mean fluxes which bracket the measured data. The modes (or maximum a posteriori values) and 95% credibility intervals of the site-specific posterior PDFs are tabulated as suggested parameter values for each site. Analysis of relationships between the posterior PDFs and site conditions suggests that the parameter values are likely correlated with the plant functional type, which needs to be confirmed in future studies by extending the approach to more sites.

  13. Uncertainty quantification for nuclear density functional theory and information content of new measurements

    DOE PAGES

    McDonnell, J. D.; Schunck, N.; Higdon, D.; ...

    2015-03-24

    Statistical tools of uncertainty quantification can be used to assess the information content of measured observables with respect to present-day theoretical models, to estimate model errors and thereby improve predictive capability, to extrapolate beyond the regions reached by experiment, and to provide meaningful input to applications and planned measurements. To showcase new opportunities offered by such tools, we make a rigorous analysis of theoretical statistical uncertainties in nuclear density functional theory using Bayesian inference methods. By considering the recent mass measurements from the Canadian Penning Trap at Argonne National Laboratory, we demonstrate how the Bayesian analysis and a direct least-squaresmore » optimization, combined with high-performance computing, can be used to assess the information content of the new data with respect to a model based on the Skyrme energy density functional approach. Employing the posterior probability distribution computed with a Gaussian process emulator, we apply the Bayesian framework to propagate theoretical statistical uncertainties in predictions of nuclear masses, two-neutron dripline, and fission barriers. Overall, we find that the new mass measurements do not impose a constraint that is strong enough to lead to significant changes in the model parameters. In addition, the example discussed in this study sets the stage for quantifying and maximizing the impact of new measurements with respect to current modeling and guiding future experimental efforts, thus enhancing the experiment-theory cycle in the scientific method.« less

  14. Bayesian analysis of multiple direct detection experiments

    NASA Astrophysics Data System (ADS)

    Arina, Chiara

    2014-12-01

    Bayesian methods offer a coherent and efficient framework for implementing uncertainties into induction problems. In this article, we review how this approach applies to the analysis of dark matter direct detection experiments. In particular we discuss the exclusion limit of XENON100 and the debated hints of detection under the hypothesis of a WIMP signal. Within parameter inference, marginalizing consistently over uncertainties to extract robust posterior probability distributions, we find that the claimed tension between XENON100 and the other experiments can be partially alleviated in isospin violating scenario, while elastic scattering model appears to be compatible with the frequentist statistical approach. We then move to model comparison, for which Bayesian methods are particularly well suited. Firstly, we investigate the annual modulation seen in CoGeNT data, finding that there is weak evidence for a modulation. Modulation models due to other physics compare unfavorably with the WIMP models, paying the price for their excessive complexity. Secondly, we confront several coherent scattering models to determine the current best physical scenario compatible with the experimental hints. We find that exothermic and inelastic dark matter are moderatly disfavored against the elastic scenario, while the isospin violating model has a similar evidence. Lastly the Bayes' factor gives inconclusive evidence for an incompatibility between the data sets of XENON100 and the hints of detection. The same question assessed with goodness of fit would indicate a 2 σ discrepancy. This suggests that more data are therefore needed to settle this question.

  15. Uncertainty quantification for nuclear density functional theory and information content of new measurements

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McDonnell, J. D.; Schunck, N.; Higdon, D.

    2015-03-24

    Statistical tools of uncertainty quantification can be used to assess the information content of measured observables with respect to present-day theoretical models, to estimate model errors and thereby improve predictive capability, to extrapolate beyond the regions reached by experiment, and to provide meaningful input to applications and planned measurements. To showcase new opportunities offered by such tools, we make a rigorous analysis of theoretical statistical uncertainties in nuclear density functional theory using Bayesian inference methods. By considering the recent mass measurements from the Canadian Penning Trap at Argonne National Laboratory, we demonstrate how the Bayesian analysis and a direct least-squaresmore » optimization, combined with high-performance computing, can be used to assess the information content of the new data with respect to a model based on the Skyrme energy density functional approach. Employing the posterior probability distribution computed with a Gaussian process emulator, we apply the Bayesian framework to propagate theoretical statistical uncertainties in predictions of nuclear masses, two-neutron dripline, and fission barriers. Overall, we find that the new mass measurements do not impose a constraint that is strong enough to lead to significant changes in the model parameters. As a result, the example discussed in this study sets the stage for quantifying and maximizing the impact of new measurements with respect to current modeling and guiding future experimental efforts, thus enhancing the experiment-theory cycle in the scientific method.« less

  16. Bayesian methods for the design and interpretation of clinical trials in very rare diseases

    PubMed Central

    Hampson, Lisa V; Whitehead, John; Eleftheriou, Despina; Brogan, Paul

    2014-01-01

    This paper considers the design and interpretation of clinical trials comparing treatments for conditions so rare that worldwide recruitment efforts are likely to yield total sample sizes of 50 or fewer, even when patients are recruited over several years. For such studies, the sample size needed to meet a conventional frequentist power requirement is clearly infeasible. Rather, the expectation of any such trial has to be limited to the generation of an improved understanding of treatment options. We propose a Bayesian approach for the conduct of rare-disease trials comparing an experimental treatment with a control where patient responses are classified as a success or failure. A systematic elicitation from clinicians of their beliefs concerning treatment efficacy is used to establish Bayesian priors for unknown model parameters. The process of determining the prior is described, including the possibility of formally considering results from related trials. As sample sizes are small, it is possible to compute all possible posterior distributions of the two success rates. A number of allocation ratios between the two treatment groups can be considered with a view to maximising the prior probability that the trial concludes recommending the new treatment when in fact it is non-inferior to control. Consideration of the extent to which opinion can be changed, even by data from the best feasible design, can help to determine whether such a trial is worthwhile. © 2014 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd. PMID:24957522

  17. On the Origins of Suboptimality in Human Probabilistic Inference

    PubMed Central

    Acerbi, Luigi; Vijayakumar, Sethu; Wolpert, Daniel M.

    2014-01-01

    Humans have been shown to combine noisy sensory information with previous experience (priors), in qualitative and sometimes quantitative agreement with the statistically-optimal predictions of Bayesian integration. However, when the prior distribution becomes more complex than a simple Gaussian, such as skewed or bimodal, training takes much longer and performance appears suboptimal. It is unclear whether such suboptimality arises from an imprecise internal representation of the complex prior, or from additional constraints in performing probabilistic computations on complex distributions, even when accurately represented. Here we probe the sources of suboptimality in probabilistic inference using a novel estimation task in which subjects are exposed to an explicitly provided distribution, thereby removing the need to remember the prior. Subjects had to estimate the location of a target given a noisy cue and a visual representation of the prior probability density over locations, which changed on each trial. Different classes of priors were examined (Gaussian, unimodal, bimodal). Subjects' performance was in qualitative agreement with the predictions of Bayesian Decision Theory although generally suboptimal. The degree of suboptimality was modulated by statistical features of the priors but was largely independent of the class of the prior and level of noise in the cue, suggesting that suboptimality in dealing with complex statistical features, such as bimodality, may be due to a problem of acquiring the priors rather than computing with them. We performed a factorial model comparison across a large set of Bayesian observer models to identify additional sources of noise and suboptimality. Our analysis rejects several models of stochastic behavior, including probability matching and sample-averaging strategies. Instead we show that subjects' response variability was mainly driven by a combination of a noisy estimation of the parameters of the priors, and by variability in the decision process, which we represent as a noisy or stochastic posterior. PMID:24945142

  18. Bayesian analysis of a mastitis control plan to investigate the influence of veterinary prior beliefs on clinical interpretation.

    PubMed

    Green, M J; Browne, W J; Green, L E; Bradley, A J; Leach, K A; Breen, J E; Medley, G F

    2009-10-01

    The fundamental objective for health research is to determine whether changes should be made to clinical decisions. Decisions made by veterinary surgeons in the light of new research evidence are known to be influenced by their prior beliefs, especially their initial opinions about the plausibility of possible results. In this paper, clinical trial results for a bovine mastitis control plan were evaluated within a Bayesian context, to incorporate a community of prior distributions that represented a spectrum of clinical prior beliefs. The aim was to quantify the effect of veterinary surgeons' initial viewpoints on the interpretation of the trial results. A Bayesian analysis was conducted using Markov chain Monte Carlo procedures. Stochastic models included a financial cost attributed to a change in clinical mastitis following implementation of the control plan. Prior distributions were incorporated that covered a realistic range of possible clinical viewpoints, including scepticism, enthusiasm and uncertainty. Posterior distributions revealed important differences in the financial gain that clinicians with different starting viewpoints would anticipate from the mastitis control plan, given the actual research results. For example, a severe skeptic would ascribe a probability of 0.50 for a return of < 5 UK pounds per cow in an average herd that implemented the plan, whereas an enthusiast would ascribe this probability for a return of > 20 UK pounds per cow. Simulations using increased trial sizes indicated that if the original study was four times as large, an initial skeptic would be more convinced about the efficacy of the control plan but would still anticipate less financial return than an initial enthusiast would anticipate after the original study. In conclusion, it is possible to estimate how clinicians' prior beliefs influence their interpretation of research evidence. Further research on the extent to which different interpretations of evidence result in changes to clinical practice would be worthwhile.

  19. Prediction-error variance in Bayesian model updating: a comparative study

    NASA Astrophysics Data System (ADS)

    Asadollahi, Parisa; Li, Jian; Huang, Yong

    2017-04-01

    In Bayesian model updating, the likelihood function is commonly formulated by stochastic embedding in which the maximum information entropy probability model of prediction error variances plays an important role and it is Gaussian distribution subject to the first two moments as constraints. The selection of prediction error variances can be formulated as a model class selection problem, which automatically involves a trade-off between the average data-fit of the model class and the information it extracts from the data. Therefore, it is critical for the robustness in the updating of the structural model especially in the presence of modeling errors. To date, three ways of considering prediction error variances have been seem in the literature: 1) setting constant values empirically, 2) estimating them based on the goodness-of-fit of the measured data, and 3) updating them as uncertain parameters by applying Bayes' Theorem at the model class level. In this paper, the effect of different strategies to deal with the prediction error variances on the model updating performance is investigated explicitly. A six-story shear building model with six uncertain stiffness parameters is employed as an illustrative example. Transitional Markov Chain Monte Carlo is used to draw samples of the posterior probability density function of the structure model parameters as well as the uncertain prediction variances. The different levels of modeling uncertainty and complexity are modeled through three FE models, including a true model, a model with more complexity, and a model with modeling error. Bayesian updating is performed for the three FE models considering the three aforementioned treatments of the prediction error variances. The effect of number of measurements on the model updating performance is also examined in the study. The results are compared based on model class assessment and indicate that updating the prediction error variances as uncertain parameters at the model class level produces more robust results especially when the number of measurement is small.

  20. SU-F-R-44: Modeling Lung SBRT Tumor Response Using Bayesian Network Averaging

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Diamant, A; Ybarra, N; Seuntjens, J

    2016-06-15

    Purpose: The prediction of tumor control after a patient receives lung SBRT (stereotactic body radiation therapy) has proven to be challenging, due to the complex interactions between an individual’s biology and dose-volume metrics. Many of these variables have predictive power when combined, a feature that we exploit using a graph modeling approach based on Bayesian networks. This provides a probabilistic framework that allows for accurate and visually intuitive predictive modeling. The aim of this study is to uncover possible interactions between an individual patient’s characteristics and generate a robust model capable of predicting said patient’s treatment outcome. Methods: We investigatedmore » a cohort of 32 prospective patients from multiple institutions whom had received curative SBRT to the lung. The number of patients exhibiting tumor failure was observed to be 7 (event rate of 22%). The serum concentration of 5 biomarkers previously associated with NSCLC (non-small cell lung cancer) was measured pre-treatment. A total of 21 variables were analyzed including: dose-volume metrics with BED (biologically effective dose) correction and clinical variables. A Markov Chain Monte Carlo technique estimated the posterior probability distribution of the potential graphical structures. The probability of tumor failure was then estimated by averaging the top 100 graphs and applying Baye’s rule. Results: The optimal Bayesian model generated throughout this study incorporated the PTV volume, the serum concentration of the biomarker EGFR (epidermal growth factor receptor) and prescription BED. This predictive model recorded an area under the receiver operating characteristic curve of 0.94(1), providing better performance compared to competing methods in other literature. Conclusion: The use of biomarkers in conjunction with dose-volume metrics allows for the generation of a robust predictive model. The preliminary results of this report demonstrate that it is possible to accurately model the prognosis of an individual lung SBRT patient’s treatment.« less

  1. Bayesian calibration of terrestrial ecosystem models: a study of advanced Markov chain Monte Carlo methods

    NASA Astrophysics Data System (ADS)

    Lu, Dan; Ricciuto, Daniel; Walker, Anthony; Safta, Cosmin; Munger, William

    2017-09-01

    Calibration of terrestrial ecosystem models is important but challenging. Bayesian inference implemented by Markov chain Monte Carlo (MCMC) sampling provides a comprehensive framework to estimate model parameters and associated uncertainties using their posterior distributions. The effectiveness and efficiency of the method strongly depend on the MCMC algorithm used. In this work, a differential evolution adaptive Metropolis (DREAM) algorithm is used to estimate posterior distributions of 21 parameters for the data assimilation linked ecosystem carbon (DALEC) model using 14 years of daily net ecosystem exchange data collected at the Harvard Forest Environmental Measurement Site eddy-flux tower. The calibration of DREAM results in a better model fit and predictive performance compared to the popular adaptive Metropolis (AM) scheme. Moreover, DREAM indicates that two parameters controlling autumn phenology have multiple modes in their posterior distributions while AM only identifies one mode. The application suggests that DREAM is very suitable to calibrate complex terrestrial ecosystem models, where the uncertain parameter size is usually large and existence of local optima is always a concern. In addition, this effort justifies the assumptions of the error model used in Bayesian calibration according to the residual analysis. The result indicates that a heteroscedastic, correlated, Gaussian error model is appropriate for the problem, and the consequent constructed likelihood function can alleviate the underestimation of parameter uncertainty that is usually caused by using uncorrelated error models.

  2. A Bayesian-frequentist two-stage single-arm phase II clinical trial design.

    PubMed

    Dong, Gaohong; Shih, Weichung Joe; Moore, Dirk; Quan, Hui; Marcella, Stephen

    2012-08-30

    It is well-known that both frequentist and Bayesian clinical trial designs have their own advantages and disadvantages. To have better properties inherited from these two types of designs, we developed a Bayesian-frequentist two-stage single-arm phase II clinical trial design. This design allows both early acceptance and rejection of the null hypothesis ( H(0) ). The measures (for example probability of trial early termination, expected sample size, etc.) of the design properties under both frequentist and Bayesian settings are derived. Moreover, under the Bayesian setting, the upper and lower boundaries are determined with predictive probability of trial success outcome. Given a beta prior and a sample size for stage I, based on the marginal distribution of the responses at stage I, we derived Bayesian Type I and Type II error rates. By controlling both frequentist and Bayesian error rates, the Bayesian-frequentist two-stage design has special features compared with other two-stage designs. Copyright © 2012 John Wiley & Sons, Ltd.

  3. State-space modeling to support management of brucellosis in the Yellowstone bison population

    USGS Publications Warehouse

    Hobbs, N. Thompson; Geremia, Chris; Treanor, John; Wallen, Rick; White, P.J.; Hooten, Mevin B.; Rhyan, Jack C.

    2015-01-01

    The bison (Bison bison) of the Yellowstone ecosystem, USA, exemplify the difficulty of conserving large mammals that migrate across the boundaries of conservation areas. Bison are infected with brucellosis (Brucella abortus) and their seasonal movements can expose livestock to infection. Yellowstone National Park has embarked on a program of adaptive management of bison, which requires a model that assimilates data to support management decisions. We constructed a Bayesian state-space model to reveal the influence of brucellosis on the Yellowstone bison population. A frequency-dependent model of brucellosis transmission was superior to a density-dependent model in predicting out-of-sample observations of horizontal transmission probability. A mixture model including both transmission mechanisms converged on frequency dependence. Conditional on the frequency-dependent model, brucellosis median transmission rate was 1.87 yr−1. The median of the posterior distribution of the basic reproductive ratio (R0) was 1.75. Seroprevalence of adult females varied around 60% over two decades, but only 9.6 of 100 adult females were infectious. Brucellosis depressed recruitment; estimated population growth rate λ averaged 1.07 for an infected population and 1.11 for a healthy population. We used five-year forecasting to evaluate the ability of different actions to meet management goals relative to no action. Annually removing 200 seropositive female bison increased by 30-fold the probability of reducing seroprevalence below 40% and increased by a factor of 120 the probability of achieving a 50% reduction in transmission probability relative to no action. Annually vaccinating 200 seronegative animals increased the likelihood of a 50% reduction in transmission probability by fivefold over no action. However, including uncertainty in the ability to implement management by representing stochastic variation in the number of accessible bison dramatically reduced the probability of achieving goals using interventions relative to no action. Because the width of the posterior predictive distributions of future population states expands rapidly with increases in the forecast horizon, managers must accept high levels of uncertainty. These findings emphasize the necessity of iterative, adaptive management with relatively short-term commitment to action and frequent reevaluation in response to new data and model forecasts. We believe our approach has broad applications.

  4. Uncertainty plus prior equals rational bias: an intuitive Bayesian probability weighting function.

    PubMed

    Fennell, John; Baddeley, Roland

    2012-10-01

    Empirical research has shown that when making choices based on probabilistic options, people behave as if they overestimate small probabilities, underestimate large probabilities, and treat positive and negative outcomes differently. These distortions have been modeled using a nonlinear probability weighting function, which is found in several nonexpected utility theories, including rank-dependent models and prospect theory; here, we propose a Bayesian approach to the probability weighting function and, with it, a psychological rationale. In the real world, uncertainty is ubiquitous and, accordingly, the optimal strategy is to combine probability statements with prior information using Bayes' rule. First, we show that any reasonable prior on probabilities leads to 2 of the observed effects; overweighting of low probabilities and underweighting of high probabilities. We then investigate 2 plausible kinds of priors: informative priors based on previous experience and uninformative priors of ignorance. Individually, these priors potentially lead to large problems of bias and inefficiency, respectively; however, when combined using Bayesian model comparison methods, both forms of prior can be applied adaptively, gaining the efficiency of empirical priors and the robustness of ignorance priors. We illustrate this for the simple case of generic good and bad options, using Internet blogs to estimate the relevant priors of inference. Given this combined ignorant/informative prior, the Bayesian probability weighting function is not only robust and efficient but also matches all of the major characteristics of the distortions found in empirical research. PsycINFO Database Record (c) 2012 APA, all rights reserved.

  5. Uncertainty estimation of Intensity-Duration-Frequency relationships: A regional analysis

    NASA Astrophysics Data System (ADS)

    Mélèse, Victor; Blanchet, Juliette; Molinié, Gilles

    2018-03-01

    We propose in this article a regional study of uncertainties in IDF curves derived from point-rainfall maxima. We develop two generalized extreme value models based on the simple scaling assumption, first in the frequentist framework and second in the Bayesian framework. Within the frequentist framework, uncertainties are obtained i) from the Gaussian density stemming from the asymptotic normality theorem of the maximum likelihood and ii) with a bootstrap procedure. Within the Bayesian framework, uncertainties are obtained from the posterior densities. We confront these two frameworks on the same database covering a large region of 100, 000 km2 in southern France with contrasted rainfall regime, in order to be able to draw conclusion that are not specific to the data. The two frameworks are applied to 405 hourly stations with data back to the 1980's, accumulated in the range 3 h-120 h. We show that i) the Bayesian framework is more robust than the frequentist one to the starting point of the estimation procedure, ii) the posterior and the bootstrap densities are able to better adjust uncertainty estimation to the data than the Gaussian density, and iii) the bootstrap density give unreasonable confidence intervals, in particular for return levels associated to large return period. Therefore our recommendation goes towards the use of the Bayesian framework to compute uncertainty.

  6. An adaptive Gaussian process-based method for efficient Bayesian experimental design in groundwater contaminant source identification problems: ADAPTIVE GAUSSIAN PROCESS-BASED INVERSION

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Jiangjiang; Li, Weixuan; Zeng, Lingzao

    Surrogate models are commonly used in Bayesian approaches such as Markov Chain Monte Carlo (MCMC) to avoid repetitive CPU-demanding model evaluations. However, the approximation error of a surrogate may lead to biased estimations of the posterior distribution. This bias can be corrected by constructing a very accurate surrogate or implementing MCMC in a two-stage manner. Since the two-stage MCMC requires extra original model evaluations, the computational cost is still high. If the information of measurement is incorporated, a locally accurate approximation of the original model can be adaptively constructed with low computational cost. Based on this idea, we propose amore » Gaussian process (GP) surrogate-based Bayesian experimental design and parameter estimation approach for groundwater contaminant source identification problems. A major advantage of the GP surrogate is that it provides a convenient estimation of the approximation error, which can be incorporated in the Bayesian formula to avoid over-confident estimation of the posterior distribution. The proposed approach is tested with a numerical case study. Without sacrificing the estimation accuracy, the new approach achieves about 200 times of speed-up compared to our previous work using two-stage MCMC.« less

  7. [The survival prediction model of advanced gallbladder cancer based on Bayesian network: a multi-institutional study].

    PubMed

    Tang, Z H; Geng, Z M; Chen, C; Si, S B; Cai, Z Q; Song, T Q; Gong, P; Jiang, L; Qiu, Y H; He, Y; Zhai, W L; Li, S P; Zhang, Y C; Yang, Y

    2018-05-01

    Objective: To investigate the clinical value of Bayesian network in predicting survival of patients with advanced gallbladder cancer(GBC)who underwent curative intent surgery. Methods: The clinical data of patients with advanced GBC who underwent curative intent surgery in 9 institutions from January 2010 to December 2015 were analyzed retrospectively.A median survival time model based on a tree augmented naïve Bayes algorithm was established by Bayesia Lab software.The survival time, number of metastatic lymph nodes(NMLN), T stage, pathological grade, margin, jaundice, liver invasion, age, sex and tumor morphology were included in this model.Confusion matrix, the receiver operating characteristic curve and area under the curve were used to evaluate the accuracy of the model.A priori statistical analysis of these 10 variables and a posterior analysis(survival time as the target variable, the remaining factors as the attribute variables)was performed.The importance rankings of each variable was calculated with the polymorphic Birnbaum importance calculation based on the posterior analysis results.The survival probability forecast table was constructed based on the top 4 prognosis factors. The survival curve was drawn by the Kaplan-Meier method, and differences in survival curves were compared using the Log-rank test. Results: A total of 316 patients were enrolled, including 109 males and 207 females.The ratio of male to female was 1.0∶1.9, the age was (62.0±10.8)years.There was 298 cases(94.3%) R0 resection and 18 cases(5.7%) R1 resection.T staging: 287 cases(90.8%) T3 and 29 cases(9.2%) T4.The median survival time(MST) was 23.77 months, and the 1, 3, 5-year survival rates were 67.4%, 40.8%, 32.0%, respectively.For the Bayesian model, the number of correctly predicted cases was 121(≤23.77 months) and 115(>23.77 months) respectively, leading to a 74.86% accuracy of this model.The prior probability of survival time was 0.503 2(≤23.77 months) and 0.496 8(>23.77 months), the importance ranking showed that NMLN(0.366 6), margin(0.350 1), T stage(0.319 2) and pathological grade(0.258 9) were the top 4 prognosis factors influencing the postoperative MST.These four factors were taken as observation variables to get the probability of patients in different survival periods.Basing on these results, a survival prediction score system including NMLN, margin, T stage and pathological grade was designed, the median survival time(month) of 4-9 points were 66.8, 42.4, 26.0, 9.0, 7.5 and 2.3, respectively, there was a statistically significant difference in the different points( P <0.01). Conclusions: The survival prediction model of GBC based on Bayesian network has high accuracy.NMLN, margin, T staging and pathological grade are the top 4 risk factors affecting the survival of patients with advanced GBC who underwent curative resection.The survival prediction score system based on these four factors could be used to predict the survival and to guide the decision making of patients with advanced GBC.

  8. Rediscovery of Good-Turing estimators via Bayesian nonparametrics.

    PubMed

    Favaro, Stefano; Nipoti, Bernardo; Teh, Yee Whye

    2016-03-01

    The problem of estimating discovery probabilities originated in the context of statistical ecology, and in recent years it has become popular due to its frequent appearance in challenging applications arising in genetics, bioinformatics, linguistics, designs of experiments, machine learning, etc. A full range of statistical approaches, parametric and nonparametric as well as frequentist and Bayesian, has been proposed for estimating discovery probabilities. In this article, we investigate the relationships between the celebrated Good-Turing approach, which is a frequentist nonparametric approach developed in the 1940s, and a Bayesian nonparametric approach recently introduced in the literature. Specifically, under the assumption of a two parameter Poisson-Dirichlet prior, we show that Bayesian nonparametric estimators of discovery probabilities are asymptotically equivalent, for a large sample size, to suitably smoothed Good-Turing estimators. As a by-product of this result, we introduce and investigate a methodology for deriving exact and asymptotic credible intervals to be associated with the Bayesian nonparametric estimators of discovery probabilities. The proposed methodology is illustrated through a comprehensive simulation study and the analysis of Expressed Sequence Tags data generated by sequencing a benchmark complementary DNA library. © 2015, The International Biometric Society.

  9. Internal Medicine residents use heuristics to estimate disease probability

    PubMed Central

    Phang, Sen Han; Ravani, Pietro; Schaefer, Jeffrey; Wright, Bruce; McLaughlin, Kevin

    2015-01-01

    Background Training in Bayesian reasoning may have limited impact on accuracy of probability estimates. In this study, our goal was to explore whether residents previously exposed to Bayesian reasoning use heuristics rather than Bayesian reasoning to estimate disease probabilities. We predicted that if residents use heuristics then post-test probability estimates would be increased by non-discriminating clinical features or a high anchor for a target condition. Method We randomized 55 Internal Medicine residents to different versions of four clinical vignettes and asked them to estimate probabilities of target conditions. We manipulated the clinical data for each vignette to be consistent with either 1) using a representative heuristic, by adding non-discriminating prototypical clinical features of the target condition, or 2) using anchoring with adjustment heuristic, by providing a high or low anchor for the target condition. Results When presented with additional non-discriminating data the odds of diagnosing the target condition were increased (odds ratio (OR) 2.83, 95% confidence interval [1.30, 6.15], p = 0.009). Similarly, the odds of diagnosing the target condition were increased when a high anchor preceded the vignette (OR 2.04, [1.09, 3.81], p = 0.025). Conclusions Our findings suggest that despite previous exposure to the use of Bayesian reasoning, residents use heuristics, such as the representative heuristic and anchoring with adjustment, to estimate probabilities. Potential reasons for attribute substitution include the relative cognitive ease of heuristics vs. Bayesian reasoning or perhaps residents in their clinical practice use gist traces rather than precise probability estimates when diagnosing. PMID:27004080

  10. Finding Useful Questions: On Bayesian Diagnosticity, Probability, Impact, and Information Gain

    ERIC Educational Resources Information Center

    Nelson, Jonathan D.

    2005-01-01

    Several norms for how people should assess a question's usefulness have been proposed, notably Bayesian diagnosticity, information gain (mutual information), Kullback-Liebler distance, probability gain (error minimization), and impact (absolute change). Several probabilistic models of previous experiments on categorization, covariation assessment,…

  11. Testing the molecular clock using mechanistic models of fossil preservation and molecular evolution

    PubMed Central

    2017-01-01

    Molecular sequence data provide information about relative times only, and fossil-based age constraints are the ultimate source of information about absolute times in molecular clock dating analyses. Thus, fossil calibrations are critical to molecular clock dating, but competing methods are difficult to evaluate empirically because the true evolutionary time scale is never known. Here, we combine mechanistic models of fossil preservation and sequence evolution in simulations to evaluate different approaches to constructing fossil calibrations and their impact on Bayesian molecular clock dating, and the relative impact of fossil versus molecular sampling. We show that divergence time estimation is impacted by the model of fossil preservation, sampling intensity and tree shape. The addition of sequence data may improve molecular clock estimates, but accuracy and precision is dominated by the quality of the fossil calibrations. Posterior means and medians are poor representatives of true divergence times; posterior intervals provide a much more accurate estimate of divergence times, though they may be wide and often do not have high coverage probability. Our results highlight the importance of increased fossil sampling and improved statistical approaches to generating calibrations, which should incorporate the non-uniform nature of ecological and temporal fossil species distributions. PMID:28637852

  12. Quantification of Nonproteolytic Clostridium botulinum Spore Loads in Food Materials.

    PubMed

    Barker, Gary C; Malakar, Pradeep K; Plowman, June; Peck, Michael W

    2016-01-04

    We have produced data and developed analysis to build representations for the concentration of spores of nonproteolytic Clostridium botulinum in materials that are used during the manufacture of minimally processed chilled foods in the United Kingdom. Food materials are categorized into homogenous groups which include meat, fish, shellfish, cereals, fresh plant material, dairy liquid, dairy nonliquid, mushroom and fungi, and dried herbs and spices. Models are constructed in a Bayesian framework and represent a combination of information from a literature survey of spore loads from positive-control experiments that establish a detection limit and from dedicated microbiological tests for real food materials. The detection of nonproteolytic C. botulinum employed an optimized protocol that combines selective enrichment culture with multiplex PCR, and the majority of tests on food materials were negative. Posterior beliefs about spore loads center on a concentration range of 1 to 10 spores kg(-1). Posterior beliefs for larger spore loads were most significant for dried herbs and spices and were most sensitive to the detailed results from control experiments. Probability distributions for spore loads are represented in a convenient form that can be used for numerical analysis and risk assessments. Copyright © 2016 Barker et al.

  13. Joint time/frequency-domain inversion of reflection data for seabed geoacoustic profiles and uncertainties.

    PubMed

    Dettmer, Jan; Dosso, Stan E; Holland, Charles W

    2008-03-01

    This paper develops a joint time/frequency-domain inversion for high-resolution single-bounce reflection data, with the potential to resolve fine-scale profiles of sediment velocity, density, and attenuation over small seafloor footprints (approximately 100 m). The approach utilizes sequential Bayesian inversion of time- and frequency-domain reflection data, employing ray-tracing inversion for reflection travel times and a layer-packet stripping method for spherical-wave reflection-coefficient inversion. Posterior credibility intervals from the travel-time inversion are passed on as prior information to the reflection-coefficient inversion. Within the reflection-coefficient inversion, parameter information is passed from one layer packet inversion to the next in terms of marginal probability distributions rotated into principal components, providing an efficient approach to (partially) account for multi-dimensional parameter correlations with one-dimensional, numerical distributions. Quantitative geoacoustic parameter uncertainties are provided by a nonlinear Gibbs sampling approach employing full data error covariance estimation (including nonstationary effects) and accounting for possible biases in travel-time picks. Posterior examination of data residuals shows the importance of including data covariance estimates in the inversion. The joint inversion is applied to data collected on the Malta Plateau during the SCARAB98 experiment.

  14. Quantification of Nonproteolytic Clostridium botulinum Spore Loads in Food Materials

    PubMed Central

    Barker, Gary C.; Malakar, Pradeep K.; Plowman, June

    2016-01-01

    We have produced data and developed analysis to build representations for the concentration of spores of nonproteolytic Clostridium botulinum in materials that are used during the manufacture of minimally processed chilled foods in the United Kingdom. Food materials are categorized into homogenous groups which include meat, fish, shellfish, cereals, fresh plant material, dairy liquid, dairy nonliquid, mushroom and fungi, and dried herbs and spices. Models are constructed in a Bayesian framework and represent a combination of information from a literature survey of spore loads from positive-control experiments that establish a detection limit and from dedicated microbiological tests for real food materials. The detection of nonproteolytic C. botulinum employed an optimized protocol that combines selective enrichment culture with multiplex PCR, and the majority of tests on food materials were negative. Posterior beliefs about spore loads center on a concentration range of 1 to 10 spores kg−1. Posterior beliefs for larger spore loads were most significant for dried herbs and spices and were most sensitive to the detailed results from control experiments. Probability distributions for spore loads are represented in a convenient form that can be used for numerical analysis and risk assessments. PMID:26729721

  15. Multiclass Posterior Probability Twin SVM for Motor Imagery EEG Classification.

    PubMed

    She, Qingshan; Ma, Yuliang; Meng, Ming; Luo, Zhizeng

    2015-01-01

    Motor imagery electroencephalography is widely used in the brain-computer interface systems. Due to inherent characteristics of electroencephalography signals, accurate and real-time multiclass classification is always challenging. In order to solve this problem, a multiclass posterior probability solution for twin SVM is proposed by the ranking continuous output and pairwise coupling in this paper. First, two-class posterior probability model is constructed to approximate the posterior probability by the ranking continuous output techniques and Platt's estimating method. Secondly, a solution of multiclass probabilistic outputs for twin SVM is provided by combining every pair of class probabilities according to the method of pairwise coupling. Finally, the proposed method is compared with multiclass SVM and twin SVM via voting, and multiclass posterior probability SVM using different coupling approaches. The efficacy on the classification accuracy and time complexity of the proposed method has been demonstrated by both the UCI benchmark datasets and real world EEG data from BCI Competition IV Dataset 2a, respectively.

  16. Reasoning and choice in the Monty Hall Dilemma (MHD): implications for improving Bayesian reasoning

    PubMed Central

    Tubau, Elisabet; Aguilar-Lleyda, David; Johnson, Eric D.

    2015-01-01

    The Monty Hall Dilemma (MHD) is a two-step decision problem involving counterintuitive conditional probabilities. The first choice is made among three equally probable options, whereas the second choice takes place after the elimination of one of the non-selected options which does not hide the prize. Differing from most Bayesian problems, statistical information in the MHD has to be inferred, either by learning outcome probabilities or by reasoning from the presented sequence of events. This often leads to suboptimal decisions and erroneous probability judgments. Specifically, decision makers commonly develop a wrong intuition that final probabilities are equally distributed, together with a preference for their first choice. Several studies have shown that repeated practice enhances sensitivity to the different reward probabilities, but does not facilitate correct Bayesian reasoning. However, modest improvements in probability judgments have been observed after guided explanations. To explain these dissociations, the present review focuses on two types of causes producing the observed biases: Emotional-based choice biases and cognitive limitations in understanding probabilistic information. Among the latter, we identify a crucial cause for the universal difficulty in overcoming the equiprobability illusion: Incomplete representation of prior and conditional probabilities. We conclude that repeated practice and/or high incentives can be effective for overcoming choice biases, but promoting an adequate partitioning of possibilities seems to be necessary for overcoming cognitive illusions and improving Bayesian reasoning. PMID:25873906

  17. Quantum mechanics: The Bayesian theory generalized to the space of Hermitian matrices

    NASA Astrophysics Data System (ADS)

    Benavoli, Alessio; Facchini, Alessandro; Zaffalon, Marco

    2016-10-01

    We consider the problem of gambling on a quantum experiment and enforce rational behavior by a few rules. These rules yield, in the classical case, the Bayesian theory of probability via duality theorems. In our quantum setting, they yield the Bayesian theory generalized to the space of Hermitian matrices. This very theory is quantum mechanics: in fact, we derive all its four postulates from the generalized Bayesian theory. This implies that quantum mechanics is self-consistent. It also leads us to reinterpret the main operations in quantum mechanics as probability rules: Bayes' rule (measurement), marginalization (partial tracing), independence (tensor product). To say it with a slogan, we obtain that quantum mechanics is the Bayesian theory in the complex numbers.

  18. A Feature-based Developmental Model of the Infant Brain in Structural MRI

    PubMed Central

    Toews, Matthew; Wells, William M.; Zöllei, Lilla

    2014-01-01

    In this paper, anatomical development is modeled as a collection of distinctive image patterns localized in space and time. A Bayesian posterior probability is defined over a random variable of subject age, conditioned on data in the form of scale-invariant image features. The model is automatically learned from a large set of images exhibiting significant variation, used to discover anatomical structure related to age and development, and fit to new images to predict age. The model is applied to a set of 230 infant structural MRIs of 92 subjects acquired at multiple sites over an age range of 8-590 days. Experiments demonstrate that the model can be used to identify age-related anatomical structure, and to predict the age of new subjects with an average error of 72 days. PMID:23286050

  19. Stochastic Inversion of 2D Magnetotelluric Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, Jinsong

    2010-07-01

    The algorithm is developed to invert 2D magnetotelluric (MT) data based on sharp boundary parametrization using a Bayesian framework. Within the algorithm, we consider the locations and the resistivity of regions formed by the interfaces are as unknowns. We use a parallel, adaptive finite-element algorithm to forward simulate frequency-domain MT responses of 2D conductivity structure. Those unknown parameters are spatially correlated and are described by a geostatistical model. The joint posterior probability distribution function is explored by Markov Chain Monte Carlo (MCMC) sampling methods. The developed stochastic model is effective for estimating the interface locations and resistivity. Most importantly, itmore » provides details uncertainty information on each unknown parameter. Hardware requirements: PC, Supercomputer, Multi-platform, Workstation; Software requirements C and Fortan; Operation Systems/version is Linux/Unix or Windows« less

  20. Bayesian performance metrics of binary sensors in homeland security applications

    NASA Astrophysics Data System (ADS)

    Jannson, Tomasz P.; Forrester, Thomas C.

    2008-04-01

    Bayesian performance metrics, based on such parameters, as: prior probability, probability of detection (or, accuracy), false alarm rate, and positive predictive value, characterizes the performance of binary sensors; i.e., sensors that have only binary response: true target/false target. Such binary sensors, very common in Homeland Security, produce an alarm that can be true, or false. They include: X-ray airport inspection, IED inspections, product quality control, cancer medical diagnosis, part of ATR, and many others. In this paper, we analyze direct and inverse conditional probabilities in the context of Bayesian inference and binary sensors, using X-ray luggage inspection statistical results as a guideline.

  1. Evidence reasoning method for constructing conditional probability tables in a Bayesian network of multimorbidity.

    PubMed

    Du, Yuanwei; Guo, Yubin

    2015-01-01

    The intrinsic mechanism of multimorbidity is difficult to recognize and prediction and diagnosis are difficult to carry out accordingly. Bayesian networks can help to diagnose multimorbidity in health care, but it is difficult to obtain the conditional probability table (CPT) because of the lack of clinically statistical data. Today, expert knowledge and experience are increasingly used in training Bayesian networks in order to help predict or diagnose diseases, but the CPT in Bayesian networks is usually irrational or ineffective for ignoring realistic constraints especially in multimorbidity. In order to solve these problems, an evidence reasoning (ER) approach is employed to extract and fuse inference data from experts using a belief distribution and recursive ER algorithm, based on which evidence reasoning method for constructing conditional probability tables in Bayesian network of multimorbidity is presented step by step. A multimorbidity numerical example is used to demonstrate the method and prove its feasibility and application. Bayesian network can be determined as long as the inference assessment is inferred by each expert according to his/her knowledge or experience. Our method is more effective than existing methods for extracting expert inference data accurately and is fused effectively for constructing CPTs in a Bayesian network of multimorbidity.

  2. Semiparametric Bayesian classification with longitudinal markers

    PubMed Central

    De la Cruz-Mesía, Rolando; Quintana, Fernando A.; Müller, Peter

    2013-01-01

    Summary We analyse data from a study involving 173 pregnant women. The data are observed values of the β human chorionic gonadotropin hormone measured during the first 80 days of gestational age, including from one up to six longitudinal responses for each woman. The main objective in this study is to predict normal versus abnormal pregnancy outcomes from data that are available at the early stages of pregnancy. We achieve the desired classification with a semiparametric hierarchical model. Specifically, we consider a Dirichlet process mixture prior for the distribution of the random effects in each group. The unknown random-effects distributions are allowed to vary across groups but are made dependent by using a design vector to select different features of a single underlying random probability measure. The resulting model is an extension of the dependent Dirichlet process model, with an additional probability model for group classification. The model is shown to perform better than an alternative model which is based on independent Dirichlet processes for the groups. Relevant posterior distributions are summarized by using Markov chain Monte Carlo methods. PMID:24368871

  3. Model inversion via multi-fidelity Bayesian optimization: a new paradigm for parameter estimation in haemodynamics, and beyond.

    PubMed

    Perdikaris, Paris; Karniadakis, George Em

    2016-05-01

    We present a computational framework for model inversion based on multi-fidelity information fusion and Bayesian optimization. The proposed methodology targets the accurate construction of response surfaces in parameter space, and the efficient pursuit to identify global optima while keeping the number of expensive function evaluations at a minimum. We train families of correlated surrogates on available data using Gaussian processes and auto-regressive stochastic schemes, and exploit the resulting predictive posterior distributions within a Bayesian optimization setting. This enables a smart adaptive sampling procedure that uses the predictive posterior variance to balance the exploration versus exploitation trade-off, and is a key enabler for practical computations under limited budgets. The effectiveness of the proposed framework is tested on three parameter estimation problems. The first two involve the calibration of outflow boundary conditions of blood flow simulations in arterial bifurcations using multi-fidelity realizations of one- and three-dimensional models, whereas the last one aims to identify the forcing term that generated a particular solution to an elliptic partial differential equation. © 2016 The Author(s).

  4. Model inversion via multi-fidelity Bayesian optimization: a new paradigm for parameter estimation in haemodynamics, and beyond

    PubMed Central

    Perdikaris, Paris; Karniadakis, George Em

    2016-01-01

    We present a computational framework for model inversion based on multi-fidelity information fusion and Bayesian optimization. The proposed methodology targets the accurate construction of response surfaces in parameter space, and the efficient pursuit to identify global optima while keeping the number of expensive function evaluations at a minimum. We train families of correlated surrogates on available data using Gaussian processes and auto-regressive stochastic schemes, and exploit the resulting predictive posterior distributions within a Bayesian optimization setting. This enables a smart adaptive sampling procedure that uses the predictive posterior variance to balance the exploration versus exploitation trade-off, and is a key enabler for practical computations under limited budgets. The effectiveness of the proposed framework is tested on three parameter estimation problems. The first two involve the calibration of outflow boundary conditions of blood flow simulations in arterial bifurcations using multi-fidelity realizations of one- and three-dimensional models, whereas the last one aims to identify the forcing term that generated a particular solution to an elliptic partial differential equation. PMID:27194481

  5. Bayesian methods for characterizing unknown parameters of material models

    DOE PAGES

    Emery, J. M.; Grigoriu, M. D.; Field Jr., R. V.

    2016-02-04

    A Bayesian framework is developed for characterizing the unknown parameters of probabilistic models for material properties. In this framework, the unknown parameters are viewed as random and described by their posterior distributions obtained from prior information and measurements of quantities of interest that are observable and depend on the unknown parameters. The proposed Bayesian method is applied to characterize an unknown spatial correlation of the conductivity field in the definition of a stochastic transport equation and to solve this equation by Monte Carlo simulation and stochastic reduced order models (SROMs). As a result, the Bayesian method is also employed tomore » characterize unknown parameters of material properties for laser welds from measurements of peak forces sustained by these welds.« less

  6. Bayesian methods for characterizing unknown parameters of material models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Emery, J. M.; Grigoriu, M. D.; Field Jr., R. V.

    A Bayesian framework is developed for characterizing the unknown parameters of probabilistic models for material properties. In this framework, the unknown parameters are viewed as random and described by their posterior distributions obtained from prior information and measurements of quantities of interest that are observable and depend on the unknown parameters. The proposed Bayesian method is applied to characterize an unknown spatial correlation of the conductivity field in the definition of a stochastic transport equation and to solve this equation by Monte Carlo simulation and stochastic reduced order models (SROMs). As a result, the Bayesian method is also employed tomore » characterize unknown parameters of material properties for laser welds from measurements of peak forces sustained by these welds.« less

  7. Numerical Demons in Monte Carlo Estimation of Bayesian Model Evidence with Application to Soil Respiration Models

    NASA Astrophysics Data System (ADS)

    Elshall, A. S.; Ye, M.; Niu, G. Y.; Barron-Gafford, G.

    2016-12-01

    Bayesian multimodel inference is increasingly being used in hydrology. Estimating Bayesian model evidence (BME) is of central importance in many Bayesian multimodel analysis such as Bayesian model averaging and model selection. BME is the overall probability of the model in reproducing the data, accounting for the trade-off between the goodness-of-fit and the model complexity. Yet estimating BME is challenging, especially for high dimensional problems with complex sampling space. Estimating BME using the Monte Carlo numerical methods is preferred, as the methods yield higher accuracy than semi-analytical solutions (e.g. Laplace approximations, BIC, KIC, etc.). However, numerical methods are prone the numerical demons arising from underflow of round off errors. Although few studies alluded to this issue, to our knowledge this is the first study that illustrates these numerical demons. We show that the precision arithmetic can become a threshold on likelihood values and Metropolis acceptance ratio, which results in trimming parameter regions (when likelihood function is less than the smallest floating point number that a computer can represent) and corrupting of the empirical measures of the random states of the MCMC sampler (when using log-likelihood function). We consider two of the most powerful numerical estimators of BME that are the path sampling method of thermodynamic integration (TI) and the importance sampling method of steppingstone sampling (SS). We also consider the two most widely used numerical estimators, which are the prior sampling arithmetic mean (AS) and posterior sampling harmonic mean (HM). We investigate the vulnerability of these four estimators to the numerical demons. Interesting, the most biased estimator, namely the HM, turned out to be the least vulnerable. While it is generally assumed that AM is a bias-free estimator that will always approximate the true BME by investing in computational effort, we show that arithmetic underflow can hamper AM resulting in severe underestimation of BME. TI turned out to be the most vulnerable, resulting in BME overestimation. Finally, we show how SS can be largely invariant to rounding errors, yielding the most accurate and computational efficient results. These research results are useful for MC simulations to estimate Bayesian model evidence.

  8. Association of climate drivers with rainfall in New South Wales, Australia, using Bayesian Model Averaging

    NASA Astrophysics Data System (ADS)

    Duc, Hiep Nguyen; Rivett, Kelly; MacSween, Katrina; Le-Anh, Linh

    2017-01-01

    Rainfall in New South Wales (NSW), located in the southeast of the Australian continent, is known to be influenced by four major climate drivers: the El Niño/Southern Oscillation (ENSO), the Interdecadal Pacific Oscillation (IPO), the Southern Annular Mode (SAM) and the Indian Ocean Dipole (IOD). Many studies have shown the influences of ENSO, IPO modulation, SAM and IOD on rainfall in Australia and on southeast Australia in particular. However, only limited work has been undertaken using a multiple regression framework to examine the extent of the combined effect of these climate drivers on rainfall. This paper analysed the role of these combined climate drivers and their interaction on the rainfall in NSW using Bayesian Model Averaging (BMA) to account for model uncertainty by considering each of the linear models across the whole model space which is equal to the set of all possible combinations of predictors to find the model posterior probabilities and their expected predictor coefficients. Using BMA for linear regression models, we are able to corroborate and confirm the results from many previous studies. In addition, the method gives the ranking order of importance and the probability of the association of each of the climate drivers and their interaction on the rainfall at a site. The ability to quantify the relative contribution of the climate drivers offers the key to understand the complex interaction of drivers on rainfall, or lack of rainfall in a region, such as the three big droughts in southeastern Australia which have been the subject of discussion and debate recently on their causes.

  9. Bayesian Hypothesis Testing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Andrews, Stephen A.; Sigeti, David E.

    These are a set of slides about Bayesian hypothesis testing, where many hypotheses are tested. The conclusions are the following: The value of the Bayes factor obtained when using the median of the posterior marginal is almost the minimum value of the Bayes factor. The value of τ 2 which minimizes the Bayes factor is a reasonable choice for this parameter. This allows a likelihood ratio to be computed with is the least favorable to H 0.

  10. Accelerating Approximate Bayesian Computation with Quantile Regression: application to cosmological redshift distributions

    NASA Astrophysics Data System (ADS)

    Kacprzak, T.; Herbel, J.; Amara, A.; Réfrégier, A.

    2018-02-01

    Approximate Bayesian Computation (ABC) is a method to obtain a posterior distribution without a likelihood function, using simulations and a set of distance metrics. For that reason, it has recently been gaining popularity as an analysis tool in cosmology and astrophysics. Its drawback, however, is a slow convergence rate. We propose a novel method, which we call qABC, to accelerate ABC with Quantile Regression. In this method, we create a model of quantiles of distance measure as a function of input parameters. This model is trained on a small number of simulations and estimates which regions of the prior space are likely to be accepted into the posterior. Other regions are then immediately rejected. This procedure is then repeated as more simulations are available. We apply it to the practical problem of estimation of redshift distribution of cosmological samples, using forward modelling developed in previous work. The qABC method converges to nearly same posterior as the basic ABC. It uses, however, only 20% of the number of simulations compared to basic ABC, achieving a fivefold gain in execution time for our problem. For other problems the acceleration rate may vary; it depends on how close the prior is to the final posterior. We discuss possible improvements and extensions to this method.

  11. An evaluation of the Bayesian approach to fitting the N-mixture model for use with pseudo-replicated count data

    USGS Publications Warehouse

    Toribo, S.G.; Gray, B.R.; Liang, S.

    2011-01-01

    The N-mixture model proposed by Royle in 2004 may be used to approximate the abundance and detection probability of animal species in a given region. In 2006, Royle and Dorazio discussed the advantages of using a Bayesian approach in modelling animal abundance and occurrence using a hierarchical N-mixture model. N-mixture models assume replication on sampling sites, an assumption that may be violated when the site is not closed to changes in abundance during the survey period or when nominal replicates are defined spatially. In this paper, we studied the robustness of a Bayesian approach to fitting the N-mixture model for pseudo-replicated count data. Our simulation results showed that the Bayesian estimates for abundance and detection probability are slightly biased when the actual detection probability is small and are sensitive to the presence of extra variability within local sites.

  12. A Dynamic Bayesian Network Model for the Production and Inventory Control

    NASA Astrophysics Data System (ADS)

    Shin, Ji-Sun; Takazaki, Noriyuki; Lee, Tae-Hong; Kim, Jin-Il; Lee, Hee-Hyol

    In general, the production quantities and delivered goods are changed randomly and then the total stock is also changed randomly. This paper deals with the production and inventory control using the Dynamic Bayesian Network. Bayesian Network is a probabilistic model which represents the qualitative dependence between two or more random variables by the graph structure, and indicates the quantitative relations between individual variables by the conditional probability. The probabilistic distribution of the total stock is calculated through the propagation of the probability on the network. Moreover, an adjusting rule of the production quantities to maintain the probability of a lower limit and a ceiling of the total stock to certain values is shown.

  13. PyMC: Bayesian Stochastic Modelling in Python

    PubMed Central

    Patil, Anand; Huard, David; Fonnesbeck, Christopher J.

    2010-01-01

    This user guide describes a Python package, PyMC, that allows users to efficiently code a probabilistic model and draw samples from its posterior distribution using Markov chain Monte Carlo techniques. PMID:21603108

  14. Efficient marginalization to compute protein posterior probabilities from shotgun mass spectrometry data

    PubMed Central

    Serang, Oliver; MacCoss, Michael J.; Noble, William Stafford

    2010-01-01

    The problem of identifying proteins from a shotgun proteomics experiment has not been definitively solved. Identifying the proteins in a sample requires ranking them, ideally with interpretable scores. In particular, “degenerate” peptides, which map to multiple proteins, have made such a ranking difficult to compute. The problem of computing posterior probabilities for the proteins, which can be interpreted as confidence in a protein’s presence, has been especially daunting. Previous approaches have either ignored the peptide degeneracy problem completely, addressed it by computing a heuristic set of proteins or heuristic posterior probabilities, or by estimating the posterior probabilities with sampling methods. We present a probabilistic model for protein identification in tandem mass spectrometry that recognizes peptide degeneracy. We then introduce graph-transforming algorithms that facilitate efficient computation of protein probabilities, even for large data sets. We evaluate our identification procedure on five different well-characterized data sets and demonstrate our ability to efficiently compute high-quality protein posteriors. PMID:20712337

  15. Variational learning and bits-back coding: an information-theoretic view to Bayesian learning.

    PubMed

    Honkela, Antti; Valpola, Harri

    2004-07-01

    The bits-back coding first introduced by Wallace in 1990 and later by Hinton and van Camp in 1993 provides an interesting link between Bayesian learning and information-theoretic minimum-description-length (MDL) learning approaches. The bits-back coding allows interpreting the cost function used in the variational Bayesian method called ensemble learning as a code length in addition to the Bayesian view of misfit of the posterior approximation and a lower bound of model evidence. Combining these two viewpoints provides interesting insights to the learning process and the functions of different parts of the model. In this paper, the problem of variational Bayesian learning of hierarchical latent variable models is used to demonstrate the benefits of the two views. The code-length interpretation provides new views to many parts of the problem such as model comparison and pruning and helps explain many phenomena occurring in learning.

  16. Bayesian calibration of terrestrial ecosystem models: a study of advanced Markov chain Monte Carlo methods

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lu, Dan; Ricciuto, Daniel M.; Walker, Anthony P.

    Calibration of terrestrial ecosystem models is important but challenging. Bayesian inference implemented by Markov chain Monte Carlo (MCMC) sampling provides a comprehensive framework to estimate model parameters and associated uncertainties using their posterior distributions. The effectiveness and efficiency of the method strongly depend on the MCMC algorithm used. In this work, a differential evolution adaptive Metropolis (DREAM) algorithm is used to estimate posterior distributions of 21 parameters for the data assimilation linked ecosystem carbon (DALEC) model using 14 years of daily net ecosystem exchange data collected at the Harvard Forest Environmental Measurement Site eddy-flux tower. The calibration of DREAM results inmore » a better model fit and predictive performance compared to the popular adaptive Metropolis (AM) scheme. Moreover, DREAM indicates that two parameters controlling autumn phenology have multiple modes in their posterior distributions while AM only identifies one mode. The application suggests that DREAM is very suitable to calibrate complex terrestrial ecosystem models, where the uncertain parameter size is usually large and existence of local optima is always a concern. In addition, this effort justifies the assumptions of the error model used in Bayesian calibration according to the residual analysis. Here, the result indicates that a heteroscedastic, correlated, Gaussian error model is appropriate for the problem, and the consequent constructed likelihood function can alleviate the underestimation of parameter uncertainty that is usually caused by using uncorrelated error models.« less

  17. Bayesian calibration of terrestrial ecosystem models: a study of advanced Markov chain Monte Carlo methods

    DOE PAGES

    Lu, Dan; Ricciuto, Daniel M.; Walker, Anthony P.; ...

    2017-09-27

    Calibration of terrestrial ecosystem models is important but challenging. Bayesian inference implemented by Markov chain Monte Carlo (MCMC) sampling provides a comprehensive framework to estimate model parameters and associated uncertainties using their posterior distributions. The effectiveness and efficiency of the method strongly depend on the MCMC algorithm used. In this work, a differential evolution adaptive Metropolis (DREAM) algorithm is used to estimate posterior distributions of 21 parameters for the data assimilation linked ecosystem carbon (DALEC) model using 14 years of daily net ecosystem exchange data collected at the Harvard Forest Environmental Measurement Site eddy-flux tower. The calibration of DREAM results inmore » a better model fit and predictive performance compared to the popular adaptive Metropolis (AM) scheme. Moreover, DREAM indicates that two parameters controlling autumn phenology have multiple modes in their posterior distributions while AM only identifies one mode. The application suggests that DREAM is very suitable to calibrate complex terrestrial ecosystem models, where the uncertain parameter size is usually large and existence of local optima is always a concern. In addition, this effort justifies the assumptions of the error model used in Bayesian calibration according to the residual analysis. Here, the result indicates that a heteroscedastic, correlated, Gaussian error model is appropriate for the problem, and the consequent constructed likelihood function can alleviate the underestimation of parameter uncertainty that is usually caused by using uncorrelated error models.« less

  18. A hierarchical Bayesian GEV model for improving local and regional flood quantile estimates

    NASA Astrophysics Data System (ADS)

    Lima, Carlos H. R.; Lall, Upmanu; Troy, Tara; Devineni, Naresh

    2016-10-01

    We estimate local and regional Generalized Extreme Value (GEV) distribution parameters for flood frequency analysis in a multilevel, hierarchical Bayesian framework, to explicitly model and reduce uncertainties. As prior information for the model, we assume that the GEV location and scale parameters for each site come from independent log-normal distributions, whose mean parameter scales with the drainage area. From empirical and theoretical arguments, the shape parameter for each site is shrunk towards a common mean. Non-informative prior distributions are assumed for the hyperparameters and the MCMC method is used to sample from the joint posterior distribution. The model is tested using annual maximum series from 20 streamflow gauges located in an 83,000 km2 flood prone basin in Southeast Brazil. The results show a significant reduction of uncertainty estimates of flood quantile estimates over the traditional GEV model, particularly for sites with shorter records. For return periods within the range of the data (around 50 years), the Bayesian credible intervals for the flood quantiles tend to be narrower than the classical confidence limits based on the delta method. As the return period increases beyond the range of the data, the confidence limits from the delta method become unreliable and the Bayesian credible intervals provide a way to estimate satisfactory confidence bands for the flood quantiles considering parameter uncertainties and regional information. In order to evaluate the applicability of the proposed hierarchical Bayesian model for regional flood frequency analysis, we estimate flood quantiles for three randomly chosen out-of-sample sites and compare with classical estimates using the index flood method. The posterior distributions of the scaling law coefficients are used to define the predictive distributions of the GEV location and scale parameters for the out-of-sample sites given only their drainage areas and the posterior distribution of the average shape parameter is taken as the regional predictive distribution for this parameter. While the index flood method does not provide a straightforward way to consider the uncertainties in the index flood and in the regional parameters, the results obtained here show that the proposed Bayesian method is able to produce adequate credible intervals for flood quantiles that are in accordance with empirical estimates.

  19. Bayesian inference for disease prevalence using negative binomial group testing

    PubMed Central

    Pritchard, Nicholas A.; Tebbs, Joshua M.

    2011-01-01

    Group testing, also known as pooled testing, and inverse sampling are both widely used methods of data collection when the goal is to estimate a small proportion. Taking a Bayesian approach, we consider the new problem of estimating disease prevalence from group testing when inverse (negative binomial) sampling is used. Using different distributions to incorporate prior knowledge of disease incidence and different loss functions, we derive closed form expressions for posterior distributions and resulting point and credible interval estimators. We then evaluate our new estimators, on Bayesian and classical grounds, and apply our methods to a West Nile Virus data set. PMID:21259308

  20. Toward accurate and precise estimates of lion density.

    PubMed

    Elliot, Nicholas B; Gopalaswamy, Arjun M

    2017-08-01

    Reliable estimates of animal density are fundamental to understanding ecological processes and population dynamics. Furthermore, their accuracy is vital to conservation because wildlife authorities rely on estimates to make decisions. However, it is notoriously difficult to accurately estimate density for wide-ranging carnivores that occur at low densities. In recent years, significant progress has been made in density estimation of Asian carnivores, but the methods have not been widely adapted to African carnivores, such as lions (Panthera leo). Although abundance indices for lions may produce poor inferences, they continue to be used to estimate density and inform management and policy. We used sighting data from a 3-month survey and adapted a Bayesian spatially explicit capture-recapture (SECR) model to estimate spatial lion density in the Maasai Mara National Reserve and surrounding conservancies in Kenya. Our unstructured spatial capture-recapture sampling design incorporated search effort to explicitly estimate detection probability and density on a fine spatial scale, making our approach robust in the context of varying detection probabilities. Overall posterior mean lion density was estimated to be 17.08 (posterior SD 1.310) lions >1 year old/100 km 2 , and the sex ratio was estimated at 2.2 females to 1 male. Our modeling framework and narrow posterior SD demonstrate that SECR methods can produce statistically rigorous and precise estimates of population parameters, and we argue that they should be favored over less reliable abundance indices. Furthermore, our approach is flexible enough to incorporate different data types, which enables robust population estimates over relatively short survey periods in a variety of systems. Trend analyses are essential to guide conservation decisions but are frequently based on surveys of differing reliability. We therefore call for a unified framework to assess lion numbers in key populations to improve management and policy decisions. © 2016 Society for Conservation Biology.

  1. Within-herd prevalence and clinical incidence distributions of Mycobacterium avium subspecies paratuberculosis infection on dairy herds in Chile.

    PubMed

    Verdugo, Cristobal; Valdes, Maria Francisca; Salgado, Miguel

    2018-06-01

    This study aimed to estimate the distributions of the within-herd true prevalence (TP) and the annual clinical incidence proportion (CIp) of Mycobacterium avium subsp. paratuberculosis (MAP) infection in dairy cattle herds in Chile. Forty two commercial herds with antecedents of MAP infection were randomly selected to participate in the study. In small herds (≤30 cows), serum samples were collected from all animals present. Whereas, in larger herds, milk or serum samples were collected from all milking cows with 2 or more parities. Samples were analysed using the Pourquier® ELISA PARATUBERCULOSIS (Insitute Pourquier, France) test. Moreover, a questionnaire gathering information on management practices and the frequency of clinical cases, compatible with paratuberculosis (in the previous 12 months), was applied on the sampling date. A Bayesian latent class analysis was used to obtain TP and clinical incidence posterior distributions. The model adjusts for uncertainty in test sensitivity (serum or milk) and specificity, and prior TP & CIp estimates. A total of 4963 animals were tested, with an average contribution of 124 samples per herd. A mean apparent prevalence of 6.3% (95% confidence interval: 4.0-8.0%) was observed. Model outputs indicated an overall TP posterior distribution, across herds, with a median of 13.1% (95% posterior probability interval (PPI); 3.2-38.1%). A high TP variability was observed between herds. CIp presented a posterior median of 1.1% (95% PPI; 0.2-4.6%). Model results complement information missing from previously conducted epidemiological studies in the sector, and they could be used for further assessment of the disease impact and planning of control programs. Copyright © 2018 Elsevier B.V. All rights reserved.

  2. Functional mechanisms of probabilistic inference in feature- and space-based attentional systems.

    PubMed

    Dombert, Pascasie L; Kuhns, Anna; Mengotti, Paola; Fink, Gereon R; Vossel, Simone

    2016-11-15

    Humans flexibly attend to features or locations and these processes are influenced by the probability of sensory events. We combined computational modeling of response times with fMRI to compare the functional correlates of (re-)orienting, and the modulation by probabilistic inference in spatial and feature-based attention systems. Twenty-four volunteers performed two task versions with spatial or color cues. Percentage of cue validity changed unpredictably. A hierarchical Bayesian model was used to derive trial-wise estimates of probability-dependent attention, entering the fMRI analysis as parametric regressors. Attentional orienting activated a dorsal frontoparietal network in both tasks, without significant parametric modulation. Spatially invalid trials activated a bilateral frontoparietal network and the precuneus, while invalid feature trials activated the left intraparietal sulcus (IPS). Probability-dependent attention modulated activity in the precuneus, left posterior IPS, middle occipital gyrus, and right temporoparietal junction for spatial attention, and in the left anterior IPS for feature-based and spatial attention. These findings provide novel insights into the generality and specificity of the functional basis of attentional control. They suggest that probabilistic inference can distinctively affect each attentional subsystem, but that there is an overlap in the left IPS, which responds to both spatial and feature-based expectancy violations. Copyright © 2016 Elsevier Inc. All rights reserved.

  3. Model weights and the foundations of multimodel inference

    USGS Publications Warehouse

    Link, W.A.; Barker, R.J.

    2006-01-01

    Statistical thinking in wildlife biology and ecology has been profoundly influenced by the introduction of AIC (Akaike?s information criterion) as a tool for model selection and as a basis for model averaging. In this paper, we advocate the Bayesian paradigm as a broader framework for multimodel inference, one in which model averaging and model selection are naturally linked, and in which the performance of AIC-based tools is naturally evaluated. Prior model weights implicitly associated with the use of AIC are seen to highly favor complex models: in some cases, all but the most highly parameterized models in the model set are virtually ignored a priori. We suggest the usefulness of the weighted BIC (Bayesian information criterion) as a computationally simple alternative to AIC, based on explicit selection of prior model probabilities rather than acceptance of default priors associated with AIC. We note, however, that both procedures are only approximate to the use of exact Bayes factors. We discuss and illustrate technical difficulties associated with Bayes factors, and suggest approaches to avoiding these difficulties in the context of model selection for a logistic regression. Our example highlights the predisposition of AIC weighting to favor complex models and suggests a need for caution in using the BIC for computing approximate posterior model weights.

  4. The improved business valuation model for RFID company based on the community mining method.

    PubMed

    Li, Shugang; Yu, Zhaoxu

    2017-01-01

    Nowadays, the appetite for the investment and mergers and acquisitions (M&A) activity in RFID companies is growing rapidly. Although the huge number of papers have addressed the topic of business valuation models based on statistical methods or neural network methods, only a few are dedicated to constructing a general framework for business valuation that improves the performance with network graph (NG) and the corresponding community mining (CM) method. In this study, an NG based business valuation model is proposed, where real options approach (ROA) integrating CM method is designed to predict the company's net profit as well as estimate the company value. Three improvements are made in the proposed valuation model: Firstly, our model figures out the credibility of the node belonging to each community and clusters the network according to the evolutionary Bayesian method. Secondly, the improved bacterial foraging optimization algorithm (IBFOA) is adopted to calculate the optimized Bayesian posterior probability function. Finally, in IBFOA, bi-objective method is used to assess the accuracy of prediction, and these two objectives are combined into one objective function using a new Pareto boundary method. The proposed method returns lower forecasting error than 10 well-known forecasting models on 3 different time interval valuing tasks for the real-life simulation of RFID companies.

  5. The complete mitochondrial genomes of five Eimeria species infecting domestic rabbits.

    PubMed

    Liu, Guo-Hua; Tian, Si-Qin; Cui, Ping; Fang, Su-Fang; Wang, Chun-Ren; Zhu, Xing-Quan

    2015-12-01

    Rabbit coccidiosis caused by members of the genus Eimeria can cause enormous economic impact worldwide, but the genetics, epidemiology and biology of these parasites remain poorly understood. In the present study, we sequenced and annotated the complete mitochondrial (mt) genomes of five Eimeria species that commonly infect the domestic rabbits. The complete mt genomes of Eimeria intestinalis, Eimeria flavescens, Eimeria media, Eimeria vejdovskyi and Eimeria irresidua were 6261bp, 6258bp, 6168bp, 6254bp, 6259bp in length, respectively. All of the mt genomes consist of 3 genes for proteins (cytb, cox1, and cox3), 14 gene fragments for the large subunit (LSU) rRNA and 11 gene fragments for the small subunit (SSU) rRNA, but no transfer RNA (tRNA) genes. The gene order of the mt genomes is similar to that of Plasmodium, but distinct from Haemosporida and Theileria. Phylogenetic analyses based on full nucleotide sequences using Bayesian analysis revealed that the monophyly of the Eimeria of rabbits was strongly statistically supported with a Bayesian posterior probabilities. These data provide novel mtDNA markers for studying the population genetics and molecular epidemiology of the Eimeria species, and should have implications for the molecular diagnosis, prevention and control of coccidiosis in rabbits. Copyright © 2015 Elsevier Inc. All rights reserved.

  6. Improved Bayesian Infrasonic Source Localization for regional infrasound

    DOE PAGES

    Blom, Philip S.; Marcillo, Omar; Arrowsmith, Stephen J.

    2015-10-20

    The Bayesian Infrasonic Source Localization (BISL) methodology is examined and simplified providing a generalized method of estimating the source location and time for an infrasonic event and the mathematical framework is used therein. The likelihood function describing an infrasonic detection used in BISL has been redefined to include the von Mises distribution developed in directional statistics and propagation-based, physically derived celerity-range and azimuth deviation models. Frameworks for constructing propagation-based celerity-range and azimuth deviation statistics are presented to demonstrate how stochastic propagation modelling methods can be used to improve the precision and accuracy of the posterior probability density function describing themore » source localization. Infrasonic signals recorded at a number of arrays in the western United States produced by rocket motor detonations at the Utah Test and Training Range are used to demonstrate the application of the new mathematical framework and to quantify the improvement obtained by using the stochastic propagation modelling methods. Moreover, using propagation-based priors, the spatial and temporal confidence bounds of the source decreased by more than 40 per cent in all cases and by as much as 80 per cent in one case. Further, the accuracy of the estimates remained high, keeping the ground truth within the 99 per cent confidence bounds for all cases.« less

  7. The improved business valuation model for RFID company based on the community mining method

    PubMed Central

    Li, Shugang; Yu, Zhaoxu

    2017-01-01

    Nowadays, the appetite for the investment and mergers and acquisitions (M&A) activity in RFID companies is growing rapidly. Although the huge number of papers have addressed the topic of business valuation models based on statistical methods or neural network methods, only a few are dedicated to constructing a general framework for business valuation that improves the performance with network graph (NG) and the corresponding community mining (CM) method. In this study, an NG based business valuation model is proposed, where real options approach (ROA) integrating CM method is designed to predict the company’s net profit as well as estimate the company value. Three improvements are made in the proposed valuation model: Firstly, our model figures out the credibility of the node belonging to each community and clusters the network according to the evolutionary Bayesian method. Secondly, the improved bacterial foraging optimization algorithm (IBFOA) is adopted to calculate the optimized Bayesian posterior probability function. Finally, in IBFOA, bi-objective method is used to assess the accuracy of prediction, and these two objectives are combined into one objective function using a new Pareto boundary method. The proposed method returns lower forecasting error than 10 well-known forecasting models on 3 different time interval valuing tasks for the real-life simulation of RFID companies. PMID:28459815

  8. Hydrologic drought prediction under climate change: Uncertainty modeling with Dempster-Shafer and Bayesian approaches

    NASA Astrophysics Data System (ADS)

    Raje, Deepashree; Mujumdar, P. P.

    2010-09-01

    Representation and quantification of uncertainty in climate change impact studies are a difficult task. Several sources of uncertainty arise in studies of hydrologic impacts of climate change, such as those due to choice of general circulation models (GCMs), scenarios and downscaling methods. Recently, much work has focused on uncertainty quantification and modeling in regional climate change impacts. In this paper, an uncertainty modeling framework is evaluated, which uses a generalized uncertainty measure to combine GCM, scenario and downscaling uncertainties. The Dempster-Shafer (D-S) evidence theory is used for representing and combining uncertainty from various sources. A significant advantage of the D-S framework over the traditional probabilistic approach is that it allows for the allocation of a probability mass to sets or intervals, and can hence handle both aleatory or stochastic uncertainty, and epistemic or subjective uncertainty. This paper shows how the D-S theory can be used to represent beliefs in some hypotheses such as hydrologic drought or wet conditions, describe uncertainty and ignorance in the system, and give a quantitative measurement of belief and plausibility in results. The D-S approach has been used in this work for information synthesis using various evidence combination rules having different conflict modeling approaches. A case study is presented for hydrologic drought prediction using downscaled streamflow in the Mahanadi River at Hirakud in Orissa, India. Projections of n most likely monsoon streamflow sequences are obtained from a conditional random field (CRF) downscaling model, using an ensemble of three GCMs for three scenarios, which are converted to monsoon standardized streamflow index (SSFI-4) series. This range is used to specify the basic probability assignment (bpa) for a Dempster-Shafer structure, which represents uncertainty associated with each of the SSFI-4 classifications. These uncertainties are then combined across GCMs and scenarios using various evidence combination rules given by the D-S theory. A Bayesian approach is also presented for this case study, which models the uncertainty in projected frequencies of SSFI-4 classifications by deriving a posterior distribution for the frequency of each classification, using an ensemble of GCMs and scenarios. Results from the D-S and Bayesian approaches are compared, and relative merits of each approach are discussed. Both approaches show an increasing probability of extreme, severe and moderate droughts and decreasing probability of normal and wet conditions in Orissa as a result of climate change.

  9. Resolution analysis of marine seismic full waveform data by Bayesian inversion

    NASA Astrophysics Data System (ADS)

    Ray, A.; Sekar, A.; Hoversten, G. M.; Albertin, U.

    2015-12-01

    The Bayesian posterior density function (PDF) of earth models that fit full waveform seismic data convey information on the uncertainty with which the elastic model parameters are resolved. In this work, we apply the trans-dimensional reversible jump Markov Chain Monte Carlo method (RJ-MCMC) for the 1D inversion of noisy synthetic full-waveform seismic data in the frequency-wavenumber domain. While seismic full waveform inversion (FWI) is a powerful method for characterizing subsurface elastic parameters, the uncertainty in the inverted models has remained poorly known, if at all and is highly initial model dependent. The Bayesian method we use is trans-dimensional in that the number of model layers is not fixed, and flexible such that the layer boundaries are free to move around. The resulting parameterization does not require regularization to stabilize the inversion. Depth resolution is traded off with the number of layers, providing an estimate of uncertainty in elastic parameters (compressional and shear velocities Vp and Vs as well as density) with depth. We find that in the absence of additional constraints, Bayesian inversion can result in a wide range of posterior PDFs on Vp, Vs and density. These PDFs range from being clustered around the true model, to those that contain little resolution of any particular features other than those in the near surface, depending on the particular data and target geometry. We present results for a suite of different frequencies and offset ranges, examining the differences in the posterior model densities thus derived. Though these results are for a 1D earth, they are applicable to areas with simple, layered geology and provide valuable insight into the resolving capabilities of FWI, as well as highlight the challenges in solving a highly non-linear problem. The RJ-MCMC method also presents a tantalizing possibility for extension to 2D and 3D Bayesian inversion of full waveform seismic data in the future, as it objectively tackles the problem of model selection (i.e., the number of layers or cells for parameterization), which could ease the computational burden of evaluating forward models with many parameters.

  10. The Psychology of Bayesian Reasoning

    DTIC Science & Technology

    2014-10-21

    The psychology of Bayesian reasoning David R. Mandel* Socio-Cognitive Systems Section, Defence Research and Development Canada and Department...belief revision, subjective probability, human judgment, psychological methods. Most psychological research on Bayesian reasoning since the 1970s has...attention to some important problems with the conventional approach to studying Bayesian reasoning in psychology that has been dominant since the

  11. The Scientific Method, Diagnostic Bayes, and How to Detect Epistemic Errors

    NASA Astrophysics Data System (ADS)

    Vrugt, J. A.

    2015-12-01

    In the past decades, Bayesian methods have found widespread application and use in environmental systems modeling. Bayes theorem states that the posterior probability, P(H|D) of a hypothesis, H is proportional to the product of the prior probability, P(H) of this hypothesis and the likelihood, L(H|hat{D}) of the same hypothesis given the new/incoming observations, \\hat {D}. In science and engineering, H often constitutes some numerical simulation model, D = F(x,.) which summarizes using algebraic, empirical, and differential equations, state variables and fluxes, all our theoretical and/or practical knowledge of the system of interest, and x are the d unknown parameters which are subject to inference using some data, \\hat {D} of the observed system response. The Bayesian approach is intimately related to the scientific method and uses an iterative cycle of hypothesis formulation (model), experimentation and data collection, and theory/hypothesis refinement to elucidate the rules that govern the natural world. Unfortunately, model refinement has proven to be very difficult in large part because of the poor diagnostic power of residual based likelihood functions tep{gupta2008}. This has inspired te{vrugt2013} to advocate the use of 'likelihood-free' inference using approximate Bayesian computation (ABC). This approach uses one or more summary statistics, S(\\hat {D}) of the original data, \\hat {D} designed ideally to be sensitive only to one particular process in the model. Any mismatch between the observed and simulated summary metrics is then easily linked to a specific model component. A recurrent issue with the application of ABC is self-sufficiency of the summary statistics. In theory, S(.) should contain as much information as the original data itself, yet complex systems rarely admit sufficient statistics. In this article, we propose to combine the ideas of ABC and regular Bayesian inference to guarantee that no information is lost in diagnostic model evaluation. This hybrid approach, coined diagnostic Bayes, uses the summary metrics as prior distribution and original data in the likelihood function, or P(x|\\hat {D}) ∝ P(x|S(\\hat {D})) L(x|\\hat {D}). A case study illustrates the ability of the proposed methodology to diagnose epistemic errors and provide guidance on model refinement.

  12. Optimal observation network design for conceptual model discrimination and uncertainty reduction

    NASA Astrophysics Data System (ADS)

    Pham, Hai V.; Tsai, Frank T.-C.

    2016-02-01

    This study expands the Box-Hill discrimination function to design an optimal observation network to discriminate conceptual models and, in turn, identify a most favored model. The Box-Hill discrimination function measures the expected decrease in Shannon entropy (for model identification) before and after the optimal design for one additional observation. This study modifies the discrimination function to account for multiple future observations that are assumed spatiotemporally independent and Gaussian-distributed. Bayesian model averaging (BMA) is used to incorporate existing observation data and quantify future observation uncertainty arising from conceptual and parametric uncertainties in the discrimination function. In addition, the BMA method is adopted to predict future observation data in a statistical sense. The design goal is to find optimal locations and least data via maximizing the Box-Hill discrimination function value subject to a posterior model probability threshold. The optimal observation network design is illustrated using a groundwater study in Baton Rouge, Louisiana, to collect additional groundwater heads from USGS wells. The sources of uncertainty creating multiple groundwater models are geological architecture, boundary condition, and fault permeability architecture. Impacts of considering homoscedastic and heteroscedastic future observation data and the sources of uncertainties on potential observation areas are analyzed. Results show that heteroscedasticity should be considered in the design procedure to account for various sources of future observation uncertainty. After the optimal design is obtained and the corresponding data are collected for model updating, total variances of head predictions can be significantly reduced by identifying a model with a superior posterior model probability.

  13. New insights into galaxy structure from GALPHAT- I. Motivation, methodology and benchmarks for Sérsic models

    NASA Astrophysics Data System (ADS)

    Yoon, Ilsang; Weinberg, Martin D.; Katz, Neal

    2011-06-01

    We introduce a new galaxy image decomposition tool, GALPHAT (GALaxy PHotometric ATtributes), which is a front-end application of the Bayesian Inference Engine (BIE), a parallel Markov chain Monte Carlo package, to provide full posterior probability distributions and reliable confidence intervals for all model parameters. The BIE relies on GALPHAT to compute the likelihood function. GALPHAT generates scale-free cumulative image tables for the desired model family with precise error control. Interpolation of this table yields accurate pixellated images with any centre, scale and inclination angle. GALPHAT then rotates the image by position angle using a Fourier shift theorem, yielding high-speed, accurate likelihood computation. We benchmark this approach using an ensemble of simulated Sérsic model galaxies over a wide range of observational conditions: the signal-to-noise ratio S/N, the ratio of galaxy size to the point spread function (PSF) and the image size, and errors in the assumed PSF; and a range of structural parameters: the half-light radius re and the Sérsic index n. We characterize the strength of parameter covariance in the Sérsic model, which increases with S/N and n, and the results strongly motivate the need for the full posterior probability distribution in galaxy morphology analyses and later inferences. The test results for simulated galaxies successfully demonstrate that, with a careful choice of Markov chain Monte Carlo algorithms and fast model image generation, GALPHAT is a powerful analysis tool for reliably inferring morphological parameters from a large ensemble of galaxies over a wide range of different observational conditions.

  14. A gradient-based model parametrization using Bernstein polynomials in Bayesian inversion of surface wave dispersion

    NASA Astrophysics Data System (ADS)

    Gosselin, Jeremy M.; Dosso, Stan E.; Cassidy, John F.; Quijano, Jorge E.; Molnar, Sheri; Dettmer, Jan

    2017-10-01

    This paper develops and applies a Bernstein-polynomial parametrization to efficiently represent general, gradient-based profiles in nonlinear geophysical inversion, with application to ambient-noise Rayleigh-wave dispersion data. Bernstein polynomials provide a stable parametrization in that small perturbations to the model parameters (basis-function coefficients) result in only small perturbations to the geophysical parameter profile. A fully nonlinear Bayesian inversion methodology is applied to estimate shear wave velocity (VS) profiles and uncertainties from surface wave dispersion data extracted from ambient seismic noise. The Bayesian information criterion is used to determine the appropriate polynomial order consistent with the resolving power of the data. Data error correlations are accounted for in the inversion using a parametric autoregressive model. The inversion solution is defined in terms of marginal posterior probability profiles for VS as a function of depth, estimated using Metropolis-Hastings sampling with parallel tempering. This methodology is applied to synthetic dispersion data as well as data processed from passive array recordings collected on the Fraser River Delta in British Columbia, Canada. Results from this work are in good agreement with previous studies, as well as with co-located invasive measurements. The approach considered here is better suited than `layered' modelling approaches in applications where smooth gradients in geophysical parameters are expected, such as soil/sediment profiles. Further, the Bernstein polynomial representation is more general than smooth models based on a fixed choice of gradient type (e.g. power-law gradient) because the form of the gradient is determined objectively by the data, rather than by a subjective parametrization choice.

  15. Towards Breaking the Histone Code – Bayesian Graphical Models for Histone Modifications

    PubMed Central

    Mitra, Riten; Müller, Peter; Liang, Shoudan; Xu, Yanxun; Ji, Yuan

    2013-01-01

    Background Histones are proteins that wrap DNA around in small spherical structures called nucleosomes. Histone modifications (HMs) refer to the post-translational modifications to the histone tails. At a particular genomic locus, each of these HMs can either be present or absent, and the combinatory patterns of the presence or absence of multiple HMs, or the ‘histone codes,’ are believed to co-regulate important biological processes. We aim to use raw data on HM markers at different genomic loci to (1) decode the complex biological network of HMs in a single region and (2) demonstrate how the HM networks differ in different regulatory regions. We suggest that these differences in network attributes form a significant link between histones and genomic functions. Methods and Results We develop a powerful graphical model under Bayesian paradigm. Posterior inference is fully probabilistic, allowing us to compute the probabilities of distinct dependence patterns of the HMs using graphs. Furthermore, our model-based framework allows for easy but important extensions for inference on differential networks under various conditions, such as the different annotations of the genomic locations (e.g., promoters versus insulators). We applied these models to ChIP-Seq data based on CD4+ T lymphocytes. The results confirmed many existing findings and provided a unified tool to generate various promising hypotheses. Differential network analyses revealed new insights on co-regulation of HMs of transcriptional activities in different genomic regions. Conclusions The use of Bayesian graphical models and borrowing strength across different conditions provide high power to infer histone networks and their differences. PMID:23748248

  16. Semiparametric Bayesian analysis of gene-environment interactions with error in measurement of environmental covariates and missing genetic data.

    PubMed

    Lobach, Iryna; Mallick, Bani; Carroll, Raymond J

    2011-01-01

    Case-control studies are widely used to detect gene-environment interactions in the etiology of complex diseases. Many variables that are of interest to biomedical researchers are difficult to measure on an individual level, e.g. nutrient intake, cigarette smoking exposure, long-term toxic exposure. Measurement error causes bias in parameter estimates, thus masking key features of data and leading to loss of power and spurious/masked associations. We develop a Bayesian methodology for analysis of case-control studies for the case when measurement error is present in an environmental covariate and the genetic variable has missing data. This approach offers several advantages. It allows prior information to enter the model to make estimation and inference more precise. The environmental covariates measured exactly are modeled completely nonparametrically. Further, information about the probability of disease can be incorporated in the estimation procedure to improve quality of parameter estimates, what cannot be done in conventional case-control studies. A unique feature of the procedure under investigation is that the analysis is based on a pseudo-likelihood function therefore conventional Bayesian techniques may not be technically correct. We propose an approach using Markov Chain Monte Carlo sampling as well as a computationally simple method based on an asymptotic posterior distribution. Simulation experiments demonstrated that our method produced parameter estimates that are nearly unbiased even for small sample sizes. An application of our method is illustrated using a population-based case-control study of the association between calcium intake with the risk of colorectal adenoma development.

  17. Overcoming the effects of rogue taxa: Evolutionary relationships of the bee flies

    PubMed Central

    Trautwein, Michelle D.; Wiegmann, Brian M.; Yeates, David K

    2011-01-01

    Bombyliidae (5000 sp.), or bee flies, are a lower brachyceran family of flower-visiting flies that, as larvae, act as parasitoids of other insects. The evolutionary relationships are known from a morphological analysis that yielded minimal support for higher-level groupings. We use the protein-coding gene CAD and 28S rDNA to determine phylogeny and to test the monophyly of existing subfamilies, the divisions Tomophtalmae, and ‘the sand chamber subfamilies’. Additionally, we demonstrate that consensus networks can be used to identify rogue taxa in a Bayesian framework. Pruning rogue taxa post-analysis from the final tree distribution results in increased posterior probabilities. We find 8 subfamilies to be monophyletic and the subfamilies Heterotropinae and Mythicomyiinae to be the earliest diverging lineages. The large subfamily Bombyliinae is found to be polyphyletic and our data does not provide evidence for the monophyly of Tomophthalmae or the ‘sand chamber subfamilies’. PMID:21686308

  18. Behavioral and Molecular Genetics of Reading-Related AM and FM Detection Thresholds.

    PubMed

    Bruni, Matthew; Flax, Judy F; Buyske, Steven; Shindhelm, Amber D; Witton, Caroline; Brzustowicz, Linda M; Bartlett, Christopher W

    2017-03-01

    Auditory detection thresholds for certain frequencies of both amplitude modulated (AM) and frequency modulated (FM) dynamic auditory stimuli are associated with reading in typically developing and dyslexic readers. We present the first behavioral and molecular genetic characterization of these two auditory traits. Two extant extended family datasets were given reading tasks and psychoacoustic tasks to determine FM 2 Hz and AM 20 Hz sensitivity thresholds. Univariate heritabilities were significant for both AM (h 2  = 0.20) and FM (h 2  = 0.29). Bayesian posterior probability of linkage (PPL) analysis found loci for AM (12q, PPL = 81 %) and FM (10p, PPL = 32 %; 20q, PPL = 65 %). Bivariate heritability analyses revealed that FM is genetically correlated with reading, while AM was not. Bivariate PPL analysis indicates that FM loci (10p, 20q) are not also associated with reading.

  19. On the statistical properties of viral misinformation in online social media

    NASA Astrophysics Data System (ADS)

    Bessi, Alessandro

    2017-03-01

    The massive diffusion of online social media allows for the rapid and uncontrolled spreading of conspiracy theories, hoaxes, unsubstantiated claims, and false news. Such an impressive amount of misinformation can influence policy preferences and encourage behaviors strongly divergent from recommended practices. In this paper, we study the statistical properties of viral misinformation in online social media. By means of methods belonging to Extreme Value Theory, we show that the number of extremely viral posts over time follows a homogeneous Poisson process, and that the interarrival times between such posts are independent and identically distributed, following an exponential distribution. Moreover, we characterize the uncertainty around the rate parameter of the Poisson process through Bayesian methods. Finally, we are able to derive the predictive posterior probability distribution of the number of posts exceeding a certain threshold of shares over a finite interval of time.

  20. BANYAN_Sigma: Bayesian classifier for members of young stellar associations

    NASA Astrophysics Data System (ADS)

    Gagné, Jonathan; Mamajek, Eric E.; Malo, Lison; Riedel, Adric; Rodriguez, David; Lafrenière, David; Faherty, Jacqueline K.; Roy-Loubier, Olivier; Pueyo, Laurent; Robin, Annie C.; Doyon, René

    2018-01-01

    BANYAN_Sigma calculates the membership probability that a given astrophysical object belongs to one of the currently known 27 young associations within 150 pc of the Sun, using Bayesian inference. This tool uses the sky position and proper motion measurements of an object, with optional radial velocity (RV) and distance (D) measurements, to derive a Bayesian membership probability. By default, the priors are adjusted such that a probability threshold of 90% will recover 50%, 68%, 82% or 90% of true association members depending on what observables are input (only sky position and proper motion, with RV, with D, with both RV and D, respectively). The algorithm is implemented in a Python package, in IDL, and is also implemented as an interactive web page.

  1. Bayesian learning

    NASA Technical Reports Server (NTRS)

    Denning, Peter J.

    1989-01-01

    In 1983 and 1984, the Infrared Astronomical Satellite (IRAS) detected 5,425 stellar objects and measured their infrared spectra. In 1987 a program called AUTOCLASS used Bayesian inference methods to discover the classes present in these data and determine the most probable class of each object, revealing unknown phenomena in astronomy. AUTOCLASS has rekindled the old debate on the suitability of Bayesian methods, which are computationally intensive, interpret probabilities as plausibility measures rather than frequencies, and appear to depend on a subjective assessment of the probability of a hypothesis before the data were collected. Modern statistical methods have, however, recently been shown to also depend on subjective elements. These debates bring into question the whole tradition of scientific objectivity and offer scientists a new way to take responsibility for their findings and conclusions.

  2. Quantum-Bayesian coherence

    NASA Astrophysics Data System (ADS)

    Fuchs, Christopher A.; Schack, Rüdiger

    2013-10-01

    In the quantum-Bayesian interpretation of quantum theory (or QBism), the Born rule cannot be interpreted as a rule for setting measurement-outcome probabilities from an objective quantum state. But if not, what is the role of the rule? In this paper, the argument is given that it should be seen as an empirical addition to Bayesian reasoning itself. Particularly, it is shown how to view the Born rule as a normative rule in addition to usual Dutch-book coherence. It is a rule that takes into account how one should assign probabilities to the consequences of various intended measurements on a physical system, but explicitly in terms of prior probabilities for and conditional probabilities consequent upon the imagined outcomes of a special counterfactual reference measurement. This interpretation is exemplified by representing quantum states in terms of probabilities for the outcomes of a fixed, fiducial symmetric informationally complete measurement. The extent to which the general form of the new normative rule implies the full state-space structure of quantum mechanics is explored.

  3. Uncertainty Analysis Based on Sparse Grid Collocation and Quasi-Monte Carlo Sampling with Application in Groundwater Modeling

    NASA Astrophysics Data System (ADS)

    Zhang, G.; Lu, D.; Ye, M.; Gunzburger, M.

    2011-12-01

    Markov Chain Monte Carlo (MCMC) methods have been widely used in many fields of uncertainty analysis to estimate the posterior distributions of parameters and credible intervals of predictions in the Bayesian framework. However, in practice, MCMC may be computationally unaffordable due to slow convergence and the excessive number of forward model executions required, especially when the forward model is expensive to compute. Both disadvantages arise from the curse of dimensionality, i.e., the posterior distribution is usually a multivariate function of parameters. Recently, sparse grid method has been demonstrated to be an effective technique for coping with high-dimensional interpolation or integration problems. Thus, in order to accelerate the forward model and avoid the slow convergence of MCMC, we propose a new method for uncertainty analysis based on sparse grid interpolation and quasi-Monte Carlo sampling. First, we construct a polynomial approximation of the forward model in the parameter space by using the sparse grid interpolation. This approximation then defines an accurate surrogate posterior distribution that can be evaluated repeatedly at minimal computational cost. Second, instead of using MCMC, a quasi-Monte Carlo method is applied to draw samples in the parameter space. Then, the desired probability density function of each prediction is approximated by accumulating the posterior density values of all the samples according to the prediction values. Our method has the following advantages: (1) the polynomial approximation of the forward model on the sparse grid provides a very efficient evaluation of the surrogate posterior distribution; (2) the quasi-Monte Carlo method retains the same accuracy in approximating the PDF of predictions but avoids all disadvantages of MCMC. The proposed method is applied to a controlled numerical experiment of groundwater flow modeling. The results show that our method attains the same accuracy much more efficiently than traditional MCMC.

  4. Bayesian Statistics for Biological Data: Pedigree Analysis

    ERIC Educational Resources Information Center

    Stanfield, William D.; Carlton, Matthew A.

    2004-01-01

    The use of Bayes' formula is applied to the biological problem of pedigree analysis to show that the Bayes' formula and non-Bayesian or "classical" methods of probability calculation give different answers. First year college students of biology can be introduced to the Bayesian statistics.

  5. Melanoma Cell Colony Expansion Parameters Revealed by Approximate Bayesian Computation

    PubMed Central

    Vo, Brenda N.; Drovandi, Christopher C.; Pettitt, Anthony N.; Pettet, Graeme J.

    2015-01-01

    In vitro studies and mathematical models are now being widely used to study the underlying mechanisms driving the expansion of cell colonies. This can improve our understanding of cancer formation and progression. Although much progress has been made in terms of developing and analysing mathematical models, far less progress has been made in terms of understanding how to estimate model parameters using experimental in vitro image-based data. To address this issue, a new approximate Bayesian computation (ABC) algorithm is proposed to estimate key parameters governing the expansion of melanoma cell (MM127) colonies, including cell diffusivity, D, cell proliferation rate, λ, and cell-to-cell adhesion, q, in two experimental scenarios, namely with and without a chemical treatment to suppress cell proliferation. Even when little prior biological knowledge about the parameters is assumed, all parameters are precisely inferred with a small posterior coefficient of variation, approximately 2–12%. The ABC analyses reveal that the posterior distributions of D and q depend on the experimental elapsed time, whereas the posterior distribution of λ does not. The posterior mean values of D and q are in the ranges 226–268 µm2h−1, 311–351 µm2h−1 and 0.23–0.39, 0.32–0.61 for the experimental periods of 0–24 h and 24–48 h, respectively. Furthermore, we found that the posterior distribution of q also depends on the initial cell density, whereas the posterior distributions of D and λ do not. The ABC approach also enables information from the two experiments to be combined, resulting in greater precision for all estimates of D and λ. PMID:26642072

  6. Bayesian nonparametric regression with varying residual density

    PubMed Central

    Pati, Debdeep; Dunson, David B.

    2013-01-01

    We consider the problem of robust Bayesian inference on the mean regression function allowing the residual density to change flexibly with predictors. The proposed class of models is based on a Gaussian process prior for the mean regression function and mixtures of Gaussians for the collection of residual densities indexed by predictors. Initially considering the homoscedastic case, we propose priors for the residual density based on probit stick-breaking (PSB) scale mixtures and symmetrized PSB (sPSB) location-scale mixtures. Both priors restrict the residual density to be symmetric about zero, with the sPSB prior more flexible in allowing multimodal densities. We provide sufficient conditions to ensure strong posterior consistency in estimating the regression function under the sPSB prior, generalizing existing theory focused on parametric residual distributions. The PSB and sPSB priors are generalized to allow residual densities to change nonparametrically with predictors through incorporating Gaussian processes in the stick-breaking components. This leads to a robust Bayesian regression procedure that automatically down-weights outliers and influential observations in a locally-adaptive manner. Posterior computation relies on an efficient data augmentation exact block Gibbs sampler. The methods are illustrated using simulated and real data applications. PMID:24465053

  7. Bayesian analysis of rare events

    NASA Astrophysics Data System (ADS)

    Straub, Daniel; Papaioannou, Iason; Betz, Wolfgang

    2016-06-01

    In many areas of engineering and science there is an interest in predicting the probability of rare events, in particular in applications related to safety and security. Increasingly, such predictions are made through computer models of physical systems in an uncertainty quantification framework. Additionally, with advances in IT, monitoring and sensor technology, an increasing amount of data on the performance of the systems is collected. This data can be used to reduce uncertainty, improve the probability estimates and consequently enhance the management of rare events and associated risks. Bayesian analysis is the ideal method to include the data into the probabilistic model. It ensures a consistent probabilistic treatment of uncertainty, which is central in the prediction of rare events, where extrapolation from the domain of observation is common. We present a framework for performing Bayesian updating of rare event probabilities, termed BUS. It is based on a reinterpretation of the classical rejection-sampling approach to Bayesian analysis, which enables the use of established methods for estimating probabilities of rare events. By drawing upon these methods, the framework makes use of their computational efficiency. These methods include the First-Order Reliability Method (FORM), tailored importance sampling (IS) methods and Subset Simulation (SuS). In this contribution, we briefly review these methods in the context of the BUS framework and investigate their applicability to Bayesian analysis of rare events in different settings. We find that, for some applications, FORM can be highly efficient and is surprisingly accurate, enabling Bayesian analysis of rare events with just a few model evaluations. In a general setting, BUS implemented through IS and SuS is more robust and flexible.

  8. Bayesian statistics in radionuclide metrology: measurement of a decaying source

    NASA Astrophysics Data System (ADS)

    Bochud, François O.; Bailat, Claude J.; Laedermann, Jean-Pascal

    2007-08-01

    The most intuitive way of defining a probability is perhaps through the frequency at which it appears when a large number of trials are realized in identical conditions. The probability derived from the obtained histogram characterizes the so-called frequentist or conventional statistical approach. In this sense, probability is defined as a physical property of the observed system. By contrast, in Bayesian statistics, a probability is not a physical property or a directly observable quantity, but a degree of belief or an element of inference. The goal of this paper is to show how Bayesian statistics can be used in radionuclide metrology and what its advantages and disadvantages are compared with conventional statistics. This is performed through the example of an yttrium-90 source typically encountered in environmental surveillance measurement. Because of the very low activity of this kind of source and the small half-life of the radionuclide, this measurement takes several days, during which the source decays significantly. Several methods are proposed to compute simultaneously the number of unstable nuclei at a given reference time, the decay constant and the background. Asymptotically, all approaches give the same result. However, Bayesian statistics produces coherent estimates and confidence intervals in a much smaller number of measurements. Apart from the conceptual understanding of statistics, the main difficulty that could deter radionuclide metrologists from using Bayesian statistics is the complexity of the computation.

  9. Annealed Importance Sampling for Neural Mass Models

    PubMed Central

    Penny, Will; Sengupta, Biswa

    2016-01-01

    Neural Mass Models provide a compact description of the dynamical activity of cell populations in neocortical regions. Moreover, models of regional activity can be connected together into networks, and inferences made about the strength of connections, using M/EEG data and Bayesian inference. To date, however, Bayesian methods have been largely restricted to the Variational Laplace (VL) algorithm which assumes that the posterior distribution is Gaussian and finds model parameters that are only locally optimal. This paper explores the use of Annealed Importance Sampling (AIS) to address these restrictions. We implement AIS using proposals derived from Langevin Monte Carlo (LMC) which uses local gradient and curvature information for efficient exploration of parameter space. In terms of the estimation of Bayes factors, VL and AIS agree about which model is best but report different degrees of belief. Additionally, AIS finds better model parameters and we find evidence of non-Gaussianity in their posterior distribution. PMID:26942606

  10. A Bayesian inverse modeling approach to estimate soil hydraulic properties of a toposequence in southeastern Amazonia.

    NASA Astrophysics Data System (ADS)

    Stucchi Boschi, Raquel; Qin, Mingming; Gimenez, Daniel; Cooper, Miguel

    2016-04-01

    Modeling is an important tool for better understanding and assessing land use impacts on landscape processes. A key point for environmental modeling is the knowledge of soil hydraulic properties. However, direct determination of soil hydraulic properties is difficult and costly, particularly in vast and remote regions such as one constituting the Amazon Biome. One way to overcome this problem is to extrapolate accurately estimated data to pedologically similar sites. The van Genuchten (VG) parametric equation is the most commonly used for modeling SWRC. The use of a Bayesian approach in combination with the Markov chain Monte Carlo to estimate the VG parameters has several advantages compared to the widely used global optimization techniques. The Bayesian approach provides posterior distributions of parameters that are independent from the initial values and allow for uncertainty analyses. The main objectives of this study were: i) to estimate hydraulic parameters from data of pasture and forest sites by the Bayesian inverse modeling approach; and ii) to investigate the extrapolation of the estimated VG parameters to a nearby toposequence with pedologically similar soils to those used for its estimate. The parameters were estimated from volumetric water content and tension observations obtained after rainfall events during a 207-day period from pasture and forest sites located in the southeastern Amazon region. These data were used to run HYDRUS-1D under a Differential Evolution Adaptive Metropolis (DREAM) scheme 10,000 times, and only the last 2,500 times were used to calculate the posterior distributions of each hydraulic parameter along with 95% confidence intervals (CI) of volumetric water content and tension time series. Then, the posterior distributions were used to generate hydraulic parameters for two nearby toposequences composed by six soil profiles, three are under forest and three are under pasture. The parameters of the nearby site were accepted when the predicted tension time series were within the 95% CI which is derived from the calibration site using DREAM scheme.

  11. Bayesian prediction of future ice sheet volume using local approximation Markov chain Monte Carlo methods

    NASA Astrophysics Data System (ADS)

    Davis, A. D.; Heimbach, P.; Marzouk, Y.

    2017-12-01

    We develop a Bayesian inverse modeling framework for predicting future ice sheet volume with associated formal uncertainty estimates. Marine ice sheets are drained by fast-flowing ice streams, which we simulate using a flowline model. Flowline models depend on geometric parameters (e.g., basal topography), parameterized physical processes (e.g., calving laws and basal sliding), and climate parameters (e.g., surface mass balance), most of which are unknown or uncertain. Given observations of ice surface velocity and thickness, we define a Bayesian posterior distribution over static parameters, such as basal topography. We also define a parameterized distribution over variable parameters, such as future surface mass balance, which we assume are not informed by the data. Hyperparameters are used to represent climate change scenarios, and sampling their distributions mimics internal variation. For example, a warming climate corresponds to increasing mean surface mass balance but an individual sample may have periods of increasing or decreasing surface mass balance. We characterize the predictive distribution of ice volume by evaluating the flowline model given samples from the posterior distribution and the distribution over variable parameters. Finally, we determine the effect of climate change on future ice sheet volume by investigating how changing the hyperparameters affects the predictive distribution. We use state-of-the-art Bayesian computation to address computational feasibility. Characterizing the posterior distribution (using Markov chain Monte Carlo), sampling the full range of variable parameters and evaluating the predictive model is prohibitively expensive. Furthermore, the required resolution of the inferred basal topography may be very high, which is often challenging for sampling methods. Instead, we leverage regularity in the predictive distribution to build a computationally cheaper surrogate over the low dimensional quantity of interest (future ice sheet volume). Continual surrogate refinement guarantees asymptotic sampling from the predictive distribution. Directly characterizing the predictive distribution in this way allows us to assess the ice sheet's sensitivity to climate variability and change.

  12. Estimation of sensitivity and specificity of pregnancy diagnosis using transrectal ultrasonography and ELISA for pregnancy-associated glycoprotein in dairy cows using a Bayesian latent class model.

    PubMed

    Shephard, R W; Morton, J M

    2018-01-01

    To determine the sensitivity (Se) and specificity (Sp) of pregnancy diagnosis using transrectal ultrasonography and an ELISA for pregnancy-associated glycoprotein (PAG) in milk, in lactating dairy cows in seasonally calving herds approximately 85-100 days after the start of the herd's breeding period. Paired results were used from pregnancy diagnosis using transrectal ultrasonography and ELISA for PAG in milk carried out approximately 85 and 100 days after the start of the breeding period, respectively, from 879 cows from four herds in Victoria, Australia. A Bayesian latent class model was used to estimate the proportion of cows pregnant, the Se and Sp of each test, and covariances between test results in pregnant and non-pregnant cows. Prior probability estimates were defined using beta distributions for the expected proportion of cows pregnant, Se and Sp for each test, and covariances between tests. Markov Chain Monte Carlo iterations identified posterior distributions for each of the unknown variables. Posterior distributions for each parameter were described using medians and 95% probability (i.e. credible) intervals (PrI). The posterior median estimates for Se and Sp for each test were used to estimate positive predictive and negative predictive values across a range of pregnancy proportions. The estimate for proportion pregnant was 0.524 (95% PrI = 0.485-0.562). For pregnancy diagnosis using transrectal ultrasonography, Se and Sp were 0.939 (95% PrI = 0.890-0.974) and 0.943 (95% PrI = 0.885-0.984), respectively; for ELISA, Se and Sp were 0.963 (95% PrI = 0.919-0.990) and 0.870 (95% PrI = 0.806-0.931), respectively. The estimated covariance between test results was 0.033 (95% PrI = 0.008-0.046) and 0.035 (95% PrI = 0.018-0.078) for pregnant and non-pregnant cows, respectively. Pregnancy diagnosis results using transrectal ultrasonography had a higher positive predictive value but lower negative predictive value than results from the ELISA across the range of pregnancy proportions assessed. Pregnancy diagnosis using transrectal ultrasonography and ELISA for PAG in milk had similar Se but differed in predictive values. Pregnancy diagnosis in seasonally calving herds around 85-100 days after the start of the breeding period using the ELISA is expected to result in a higher negative predictive value but lower positive predictive value than pregnancy diagnosis using transrectal ultrasonography. Thus, with the ELISA, a higher proportion of the cows with negative results will be non-pregnant, relative to results from transrectal ultrasonography, but a lower proportion of cows with positive results will be pregnant.

  13. Probabilistic Model for Untargeted Peak Detection in LC-MS Using Bayesian Statistics.

    PubMed

    Woldegebriel, Michael; Vivó-Truyols, Gabriel

    2015-07-21

    We introduce a novel Bayesian probabilistic peak detection algorithm for liquid chromatography-mass spectroscopy (LC-MS). The final probabilistic result allows the user to make a final decision about which points in a chromatogram are affected by a chromatographic peak and which ones are only affected by noise. The use of probabilities contrasts with the traditional method in which a binary answer is given, relying on a threshold. By contrast, with the Bayesian peak detection presented here, the values of probability can be further propagated into other preprocessing steps, which will increase (or decrease) the importance of chromatographic regions into the final results. The present work is based on the use of the statistical overlap theory of component overlap from Davis and Giddings (Davis, J. M.; Giddings, J. Anal. Chem. 1983, 55, 418-424) as prior probability in the Bayesian formulation. The algorithm was tested on LC-MS Orbitrap data and was able to successfully distinguish chemical noise from actual peaks without any data preprocessing.

  14. A Tutorial in Bayesian Potential Outcomes Mediation Analysis.

    PubMed

    Miočević, Milica; Gonzalez, Oscar; Valente, Matthew J; MacKinnon, David P

    2018-01-01

    Statistical mediation analysis is used to investigate intermediate variables in the relation between independent and dependent variables. Causal interpretation of mediation analyses is challenging because randomization of subjects to levels of the independent variable does not rule out the possibility of unmeasured confounders of the mediator to outcome relation. Furthermore, commonly used frequentist methods for mediation analysis compute the probability of the data given the null hypothesis, which is not the probability of a hypothesis given the data as in Bayesian analysis. Under certain assumptions, applying the potential outcomes framework to mediation analysis allows for the computation of causal effects, and statistical mediation in the Bayesian framework gives indirect effects probabilistic interpretations. This tutorial combines causal inference and Bayesian methods for mediation analysis so the indirect and direct effects have both causal and probabilistic interpretations. Steps in Bayesian causal mediation analysis are shown in the application to an empirical example.

  15. Bayesian calibration of terrestrial ecosystem models: A study of advanced Markov chain Monte Carlo methods

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lu, Dan; Ricciuto, Daniel; Walker, Anthony

    Calibration of terrestrial ecosystem models is important but challenging. Bayesian inference implemented by Markov chain Monte Carlo (MCMC) sampling provides a comprehensive framework to estimate model parameters and associated uncertainties using their posterior distributions. The effectiveness and efficiency of the method strongly depend on the MCMC algorithm used. In this study, a Differential Evolution Adaptive Metropolis (DREAM) algorithm was used to estimate posterior distributions of 21 parameters for the data assimilation linked ecosystem carbon (DALEC) model using 14 years of daily net ecosystem exchange data collected at the Harvard Forest Environmental Measurement Site eddy-flux tower. The DREAM is a multi-chainmore » method and uses differential evolution technique for chain movement, allowing it to be efficiently applied to high-dimensional problems, and can reliably estimate heavy-tailed and multimodal distributions that are difficult for single-chain schemes using a Gaussian proposal distribution. The results were evaluated against the popular Adaptive Metropolis (AM) scheme. DREAM indicated that two parameters controlling autumn phenology have multiple modes in their posterior distributions while AM only identified one mode. The calibration of DREAM resulted in a better model fit and predictive performance compared to the AM. DREAM provides means for a good exploration of the posterior distributions of model parameters. Lastly, it reduces the risk of false convergence to a local optimum and potentially improves the predictive performance of the calibrated model.« less

  16. Bayesian calibration of terrestrial ecosystem models: A study of advanced Markov chain Monte Carlo methods

    DOE PAGES

    Lu, Dan; Ricciuto, Daniel; Walker, Anthony; ...

    2017-02-22

    Calibration of terrestrial ecosystem models is important but challenging. Bayesian inference implemented by Markov chain Monte Carlo (MCMC) sampling provides a comprehensive framework to estimate model parameters and associated uncertainties using their posterior distributions. The effectiveness and efficiency of the method strongly depend on the MCMC algorithm used. In this study, a Differential Evolution Adaptive Metropolis (DREAM) algorithm was used to estimate posterior distributions of 21 parameters for the data assimilation linked ecosystem carbon (DALEC) model using 14 years of daily net ecosystem exchange data collected at the Harvard Forest Environmental Measurement Site eddy-flux tower. The DREAM is a multi-chainmore » method and uses differential evolution technique for chain movement, allowing it to be efficiently applied to high-dimensional problems, and can reliably estimate heavy-tailed and multimodal distributions that are difficult for single-chain schemes using a Gaussian proposal distribution. The results were evaluated against the popular Adaptive Metropolis (AM) scheme. DREAM indicated that two parameters controlling autumn phenology have multiple modes in their posterior distributions while AM only identified one mode. The calibration of DREAM resulted in a better model fit and predictive performance compared to the AM. DREAM provides means for a good exploration of the posterior distributions of model parameters. Lastly, it reduces the risk of false convergence to a local optimum and potentially improves the predictive performance of the calibrated model.« less

  17. On Bayesian Testing of Additive Conjoint Measurement Axioms Using Synthetic Likelihood.

    PubMed

    Karabatsos, George

    2018-06-01

    This article introduces a Bayesian method for testing the axioms of additive conjoint measurement. The method is based on an importance sampling algorithm that performs likelihood-free, approximate Bayesian inference using a synthetic likelihood to overcome the analytical intractability of this testing problem. This new method improves upon previous methods because it provides an omnibus test of the entire hierarchy of cancellation axioms, beyond double cancellation. It does so while accounting for the posterior uncertainty that is inherent in the empirical orderings that are implied by these axioms, together. The new method is illustrated through a test of the cancellation axioms on a classic survey data set, and through the analysis of simulated data.

  18. A local approach for focussed Bayesian fusion

    NASA Astrophysics Data System (ADS)

    Sander, Jennifer; Heizmann, Michael; Goussev, Igor; Beyerer, Jürgen

    2009-04-01

    Local Bayesian fusion approaches aim to reduce high storage and computational costs of Bayesian fusion which is separated from fixed modeling assumptions. Using the small world formalism, we argue why this proceeding is conform with Bayesian theory. Then, we concentrate on the realization of local Bayesian fusion by focussing the fusion process solely on local regions that are task relevant with a high probability. The resulting local models correspond then to restricted versions of the original one. In a previous publication, we used bounds for the probability of misleading evidence to show the validity of the pre-evaluation of task specific knowledge and prior information which we perform to build local models. In this paper, we prove the validity of this proceeding using information theoretic arguments. For additional efficiency, local Bayesian fusion can be realized in a distributed manner. Here, several local Bayesian fusion tasks are evaluated and unified after the actual fusion process. For the practical realization of distributed local Bayesian fusion, software agents are predestinated. There is a natural analogy between the resulting agent based architecture and criminal investigations in real life. We show how this analogy can be used to improve the efficiency of distributed local Bayesian fusion additionally. Using a landscape model, we present an experimental study of distributed local Bayesian fusion in the field of reconnaissance, which highlights its high potential.

  19. Testing the molecular clock using mechanistic models of fossil preservation and molecular evolution.

    PubMed

    Warnock, Rachel C M; Yang, Ziheng; Donoghue, Philip C J

    2017-06-28

    Molecular sequence data provide information about relative times only, and fossil-based age constraints are the ultimate source of information about absolute times in molecular clock dating analyses. Thus, fossil calibrations are critical to molecular clock dating, but competing methods are difficult to evaluate empirically because the true evolutionary time scale is never known. Here, we combine mechanistic models of fossil preservation and sequence evolution in simulations to evaluate different approaches to constructing fossil calibrations and their impact on Bayesian molecular clock dating, and the relative impact of fossil versus molecular sampling. We show that divergence time estimation is impacted by the model of fossil preservation, sampling intensity and tree shape. The addition of sequence data may improve molecular clock estimates, but accuracy and precision is dominated by the quality of the fossil calibrations. Posterior means and medians are poor representatives of true divergence times; posterior intervals provide a much more accurate estimate of divergence times, though they may be wide and often do not have high coverage probability. Our results highlight the importance of increased fossil sampling and improved statistical approaches to generating calibrations, which should incorporate the non-uniform nature of ecological and temporal fossil species distributions. © 2017 The Authors.

  20. A pitfall of piecewise-polytropic equation of state inference

    NASA Astrophysics Data System (ADS)

    Raaijmakers, Geert; Riley, Thomas E.; Watts, Anna L.

    2018-05-01

    The only messenger radiation in the Universe which one can use to statistically probe the Equation of State (EOS) of cold dense matter is that originating from the near-field vicinities of compact stars. Constraining gravitational masses and equatorial radii of rotating compact stars is a major goal for current and future telescope missions, with a primary purpose of constraining the EOS. From a Bayesian perspective it is necessary to carefully discuss prior definition; in this context a complicating issue is that in practice there exist pathologies in the general relativistic mapping between spaces of local (interior source matter) and global (exterior spacetime) parameters. In a companion paper, these issues were raised on a theoretical basis. In this study we reproduce a probability transformation procedure from the literature in order to map a joint posterior distribution of Schwarzschild gravitational masses and radii into a joint posterior distribution of EOS parameters. We demonstrate computationally that EOS parameter inferences are sensitive to the choice to define a prior on a joint space of these masses and radii, instead of on a joint space interior source matter parameters. We focus on the piecewise-polytropic EOS model, which is currently standard in the field of astrophysical dense matter study. We discuss the implications of this issue for the field.

Top