Molitor, John
2012-03-01
Bayesian methods have seen an increase in popularity in a wide variety of scientific fields, including epidemiology. One of the main reasons for their widespread application is the power of the Markov chain Monte Carlo (MCMC) techniques generally used to fit these models. As a result, researchers often implicitly associate Bayesian models with MCMC estimation procedures. However, Bayesian models do not always require Markov-chain-based methods for parameter estimation. This is important, as MCMC estimation methods, while generally quite powerful, are complex and computationally expensive and suffer from convergence problems related to the manner in which they generate correlated samples used to estimate probability distributions for parameters of interest. In this issue of the Journal, Cole et al. (Am J Epidemiol. 2012;175(5):368-375) present an interesting paper that discusses non-Markov-chain-based approaches to fitting Bayesian models. These methods, though limited, can overcome some of the problems associated with MCMC techniques and promise to provide simpler approaches to fitting Bayesian models. Applied researchers will find these estimation approaches intuitively appealing and will gain a deeper understanding of Bayesian models through their use. However, readers should be aware that other non-Markov-chain-based methods are currently in active development and have been widely published in other fields.
Wu, Xiao-Lin; Sun, Chuanyu; Beissinger, Timothy M; Rosa, Guilherme Jm; Weigel, Kent A; Gatti, Natalia de Leon; Gianola, Daniel
2012-09-25
Most Bayesian models for the analysis of complex traits are not analytically tractable and inferences are based on computationally intensive techniques. This is true of Bayesian models for genome-enabled selection, which uses whole-genome molecular data to predict the genetic merit of candidate animals for breeding purposes. In this regard, parallel computing can overcome the bottlenecks that can arise from series computing. Hence, a major goal of the present study is to bridge the gap to high-performance Bayesian computation in the context of animal breeding and genetics. Parallel Monte Carlo Markov chain algorithms and strategies are described in the context of animal breeding and genetics. Parallel Monte Carlo algorithms are introduced as a starting point including their applications to computing single-parameter and certain multiple-parameter models. Then, two basic approaches for parallel Markov chain Monte Carlo are described: one aims at parallelization within a single chain; the other is based on running multiple chains, yet some variants are discussed as well. Features and strategies of the parallel Markov chain Monte Carlo are illustrated using real data, including a large beef cattle dataset with 50K SNP genotypes. Parallel Markov chain Monte Carlo algorithms are useful for computing complex Bayesian models, which does not only lead to a dramatic speedup in computing but can also be used to optimize model parameters in complex Bayesian models. Hence, we anticipate that use of parallel Markov chain Monte Carlo will have a profound impact on revolutionizing the computational tools for genomic selection programs.
2012-01-01
Background Most Bayesian models for the analysis of complex traits are not analytically tractable and inferences are based on computationally intensive techniques. This is true of Bayesian models for genome-enabled selection, which uses whole-genome molecular data to predict the genetic merit of candidate animals for breeding purposes. In this regard, parallel computing can overcome the bottlenecks that can arise from series computing. Hence, a major goal of the present study is to bridge the gap to high-performance Bayesian computation in the context of animal breeding and genetics. Results Parallel Monte Carlo Markov chain algorithms and strategies are described in the context of animal breeding and genetics. Parallel Monte Carlo algorithms are introduced as a starting point including their applications to computing single-parameter and certain multiple-parameter models. Then, two basic approaches for parallel Markov chain Monte Carlo are described: one aims at parallelization within a single chain; the other is based on running multiple chains, yet some variants are discussed as well. Features and strategies of the parallel Markov chain Monte Carlo are illustrated using real data, including a large beef cattle dataset with 50K SNP genotypes. Conclusions Parallel Markov chain Monte Carlo algorithms are useful for computing complex Bayesian models, which does not only lead to a dramatic speedup in computing but can also be used to optimize model parameters in complex Bayesian models. Hence, we anticipate that use of parallel Markov chain Monte Carlo will have a profound impact on revolutionizing the computational tools for genomic selection programs. PMID:23009363
Bayesian analysis of non-homogeneous Markov chains: application to mental health data.
Sung, Minje; Soyer, Refik; Nhan, Nguyen
2007-07-10
In this paper we present a formal treatment of non-homogeneous Markov chains by introducing a hierarchical Bayesian framework. Our work is motivated by the analysis of correlated categorical data which arise in assessment of psychiatric treatment programs. In our development, we introduce a Markovian structure to describe the non-homogeneity of transition patterns. In doing so, we introduce a logistic regression set-up for Markov chains and incorporate covariates in our model. We present a Bayesian model using Markov chain Monte Carlo methods and develop inference procedures to address issues encountered in the analyses of data from psychiatric treatment programs. Our model and inference procedures are implemented to some real data from a psychiatric treatment study. Copyright 2006 John Wiley & Sons, Ltd.
Strelioff, Christopher C; Crutchfield, James P; Hübler, Alfred W
2007-07-01
Markov chains are a natural and well understood tool for describing one-dimensional patterns in time or space. We show how to infer kth order Markov chains, for arbitrary k , from finite data by applying Bayesian methods to both parameter estimation and model-order selection. Extending existing results for multinomial models of discrete data, we connect inference to statistical mechanics through information-theoretic (type theory) techniques. We establish a direct relationship between Bayesian evidence and the partition function which allows for straightforward calculation of the expectation and variance of the conditional relative entropy and the source entropy rate. Finally, we introduce a method that uses finite data-size scaling with model-order comparison to infer the structure of out-of-class processes.
Metis: A Pure Metropolis Markov Chain Monte Carlo Bayesian Inference Library
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bates, Cameron Russell; Mckigney, Edward Allen
The use of Bayesian inference in data analysis has become the standard for large scienti c experiments [1, 2]. The Monte Carlo Codes Group(XCP-3) at Los Alamos has developed a simple set of algorithms currently implemented in C++ and Python to easily perform at-prior Markov Chain Monte Carlo Bayesian inference with pure Metropolis sampling. These implementations are designed to be user friendly and extensible for customization based on speci c application requirements. This document describes the algorithmic choices made and presents two use cases.
NASA Astrophysics Data System (ADS)
Ren, Huiying; Ray, Jaideep; Hou, Zhangshuan; Huang, Maoyi; Bao, Jie; Swiler, Laura
2017-12-01
In this study we developed an efficient Bayesian inversion framework for interpreting marine seismic Amplitude Versus Angle and Controlled-Source Electromagnetic data for marine reservoir characterization. The framework uses a multi-chain Markov-chain Monte Carlo sampler, which is a hybrid of DiffeRential Evolution Adaptive Metropolis and Adaptive Metropolis samplers. The inversion framework is tested by estimating reservoir-fluid saturations and porosity based on marine seismic and Controlled-Source Electromagnetic data. The multi-chain Markov-chain Monte Carlo is scalable in terms of the number of chains, and is useful for computationally demanding Bayesian model calibration in scientific and engineering problems. As a demonstration, the approach is used to efficiently and accurately estimate the porosity and saturations in a representative layered synthetic reservoir. The results indicate that the seismic Amplitude Versus Angle and Controlled-Source Electromagnetic joint inversion provides better estimation of reservoir saturations than the seismic Amplitude Versus Angle only inversion, especially for the parameters in deep layers. The performance of the inversion approach for various levels of noise in observational data was evaluated - reasonable estimates can be obtained with noise levels up to 25%. Sampling efficiency due to the use of multiple chains was also checked and was found to have almost linear scalability.
Ren, Huiying; Ray, Jaideep; Hou, Zhangshuan; ...
2017-10-17
In this paper we developed an efficient Bayesian inversion framework for interpreting marine seismic Amplitude Versus Angle and Controlled-Source Electromagnetic data for marine reservoir characterization. The framework uses a multi-chain Markov-chain Monte Carlo sampler, which is a hybrid of DiffeRential Evolution Adaptive Metropolis and Adaptive Metropolis samplers. The inversion framework is tested by estimating reservoir-fluid saturations and porosity based on marine seismic and Controlled-Source Electromagnetic data. The multi-chain Markov-chain Monte Carlo is scalable in terms of the number of chains, and is useful for computationally demanding Bayesian model calibration in scientific and engineering problems. As a demonstration, the approach ismore » used to efficiently and accurately estimate the porosity and saturations in a representative layered synthetic reservoir. The results indicate that the seismic Amplitude Versus Angle and Controlled-Source Electromagnetic joint inversion provides better estimation of reservoir saturations than the seismic Amplitude Versus Angle only inversion, especially for the parameters in deep layers. The performance of the inversion approach for various levels of noise in observational data was evaluated — reasonable estimates can be obtained with noise levels up to 25%. Sampling efficiency due to the use of multiple chains was also checked and was found to have almost linear scalability.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ren, Huiying; Ray, Jaideep; Hou, Zhangshuan
In this paper we developed an efficient Bayesian inversion framework for interpreting marine seismic Amplitude Versus Angle and Controlled-Source Electromagnetic data for marine reservoir characterization. The framework uses a multi-chain Markov-chain Monte Carlo sampler, which is a hybrid of DiffeRential Evolution Adaptive Metropolis and Adaptive Metropolis samplers. The inversion framework is tested by estimating reservoir-fluid saturations and porosity based on marine seismic and Controlled-Source Electromagnetic data. The multi-chain Markov-chain Monte Carlo is scalable in terms of the number of chains, and is useful for computationally demanding Bayesian model calibration in scientific and engineering problems. As a demonstration, the approach ismore » used to efficiently and accurately estimate the porosity and saturations in a representative layered synthetic reservoir. The results indicate that the seismic Amplitude Versus Angle and Controlled-Source Electromagnetic joint inversion provides better estimation of reservoir saturations than the seismic Amplitude Versus Angle only inversion, especially for the parameters in deep layers. The performance of the inversion approach for various levels of noise in observational data was evaluated — reasonable estimates can be obtained with noise levels up to 25%. Sampling efficiency due to the use of multiple chains was also checked and was found to have almost linear scalability.« less
Teaching Markov Chain Monte Carlo: Revealing the Basic Ideas behind the Algorithm
ERIC Educational Resources Information Center
Stewart, Wayne; Stewart, Sepideh
2014-01-01
For many scientists, researchers and students Markov chain Monte Carlo (MCMC) simulation is an important and necessary tool to perform Bayesian analyses. The simulation is often presented as a mathematical algorithm and then translated into an appropriate computer program. However, this can result in overlooking the fundamental and deeper…
Markov Chain Monte Carlo Methods for Bayesian Data Analysis in Astronomy
NASA Astrophysics Data System (ADS)
Sharma, Sanjib
2017-08-01
Markov Chain Monte Carlo based Bayesian data analysis has now become the method of choice for analyzing and interpreting data in almost all disciplines of science. In astronomy, over the last decade, we have also seen a steady increase in the number of papers that employ Monte Carlo based Bayesian analysis. New, efficient Monte Carlo based methods are continuously being developed and explored. In this review, we first explain the basics of Bayesian theory and discuss how to set up data analysis problems within this framework. Next, we provide an overview of various Monte Carlo based methods for performing Bayesian data analysis. Finally, we discuss advanced ideas that enable us to tackle complex problems and thus hold great promise for the future. We also distribute downloadable computer software (available at https://github.com/sanjibs/bmcmc/ ) that implements some of the algorithms and examples discussed here.
NASA Astrophysics Data System (ADS)
Wang, Hongrui; Wang, Cheng; Wang, Ying; Gao, Xiong; Yu, Chen
2017-06-01
This paper presents a Bayesian approach using Metropolis-Hastings Markov Chain Monte Carlo algorithm and applies this method for daily river flow rate forecast and uncertainty quantification for Zhujiachuan River using data collected from Qiaotoubao Gage Station and other 13 gage stations in Zhujiachuan watershed in China. The proposed method is also compared with the conventional maximum likelihood estimation (MLE) for parameter estimation and quantification of associated uncertainties. While the Bayesian method performs similarly in estimating the mean value of daily flow rate, it performs over the conventional MLE method on uncertainty quantification, providing relatively narrower reliable interval than the MLE confidence interval and thus more precise estimation by using the related information from regional gage stations. The Bayesian MCMC method might be more favorable in the uncertainty analysis and risk management.
ERIC Educational Resources Information Center
Kieftenbeld, Vincent; Natesan, Prathiba
2012-01-01
Markov chain Monte Carlo (MCMC) methods enable a fully Bayesian approach to parameter estimation of item response models. In this simulation study, the authors compared the recovery of graded response model parameters using marginal maximum likelihood (MML) and Gibbs sampling (MCMC) under various latent trait distributions, test lengths, and…
El Yazid Boudaren, Mohamed; Monfrini, Emmanuel; Pieczynski, Wojciech; Aïssani, Amar
2014-11-01
Hidden Markov chains have been shown to be inadequate for data modeling under some complex conditions. In this work, we address the problem of statistical modeling of phenomena involving two heterogeneous system states. Such phenomena may arise in biology or communications, among other fields. Namely, we consider that a sequence of meaningful words is to be searched within a whole observation that also contains arbitrary one-by-one symbols. Moreover, a word may be interrupted at some site to be carried on later. Applying plain hidden Markov chains to such data, while ignoring their specificity, yields unsatisfactory results. The Phasic triplet Markov chain, proposed in this paper, overcomes this difficulty by means of an auxiliary underlying process in accordance with the triplet Markov chains theory. Related Bayesian restoration techniques and parameters estimation procedures according to the new model are then described. Finally, to assess the performance of the proposed model against the conventional hidden Markov chain model, experiments are conducted on synthetic and real data.
NASA Astrophysics Data System (ADS)
Astuti, Ani Budi; Iriawan, Nur; Irhamah, Kuswanto, Heri
2017-12-01
In the Bayesian mixture modeling requires stages the identification number of the most appropriate mixture components thus obtained mixture models fit the data through data driven concept. Reversible Jump Markov Chain Monte Carlo (RJMCMC) is a combination of the reversible jump (RJ) concept and the Markov Chain Monte Carlo (MCMC) concept used by some researchers to solve the problem of identifying the number of mixture components which are not known with certainty number. In its application, RJMCMC using the concept of the birth/death and the split-merge with six types of movement, that are w updating, θ updating, z updating, hyperparameter β updating, split-merge for components and birth/death from blank components. The development of the RJMCMC algorithm needs to be done according to the observed case. The purpose of this study is to know the performance of RJMCMC algorithm development in identifying the number of mixture components which are not known with certainty number in the Bayesian mixture modeling for microarray data in Indonesia. The results of this study represent that the concept RJMCMC algorithm development able to properly identify the number of mixture components in the Bayesian normal mixture model wherein the component mixture in the case of microarray data in Indonesia is not known for certain number.
ERIC Educational Resources Information Center
Kim, Jee-Seon; Bolt, Daniel M.
2007-01-01
The purpose of this ITEMS module is to provide an introduction to Markov chain Monte Carlo (MCMC) estimation for item response models. A brief description of Bayesian inference is followed by an overview of the various facets of MCMC algorithms, including discussion of prior specification, sampling procedures, and methods for evaluating chain…
Wang, Hongrui; Wang, Cheng; Wang, Ying; ...
2017-04-05
This paper presents a Bayesian approach using Metropolis-Hastings Markov Chain Monte Carlo algorithm and applies this method for daily river flow rate forecast and uncertainty quantification for Zhujiachuan River using data collected from Qiaotoubao Gage Station and other 13 gage stations in Zhujiachuan watershed in China. The proposed method is also compared with the conventional maximum likelihood estimation (MLE) for parameter estimation and quantification of associated uncertainties. While the Bayesian method performs similarly in estimating the mean value of daily flow rate, it performs over the conventional MLE method on uncertainty quantification, providing relatively narrower reliable interval than the MLEmore » confidence interval and thus more precise estimation by using the related information from regional gage stations. As a result, the Bayesian MCMC method might be more favorable in the uncertainty analysis and risk management.« less
MC3: Multi-core Markov-chain Monte Carlo code
NASA Astrophysics Data System (ADS)
Cubillos, Patricio; Harrington, Joseph; Lust, Nate; Foster, AJ; Stemm, Madison; Loredo, Tom; Stevenson, Kevin; Campo, Chris; Hardin, Matt; Hardy, Ryan
2016-10-01
MC3 (Multi-core Markov-chain Monte Carlo) is a Bayesian statistics tool that can be executed from the shell prompt or interactively through the Python interpreter with single- or multiple-CPU parallel computing. It offers Markov-chain Monte Carlo (MCMC) posterior-distribution sampling for several algorithms, Levenberg-Marquardt least-squares optimization, and uniform non-informative, Jeffreys non-informative, or Gaussian-informative priors. MC3 can share the same value among multiple parameters and fix the value of parameters to constant values, and offers Gelman-Rubin convergence testing and correlated-noise estimation with time-averaging or wavelet-based likelihood estimation methods.
Markov Chain Monte Carlo Inference of Parametric Dictionaries for Sparse Bayesian Approximations
Chaspari, Theodora; Tsiartas, Andreas; Tsilifis, Panagiotis; Narayanan, Shrikanth
2016-01-01
Parametric dictionaries can increase the ability of sparse representations to meaningfully capture and interpret the underlying signal information, such as encountered in biomedical problems. Given a mapping function from the atom parameter space to the actual atoms, we propose a sparse Bayesian framework for learning the atom parameters, because of its ability to provide full posterior estimates, take uncertainty into account and generalize on unseen data. Inference is performed with Markov Chain Monte Carlo, that uses block sampling to generate the variables of the Bayesian problem. Since the parameterization of dictionary atoms results in posteriors that cannot be analytically computed, we use a Metropolis-Hastings-within-Gibbs framework, according to which variables with closed-form posteriors are generated with the Gibbs sampler, while the remaining ones with the Metropolis Hastings from appropriate candidate-generating densities. We further show that the corresponding Markov Chain is uniformly ergodic ensuring its convergence to a stationary distribution independently of the initial state. Results on synthetic data and real biomedical signals indicate that our approach offers advantages in terms of signal reconstruction compared to previously proposed Steepest Descent and Equiangular Tight Frame methods. This paper demonstrates the ability of Bayesian learning to generate parametric dictionaries that can reliably represent the exemplar data and provides the foundation towards inferring the entire variable set of the sparse approximation problem for signal denoising, adaptation and other applications. PMID:28649173
Modelling maximum river flow by using Bayesian Markov Chain Monte Carlo
NASA Astrophysics Data System (ADS)
Cheong, R. Y.; Gabda, D.
2017-09-01
Analysis of flood trends is vital since flooding threatens human living in terms of financial, environment and security. The data of annual maximum river flows in Sabah were fitted into generalized extreme value (GEV) distribution. Maximum likelihood estimator (MLE) raised naturally when working with GEV distribution. However, previous researches showed that MLE provide unstable results especially in small sample size. In this study, we used different Bayesian Markov Chain Monte Carlo (MCMC) based on Metropolis-Hastings algorithm to estimate GEV parameters. Bayesian MCMC method is a statistical inference which studies the parameter estimation by using posterior distribution based on Bayes’ theorem. Metropolis-Hastings algorithm is used to overcome the high dimensional state space faced in Monte Carlo method. This approach also considers more uncertainty in parameter estimation which then presents a better prediction on maximum river flow in Sabah.
Geostatistical models are appropriate for spatially distributed data measured at irregularly spaced locations. We propose an efficient Markov chain Monte Carlo (MCMC) algorithm for fitting Bayesian geostatistical models with substantial numbers of unknown parameters to sizable...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ren, Huiying; Ray, Jaideep; Hou, Zhangshuan
In this study we developed an efficient Bayesian inversion framework for interpreting marine seismic amplitude versus angle (AVA) and controlled source electromagnetic (CSEM) data for marine reservoir characterization. The framework uses a multi-chain Markov-chain Monte Carlo (MCMC) sampler, which is a hybrid of DiffeRential Evolution Adaptive Metropolis (DREAM) and Adaptive Metropolis (AM) samplers. The inversion framework is tested by estimating reservoir-fluid saturations and porosity based on marine seismic and CSEM data. The multi-chain MCMC is scalable in terms of the number of chains, and is useful for computationally demanding Bayesian model calibration in scientific and engineering problems. As a demonstration,more » the approach is used to efficiently and accurately estimate the porosity and saturations in a representative layered synthetic reservoir. The results indicate that the seismic AVA and CSEM joint inversion provides better estimation of reservoir saturations than the seismic AVA-only inversion, especially for the parameters in deep layers. The performance of the inversion approach for various levels of noise in observational data was evaluated – reasonable estimates can be obtained with noise levels up to 25%. Sampling efficiency due to the use of multiple chains was also checked and was found to have almost linear scalability.« less
2017-09-01
efficacy of statistical post-processing methods downstream of these dynamical model components with a hierarchical multivariate Bayesian approach to...Bayesian hierarchical modeling, Markov chain Monte Carlo methods , Metropolis algorithm, machine learning, atmospheric prediction 15. NUMBER OF PAGES...scale processes. However, this dissertation explores the efficacy of statistical post-processing methods downstream of these dynamical model components
A Bayesian Approach to Person Fit Analysis in Item Response Theory Models. Research Report.
ERIC Educational Resources Information Center
Glas, Cees A. W.; Meijer, Rob R.
A Bayesian approach to the evaluation of person fit in item response theory (IRT) models is presented. In a posterior predictive check, the observed value on a discrepancy variable is positioned in its posterior distribution. In a Bayesian framework, a Markov Chain Monte Carlo procedure can be used to generate samples of the posterior distribution…
Bayesian models: A statistical primer for ecologists
Hobbs, N. Thompson; Hooten, Mevin B.
2015-01-01
Bayesian modeling has become an indispensable tool for ecological research because it is uniquely suited to deal with complexity in a statistically coherent way. This textbook provides a comprehensive and accessible introduction to the latest Bayesian methods—in language ecologists can understand. Unlike other books on the subject, this one emphasizes the principles behind the computations, giving ecologists a big-picture understanding of how to implement this powerful statistical approach.Bayesian Models is an essential primer for non-statisticians. It begins with a definition of probability and develops a step-by-step sequence of connected ideas, including basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and inference from single and multiple models. This unique book places less emphasis on computer coding, favoring instead a concise presentation of the mathematical statistics needed to understand how and why Bayesian analysis works. It also explains how to write out properly formulated hierarchical Bayesian models and use them in computing, research papers, and proposals.This primer enables ecologists to understand the statistical principles behind Bayesian modeling and apply them to research, teaching, policy, and management.Presents the mathematical and statistical foundations of Bayesian modeling in language accessible to non-statisticiansCovers basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and moreDeemphasizes computer coding in favor of basic principlesExplains how to write out properly factored statistical expressions representing Bayesian models
Bayesian Estimation of the Logistic Positive Exponent IRT Model
ERIC Educational Resources Information Center
Bolfarine, Heleno; Bazan, Jorge Luis
2010-01-01
A Bayesian inference approach using Markov Chain Monte Carlo (MCMC) is developed for the logistic positive exponent (LPE) model proposed by Samejima and for a new skewed Logistic Item Response Theory (IRT) model, named Reflection LPE model. Both models lead to asymmetric item characteristic curves (ICC) and can be appropriate because a symmetric…
A baker's dozen of new particle flows for nonlinear filters, Bayesian decisions and transport
NASA Astrophysics Data System (ADS)
Daum, Fred; Huang, Jim
2015-05-01
We describe a baker's dozen of new particle flows to compute Bayes' rule for nonlinear filters, Bayesian decisions and learning as well as transport. Several of these new flows were inspired by transport theory, but others were inspired by physics or statistics or Markov chain Monte Carlo methods.
Burroughs, N J; Pillay, D; Mutimer, D
1999-01-01
Bayesian analysis using a virus dynamics model is demonstrated to facilitate hypothesis testing of patterns in clinical time-series. Our Markov chain Monte Carlo implementation demonstrates that the viraemia time-series observed in two sets of hepatitis B patients on antiviral (lamivudine) therapy, chronic carriers and liver transplant patients, are significantly different, overcoming clinical trial design differences that question the validity of non-parametric tests. We show that lamivudine-resistant mutants grow faster in transplant patients than in chronic carriers, which probably explains the differences in emergence times and failure rates between these two sets of patients. Incorporation of dynamic models into Bayesian parameter analysis is of general applicability in medical statistics. PMID:10643081
A Gibbs sampler for Bayesian analysis of site-occupancy data
Dorazio, Robert M.; Rodriguez, Daniel Taylor
2012-01-01
1. A Bayesian analysis of site-occupancy data containing covariates of species occurrence and species detection probabilities is usually completed using Markov chain Monte Carlo methods in conjunction with software programs that can implement those methods for any statistical model, not just site-occupancy models. Although these software programs are quite flexible, considerable experience is often required to specify a model and to initialize the Markov chain so that summaries of the posterior distribution can be estimated efficiently and accurately. 2. As an alternative to these programs, we develop a Gibbs sampler for Bayesian analysis of site-occupancy data that include covariates of species occurrence and species detection probabilities. This Gibbs sampler is based on a class of site-occupancy models in which probabilities of species occurrence and detection are specified as probit-regression functions of site- and survey-specific covariate measurements. 3. To illustrate the Gibbs sampler, we analyse site-occupancy data of the blue hawker, Aeshna cyanea (Odonata, Aeshnidae), a common dragonfly species in Switzerland. Our analysis includes a comparison of results based on Bayesian and classical (non-Bayesian) methods of inference. We also provide code (based on the R software program) for conducting Bayesian and classical analyses of site-occupancy data.
Observation uncertainty in reversible Markov chains.
Metzner, Philipp; Weber, Marcus; Schütte, Christof
2010-09-01
In many applications one is interested in finding a simplified model which captures the essential dynamical behavior of a real life process. If the essential dynamics can be assumed to be (approximately) memoryless then a reasonable choice for a model is a Markov model whose parameters are estimated by means of Bayesian inference from an observed time series. We propose an efficient Monte Carlo Markov chain framework to assess the uncertainty of the Markov model and related observables. The derived Gibbs sampler allows for sampling distributions of transition matrices subject to reversibility and/or sparsity constraints. The performance of the suggested sampling scheme is demonstrated and discussed for a variety of model examples. The uncertainty analysis of functions of the Markov model under investigation is discussed in application to the identification of conformations of the trialanine molecule via Robust Perron Cluster Analysis (PCCA+) .
Fuzzy Markov random fields versus chains for multispectral image segmentation.
Salzenstein, Fabien; Collet, Christophe
2006-11-01
This paper deals with a comparison of recent statistical models based on fuzzy Markov random fields and chains for multispectral image segmentation. The fuzzy scheme takes into account discrete and continuous classes which model the imprecision of the hidden data. In this framework, we assume the dependence between bands and we express the general model for the covariance matrix. A fuzzy Markov chain model is developed in an unsupervised way. This method is compared with the fuzzy Markovian field model previously proposed by one of the authors. The segmentation task is processed with Bayesian tools, such as the well-known MPM (Mode of Posterior Marginals) criterion. Our goal is to compare the robustness and rapidity for both methods (fuzzy Markov fields versus fuzzy Markov chains). Indeed, such fuzzy-based procedures seem to be a good answer, e.g., for astronomical observations when the patterns present diffuse structures. Moreover, these approaches allow us to process missing data in one or several spectral bands which correspond to specific situations in astronomy. To validate both models, we perform and compare the segmentation on synthetic images and raw multispectral astronomical data.
Inference of epidemiological parameters from household stratified data
Walker, James N.; Ross, Joshua V.
2017-01-01
We consider a continuous-time Markov chain model of SIR disease dynamics with two levels of mixing. For this so-called stochastic households model, we provide two methods for inferring the model parameters—governing within-household transmission, recovery, and between-household transmission—from data of the day upon which each individual became infectious and the household in which each infection occurred, as might be available from First Few Hundred studies. Each method is a form of Bayesian Markov Chain Monte Carlo that allows us to calculate a joint posterior distribution for all parameters and hence the household reproduction number and the early growth rate of the epidemic. The first method performs exact Bayesian inference using a standard data-augmentation approach; the second performs approximate Bayesian inference based on a likelihood approximation derived from branching processes. These methods are compared for computational efficiency and posteriors from each are compared. The branching process is shown to be a good approximation and remains computationally efficient as the amount of data is increased. PMID:29045456
Probability, statistics, and computational science.
Beerenwinkel, Niko; Siebourg, Juliane
2012-01-01
In this chapter, we review basic concepts from probability theory and computational statistics that are fundamental to evolutionary genomics. We provide a very basic introduction to statistical modeling and discuss general principles, including maximum likelihood and Bayesian inference. Markov chains, hidden Markov models, and Bayesian network models are introduced in more detail as they occur frequently and in many variations in genomics applications. In particular, we discuss efficient inference algorithms and methods for learning these models from partially observed data. Several simple examples are given throughout the text, some of which point to models that are discussed in more detail in subsequent chapters.
Bayesian inference based on dual generalized order statistics from the exponentiated Weibull model
NASA Astrophysics Data System (ADS)
Al Sobhi, Mashail M.
2015-02-01
Bayesian estimation for the two parameters and the reliability function of the exponentiated Weibull model are obtained based on dual generalized order statistics (DGOS). Also, Bayesian prediction bounds for future DGOS from exponentiated Weibull model are obtained. The symmetric and asymmetric loss functions are considered for Bayesian computations. The Markov chain Monte Carlo (MCMC) methods are used for computing the Bayes estimates and prediction bounds. The results have been specialized to the lower record values. Comparisons are made between Bayesian and maximum likelihood estimators via Monte Carlo simulation.
Model-based Clustering of Categorical Time Series with Multinomial Logit Classification
NASA Astrophysics Data System (ADS)
Frühwirth-Schnatter, Sylvia; Pamminger, Christoph; Winter-Ebmer, Rudolf; Weber, Andrea
2010-09-01
A common problem in many areas of applied statistics is to identify groups of similar time series in a panel of time series. However, distance-based clustering methods cannot easily be extended to time series data, where an appropriate distance-measure is rather difficult to define, particularly for discrete-valued time series. Markov chain clustering, proposed by Pamminger and Frühwirth-Schnatter [6], is an approach for clustering discrete-valued time series obtained by observing a categorical variable with several states. This model-based clustering method is based on finite mixtures of first-order time-homogeneous Markov chain models. In order to further explain group membership we present an extension to the approach of Pamminger and Frühwirth-Schnatter [6] by formulating a probabilistic model for the latent group indicators within the Bayesian classification rule by using a multinomial logit model. The parameters are estimated for a fixed number of clusters within a Bayesian framework using an Markov chain Monte Carlo (MCMC) sampling scheme representing a (full) Gibbs-type sampler which involves only draws from standard distributions. Finally, an application to a panel of Austrian wage mobility data is presented which leads to an interesting segmentation of the Austrian labour market.
A General and Flexible Approach to Estimating the Social Relations Model Using Bayesian Methods
ERIC Educational Resources Information Center
Ludtke, Oliver; Robitzsch, Alexander; Kenny, David A.; Trautwein, Ulrich
2013-01-01
The social relations model (SRM) is a conceptual, methodological, and analytical approach that is widely used to examine dyadic behaviors and interpersonal perception within groups. This article introduces a general and flexible approach to estimating the parameters of the SRM that is based on Bayesian methods using Markov chain Monte Carlo…
PyMC: Bayesian Stochastic Modelling in Python
Patil, Anand; Huard, David; Fonnesbeck, Christopher J.
2010-01-01
This user guide describes a Python package, PyMC, that allows users to efficiently code a probabilistic model and draw samples from its posterior distribution using Markov chain Monte Carlo techniques. PMID:21603108
Markov Chain Monte Carlo Bayesian Learning for Neural Networks
NASA Technical Reports Server (NTRS)
Goodrich, Michael S.
2011-01-01
Conventional training methods for neural networks involve starting al a random location in the solution space of the network weights, navigating an error hyper surface to reach a minimum, and sometime stochastic based techniques (e.g., genetic algorithms) to avoid entrapment in a local minimum. It is further typically necessary to preprocess the data (e.g., normalization) to keep the training algorithm on course. Conversely, Bayesian based learning is an epistemological approach concerned with formally updating the plausibility of competing candidate hypotheses thereby obtaining a posterior distribution for the network weights conditioned on the available data and a prior distribution. In this paper, we developed a powerful methodology for estimating the full residual uncertainty in network weights and therefore network predictions by using a modified Jeffery's prior combined with a Metropolis Markov Chain Monte Carlo method.
NASA Astrophysics Data System (ADS)
Graham, Eleanor; Cuore Collaboration
2017-09-01
The CUORE experiment is a large-scale bolometric detector seeking to observe the never-before-seen process of neutrinoless double beta decay. Predictions for CUORE's sensitivity to neutrinoless double beta decay allow for an understanding of the half-life ranges that the detector can probe, and also to evaluate the relative importance of different detector parameters. Currently, CUORE uses a Bayesian analysis based in BAT, which uses Metropolis-Hastings Markov Chain Monte Carlo, for its sensitivity studies. My work evaluates the viability and potential improvements of switching the Bayesian analysis to Hamiltonian Monte Carlo, realized through the program Stan and its Morpho interface. I demonstrate that the BAT study can be successfully recreated in Stan, and perform a detailed comparison between the results and computation times of the two methods.
Bayesian Analysis of Biogeography when the Number of Areas is Large
Landis, Michael J.; Matzke, Nicholas J.; Moore, Brian R.; Huelsenbeck, John P.
2013-01-01
Historical biogeography is increasingly studied from an explicitly statistical perspective, using stochastic models to describe the evolution of species range as a continuous-time Markov process of dispersal between and extinction within a set of discrete geographic areas. The main constraint of these methods is the computational limit on the number of areas that can be specified. We propose a Bayesian approach for inferring biogeographic history that extends the application of biogeographic models to the analysis of more realistic problems that involve a large number of areas. Our solution is based on a “data-augmentation” approach, in which we first populate the tree with a history of biogeographic events that is consistent with the observed species ranges at the tips of the tree. We then calculate the likelihood of a given history by adopting a mechanistic interpretation of the instantaneous-rate matrix, which specifies both the exponential waiting times between biogeographic events and the relative probabilities of each biogeographic change. We develop this approach in a Bayesian framework, marginalizing over all possible biogeographic histories using Markov chain Monte Carlo (MCMC). Besides dramatically increasing the number of areas that can be accommodated in a biogeographic analysis, our method allows the parameters of a given biogeographic model to be estimated and different biogeographic models to be objectively compared. Our approach is implemented in the program, BayArea. [ancestral area analysis; Bayesian biogeographic inference; data augmentation; historical biogeography; Markov chain Monte Carlo.] PMID:23736102
Markov chain Monte Carlo estimation of quantum states
NASA Astrophysics Data System (ADS)
Diguglielmo, James; Messenger, Chris; Fiurášek, Jaromír; Hage, Boris; Samblowski, Aiko; Schmidt, Tabea; Schnabel, Roman
2009-03-01
We apply a Bayesian data analysis scheme known as the Markov chain Monte Carlo to the tomographic reconstruction of quantum states. This method yields a vector, known as the Markov chain, which contains the full statistical information concerning all reconstruction parameters including their statistical correlations with no a priori assumptions as to the form of the distribution from which it has been obtained. From this vector we can derive, e.g., the marginal distributions and uncertainties of all model parameters, and also of other quantities such as the purity of the reconstructed state. We demonstrate the utility of this scheme by reconstructing the Wigner function of phase-diffused squeezed states. These states possess non-Gaussian statistics and therefore represent a nontrivial case of tomographic reconstruction. We compare our results to those obtained through pure maximum-likelihood and Fisher information approaches.
Honest Importance Sampling with Multiple Markov Chains
Tan, Aixin; Doss, Hani; Hobert, James P.
2017-01-01
Importance sampling is a classical Monte Carlo technique in which a random sample from one probability density, π1, is used to estimate an expectation with respect to another, π. The importance sampling estimator is strongly consistent and, as long as two simple moment conditions are satisfied, it obeys a central limit theorem (CLT). Moreover, there is a simple consistent estimator for the asymptotic variance in the CLT, which makes for routine computation of standard errors. Importance sampling can also be used in the Markov chain Monte Carlo (MCMC) context. Indeed, if the random sample from π1 is replaced by a Harris ergodic Markov chain with invariant density π1, then the resulting estimator remains strongly consistent. There is a price to be paid however, as the computation of standard errors becomes more complicated. First, the two simple moment conditions that guarantee a CLT in the iid case are not enough in the MCMC context. Second, even when a CLT does hold, the asymptotic variance has a complex form and is difficult to estimate consistently. In this paper, we explain how to use regenerative simulation to overcome these problems. Actually, we consider a more general set up, where we assume that Markov chain samples from several probability densities, π1, …, πk, are available. We construct multiple-chain importance sampling estimators for which we obtain a CLT based on regeneration. We show that if the Markov chains converge to their respective target distributions at a geometric rate, then under moment conditions similar to those required in the iid case, the MCMC-based importance sampling estimator obeys a CLT. Furthermore, because the CLT is based on a regenerative process, there is a simple consistent estimator of the asymptotic variance. We illustrate the method with two applications in Bayesian sensitivity analysis. The first concerns one-way random effects models under different priors. The second involves Bayesian variable selection in linear regression, and for this application, importance sampling based on multiple chains enables an empirical Bayes approach to variable selection. PMID:28701855
Honest Importance Sampling with Multiple Markov Chains.
Tan, Aixin; Doss, Hani; Hobert, James P
2015-01-01
Importance sampling is a classical Monte Carlo technique in which a random sample from one probability density, π 1 , is used to estimate an expectation with respect to another, π . The importance sampling estimator is strongly consistent and, as long as two simple moment conditions are satisfied, it obeys a central limit theorem (CLT). Moreover, there is a simple consistent estimator for the asymptotic variance in the CLT, which makes for routine computation of standard errors. Importance sampling can also be used in the Markov chain Monte Carlo (MCMC) context. Indeed, if the random sample from π 1 is replaced by a Harris ergodic Markov chain with invariant density π 1 , then the resulting estimator remains strongly consistent. There is a price to be paid however, as the computation of standard errors becomes more complicated. First, the two simple moment conditions that guarantee a CLT in the iid case are not enough in the MCMC context. Second, even when a CLT does hold, the asymptotic variance has a complex form and is difficult to estimate consistently. In this paper, we explain how to use regenerative simulation to overcome these problems. Actually, we consider a more general set up, where we assume that Markov chain samples from several probability densities, π 1 , …, π k , are available. We construct multiple-chain importance sampling estimators for which we obtain a CLT based on regeneration. We show that if the Markov chains converge to their respective target distributions at a geometric rate, then under moment conditions similar to those required in the iid case, the MCMC-based importance sampling estimator obeys a CLT. Furthermore, because the CLT is based on a regenerative process, there is a simple consistent estimator of the asymptotic variance. We illustrate the method with two applications in Bayesian sensitivity analysis. The first concerns one-way random effects models under different priors. The second involves Bayesian variable selection in linear regression, and for this application, importance sampling based on multiple chains enables an empirical Bayes approach to variable selection.
Of bugs and birds: Markov Chain Monte Carlo for hierarchical modeling in wildlife research
Link, W.A.; Cam, E.; Nichols, J.D.; Cooch, E.G.
2002-01-01
Markov chain Monte Carlo (MCMC) is a statistical innovation that allows researchers to fit far more complex models to data than is feasible using conventional methods. Despite its widespread use in a variety of scientific fields, MCMC appears to be underutilized in wildlife applications. This may be due to a misconception that MCMC requires the adoption of a subjective Bayesian analysis, or perhaps simply to its lack of familiarity among wildlife researchers. We introduce the basic ideas of MCMC and software BUGS (Bayesian inference using Gibbs sampling), stressing that a simple and satisfactory intuition for MCMC does not require extraordinary mathematical sophistication. We illustrate the use of MCMC with an analysis of the association between latent factors governing individual heterogeneity in breeding and survival rates of kittiwakes (Rissa tridactyla). We conclude with a discussion of the importance of individual heterogeneity for understanding population dynamics and designing management plans.
Population forecasts for Bangladesh, using a Bayesian methodology.
Mahsin, Md; Hossain, Syed Shahadat
2012-12-01
Population projection for many developing countries could be quite a challenging task for the demographers mostly due to lack of availability of enough reliable data. The objective of this paper is to present an overview of the existing methods for population forecasting and to propose an alternative based on the Bayesian statistics, combining the formality of inference. The analysis has been made using Markov Chain Monte Carlo (MCMC) technique for Bayesian methodology available with the software WinBUGS. Convergence diagnostic techniques available with the WinBUGS software have been applied to ensure the convergence of the chains necessary for the implementation of MCMC. The Bayesian approach allows for the use of observed data and expert judgements by means of appropriate priors, and a more realistic population forecasts, along with associated uncertainty, has been possible.
NASA Astrophysics Data System (ADS)
Gbedo, Yémalin Gabin; Mangin-Brinet, Mariane
2017-07-01
We present a new procedure to determine parton distribution functions (PDFs), based on Markov chain Monte Carlo (MCMC) methods. The aim of this paper is to show that we can replace the standard χ2 minimization by procedures grounded on statistical methods, and on Bayesian inference in particular, thus offering additional insight into the rich field of PDFs determination. After a basic introduction to these techniques, we introduce the algorithm we have chosen to implement—namely Hybrid (or Hamiltonian) Monte Carlo. This algorithm, initially developed for Lattice QCD, turns out to be very interesting when applied to PDFs determination by global analyses; we show that it allows us to circumvent the difficulties due to the high dimensionality of the problem, in particular concerning the acceptance. A first feasibility study is performed and presented, which indicates that Markov chain Monte Carlo can successfully be applied to the extraction of PDFs and of their uncertainties.
Bayesian tomography by interacting Markov chains
NASA Astrophysics Data System (ADS)
Romary, T.
2017-12-01
In seismic tomography, we seek to determine the velocity of the undergound from noisy first arrival travel time observations. In most situations, this is an ill posed inverse problem that admits several unperfect solutions. Given an a priori distribution over the parameters of the velocity model, the Bayesian formulation allows to state this problem as a probabilistic one, with a solution under the form of a posterior distribution. The posterior distribution is generally high dimensional and may exhibit multimodality. Moreover, as it is known only up to a constant, the only sensible way to addressthis problem is to try to generate simulations from the posterior. The natural tools to perform these simulations are Monte Carlo Markov chains (MCMC). Classical implementations of MCMC algorithms generally suffer from slow mixing: the generated states are slow to enter the stationary regime, that is to fit the observations, and when one mode of the posterior is eventually identified, it may become difficult to visit others. Using a varying temperature parameter relaxing the constraint on the data may help to enter the stationary regime. Besides, the sequential nature of MCMC makes them ill fitted toparallel implementation. Running a large number of chains in parallel may be suboptimal as the information gathered by each chain is not mutualized. Parallel tempering (PT) can be seen as a first attempt to make parallel chains at different temperatures communicate but only exchange information between current states. In this talk, I will show that PT actually belongs to a general class of interacting Markov chains algorithm. I will also show that this class enables to design interacting schemes that can take advantage of the whole history of the chain, by authorizing exchanges toward already visited states. The algorithms will be illustrated with toy examples and an application to first arrival traveltime tomography.
ERIC Educational Resources Information Center
Chung, Hwan; Anthony, James C.
2013-01-01
This article presents a multiple-group latent class-profile analysis (LCPA) by taking a Bayesian approach in which a Markov chain Monte Carlo simulation is employed to achieve more robust estimates for latent growth patterns. This article describes and addresses a label-switching problem that involves the LCPA likelihood function, which has…
Bayesian posterior distributions without Markov chains.
Cole, Stephen R; Chu, Haitao; Greenland, Sander; Hamra, Ghassan; Richardson, David B
2012-03-01
Bayesian posterior parameter distributions are often simulated using Markov chain Monte Carlo (MCMC) methods. However, MCMC methods are not always necessary and do not help the uninitiated understand Bayesian inference. As a bridge to understanding Bayesian inference, the authors illustrate a transparent rejection sampling method. In example 1, they illustrate rejection sampling using 36 cases and 198 controls from a case-control study (1976-1983) assessing the relation between residential exposure to magnetic fields and the development of childhood cancer. Results from rejection sampling (odds ratio (OR) = 1.69, 95% posterior interval (PI): 0.57, 5.00) were similar to MCMC results (OR = 1.69, 95% PI: 0.58, 4.95) and approximations from data-augmentation priors (OR = 1.74, 95% PI: 0.60, 5.06). In example 2, the authors apply rejection sampling to a cohort study of 315 human immunodeficiency virus seroconverters (1984-1998) to assess the relation between viral load after infection and 5-year incidence of acquired immunodeficiency syndrome, adjusting for (continuous) age at seroconversion and race. In this more complex example, rejection sampling required a notably longer run time than MCMC sampling but remained feasible and again yielded similar results. The transparency of the proposed approach comes at a price of being less broadly applicable than MCMC.
Bayesian seismic tomography by parallel interacting Markov chains
NASA Astrophysics Data System (ADS)
Gesret, Alexandrine; Bottero, Alexis; Romary, Thomas; Noble, Mark; Desassis, Nicolas
2014-05-01
The velocity field estimated by first arrival traveltime tomography is commonly used as a starting point for further seismological, mineralogical, tectonic or similar analysis. In order to interpret quantitatively the results, the tomography uncertainty values as well as their spatial distribution are required. The estimated velocity model is obtained through inverse modeling by minimizing an objective function that compares observed and computed traveltimes. This step is often performed by gradient-based optimization algorithms. The major drawback of such local optimization schemes, beyond the possibility of being trapped in a local minimum, is that they do not account for the multiple possible solutions of the inverse problem. They are therefore unable to assess the uncertainties linked to the solution. Within a Bayesian (probabilistic) framework, solving the tomography inverse problem aims at estimating the posterior probability density function of velocity model using a global sampling algorithm. Markov chains Monte-Carlo (MCMC) methods are known to produce samples of virtually any distribution. In such a Bayesian inversion, the total number of simulations we can afford is highly related to the computational cost of the forward model. Although fast algorithms have been recently developed for computing first arrival traveltimes of seismic waves, the complete browsing of the posterior distribution of velocity model is hardly performed, especially when it is high dimensional and/or multimodal. In the latter case, the chain may even stay stuck in one of the modes. In order to improve the mixing properties of classical single MCMC, we propose to make interact several Markov chains at different temperatures. This method can make efficient use of large CPU clusters, without increasing the global computational cost with respect to classical MCMC and is therefore particularly suited for Bayesian inversion. The exchanges between the chains allow a precise sampling of the high probability zones of the model space while avoiding the chains to end stuck in a probability maximum. This approach supplies thus a robust way to analyze the tomography imaging uncertainties. The interacting MCMC approach is illustrated on two synthetic examples of tomography of calibration shots such as encountered in induced microseismic studies. On the second application, a wavelet based model parameterization is presented that allows to significantly reduce the dimension of the problem, making thus the algorithm efficient even for a complex velocity model.
Bayesian linkage and segregation analysis: factoring the problem.
Matthysse, S
2000-01-01
Complex segregation analysis and linkage methods are mathematical techniques for the genetic dissection of complex diseases. They are used to delineate complex modes of familial transmission and to localize putative disease susceptibility loci to specific chromosomal locations. The computational problem of Bayesian linkage and segregation analysis is one of integration in high-dimensional spaces. In this paper, three available techniques for Bayesian linkage and segregation analysis are discussed: Markov Chain Monte Carlo (MCMC), importance sampling, and exact calculation. The contribution of each to the overall integration will be explicitly discussed.
Dorazio, R.M.; Johnson, F.A.
2003-01-01
Bayesian inference and decision theory may be used in the solution of relatively complex problems of natural resource management, owing to recent advances in statistical theory and computing. In particular, Markov chain Monte Carlo algorithms provide a computational framework for fitting models of adequate complexity and for evaluating the expected consequences of alternative management actions. We illustrate these features using an example based on management of waterfowl habitat.
NASA Technical Reports Server (NTRS)
Buntine, Wray L.
1995-01-01
Intelligent systems require software incorporating probabilistic reasoning, and often times learning. Networks provide a framework and methodology for creating this kind of software. This paper introduces network models based on chain graphs with deterministic nodes. Chain graphs are defined as a hierarchical combination of Bayesian and Markov networks. To model learning, plates on chain graphs are introduced to model independent samples. The paper concludes by discussing various operations that can be performed on chain graphs with plates as a simplification process or to generate learning algorithms.
Bayesian analysis of physiologically based toxicokinetic and toxicodynamic models.
Hack, C Eric
2006-04-17
Physiologically based toxicokinetic (PBTK) and toxicodynamic (TD) models of bromate in animals and humans would improve our ability to accurately estimate the toxic doses in humans based on available animal studies. These mathematical models are often highly parameterized and must be calibrated in order for the model predictions of internal dose to adequately fit the experimentally measured doses. Highly parameterized models are difficult to calibrate and it is difficult to obtain accurate estimates of uncertainty or variability in model parameters with commonly used frequentist calibration methods, such as maximum likelihood estimation (MLE) or least squared error approaches. The Bayesian approach called Markov chain Monte Carlo (MCMC) analysis can be used to successfully calibrate these complex models. Prior knowledge about the biological system and associated model parameters is easily incorporated in this approach in the form of prior parameter distributions, and the distributions are refined or updated using experimental data to generate posterior distributions of parameter estimates. The goal of this paper is to give the non-mathematician a brief description of the Bayesian approach and Markov chain Monte Carlo analysis, how this technique is used in risk assessment, and the issues associated with this approach.
Bayesian Analysis for Exponential Random Graph Models Using the Adaptive Exchange Sampler.
Jin, Ick Hoon; Yuan, Ying; Liang, Faming
2013-10-01
Exponential random graph models have been widely used in social network analysis. However, these models are extremely difficult to handle from a statistical viewpoint, because of the intractable normalizing constant and model degeneracy. In this paper, we consider a fully Bayesian analysis for exponential random graph models using the adaptive exchange sampler, which solves the intractable normalizing constant and model degeneracy issues encountered in Markov chain Monte Carlo (MCMC) simulations. The adaptive exchange sampler can be viewed as a MCMC extension of the exchange algorithm, and it generates auxiliary networks via an importance sampling procedure from an auxiliary Markov chain running in parallel. The convergence of this algorithm is established under mild conditions. The adaptive exchange sampler is illustrated using a few social networks, including the Florentine business network, molecule synthetic network, and dolphins network. The results indicate that the adaptive exchange algorithm can produce more accurate estimates than approximate exchange algorithms, while maintaining the same computational efficiency.
Searching for efficient Markov chain Monte Carlo proposal kernels
Yang, Ziheng; Rodríguez, Carlos E.
2013-01-01
Markov chain Monte Carlo (MCMC) or the Metropolis–Hastings algorithm is a simulation algorithm that has made modern Bayesian statistical inference possible. Nevertheless, the efficiency of different Metropolis–Hastings proposal kernels has rarely been studied except for the Gaussian proposal. Here we propose a unique class of Bactrian kernels, which avoid proposing values that are very close to the current value, and compare their efficiency with a number of proposals for simulating different target distributions, with efficiency measured by the asymptotic variance of a parameter estimate. The uniform kernel is found to be more efficient than the Gaussian kernel, whereas the Bactrian kernel is even better. When optimal scales are used for both, the Bactrian kernel is at least 50% more efficient than the Gaussian. Implementation in a Bayesian program for molecular clock dating confirms the general applicability of our results to generic MCMC algorithms. Our results refute a previous claim that all proposals had nearly identical performance and will prompt further research into efficient MCMC proposals. PMID:24218600
McNally, Kevin; Cotton, Richard; Cocker, John; Jones, Kate; Bartels, Mike; Rick, David; Price, Paul; Loizou, George
2012-01-01
There are numerous biomonitoring programs, both recent and ongoing, to evaluate environmental exposure of humans to chemicals. Due to the lack of exposure and kinetic data, the correlation of biomarker levels with exposure concentrations leads to difficulty in utilizing biomonitoring data for biological guidance values. Exposure reconstruction or reverse dosimetry is the retrospective interpretation of external exposure consistent with biomonitoring data. We investigated the integration of physiologically based pharmacokinetic modelling, global sensitivity analysis, Bayesian inference, and Markov chain Monte Carlo simulation to obtain a population estimate of inhalation exposure to m-xylene. We used exhaled breath and venous blood m-xylene and urinary 3-methylhippuric acid measurements from a controlled human volunteer study in order to evaluate the ability of our computational framework to predict known inhalation exposures. We also investigated the importance of model structure and dimensionality with respect to its ability to reconstruct exposure. PMID:22719759
Slice sampling technique in Bayesian extreme of gold price modelling
NASA Astrophysics Data System (ADS)
Rostami, Mohammad; Adam, Mohd Bakri; Ibrahim, Noor Akma; Yahya, Mohamed Hisham
2013-09-01
In this paper, a simulation study of Bayesian extreme values by using Markov Chain Monte Carlo via slice sampling algorithm is implemented. We compared the accuracy of slice sampling with other methods for a Gumbel model. This study revealed that slice sampling algorithm offers more accurate and closer estimates with less RMSE than other methods . Finally we successfully employed this procedure to estimate the parameters of Malaysia extreme gold price from 2000 to 2011.
spMC: an R-package for 3D lithological reconstructions based on spatial Markov chains
NASA Astrophysics Data System (ADS)
Sartore, Luca; Fabbri, Paolo; Gaetan, Carlo
2016-09-01
The paper presents the spatial Markov Chains (spMC) R-package and a case study of subsoil simulation/prediction located in a plain site of Northeastern Italy. spMC is a quite complete collection of advanced methods for data inspection, besides spMC implements Markov Chain models to estimate experimental transition probabilities of categorical lithological data. Furthermore, simulation methods based on most known prediction methods (as indicator Kriging and CoKriging) were implemented in spMC package. Moreover, other more advanced methods are available for simulations, e.g. path methods and Bayesian procedures, that exploit the maximum entropy. Since the spMC package was developed for intensive geostatistical computations, part of the code is implemented for parallel computations via the OpenMP constructs. A final analysis of this computational efficiency compares the simulation/prediction algorithms by using different numbers of CPU cores, and considering the example data set of the case study included in the package.
Comparing Three Estimation Methods for the Three-Parameter Logistic IRT Model
ERIC Educational Resources Information Center
Lamsal, Sunil
2015-01-01
Different estimation procedures have been developed for the unidimensional three-parameter item response theory (IRT) model. These techniques include the marginal maximum likelihood estimation, the fully Bayesian estimation using Markov chain Monte Carlo simulation techniques, and the Metropolis-Hastings Robbin-Monro estimation. With each…
Bayesian conditional-independence modeling of the AIDS epidemic in England and Wales
NASA Astrophysics Data System (ADS)
Gilks, Walter R.; De Angelis, Daniela; Day, Nicholas E.
We describe the use of conditional-independence modeling, Bayesian inference and Markov chain Monte Carlo, to model and project the HIV-AIDS epidemic in homosexual/bisexual males in England and Wales. Complexity in this analysis arises through selectively missing data, indirectly observed underlying processes, and measurement error. Our emphasis is on presentation and discussion of the concepts, not on the technicalities of this analysis, which can be found elsewhere [D. De Angelis, W.R. Gilks, N.E. Day, Bayesian projection of the the acquired immune deficiency syndrome epidemic (with discussion), Applied Statistics, in press].
Dynamic Modeling Using MCSim and R (SOT 2016 Biological Modeling Webinar Series)
MCSim is a stand-alone software package for simulating and analyzing dynamic models, with a focus on Bayesian analysis using Markov Chain Monte Carlo. While it is an extremely powerful package, it is somewhat inflexible, and offers only a limited range of analysis options, with n...
BAT - The Bayesian analysis toolkit
NASA Astrophysics Data System (ADS)
Caldwell, Allen; Kollár, Daniel; Kröninger, Kevin
2009-11-01
We describe the development of a new toolkit for data analysis. The analysis package is based on Bayes' Theorem, and is realized with the use of Markov Chain Monte Carlo. This gives access to the full posterior probability distribution. Parameter estimation, limit setting and uncertainty propagation are implemented in a straightforward manner.
Bayesian clustering of DNA sequences using Markov chains and a stochastic partition model.
Jääskinen, Väinö; Parkkinen, Ville; Cheng, Lu; Corander, Jukka
2014-02-01
In many biological applications it is necessary to cluster DNA sequences into groups that represent underlying organismal units, such as named species or genera. In metagenomics this grouping needs typically to be achieved on the basis of relatively short sequences which contain different types of errors, making the use of a statistical modeling approach desirable. Here we introduce a novel method for this purpose by developing a stochastic partition model that clusters Markov chains of a given order. The model is based on a Dirichlet process prior and we use conjugate priors for the Markov chain parameters which enables an analytical expression for comparing the marginal likelihoods of any two partitions. To find a good candidate for the posterior mode in the partition space, we use a hybrid computational approach which combines the EM-algorithm with a greedy search. This is demonstrated to be faster and yield highly accurate results compared to earlier suggested clustering methods for the metagenomics application. Our model is fairly generic and could also be used for clustering of other types of sequence data for which Markov chains provide a reasonable way to compress information, as illustrated by experiments on shotgun sequence type data from an Escherichia coli strain.
Bayesian experimental design for models with intractable likelihoods.
Drovandi, Christopher C; Pettitt, Anthony N
2013-12-01
In this paper we present a methodology for designing experiments for efficiently estimating the parameters of models with computationally intractable likelihoods. The approach combines a commonly used methodology for robust experimental design, based on Markov chain Monte Carlo sampling, with approximate Bayesian computation (ABC) to ensure that no likelihood evaluations are required. The utility function considered for precise parameter estimation is based upon the precision of the ABC posterior distribution, which we form efficiently via the ABC rejection algorithm based on pre-computed model simulations. Our focus is on stochastic models and, in particular, we investigate the methodology for Markov process models of epidemics and macroparasite population evolution. The macroparasite example involves a multivariate process and we assess the loss of information from not observing all variables. © 2013, The International Biometric Society.
Golightly, Andrew; Wilkinson, Darren J.
2011-01-01
Computational systems biology is concerned with the development of detailed mechanistic models of biological processes. Such models are often stochastic and analytically intractable, containing uncertain parameters that must be estimated from time course data. In this article, we consider the task of inferring the parameters of a stochastic kinetic model defined as a Markov (jump) process. Inference for the parameters of complex nonlinear multivariate stochastic process models is a challenging problem, but we find here that algorithms based on particle Markov chain Monte Carlo turn out to be a very effective computationally intensive approach to the problem. Approximations to the inferential model based on stochastic differential equations (SDEs) are considered, as well as improvements to the inference scheme that exploit the SDE structure. We apply the methodology to a Lotka–Volterra system and a prokaryotic auto-regulatory network. PMID:23226583
Bayesian Inference on Malignant Breast Cancer in Nigeria: A Diagnosis of MCMC Convergence
Ogunsakin, Ropo Ebenezer; Siaka, Lougue
2017-01-01
Background: There has been no previous study to classify malignant breast tumor in details based on Markov Chain Monte Carlo (MCMC) convergence in Western, Nigeria. This study therefore aims to profile patients living with benign and malignant breast tumor in two different hospitals among women of Western Nigeria, with a focus on prognostic factors and MCMC convergence. Materials and Methods: A hospital-based record was used to identify prognostic factors for malignant breast cancer among women of Western Nigeria. This paper describes Bayesian inference and demonstrates its usage to estimation of parameters of the logistic regression via Markov Chain Monte Carlo (MCMC) algorithm. The result of the Bayesian approach is compared with the classical statistics. Results: The mean age of the respondents was 42.2 ±16.6 years with 52% of the women aged between 35-49 years. The results of both techniques suggest that age and women with at least high school education have a significantly higher risk of being diagnosed with malignant breast tumors than benign breast tumors. The results also indicate a reduction of standard errors is associated with the coefficients obtained from the Bayesian approach. In addition, simulation result reveal that women with at least high school are 1.3 times more at risk of having malignant breast lesion in western Nigeria compared to benign breast lesion. Conclusion: We concluded that more efforts are required towards creating awareness and advocacy campaigns on how the prevalence of malignant breast lesions can be reduced, especially among women. The application of Bayesian produces precise estimates for modeling malignant breast cancer. PMID:29072396
Item Response Theory Equating Using Bayesian Informative Priors.
ERIC Educational Resources Information Center
de la Torre, Jimmy; Patz, Richard J.
This paper seeks to extend the application of Markov chain Monte Carlo (MCMC) methods in item response theory (IRT) to include the estimation of equating relationships along with the estimation of test item parameters. A method is proposed that incorporates estimation of the equating relationship in the item calibration phase. Item parameters from…
Model-Averaged ℓ1 Regularization using Markov Chain Monte Carlo Model Composition
Fraley, Chris; Percival, Daniel
2014-01-01
Bayesian Model Averaging (BMA) is an effective technique for addressing model uncertainty in variable selection problems. However, current BMA approaches have computational difficulty dealing with data in which there are many more measurements (variables) than samples. This paper presents a method for combining ℓ1 regularization and Markov chain Monte Carlo model composition techniques for BMA. By treating the ℓ1 regularization path as a model space, we propose a method to resolve the model uncertainty issues arising in model averaging from solution path point selection. We show that this method is computationally and empirically effective for regression and classification in high-dimensional datasets. We apply our technique in simulations, as well as to some applications that arise in genomics. PMID:25642001
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lu, Dan; Ricciuto, Daniel; Walker, Anthony
Calibration of terrestrial ecosystem models is important but challenging. Bayesian inference implemented by Markov chain Monte Carlo (MCMC) sampling provides a comprehensive framework to estimate model parameters and associated uncertainties using their posterior distributions. The effectiveness and efficiency of the method strongly depend on the MCMC algorithm used. In this study, a Differential Evolution Adaptive Metropolis (DREAM) algorithm was used to estimate posterior distributions of 21 parameters for the data assimilation linked ecosystem carbon (DALEC) model using 14 years of daily net ecosystem exchange data collected at the Harvard Forest Environmental Measurement Site eddy-flux tower. The DREAM is a multi-chainmore » method and uses differential evolution technique for chain movement, allowing it to be efficiently applied to high-dimensional problems, and can reliably estimate heavy-tailed and multimodal distributions that are difficult for single-chain schemes using a Gaussian proposal distribution. The results were evaluated against the popular Adaptive Metropolis (AM) scheme. DREAM indicated that two parameters controlling autumn phenology have multiple modes in their posterior distributions while AM only identified one mode. The calibration of DREAM resulted in a better model fit and predictive performance compared to the AM. DREAM provides means for a good exploration of the posterior distributions of model parameters. Lastly, it reduces the risk of false convergence to a local optimum and potentially improves the predictive performance of the calibrated model.« less
Lu, Dan; Ricciuto, Daniel; Walker, Anthony; ...
2017-02-22
Calibration of terrestrial ecosystem models is important but challenging. Bayesian inference implemented by Markov chain Monte Carlo (MCMC) sampling provides a comprehensive framework to estimate model parameters and associated uncertainties using their posterior distributions. The effectiveness and efficiency of the method strongly depend on the MCMC algorithm used. In this study, a Differential Evolution Adaptive Metropolis (DREAM) algorithm was used to estimate posterior distributions of 21 parameters for the data assimilation linked ecosystem carbon (DALEC) model using 14 years of daily net ecosystem exchange data collected at the Harvard Forest Environmental Measurement Site eddy-flux tower. The DREAM is a multi-chainmore » method and uses differential evolution technique for chain movement, allowing it to be efficiently applied to high-dimensional problems, and can reliably estimate heavy-tailed and multimodal distributions that are difficult for single-chain schemes using a Gaussian proposal distribution. The results were evaluated against the popular Adaptive Metropolis (AM) scheme. DREAM indicated that two parameters controlling autumn phenology have multiple modes in their posterior distributions while AM only identified one mode. The calibration of DREAM resulted in a better model fit and predictive performance compared to the AM. DREAM provides means for a good exploration of the posterior distributions of model parameters. Lastly, it reduces the risk of false convergence to a local optimum and potentially improves the predictive performance of the calibrated model.« less
Schmidt, Paul; Schmid, Volker J; Gaser, Christian; Buck, Dorothea; Bührlen, Susanne; Förschler, Annette; Mühlau, Mark
2013-01-01
Aiming at iron-related T2-hypointensity, which is related to normal aging and neurodegenerative processes, we here present two practicable approaches, based on Bayesian inference, for preprocessing and statistical analysis of a complex set of structural MRI data. In particular, Markov Chain Monte Carlo methods were used to simulate posterior distributions. First, we rendered a segmentation algorithm that uses outlier detection based on model checking techniques within a Bayesian mixture model. Second, we rendered an analytical tool comprising a Bayesian regression model with smoothness priors (in the form of Gaussian Markov random fields) mitigating the necessity to smooth data prior to statistical analysis. For validation, we used simulated data and MRI data of 27 healthy controls (age: [Formula: see text]; range, [Formula: see text]). We first observed robust segmentation of both simulated T2-hypointensities and gray-matter regions known to be T2-hypointense. Second, simulated data and images of segmented T2-hypointensity were analyzed. We found not only robust identification of simulated effects but also a biologically plausible age-related increase of T2-hypointensity primarily within the dentate nucleus but also within the globus pallidus, substantia nigra, and red nucleus. Our results indicate that fully Bayesian inference can successfully be applied for preprocessing and statistical analysis of structural MRI data.
A Bayesian method for inferring transmission chains in a partially observed epidemic.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Marzouk, Youssef M.; Ray, Jaideep
2008-10-01
We present a Bayesian approach for estimating transmission chains and rates in the Abakaliki smallpox epidemic of 1967. The epidemic affected 30 individuals in a community of 74; only the dates of appearance of symptoms were recorded. Our model assumes stochastic transmission of the infections over a social network. Distinct binomial random graphs model intra- and inter-compound social connections, while disease transmission over each link is treated as a Poisson process. Link probabilities and rate parameters are objects of inference. Dates of infection and recovery comprise the remaining unknowns. Distributions for smallpox incubation and recovery periods are obtained from historicalmore » data. Using Markov chain Monte Carlo, we explore the joint posterior distribution of the scalar parameters and provide an expected connectivity pattern for the social graph and infection pathway.« less
ACHCAR, J. A.; MARTINEZ, E. Z.; RUFFINO-NETTO, A.; PAULINO, C. D.; SOARES, P.
2008-01-01
SUMMARY We considered a Bayesian analysis for the prevalence of tuberculosis cases in New York City from 1970 to 2000. This counting dataset presented two change-points during this period. We modelled this counting dataset considering non-homogeneous Poisson processes in the presence of the two-change points. A Bayesian analysis for the data is considered using Markov chain Monte Carlo methods. Simulated Gibbs samples for the parameters of interest were obtained using WinBugs software. PMID:18346287
A Bayesian nonparametric approach to dynamical noise reduction
NASA Astrophysics Data System (ADS)
Kaloudis, Konstantinos; Hatjispyros, Spyridon J.
2018-06-01
We propose a Bayesian nonparametric approach for the noise reduction of a given chaotic time series contaminated by dynamical noise, based on Markov Chain Monte Carlo methods. The underlying unknown noise process (possibly) exhibits heavy tailed behavior. We introduce the Dynamic Noise Reduction Replicator model with which we reconstruct the unknown dynamic equations and in parallel we replicate the dynamics under reduced noise level dynamical perturbations. The dynamic noise reduction procedure is demonstrated specifically in the case of polynomial maps. Simulations based on synthetic time series are presented.
A Bayesian approach to modeling diffraction profiles and application to ferroelectric materials
Iamsasri, Thanakorn; Guerrier, Jonathon; Esteves, Giovanni; ...
2017-02-01
A new statistical approach for modeling diffraction profiles is introduced, using Bayesian inference and a Markov chain Monte Carlo (MCMC) algorithm. This method is demonstrated by modeling the degenerate reflections during application of an electric field to two different ferroelectric materials: thin-film lead zirconate titanate (PZT) of composition PbZr 0.3Ti 0.7O 3and a bulk commercial PZT polycrystalline ferroelectric. Here, the new method offers a unique uncertainty quantification of the model parameters that can be readily propagated into new calculated parameters.
Markov switching multinomial logit model: An application to accident-injury severities.
Malyshkina, Nataliya V; Mannering, Fred L
2009-07-01
In this study, two-state Markov switching multinomial logit models are proposed for statistical modeling of accident-injury severities. These models assume Markov switching over time between two unobserved states of roadway safety as a means of accounting for potential unobserved heterogeneity. The states are distinct in the sense that in different states accident-severity outcomes are generated by separate multinomial logit processes. To demonstrate the applicability of the approach, two-state Markov switching multinomial logit models are estimated for severity outcomes of accidents occurring on Indiana roads over a four-year time period. Bayesian inference methods and Markov Chain Monte Carlo (MCMC) simulations are used for model estimation. The estimated Markov switching models result in a superior statistical fit relative to the standard (single-state) multinomial logit models for a number of roadway classes and accident types. It is found that the more frequent state of roadway safety is correlated with better weather conditions and that the less frequent state is correlated with adverse weather conditions.
ERIC Educational Resources Information Center
Dai, Yunyun
2013-01-01
Mixtures of item response theory (IRT) models have been proposed as a technique to explore response patterns in test data related to cognitive strategies, instructional sensitivity, and differential item functioning (DIF). Estimation proves challenging due to difficulties in identification and questions of effect size needed to recover underlying…
ERIC Educational Resources Information Center
Martin-Fernandez, Manuel; Revuelta, Javier
2017-01-01
This study compares the performance of two estimation algorithms of new usage, the Metropolis-Hastings Robins-Monro (MHRM) and the Hamiltonian MCMC (HMC), with two consolidated algorithms in the psychometric literature, the marginal likelihood via EM algorithm (MML-EM) and the Markov chain Monte Carlo (MCMC), in the estimation of multidimensional…
2015-09-30
oil spills (unpublished data, Kellar). The second will be to conduct a more fine-scale analysis of the areas examined during this study. For this...REFERENCES Carlin BP , Chib S (1995) Bayesian model choice via Markov-chain Monte-Carlo methods. Journal of the Royal Statistical Society
ERIC Educational Resources Information Center
Lee, Soo; Suh, Youngsuk
2018-01-01
Lord's Wald test for differential item functioning (DIF) has not been studied extensively in the context of the multidimensional item response theory (MIRT) framework. In this article, Lord's Wald test was implemented using two estimation approaches, marginal maximum likelihood estimation and Bayesian Markov chain Monte Carlo estimation, to detect…
A default Bayesian hypothesis test for mediation.
Nuijten, Michèle B; Wetzels, Ruud; Matzke, Dora; Dolan, Conor V; Wagenmakers, Eric-Jan
2015-03-01
In order to quantify the relationship between multiple variables, researchers often carry out a mediation analysis. In such an analysis, a mediator (e.g., knowledge of a healthy diet) transmits the effect from an independent variable (e.g., classroom instruction on a healthy diet) to a dependent variable (e.g., consumption of fruits and vegetables). Almost all mediation analyses in psychology use frequentist estimation and hypothesis-testing techniques. A recent exception is Yuan and MacKinnon (Psychological Methods, 14, 301-322, 2009), who outlined a Bayesian parameter estimation procedure for mediation analysis. Here we complete the Bayesian alternative to frequentist mediation analysis by specifying a default Bayesian hypothesis test based on the Jeffreys-Zellner-Siow approach. We further extend this default Bayesian test by allowing a comparison to directional or one-sided alternatives, using Markov chain Monte Carlo techniques implemented in JAGS. All Bayesian tests are implemented in the R package BayesMed (Nuijten, Wetzels, Matzke, Dolan, & Wagenmakers, 2014).
NASA Astrophysics Data System (ADS)
Lu, Dan; Ricciuto, Daniel; Walker, Anthony; Safta, Cosmin; Munger, William
2017-09-01
Calibration of terrestrial ecosystem models is important but challenging. Bayesian inference implemented by Markov chain Monte Carlo (MCMC) sampling provides a comprehensive framework to estimate model parameters and associated uncertainties using their posterior distributions. The effectiveness and efficiency of the method strongly depend on the MCMC algorithm used. In this work, a differential evolution adaptive Metropolis (DREAM) algorithm is used to estimate posterior distributions of 21 parameters for the data assimilation linked ecosystem carbon (DALEC) model using 14 years of daily net ecosystem exchange data collected at the Harvard Forest Environmental Measurement Site eddy-flux tower. The calibration of DREAM results in a better model fit and predictive performance compared to the popular adaptive Metropolis (AM) scheme. Moreover, DREAM indicates that two parameters controlling autumn phenology have multiple modes in their posterior distributions while AM only identifies one mode. The application suggests that DREAM is very suitable to calibrate complex terrestrial ecosystem models, where the uncertain parameter size is usually large and existence of local optima is always a concern. In addition, this effort justifies the assumptions of the error model used in Bayesian calibration according to the residual analysis. The result indicates that a heteroscedastic, correlated, Gaussian error model is appropriate for the problem, and the consequent constructed likelihood function can alleviate the underestimation of parameter uncertainty that is usually caused by using uncorrelated error models.
Fancher, Chris M.; Han, Zhen; Levin, Igor; Page, Katharine; Reich, Brian J.; Smith, Ralph C.; Wilson, Alyson G.; Jones, Jacob L.
2016-01-01
A Bayesian inference method for refining crystallographic structures is presented. The distribution of model parameters is stochastically sampled using Markov chain Monte Carlo. Posterior probability distributions are constructed for all model parameters to properly quantify uncertainty by appropriately modeling the heteroskedasticity and correlation of the error structure. The proposed method is demonstrated by analyzing a National Institute of Standards and Technology silicon standard reference material. The results obtained by Bayesian inference are compared with those determined by Rietveld refinement. Posterior probability distributions of model parameters provide both estimates and uncertainties. The new method better estimates the true uncertainties in the model as compared to the Rietveld method. PMID:27550221
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bertholon, François; Harant, Olivier; Bourlon, Bertrand
This article introduces a joined Bayesian estimation of gas samples issued from a gas chromatography column (GC) coupled with a NEMS sensor based on Giddings Eyring microscopic molecular stochastic model. The posterior distribution is sampled using a Monte Carlo Markov Chain and Gibbs sampling. Parameters are estimated using the posterior mean. This estimation scheme is finally applied on simulated and real datasets using this molecular stochastic forward model.
NASA Astrophysics Data System (ADS)
Berradja, Khadidja; Boughanmi, Nabil
2016-09-01
In dynamic cardiac PET FDG studies the assessment of myocardial metabolic rate of glucose (MMRG) requires the knowledge of the blood input function (IF). IF can be obtained by manual or automatic blood sampling and cross calibrated with PET. These procedures are cumbersome, invasive and generate uncertainties. The IF is contaminated by spillover of radioactivity from the adjacent myocardium and this could cause important error in the estimated MMRG. In this study, we show that the IF can be extracted from the images in a rat heart study with 18F-fluorodeoxyglucose (18F-FDG) by means of Independent Component Analysis (ICA) based on Bayesian theory and Markov Chain Monte Carlo (MCMC) sampling method (BICA). Images of the heart from rats were acquired with the Sherbrooke small animal PET scanner. A region of interest (ROI) was drawn around the rat image and decomposed into blood and tissue using BICA. The Statistical study showed that there is a significant difference (p < 0.05) between MMRG obtained with IF extracted by BICA with respect to IF extracted from measured images corrupted with spillover.
Li, Michael; Dushoff, Jonathan; Bolker, Benjamin M
2018-07-01
Simple mechanistic epidemic models are widely used for forecasting and parameter estimation of infectious diseases based on noisy case reporting data. Despite the widespread application of models to emerging infectious diseases, we know little about the comparative performance of standard computational-statistical frameworks in these contexts. Here we build a simple stochastic, discrete-time, discrete-state epidemic model with both process and observation error and use it to characterize the effectiveness of different flavours of Bayesian Markov chain Monte Carlo (MCMC) techniques. We use fits to simulated data, where parameters (and future behaviour) are known, to explore the limitations of different platforms and quantify parameter estimation accuracy, forecasting accuracy, and computational efficiency across combinations of modeling decisions (e.g. discrete vs. continuous latent states, levels of stochasticity) and computational platforms (JAGS, NIMBLE, Stan).
Refining value-at-risk estimates using a Bayesian Markov-switching GJR-GARCH copula-EVT model.
Sampid, Marius Galabe; Hasim, Haslifah M; Dai, Hongsheng
2018-01-01
In this paper, we propose a model for forecasting Value-at-Risk (VaR) using a Bayesian Markov-switching GJR-GARCH(1,1) model with skewed Student's-t innovation, copula functions and extreme value theory. A Bayesian Markov-switching GJR-GARCH(1,1) model that identifies non-constant volatility over time and allows the GARCH parameters to vary over time following a Markov process, is combined with copula functions and EVT to formulate the Bayesian Markov-switching GJR-GARCH(1,1) copula-EVT VaR model, which is then used to forecast the level of risk on financial asset returns. We further propose a new method for threshold selection in EVT analysis, which we term the hybrid method. Empirical and back-testing results show that the proposed VaR models capture VaR reasonably well in periods of calm and in periods of crisis.
Spectral likelihood expansions for Bayesian inference
NASA Astrophysics Data System (ADS)
Nagel, Joseph B.; Sudret, Bruno
2016-03-01
A spectral approach to Bayesian inference is presented. It pursues the emulation of the posterior probability density. The starting point is a series expansion of the likelihood function in terms of orthogonal polynomials. From this spectral likelihood expansion all statistical quantities of interest can be calculated semi-analytically. The posterior is formally represented as the product of a reference density and a linear combination of polynomial basis functions. Both the model evidence and the posterior moments are related to the expansion coefficients. This formulation avoids Markov chain Monte Carlo simulation and allows one to make use of linear least squares instead. The pros and cons of spectral Bayesian inference are discussed and demonstrated on the basis of simple applications from classical statistics and inverse modeling.
On a full Bayesian inference for force reconstruction problems
NASA Astrophysics Data System (ADS)
Aucejo, M.; De Smet, O.
2018-05-01
In a previous paper, the authors introduced a flexible methodology for reconstructing mechanical sources in the frequency domain from prior local information on both their nature and location over a linear and time invariant structure. The proposed approach was derived from Bayesian statistics, because of its ability in mathematically accounting for experimenter's prior knowledge. However, since only the Maximum a Posteriori estimate was computed, the posterior uncertainty about the regularized solution given the measured vibration field, the mechanical model and the regularization parameter was not assessed. To answer this legitimate question, this paper fully exploits the Bayesian framework to provide, from a Markov Chain Monte Carlo algorithm, credible intervals and other statistical measures (mean, median, mode) for all the parameters of the force reconstruction problem.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lu, Dan; Ricciuto, Daniel M.; Walker, Anthony P.
Calibration of terrestrial ecosystem models is important but challenging. Bayesian inference implemented by Markov chain Monte Carlo (MCMC) sampling provides a comprehensive framework to estimate model parameters and associated uncertainties using their posterior distributions. The effectiveness and efficiency of the method strongly depend on the MCMC algorithm used. In this work, a differential evolution adaptive Metropolis (DREAM) algorithm is used to estimate posterior distributions of 21 parameters for the data assimilation linked ecosystem carbon (DALEC) model using 14 years of daily net ecosystem exchange data collected at the Harvard Forest Environmental Measurement Site eddy-flux tower. The calibration of DREAM results inmore » a better model fit and predictive performance compared to the popular adaptive Metropolis (AM) scheme. Moreover, DREAM indicates that two parameters controlling autumn phenology have multiple modes in their posterior distributions while AM only identifies one mode. The application suggests that DREAM is very suitable to calibrate complex terrestrial ecosystem models, where the uncertain parameter size is usually large and existence of local optima is always a concern. In addition, this effort justifies the assumptions of the error model used in Bayesian calibration according to the residual analysis. Here, the result indicates that a heteroscedastic, correlated, Gaussian error model is appropriate for the problem, and the consequent constructed likelihood function can alleviate the underestimation of parameter uncertainty that is usually caused by using uncorrelated error models.« less
Lu, Dan; Ricciuto, Daniel M.; Walker, Anthony P.; ...
2017-09-27
Calibration of terrestrial ecosystem models is important but challenging. Bayesian inference implemented by Markov chain Monte Carlo (MCMC) sampling provides a comprehensive framework to estimate model parameters and associated uncertainties using their posterior distributions. The effectiveness and efficiency of the method strongly depend on the MCMC algorithm used. In this work, a differential evolution adaptive Metropolis (DREAM) algorithm is used to estimate posterior distributions of 21 parameters for the data assimilation linked ecosystem carbon (DALEC) model using 14 years of daily net ecosystem exchange data collected at the Harvard Forest Environmental Measurement Site eddy-flux tower. The calibration of DREAM results inmore » a better model fit and predictive performance compared to the popular adaptive Metropolis (AM) scheme. Moreover, DREAM indicates that two parameters controlling autumn phenology have multiple modes in their posterior distributions while AM only identifies one mode. The application suggests that DREAM is very suitable to calibrate complex terrestrial ecosystem models, where the uncertain parameter size is usually large and existence of local optima is always a concern. In addition, this effort justifies the assumptions of the error model used in Bayesian calibration according to the residual analysis. Here, the result indicates that a heteroscedastic, correlated, Gaussian error model is appropriate for the problem, and the consequent constructed likelihood function can alleviate the underestimation of parameter uncertainty that is usually caused by using uncorrelated error models.« less
Markov Chain Monte Carlo in the Analysis of Single-Molecule Experimental Data
NASA Astrophysics Data System (ADS)
Kou, S. C.; Xie, X. Sunney; Liu, Jun S.
2003-11-01
This article provides a Bayesian analysis of the single-molecule fluorescence lifetime experiment designed to probe the conformational dynamics of a single DNA hairpin molecule. The DNA hairpin's conformational change is initially modeled as a two-state Markov chain, which is not observable and has to be indirectly inferred. The Brownian diffusion of the single molecule, in addition to the hidden Markov structure, further complicates the matter. We show that the analytical form of the likelihood function can be obtained in the simplest case and a Metropolis-Hastings algorithm can be designed to sample from the posterior distribution of the parameters of interest and to compute desired estiamtes. To cope with the molecular diffusion process and the potentially oscillating energy barrier between the two states of the DNA hairpin, we introduce a data augmentation technique to handle both the Brownian diffusion and the hidden Ornstein-Uhlenbeck process associated with the fluctuating energy barrier, and design a more sophisticated Metropolis-type algorithm. Our method not only increases the estimating resolution by several folds but also proves to be successful for model discrimination.
Bayesian identification of acoustic impedance in treated ducts.
Buot de l'Épine, Y; Chazot, J-D; Ville, J-M
2015-07-01
The noise reduction of a liner placed in the nacelle of a turbofan engine is still difficult to predict due to the lack of knowledge of its acoustic impedance that depends on grazing flow profile, mode order, and sound pressure level. An eduction method, based on a Bayesian approach, is presented here to adjust an impedance model of the liner from sound pressures measured in a rectangular treated duct under multimodal propagation and flow. The cost function is regularized with prior information provided by Guess's [J. Sound Vib. 40, 119-137 (1975)] impedance of a perforated plate. The multi-parameter optimization is achieved with an Evolutionary-Markov-Chain-Monte-Carlo algorithm.
Bayesian approach to analyzing holograms of colloidal particles.
Dimiduk, Thomas G; Manoharan, Vinothan N
2016-10-17
We demonstrate a Bayesian approach to tracking and characterizing colloidal particles from in-line digital holograms. We model the formation of the hologram using Lorenz-Mie theory. We then use a tempered Markov-chain Monte Carlo method to sample the posterior probability distributions of the model parameters: particle position, size, and refractive index. Compared to least-squares fitting, our approach allows us to more easily incorporate prior information about the parameters and to obtain more accurate uncertainties, which are critical for both particle tracking and characterization experiments. Our approach also eliminates the need to supply accurate initial guesses for the parameters, so it requires little tuning.
Farr, W. M.; Mandel, I.; Stevens, D.
2015-01-01
Selection among alternative theoretical models given an observed dataset is an important challenge in many areas of physics and astronomy. Reversible-jump Markov chain Monte Carlo (RJMCMC) is an extremely powerful technique for performing Bayesian model selection, but it suffers from a fundamental difficulty and it requires jumps between model parameter spaces, but cannot efficiently explore both parameter spaces at once. Thus, a naive jump between parameter spaces is unlikely to be accepted in the Markov chain Monte Carlo (MCMC) algorithm and convergence is correspondingly slow. Here, we demonstrate an interpolation technique that uses samples from single-model MCMCs to propose intermodel jumps from an approximation to the single-model posterior of the target parameter space. The interpolation technique, based on a kD-tree data structure, is adaptive and efficient in modest dimensionality. We show that our technique leads to improved convergence over naive jumps in an RJMCMC, and compare it to other proposals in the literature to improve the convergence of RJMCMCs. We also demonstrate the use of the same interpolation technique as a way to construct efficient ‘global’ proposal distributions for single-model MCMCs without prior knowledge of the structure of the posterior distribution, and discuss improvements that permit the method to be used in higher dimensional spaces efficiently. PMID:26543580
Efficient hierarchical trans-dimensional Bayesian inversion of magnetotelluric data
NASA Astrophysics Data System (ADS)
Xiang, Enming; Guo, Rongwen; Dosso, Stan E.; Liu, Jianxin; Dong, Hao; Ren, Zhengyong
2018-06-01
This paper develops an efficient hierarchical trans-dimensional (trans-D) Bayesian algorithm to invert magnetotelluric (MT) data for subsurface geoelectrical structure, with unknown geophysical model parameterization (the number of conductivity-layer interfaces) and data-error models parameterized by an auto-regressive (AR) process to account for potential error correlations. The reversible-jump Markov-chain Monte Carlo algorithm, which adds/removes interfaces and AR parameters in birth/death steps, is applied to sample the trans-D posterior probability density for model parameterization, model parameters, error variance and AR parameters, accounting for the uncertainties of model dimension and data-error statistics in the uncertainty estimates of the conductivity profile. To provide efficient sampling over the multiple subspaces of different dimensions, advanced proposal schemes are applied. Parameter perturbations are carried out in principal-component space, defined by eigen-decomposition of the unit-lag model covariance matrix, to minimize the effect of inter-parameter correlations and provide effective perturbation directions and length scales. Parameters of new layers in birth steps are proposed from the prior, instead of focused distributions centred at existing values, to improve birth acceptance rates. Parallel tempering, based on a series of parallel interacting Markov chains with successively relaxed likelihoods, is applied to improve chain mixing over model dimensions. The trans-D inversion is applied in a simulation study to examine the resolution of model structure according to the data information content. The inversion is also applied to a measured MT data set from south-central Australia.
Iglesias, Juan Eugenio; Sabuncu, Mert Rory; Van Leemput, Koen
2013-10-01
Many segmentation algorithms in medical image analysis use Bayesian modeling to augment local image appearance with prior anatomical knowledge. Such methods often contain a large number of free parameters that are first estimated and then kept fixed during the actual segmentation process. However, a faithful Bayesian analysis would marginalize over such parameters, accounting for their uncertainty by considering all possible values they may take. Here we propose to incorporate this uncertainty into Bayesian segmentation methods in order to improve the inference process. In particular, we approximate the required marginalization over model parameters using computationally efficient Markov chain Monte Carlo techniques. We illustrate the proposed approach using a recently developed Bayesian method for the segmentation of hippocampal subfields in brain MRI scans, showing a significant improvement in an Alzheimer's disease classification task. As an additional benefit, the technique also allows one to compute informative "error bars" on the volume estimates of individual structures. Copyright © 2013 Elsevier B.V. All rights reserved.
Iglesias, Juan Eugenio; Sabuncu, Mert Rory; Leemput, Koen Van
2013-01-01
Many segmentation algorithms in medical image analysis use Bayesian modeling to augment local image appearance with prior anatomical knowledge. Such methods often contain a large number of free parameters that are first estimated and then kept fixed during the actual segmentation process. However, a faithful Bayesian analysis would marginalize over such parameters, accounting for their uncertainty by considering all possible values they may take. Here we propose to incorporate this uncertainty into Bayesian segmentation methods in order to improve the inference process. In particular, we approximate the required marginalization over model parameters using computationally efficient Markov chain Monte Carlo techniques. We illustrate the proposed approach using a recently developed Bayesian method for the segmentation of hippocampal subfields in brain MRI scans, showing a significant improvement in an Alzheimer’s disease classification task. As an additional benefit, the technique also allows one to compute informative “error bars” on the volume estimates of individual structures. PMID:23773521
Lyons, Michael A.; Yang, Raymond S.H.; Mayeno, Arthur N.; Reisfeld, Brad
2008-01-01
Background One problem of interpreting population-based biomonitoring data is the reconstruction of corresponding external exposure in cases where no such data are available. Objectives We demonstrate the use of a computational framework that integrates physiologically based pharmacokinetic (PBPK) modeling, Bayesian inference, and Markov chain Monte Carlo simulation to obtain a population estimate of environmental chloroform source concentrations consistent with human biomonitoring data. The biomonitoring data consist of chloroform blood concentrations measured as part of the Third National Health and Nutrition Examination Survey (NHANES III), and for which no corresponding exposure data were collected. Methods We used a combined PBPK and shower exposure model to consider several routes and sources of exposure: ingestion of tap water, inhalation of ambient household air, and inhalation and dermal absorption while showering. We determined posterior distributions for chloroform concentration in tap water and ambient household air using U.S. Environmental Protection Agency Total Exposure Assessment Methodology (TEAM) data as prior distributions for the Bayesian analysis. Results Posterior distributions for exposure indicate that 95% of the population represented by the NHANES III data had likely chloroform exposures ≤ 67 μg/L in tap water and ≤ 0.02 μg/L in ambient household air. Conclusions Our results demonstrate the application of computer simulation to aid in the interpretation of human biomonitoring data in the context of the exposure–health evaluation–risk assessment continuum. These results should be considered as a demonstration of the method and can be improved with the addition of more detailed data. PMID:18709138
Bayesian comparison of protein structures using partial Procrustes distance.
Ejlali, Nasim; Faghihi, Mohammad Reza; Sadeghi, Mehdi
2017-09-26
An important topic in bioinformatics is the protein structure alignment. Some statistical methods have been proposed for this problem, but most of them align two protein structures based on the global geometric information without considering the effect of neighbourhood in the structures. In this paper, we provide a Bayesian model to align protein structures, by considering the effect of both local and global geometric information of protein structures. Local geometric information is incorporated to the model through the partial Procrustes distance of small substructures. These substructures are composed of β-carbon atoms from the side chains. Parameters are estimated using a Markov chain Monte Carlo (MCMC) approach. We evaluate the performance of our model through some simulation studies. Furthermore, we apply our model to a real dataset and assess the accuracy and convergence rate. Results show that our model is much more efficient than previous approaches.
RadVel: General toolkit for modeling Radial Velocities
NASA Astrophysics Data System (ADS)
Fulton, Benjamin J.; Petigura, Erik A.; Blunt, Sarah; Sinukoff, Evan
2018-01-01
RadVel models Keplerian orbits in radial velocity (RV) time series. The code is written in Python with a fast Kepler's equation solver written in C. It provides a framework for fitting RVs using maximum a posteriori optimization and computing robust confidence intervals by sampling the posterior probability density via Markov Chain Monte Carlo (MCMC). RadVel can perform Bayesian model comparison and produces publication quality plots and LaTeX tables.
Multivariate longitudinal data analysis with mixed effects hidden Markov models.
Raffa, Jesse D; Dubin, Joel A
2015-09-01
Multiple longitudinal responses are often collected as a means to capture relevant features of the true outcome of interest, which is often hidden and not directly measurable. We outline an approach which models these multivariate longitudinal responses as generated from a hidden disease process. We propose a class of models which uses a hidden Markov model with separate but correlated random effects between multiple longitudinal responses. This approach was motivated by a smoking cessation clinical trial, where a bivariate longitudinal response involving both a continuous and a binomial response was collected for each participant to monitor smoking behavior. A Bayesian method using Markov chain Monte Carlo is used. Comparison of separate univariate response models to the bivariate response models was undertaken. Our methods are demonstrated on the smoking cessation clinical trial dataset, and properties of our approach are examined through extensive simulation studies. © 2015, The International Biometric Society.
Nichols, J.M.; Link, W.A.; Murphy, K.D.; Olson, C.C.
2010-01-01
This work discusses a Bayesian approach to approximating the distribution of parameters governing nonlinear structural systems. Specifically, we use a Markov Chain Monte Carlo method for sampling the posterior parameter distributions thus producing both point and interval estimates for parameters. The method is first used to identify both linear and nonlinear parameters in a multiple degree-of-freedom structural systems using free-decay vibrations. The approach is then applied to the problem of identifying the location, size, and depth of delamination in a model composite beam. The influence of additive Gaussian noise on the response data is explored with respect to the quality of the resulting parameter estimates.
Bayesian Peptide Peak Detection for High Resolution TOF Mass Spectrometry.
Zhang, Jianqiu; Zhou, Xiaobo; Wang, Honghui; Suffredini, Anthony; Zhang, Lin; Huang, Yufei; Wong, Stephen
2010-11-01
In this paper, we address the issue of peptide ion peak detection for high resolution time-of-flight (TOF) mass spectrometry (MS) data. A novel Bayesian peptide ion peak detection method is proposed for TOF data with resolution of 10 000-15 000 full width at half-maximum (FWHW). MS spectra exhibit distinct characteristics at this resolution, which are captured in a novel parametric model. Based on the proposed parametric model, a Bayesian peak detection algorithm based on Markov chain Monte Carlo (MCMC) sampling is developed. The proposed algorithm is tested on both simulated and real datasets. The results show a significant improvement in detection performance over a commonly employed method. The results also agree with expert's visual inspection. Moreover, better detection consistency is achieved across MS datasets from patients with identical pathological condition.
Bayesian Peptide Peak Detection for High Resolution TOF Mass Spectrometry
Zhang, Jianqiu; Zhou, Xiaobo; Wang, Honghui; Suffredini, Anthony; Zhang, Lin; Huang, Yufei; Wong, Stephen
2011-01-01
In this paper, we address the issue of peptide ion peak detection for high resolution time-of-flight (TOF) mass spectrometry (MS) data. A novel Bayesian peptide ion peak detection method is proposed for TOF data with resolution of 10 000–15 000 full width at half-maximum (FWHW). MS spectra exhibit distinct characteristics at this resolution, which are captured in a novel parametric model. Based on the proposed parametric model, a Bayesian peak detection algorithm based on Markov chain Monte Carlo (MCMC) sampling is developed. The proposed algorithm is tested on both simulated and real datasets. The results show a significant improvement in detection performance over a commonly employed method. The results also agree with expert’s visual inspection. Moreover, better detection consistency is achieved across MS datasets from patients with identical pathological condition. PMID:21544266
A Bayesian Multinomial Probit MODEL FOR THE ANALYSIS OF PANEL CHOICE DATA.
Fong, Duncan K H; Kim, Sunghoon; Chen, Zhe; DeSarbo, Wayne S
2016-03-01
A new Bayesian multinomial probit model is proposed for the analysis of panel choice data. Using a parameter expansion technique, we are able to devise a Markov Chain Monte Carlo algorithm to compute our Bayesian estimates efficiently. We also show that the proposed procedure enables the estimation of individual level coefficients for the single-period multinomial probit model even when the available prior information is vague. We apply our new procedure to consumer purchase data and reanalyze a well-known scanner panel dataset that reveals new substantive insights. In addition, we delineate a number of advantageous features of our proposed procedure over several benchmark models. Finally, through a simulation analysis employing a fractional factorial design, we demonstrate that the results from our proposed model are quite robust with respect to differing factors across various conditions.
HIV Migration Between Blood and Cerebrospinal Fluid or Semen Over Time
Chaillon, Antoine; Gianella, Sara; Wertheim, Joel O.; Richman, Douglas D.; Mehta, Sanjay R.; Smith, David M.
2014-01-01
Previous studies reported associations between neuropathogenesis and human immunodeficiency virus (HIV) compartmentalization in cerebrospinal fluid (CSF) and between sexual transmission and human immunodeficiency virus type 1 (HIV) compartmentalization in semen. It remains unclear, however, how compartmentalization dynamics change over time. To address this, we used statistical methods and Bayesian phylogenetic approaches to reconstruct temporal dynamics of HIV migration between blood and CSF and between blood and the male genital tract. We investigated 11 HIV-infected individuals with paired semen and blood samples and 4 individuals with paired CSF and blood samples. Aligned partial HIV env sequences were analyzed by (1) phylogenetic reconstruction, using a Bayesian Markov-chain Monte Carlo approach; (2) evaluation of viral compartmentalization, using tree-based and distance-based methods; and (3) analysis of migration events, using a discrete Bayesian asymmetric phylogeographic approach of diffusion with Markov jump counts estimation. Finally, we evaluated potential correlates of viral gene flow across anatomical compartments. We observed bidirectional replenishment of viral compartments and asynchronous peaks of viral migration from and to blood over time, suggesting that disruption of viral compartment is transient and directionally selected. These findings imply that viral subpopulations in anatomical sites are an active part of the whole viral population and that compartmental reservoirs could have implications in future eradication studies. PMID:24302756
Reed, H; Leckey, Cara A C; Dick, A; Harvey, G; Dobson, J
2018-01-01
Ultrasonic damage detection and characterization is commonly used in nondestructive evaluation (NDE) of aerospace composite components. In recent years there has been an increased development of guided wave based methods. In real materials and structures, these dispersive waves result in complicated behavior in the presence of complex damage scenarios. Model-based characterization methods utilize accurate three dimensional finite element models (FEMs) of guided wave interaction with realistic damage scenarios to aid in defect identification and classification. This work describes an inverse solution for realistic composite damage characterization by comparing the wavenumber-frequency spectra of experimental and simulated ultrasonic inspections. The composite laminate material properties are first verified through a Bayesian solution (Markov chain Monte Carlo), enabling uncertainty quantification surrounding the characterization. A study is undertaken to assess the efficacy of the proposed damage model and comparative metrics between the experimental and simulated output. The FEM is then parameterized with a damage model capable of describing the typical complex damage created by impact events in composites. The damage is characterized through a transdimensional Markov chain Monte Carlo solution, enabling a flexible damage model capable of adapting to the complex damage geometry investigated here. The posterior probability distributions of the individual delamination petals as well as the overall envelope of the damage site are determined. Copyright © 2017 Elsevier B.V. All rights reserved.
Bulashevska, Alla; Stein, Martin; Jackson, David; Eils, Roland
2009-12-01
Accurate computational methods that can help to predict biological function of a protein from its sequence are of great interest to research biologists and pharmaceutical companies. One approach to assume the function of proteins is to predict the interactions between proteins and other molecules. In this work, we propose a machine learning method that uses a primary sequence of a domain to predict its propensity for interaction with small molecules. By curating the Pfam database with respect to the small molecule binding ability of its component domains, we have constructed a dataset of small molecule binding and non-binding domains. This dataset was then used as training set to learn a Bayesian classifier, which should distinguish members of each class. The domain sequences of both classes are modelled with Markov chains. In a Jack-knife test, our classification procedure achieved the predictive accuracies of 77.2% and 66.7% for binding and non-binding classes respectively. We demonstrate the applicability of our classifier by using it to identify previously unknown small molecule binding domains. Our predictions are available as supplementary material and can provide very useful information to drug discovery specialists. Given the ubiquitous and essential role small molecules play in biological processes, our method is important for identifying pharmaceutically relevant components of complete proteomes. The software is available from the author upon request.
Weijs, Liesbeth; Yang, Raymond S H; Das, Krishna; Covaci, Adrian; Blust, Ronny
2013-05-07
Physiologically based pharmacokinetic (PBPK) modeling in marine mammals is a challenge because of the lack of parameter information and the ban on exposure experiments. To minimize uncertainty and variability, parameter estimation methods are required for the development of reliable PBPK models. The present study is the first to develop PBPK models for the lifetime bioaccumulation of p,p'-DDT, p,p'-DDE, and p,p'-DDD in harbor porpoises. In addition, this study is also the first to apply the Bayesian approach executed with Markov chain Monte Carlo simulations using two data sets of harbor porpoises from the Black and North Seas. Parameters from the literature were used as priors for the first "model update" using the Black Sea data set, the resulting posterior parameters were then used as priors for the second "model update" using the North Sea data set. As such, PBPK models with parameters specific for harbor porpoises could be strengthened with more robust probability distributions. As the science and biomonitoring effort progress in this area, more data sets will become available to further strengthen and update the parameters in the PBPK models for harbor porpoises as a species anywhere in the world. Further, such an approach could very well be extended to other protected marine mammals.
Doss, Hani; Tan, Aixin
2017-01-01
In the classical biased sampling problem, we have k densities π1(·), …, πk(·), each known up to a normalizing constant, i.e. for l = 1, …, k, πl(·) = νl(·)/ml, where νl(·) is a known function and ml is an unknown constant. For each l, we have an iid sample from πl,·and the problem is to estimate the ratios ml/ms for all l and all s. This problem arises frequently in several situations in both frequentist and Bayesian inference. An estimate of the ratios was developed and studied by Vardi and his co-workers over two decades ago, and there has been much subsequent work on this problem from many different perspectives. In spite of this, there are no rigorous results in the literature on how to estimate the standard error of the estimate. We present a class of estimates of the ratios of normalizing constants that are appropriate for the case where the samples from the πl’s are not necessarily iid sequences, but are Markov chains. We also develop an approach based on regenerative simulation for obtaining standard errors for the estimates of ratios of normalizing constants. These standard error estimates are valid for both the iid case and the Markov chain case. PMID:28706463
Doss, Hani; Tan, Aixin
2014-09-01
In the classical biased sampling problem, we have k densities π 1 (·), …, π k (·), each known up to a normalizing constant, i.e. for l = 1, …, k , π l (·) = ν l (·)/ m l , where ν l (·) is a known function and m l is an unknown constant. For each l , we have an iid sample from π l , · and the problem is to estimate the ratios m l /m s for all l and all s . This problem arises frequently in several situations in both frequentist and Bayesian inference. An estimate of the ratios was developed and studied by Vardi and his co-workers over two decades ago, and there has been much subsequent work on this problem from many different perspectives. In spite of this, there are no rigorous results in the literature on how to estimate the standard error of the estimate. We present a class of estimates of the ratios of normalizing constants that are appropriate for the case where the samples from the π l 's are not necessarily iid sequences, but are Markov chains. We also develop an approach based on regenerative simulation for obtaining standard errors for the estimates of ratios of normalizing constants. These standard error estimates are valid for both the iid case and the Markov chain case.
Hierarchical Bayesian modeling of ionospheric TEC disturbances as non-stationary processes
NASA Astrophysics Data System (ADS)
Seid, Abdu Mohammed; Berhane, Tesfahun; Roininen, Lassi; Nigussie, Melessew
2018-03-01
We model regular and irregular variation of ionospheric total electron content as stationary and non-stationary processes, respectively. We apply the method developed to SCINDA GPS data set observed at Bahir Dar, Ethiopia (11.6 °N, 37.4 °E) . We use hierarchical Bayesian inversion with Gaussian Markov random process priors, and we model the prior parameters in the hyperprior. We use Matérn priors via stochastic partial differential equations, and use scaled Inv -χ2 hyperpriors for the hyperparameters. For drawing posterior estimates, we use Markov Chain Monte Carlo methods: Gibbs sampling and Metropolis-within-Gibbs for parameter and hyperparameter estimations, respectively. This allows us to quantify model parameter estimation uncertainties as well. We demonstrate the applicability of the method proposed using a synthetic test case. Finally, we apply the method to real GPS data set, which we decompose to regular and irregular variation components. The result shows that the approach can be used as an accurate ionospheric disturbance characterization technique that quantifies the total electron content variability with corresponding error uncertainties.
NASA Astrophysics Data System (ADS)
Cubillos, Patricio; Harrington, Joseph; Blecic, Jasmina; Stemm, Madison M.; Lust, Nate B.; Foster, Andrew S.; Rojo, Patricio M.; Loredo, Thomas J.
2014-11-01
Multi-wavelength secondary-eclipse and transit depths probe the thermo-chemical properties of exoplanets. In recent years, several research groups have developed retrieval codes to analyze the existing data and study the prospects of future facilities. However, the scientific community has limited access to these packages. Here we premiere the open-source Bayesian Atmospheric Radiative Transfer (BART) code. We discuss the key aspects of the radiative-transfer algorithm and the statistical package. The radiation code includes line databases for all HITRAN molecules, high-temperature H2O, TiO, and VO, and includes a preprocessor for adding additional line databases without recompiling the radiation code. Collision-induced absorption lines are available for H2-H2 and H2-He. The parameterized thermal and molecular abundance profiles can be modified arbitrarily without recompilation. The generated spectra are integrated over arbitrary bandpasses for comparison to data. BART's statistical package, Multi-core Markov-chain Monte Carlo (MC3), is a general-purpose MCMC module. MC3 implements the Differental-evolution Markov-chain Monte Carlo algorithm (ter Braak 2006, 2009). MC3 converges 20-400 times faster than the usual Metropolis-Hastings MCMC algorithm, and in addition uses the Message Passing Interface (MPI) to parallelize the MCMC chains. We apply the BART retrieval code to the HD 209458b data set to estimate the planet's temperature profile and molecular abundances. This work was supported by NASA Planetary Atmospheres grant NNX12AI69G and NASA Astrophysics Data Analysis Program grant NNX13AF38G. JB holds a NASA Earth and Space Science Fellowship.
NASA Astrophysics Data System (ADS)
Davis, A. D.; Heimbach, P.; Marzouk, Y.
2017-12-01
We develop a Bayesian inverse modeling framework for predicting future ice sheet volume with associated formal uncertainty estimates. Marine ice sheets are drained by fast-flowing ice streams, which we simulate using a flowline model. Flowline models depend on geometric parameters (e.g., basal topography), parameterized physical processes (e.g., calving laws and basal sliding), and climate parameters (e.g., surface mass balance), most of which are unknown or uncertain. Given observations of ice surface velocity and thickness, we define a Bayesian posterior distribution over static parameters, such as basal topography. We also define a parameterized distribution over variable parameters, such as future surface mass balance, which we assume are not informed by the data. Hyperparameters are used to represent climate change scenarios, and sampling their distributions mimics internal variation. For example, a warming climate corresponds to increasing mean surface mass balance but an individual sample may have periods of increasing or decreasing surface mass balance. We characterize the predictive distribution of ice volume by evaluating the flowline model given samples from the posterior distribution and the distribution over variable parameters. Finally, we determine the effect of climate change on future ice sheet volume by investigating how changing the hyperparameters affects the predictive distribution. We use state-of-the-art Bayesian computation to address computational feasibility. Characterizing the posterior distribution (using Markov chain Monte Carlo), sampling the full range of variable parameters and evaluating the predictive model is prohibitively expensive. Furthermore, the required resolution of the inferred basal topography may be very high, which is often challenging for sampling methods. Instead, we leverage regularity in the predictive distribution to build a computationally cheaper surrogate over the low dimensional quantity of interest (future ice sheet volume). Continual surrogate refinement guarantees asymptotic sampling from the predictive distribution. Directly characterizing the predictive distribution in this way allows us to assess the ice sheet's sensitivity to climate variability and change.
Markov Chain Monte Carlo: an introduction for epidemiologists
Hamra, Ghassan; MacLehose, Richard; Richardson, David
2013-01-01
Markov Chain Monte Carlo (MCMC) methods are increasingly popular among epidemiologists. The reason for this may in part be that MCMC offers an appealing approach to handling some difficult types of analyses. Additionally, MCMC methods are those most commonly used for Bayesian analysis. However, epidemiologists are still largely unfamiliar with MCMC. They may lack familiarity either with he implementation of MCMC or with interpretation of the resultant output. As with tutorials outlining the calculus behind maximum likelihood in previous decades, a simple description of the machinery of MCMC is needed. We provide an introduction to conducting analyses with MCMC, and show that, given the same data and under certain model specifications, the results of an MCMC simulation match those of methods based on standard maximum-likelihood estimation (MLE). In addition, we highlight examples of instances in which MCMC approaches to data analysis provide a clear advantage over MLE. We hope that this brief tutorial will encourage epidemiologists to consider MCMC approaches as part of their analytic tool-kit. PMID:23569196
Hydrologic Model Selection using Markov chain Monte Carlo methods
NASA Astrophysics Data System (ADS)
Marshall, L.; Sharma, A.; Nott, D.
2002-12-01
Estimation of parameter uncertainty (and in turn model uncertainty) allows assessment of the risk in likely applications of hydrological models. Bayesian statistical inference provides an ideal means of assessing parameter uncertainty whereby prior knowledge about the parameter is combined with information from the available data to produce a probability distribution (the posterior distribution) that describes uncertainty about the parameter and serves as a basis for selecting appropriate values for use in modelling applications. Widespread use of Bayesian techniques in hydrology has been hindered by difficulties in summarizing and exploring the posterior distribution. These difficulties have been largely overcome by recent advances in Markov chain Monte Carlo (MCMC) methods that involve random sampling of the posterior distribution. This study presents an adaptive MCMC sampling algorithm which has characteristics that are well suited to model parameters with a high degree of correlation and interdependence, as is often evident in hydrological models. The MCMC sampling technique is used to compare six alternative configurations of a commonly used conceptual rainfall-runoff model, the Australian Water Balance Model (AWBM), using 11 years of daily rainfall runoff data from the Bass river catchment in Australia. The alternative configurations considered fall into two classes - those that consider model errors to be independent of prior values, and those that model the errors as an autoregressive process. Each such class consists of three formulations that represent increasing levels of complexity (and parameterisation) of the original model structure. The results from this study point both to the importance of using Bayesian approaches in evaluating model performance, as well as the simplicity of the MCMC sampling framework that has the ability to bring such approaches within the reach of the applied hydrological community.
Atmospheric Tracer Inverse Modeling Using Markov Chain Monte Carlo (MCMC)
NASA Astrophysics Data System (ADS)
Kasibhatla, P.
2004-12-01
In recent years, there has been an increasing emphasis on the use of Bayesian statistical estimation techniques to characterize the temporal and spatial variability of atmospheric trace gas sources and sinks. The applications have been varied in terms of the particular species of interest, as well as in terms of the spatial and temporal resolution of the estimated fluxes. However, one common characteristic has been the use of relatively simple statistical models for describing the measurement and chemical transport model error statistics and prior source statistics. For example, multivariate normal probability distribution functions (pdfs) are commonly used to model these quantities and inverse source estimates are derived for fixed values of pdf paramaters. While the advantage of this approach is that closed form analytical solutions for the a posteriori pdfs of interest are available, it is worth exploring Bayesian analysis approaches which allow for a more general treatment of error and prior source statistics. Here, we present an application of the Markov Chain Monte Carlo (MCMC) methodology to an atmospheric tracer inversion problem to demonstrate how more gereral statistical models for errors can be incorporated into the analysis in a relatively straightforward manner. The MCMC approach to Bayesian analysis, which has found wide application in a variety of fields, is a statistical simulation approach that involves computing moments of interest of the a posteriori pdf by efficiently sampling this pdf. The specific inverse problem that we focus on is the annual mean CO2 source/sink estimation problem considered by the TransCom3 project. TransCom3 was a collaborative effort involving various modeling groups and followed a common modeling and analysis protocoal. As such, this problem provides a convenient case study to demonstrate the applicability of the MCMC methodology to atmospheric tracer source/sink estimation problems.
Abanto-Valle, C. A.; Bandyopadhyay, D.; Lachos, V. H.; Enriquez, I.
2009-01-01
A Bayesian analysis of stochastic volatility (SV) models using the class of symmetric scale mixtures of normal (SMN) distributions is considered. In the face of non-normality, this provides an appealing robust alternative to the routine use of the normal distribution. Specific distributions examined include the normal, student-t, slash and the variance gamma distributions. Using a Bayesian paradigm, an efficient Markov chain Monte Carlo (MCMC) algorithm is introduced for parameter estimation. Moreover, the mixing parameters obtained as a by-product of the scale mixture representation can be used to identify outliers. The methods developed are applied to analyze daily stock returns data on S&P500 index. Bayesian model selection criteria as well as out-of- sample forecasting results reveal that the SV models based on heavy-tailed SMN distributions provide significant improvement in model fit as well as prediction to the S&P500 index data over the usual normal model. PMID:20730043
Time-varying nonstationary multivariate risk analysis using a dynamic Bayesian copula
NASA Astrophysics Data System (ADS)
Sarhadi, Ali; Burn, Donald H.; Concepción Ausín, María.; Wiper, Michael P.
2016-03-01
A time-varying risk analysis is proposed for an adaptive design framework in nonstationary conditions arising from climate change. A Bayesian, dynamic conditional copula is developed for modeling the time-varying dependence structure between mixed continuous and discrete multiattributes of multidimensional hydrometeorological phenomena. Joint Bayesian inference is carried out to fit the marginals and copula in an illustrative example using an adaptive, Gibbs Markov Chain Monte Carlo (MCMC) sampler. Posterior mean estimates and credible intervals are provided for the model parameters and the Deviance Information Criterion (DIC) is used to select the model that best captures different forms of nonstationarity over time. This study also introduces a fully Bayesian, time-varying joint return period for multivariate time-dependent risk analysis in nonstationary environments. The results demonstrate that the nature and the risk of extreme-climate multidimensional processes are changed over time under the impact of climate change, and accordingly the long-term decision making strategies should be updated based on the anomalies of the nonstationary environment.
Bayesian analysis of caustic-crossing microlensing events
NASA Astrophysics Data System (ADS)
Cassan, A.; Horne, K.; Kains, N.; Tsapras, Y.; Browne, P.
2010-06-01
Aims: Caustic-crossing binary-lens microlensing events are important anomalous events because they are capable of detecting an extrasolar planet companion orbiting the lens star. Fast and robust modelling methods are thus of prime interest in helping to decide whether a planet is detected by an event. Cassan introduced a new set of parameters to model binary-lens events, which are closely related to properties of the light curve. In this work, we explain how Bayesian priors can be added to this framework, and investigate on interesting options. Methods: We develop a mathematical formulation that allows us to compute analytically the priors on the new parameters, given some previous knowledge about other physical quantities. We explicitly compute the priors for a number of interesting cases, and show how this can be implemented in a fully Bayesian, Markov chain Monte Carlo algorithm. Results: Using Bayesian priors can accelerate microlens fitting codes by reducing the time spent considering physically implausible models, and helps us to discriminate between alternative models based on the physical plausibility of their parameters.
Estimating the hatchery fraction of a natural population: a Bayesian approach
Barber, Jarrett J.; Gerow, Kenneth G.; Connolly, Patrick J.; Singh, Sarabdeep
2011-01-01
There is strong and growing interest in estimating the proportion of hatchery fish that are in a natural population (the hatchery fraction). In a sample of fish from the relevant population, some are observed to be marked, indicating their origin as hatchery fish. The observed proportion of marked fish is usually less than the actual hatchery fraction, since the observed proportion is determined by the proportion originally marked, differential survival (usually lower) of marked fish relative to unmarked hatchery fish, and rates of mark retention and detection. Bayesian methods can work well in a setting such as this, in which empirical data are limited but for which there may be considerable expert judgment regarding these values. We explored a Bayesian estimation of the hatchery fraction using Monte Carlo–Markov chain methods. Based on our findings, we created an interactive Excel tool to implement the algorithm, which we have made available for free.
Nonparametric Bayesian Segmentation of a Multivariate Inhomogeneous Space-Time Poisson Process.
Ding, Mingtao; He, Lihan; Dunson, David; Carin, Lawrence
2012-12-01
A nonparametric Bayesian model is proposed for segmenting time-evolving multivariate spatial point process data. An inhomogeneous Poisson process is assumed, with a logistic stick-breaking process (LSBP) used to encourage piecewise-constant spatial Poisson intensities. The LSBP explicitly favors spatially contiguous segments, and infers the number of segments based on the observed data. The temporal dynamics of the segmentation and of the Poisson intensities are modeled with exponential correlation in time, implemented in the form of a first-order autoregressive model for uniformly sampled discrete data, and via a Gaussian process with an exponential kernel for general temporal sampling. We consider and compare two different inference techniques: a Markov chain Monte Carlo sampler, which has relatively high computational complexity; and an approximate and efficient variational Bayesian analysis. The model is demonstrated with a simulated example and a real example of space-time crime events in Cincinnati, Ohio, USA.
Bayesian spatio-temporal modeling of particulate matter concentrations in Peninsular Malaysia
NASA Astrophysics Data System (ADS)
Manga, Edna; Awang, Norhashidah
2016-06-01
This article presents an application of a Bayesian spatio-temporal Gaussian process (GP) model on particulate matter concentrations from Peninsular Malaysia. We analyze daily PM10 concentration levels from 35 monitoring sites in June and July 2011. The spatiotemporal model set in a Bayesian hierarchical framework allows for inclusion of informative covariates, meteorological variables and spatiotemporal interactions. Posterior density estimates of the model parameters are obtained by Markov chain Monte Carlo methods. Preliminary data analysis indicate information on PM10 levels at sites classified as industrial locations could explain part of the space time variations. We include the site-type indicator in our modeling efforts. Results of the parameter estimates for the fitted GP model show significant spatio-temporal structure and positive effect of the location-type explanatory variable. We also compute some validation criteria for the out of sample sites that show the adequacy of the model for predicting PM10 at unmonitored sites.
Approximate Bayesian computation for spatial SEIR(S) epidemic models.
Brown, Grant D; Porter, Aaron T; Oleson, Jacob J; Hinman, Jessica A
2018-02-01
Approximate Bayesia n Computation (ABC) provides an attractive approach to estimation in complex Bayesian inferential problems for which evaluation of the kernel of the posterior distribution is impossible or computationally expensive. These highly parallelizable techniques have been successfully applied to many fields, particularly in cases where more traditional approaches such as Markov chain Monte Carlo (MCMC) are impractical. In this work, we demonstrate the application of approximate Bayesian inference to spatially heterogeneous Susceptible-Exposed-Infectious-Removed (SEIR) stochastic epidemic models. These models have a tractable posterior distribution, however MCMC techniques nevertheless become computationally infeasible for moderately sized problems. We discuss the practical implementation of these techniques via the open source ABSEIR package for R. The performance of ABC relative to traditional MCMC methods in a small problem is explored under simulation, as well as in the spatially heterogeneous context of the 2014 epidemic of Chikungunya in the Americas. Copyright © 2017 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Birkel, C.; Paroli, R.; Spezia, L.; Tetzlaff, D.; Soulsby, C.
2012-12-01
In this paper we present a novel model framework using the class of Markov Switching Autoregressive Models (MSARMs) to examine catchments as complex stochastic systems that exhibit non-stationary, non-linear and non-Normal rainfall-runoff and solute dynamics. Hereby, MSARMs are pairs of stochastic processes, one observed and one unobserved, or hidden. We model the unobserved process as a finite state Markov chain and assume that the observed process, given the hidden Markov chain, is conditionally autoregressive, which means that the current observation depends on its recent past (system memory). The model is fully embedded in a Bayesian analysis based on Markov Chain Monte Carlo (MCMC) algorithms for model selection and uncertainty assessment. Hereby, the autoregressive order and the dimension of the hidden Markov chain state-space are essentially self-selected. The hidden states of the Markov chain represent unobserved levels of variability in the observed process that may result from complex interactions of hydroclimatic variability on the one hand and catchment characteristics affecting water and solute storage on the other. To deal with non-stationarity, additional meteorological and hydrological time series along with a periodic component can be included in the MSARMs as covariates. This extension allows identification of potential underlying drivers of temporal rainfall-runoff and solute dynamics. We applied the MSAR model framework to streamflow and conservative tracer (deuterium and oxygen-18) time series from an intensively monitored 2.3 km2 experimental catchment in eastern Scotland. Statistical time series analysis, in the form of MSARMs, suggested that the streamflow and isotope tracer time series are not controlled by simple linear rules. MSARMs showed that the dependence of current observations on past inputs observed by transport models often in form of the long-tailing of travel time and residence time distributions can be efficiently explained by non-stationarity either of the system input (climatic variability) and/or the complexity of catchment storage characteristics. The statistical model is also capable of reproducing short (event) and longer-term (inter-event) and wet and dry dynamical "hydrological states". These reflect the non-linear transport mechanisms of flow pathways induced by transient climatic and hydrological variables and modified by catchment characteristics. We conclude that MSARMs are a powerful tool to analyze the temporal dynamics of hydrological data, allowing for explicit integration of non-stationary, non-linear and non-Normal characteristics.
NASA Technical Reports Server (NTRS)
Yim, John T.
2017-01-01
A survey of low energy xenon ion impact sputter yields was conducted to provide a more coherent baseline set of sputter yield data and accompanying fits for electric propulsion integration. Data uncertainties are discussed and different available curve fit formulas are assessed for their general suitability. A Bayesian parameter fitting approach is used with a Markov chain Monte Carlo method to provide estimates for the fitting parameters while characterizing the uncertainties for the resulting yield curves.
pytc: Open-Source Python Software for Global Analyses of Isothermal Titration Calorimetry Data.
Duvvuri, Hiranmayi; Wheeler, Lucas C; Harms, Michael J
2018-05-08
Here we describe pytc, an open-source Python package for global fits of thermodynamic models to multiple isothermal titration calorimetry experiments. Key features include simplicity, the ability to implement new thermodynamic models, a robust maximum likelihood fitter, a fast Bayesian Markov-Chain Monte Carlo sampler, rigorous implementation, extensive documentation, and full cross-platform compatibility. pytc fitting can be done using an application program interface or via a graphical user interface. It is available for download at https://github.com/harmslab/pytc .
Modeling and Bayesian parameter estimation for shape memory alloy bending actuators
NASA Astrophysics Data System (ADS)
Crews, John H.; Smith, Ralph C.
2012-04-01
In this paper, we employ a homogenized energy model (HEM) for shape memory alloy (SMA) bending actuators. Additionally, we utilize a Bayesian method for quantifying parameter uncertainty. The system consists of a SMA wire attached to a flexible beam. As the actuator is heated, the beam bends, providing endoscopic motion. The model parameters are fit to experimental data using an ordinary least-squares approach. The uncertainty in the fit model parameters is then quantified using Markov Chain Monte Carlo (MCMC) methods. The MCMC algorithm provides bounds on the parameters, which will ultimately be used in robust control algorithms. One purpose of the paper is to test the feasibility of the Random Walk Metropolis algorithm, the MCMC method used here.
Incorporating approximation error in surrogate based Bayesian inversion
NASA Astrophysics Data System (ADS)
Zhang, J.; Zeng, L.; Li, W.; Wu, L.
2015-12-01
There are increasing interests in applying surrogates for inverse Bayesian modeling to reduce repetitive evaluations of original model. In this way, the computational cost is expected to be saved. However, the approximation error of surrogate model is usually overlooked. This is partly because that it is difficult to evaluate the approximation error for many surrogates. Previous studies have shown that, the direct combination of surrogates and Bayesian methods (e.g., Markov Chain Monte Carlo, MCMC) may lead to biased estimations when the surrogate cannot emulate the highly nonlinear original system. This problem can be alleviated by implementing MCMC in a two-stage manner. However, the computational cost is still high since a relatively large number of original model simulations are required. In this study, we illustrate the importance of incorporating approximation error in inverse Bayesian modeling. Gaussian process (GP) is chosen to construct the surrogate for its convenience in approximation error evaluation. Numerical cases of Bayesian experimental design and parameter estimation for contaminant source identification are used to illustrate this idea. It is shown that, once the surrogate approximation error is well incorporated into Bayesian framework, promising results can be obtained even when the surrogate is directly used, and no further original model simulations are required.
Bayesian spatial transformation models with applications in neuroimaging data
Miranda, Michelle F.; Zhu, Hongtu; Ibrahim, Joseph G.
2013-01-01
Summary The aim of this paper is to develop a class of spatial transformation models (STM) to spatially model the varying association between imaging measures in a three-dimensional (3D) volume (or 2D surface) and a set of covariates. Our STMs include a varying Box-Cox transformation model for dealing with the issue of non-Gaussian distributed imaging data and a Gaussian Markov Random Field model for incorporating spatial smoothness of the imaging data. Posterior computation proceeds via an efficient Markov chain Monte Carlo algorithm. Simulations and real data analysis demonstrate that the STM significantly outperforms the voxel-wise linear model with Gaussian noise in recovering meaningful geometric patterns. Our STM is able to reveal important brain regions with morphological changes in children with attention deficit hyperactivity disorder. PMID:24128143
AN OPTIMAL MAINTENANCE MANAGEMENT MODEL FOR AIRPORT CONCRETE PAVEMENT
NASA Astrophysics Data System (ADS)
Shimomura, Taizo; Fujimori, Yuji; Kaito, Kiyoyuki; Obama, Kengo; Kobayashi, Kiyoshi
In this paper, an optimal management model is formulated for the performance-based rehabilitation/maintenance contract for airport concrete pavement, whereby two types of life cycle cost risks, i.e., ground consolidation risk and concrete depreciation risk, are explicitly considered. The non-homogenous Markov chain model is formulated to represent the deterioration processes of concrete pavement which are conditional upon the ground consolidation processes. The optimal non-homogenous Markov decision model with multiple types of risk is presented to design the optimal rehabilitation/maintenance plans. And the methodology to revise the optimal rehabilitation/maintenance plans based upon the monitoring data by the Bayesian up-to-dating rules. The validity of the methodology presented in this paper is examined based upon the case studies carried out for the H airport.
Bayesian analysis of biogeography when the number of areas is large.
Landis, Michael J; Matzke, Nicholas J; Moore, Brian R; Huelsenbeck, John P
2013-11-01
Historical biogeography is increasingly studied from an explicitly statistical perspective, using stochastic models to describe the evolution of species range as a continuous-time Markov process of dispersal between and extinction within a set of discrete geographic areas. The main constraint of these methods is the computational limit on the number of areas that can be specified. We propose a Bayesian approach for inferring biogeographic history that extends the application of biogeographic models to the analysis of more realistic problems that involve a large number of areas. Our solution is based on a "data-augmentation" approach, in which we first populate the tree with a history of biogeographic events that is consistent with the observed species ranges at the tips of the tree. We then calculate the likelihood of a given history by adopting a mechanistic interpretation of the instantaneous-rate matrix, which specifies both the exponential waiting times between biogeographic events and the relative probabilities of each biogeographic change. We develop this approach in a Bayesian framework, marginalizing over all possible biogeographic histories using Markov chain Monte Carlo (MCMC). Besides dramatically increasing the number of areas that can be accommodated in a biogeographic analysis, our method allows the parameters of a given biogeographic model to be estimated and different biogeographic models to be objectively compared. Our approach is implemented in the program, BayArea.
Bettenbühl, Mario; Rusconi, Marco; Engbert, Ralf; Holschneider, Matthias
2012-01-01
Complex biological dynamics often generate sequences of discrete events which can be described as a Markov process. The order of the underlying Markovian stochastic process is fundamental for characterizing statistical dependencies within sequences. As an example for this class of biological systems, we investigate the Markov order of sequences of microsaccadic eye movements from human observers. We calculate the integrated likelihood of a given sequence for various orders of the Markov process and use this in a Bayesian framework for statistical inference on the Markov order. Our analysis shows that data from most participants are best explained by a first-order Markov process. This is compatible with recent findings of a statistical coupling of subsequent microsaccade orientations. Our method might prove to be useful for a broad class of biological systems.
Minsley, Burke J.
2011-01-01
A meaningful interpretation of geophysical measurements requires an assessment of the space of models that are consistent with the data, rather than just a single, ‘best’ model which does not convey information about parameter uncertainty. For this purpose, a trans-dimensional Bayesian Markov chain Monte Carlo (MCMC) algorithm is developed for assessing frequencydomain electromagnetic (FDEM) data acquired from airborne or ground-based systems. By sampling the distribution of models that are consistent with measured data and any prior knowledge, valuable inferences can be made about parameter values such as the likely depth to an interface, the distribution of possible resistivity values as a function of depth and non-unique relationships between parameters. The trans-dimensional aspect of the algorithm allows the number of layers to be a free parameter that is controlled by the data, where models with fewer layers are inherently favoured, which provides a natural measure of parsimony and a significant degree of flexibility in parametrization. The MCMC algorithm is used with synthetic examples to illustrate how the distribution of acceptable models is affected by the choice of prior information, the system geometry and configuration and the uncertainty in the measured system elevation. An airborne FDEM data set that was acquired for the purpose of hydrogeological characterization is also studied. The results compare favorably with traditional least-squares analysis, borehole resistivity and lithology logs from the site, and also provide new information about parameter uncertainty necessary for model assessment.
Bulashevska, Alla; Eils, Roland
2006-06-14
The subcellular location of a protein is closely related to its function. It would be worthwhile to develop a method to predict the subcellular location for a given protein when only the amino acid sequence of the protein is known. Although many efforts have been made to predict subcellular location from sequence information only, there is the need for further research to improve the accuracy of prediction. A novel method called HensBC is introduced to predict protein subcellular location. HensBC is a recursive algorithm which constructs a hierarchical ensemble of classifiers. The classifiers used are Bayesian classifiers based on Markov chain models. We tested our method on six various datasets; among them are Gram-negative bacteria dataset, data for discriminating outer membrane proteins and apoptosis proteins dataset. We observed that our method can predict the subcellular location with high accuracy. Another advantage of the proposed method is that it can improve the accuracy of the prediction of some classes with few sequences in training and is therefore useful for datasets with imbalanced distribution of classes. This study introduces an algorithm which uses only the primary sequence of a protein to predict its subcellular location. The proposed recursive scheme represents an interesting methodology for learning and combining classifiers. The method is computationally efficient and competitive with the previously reported approaches in terms of prediction accuracies as empirical results indicate. The code for the software is available upon request.
Estimating the Earthquake Source Time Function by Markov Chain Monte Carlo Sampling
NASA Astrophysics Data System (ADS)
Dȩbski, Wojciech
2008-07-01
Many aspects of earthquake source dynamics like dynamic stress drop, rupture velocity and directivity, etc. are currently inferred from the source time functions obtained by a deconvolution of the propagation and recording effects from seismograms. The question of the accuracy of obtained results remains open. In this paper we address this issue by considering two aspects of the source time function deconvolution. First, we propose a new pseudo-spectral parameterization of the sought function which explicitly takes into account the physical constraints imposed on the sought functions. Such parameterization automatically excludes non-physical solutions and so improves the stability and uniqueness of the deconvolution. Secondly, we demonstrate that the Bayesian approach to the inverse problem at hand, combined with an efficient Markov Chain Monte Carlo sampling technique, is a method which allows efficient estimation of the source time function uncertainties. The key point of the approach is the description of the solution of the inverse problem by the a posteriori probability density function constructed according to the Bayesian (probabilistic) theory. Next, the Markov Chain Monte Carlo sampling technique is used to sample this function so the statistical estimator of a posteriori errors can be easily obtained with minimal additional computational effort with respect to modern inversion (optimization) algorithms. The methodological considerations are illustrated by a case study of the mining-induced seismic event of the magnitude M L ≈3.1 that occurred at Rudna (Poland) copper mine. The seismic P-wave records were inverted for the source time functions, using the proposed algorithm and the empirical Green function technique to approximate Green functions. The obtained solutions seem to suggest some complexity of the rupture process with double pulses of energy release. However, the error analysis shows that the hypothesis of source complexity is not justified at the 95% confidence level. On the basis of the analyzed event we also show that the separation of the source inversion into two steps introduces limitations on the completeness of the a posteriori error analysis.
Robust Spectral Unmixing of Sparse Multispectral Lidar Waveforms using Gamma Markov Random Fields
Altmann, Yoann; Maccarone, Aurora; McCarthy, Aongus; ...
2017-05-10
Here, this paper presents a new Bayesian spectral un-mixing algorithm to analyse remote scenes sensed via sparse multispectral Lidar measurements. To a first approximation, in the presence of a target, each Lidar waveform consists of a main peak, whose position depends on the target distance and whose amplitude depends on the wavelength of the laser source considered (i.e, on the target reflectivity). Besides, these temporal responses are usually assumed to be corrupted by Poisson noise in the low photon count regime. When considering multiple wavelengths, it becomes possible to use spectral information in order to identify and quantify the mainmore » materials in the scene, in addition to estimation of the Lidar-based range profiles. Due to its anomaly detection capability, the proposed hierarchical Bayesian model, coupled with an efficient Markov chain Monte Carlo algorithm, allows robust estimation of depth images together with abundance and outlier maps associated with the observed 3D scene. The proposed methodology is illustrated via experiments conducted with real multispectral Lidar data acquired in a controlled environment. The results demonstrate the possibility to unmix spectral responses constructed from extremely sparse photon counts (less than 10 photons per pixel and band).« less
NASA Astrophysics Data System (ADS)
Hu, Zixi; Yao, Zhewei; Li, Jinglai
2017-03-01
Many scientific and engineering problems require to perform Bayesian inference for unknowns of infinite dimension. In such problems, many standard Markov Chain Monte Carlo (MCMC) algorithms become arbitrary slow under the mesh refinement, which is referred to as being dimension dependent. To this end, a family of dimensional independent MCMC algorithms, known as the preconditioned Crank-Nicolson (pCN) methods, were proposed to sample the infinite dimensional parameters. In this work we develop an adaptive version of the pCN algorithm, where the covariance operator of the proposal distribution is adjusted based on sampling history to improve the simulation efficiency. We show that the proposed algorithm satisfies an important ergodicity condition under some mild assumptions. Finally we provide numerical examples to demonstrate the performance of the proposed method.
NASA Astrophysics Data System (ADS)
Vrugt, J. A.
2012-12-01
In the past decade much progress has been made in the treatment of uncertainty in earth systems modeling. Whereas initial approaches has focused mostly on quantification of parameter and predictive uncertainty, recent methods attempt to disentangle the effects of parameter, forcing (input) data, model structural and calibration data errors. In this talk I will highlight some of our recent work involving theory, concepts and applications of Bayesian parameter and/or state estimation. In particular, new methods for sequential Monte Carlo (SMC) and Markov Chain Monte Carlo (MCMC) simulation will be presented with emphasis on massively parallel distributed computing and quantification of model structural errors. The theoretical and numerical developments will be illustrated using model-data synthesis problems in hydrology, hydrogeology and geophysics.
Bayesian reconstruction of projection reconstruction NMR (PR-NMR).
Yoon, Ji Won
2014-11-01
Projection reconstruction nuclear magnetic resonance (PR-NMR) is a technique for generating multidimensional NMR spectra. A small number of projections from lower-dimensional NMR spectra are used to reconstruct the multidimensional NMR spectra. In our previous work, it was shown that multidimensional NMR spectra are efficiently reconstructed using peak-by-peak based reversible jump Markov chain Monte Carlo (RJMCMC) algorithm. We propose an extended and generalized RJMCMC algorithm replacing a simple linear model with a linear mixed model to reconstruct close NMR spectra into true spectra. This statistical method generates samples in a Bayesian scheme. Our proposed algorithm is tested on a set of six projections derived from the three-dimensional 700 MHz HNCO spectrum of a protein HasA. Copyright © 2014 Elsevier Ltd. All rights reserved.
Inference on cancer screening exam accuracy using population-level administrative data.
Jiang, H; Brown, P E; Walter, S D
2016-01-15
This paper develops a model for cancer screening and cancer incidence data, accommodating the partially unobserved disease status, clustered data structures, general covariate effects, and dependence between exams. The true unobserved cancer and detection status of screening participants are treated as latent variables, and a Markov Chain Monte Carlo algorithm is used to estimate the Bayesian posterior distributions of the diagnostic error rates and disease prevalence. We show how the Bayesian approach can be used to draw inferences about screening exam properties and disease prevalence while allowing for the possibility of conditional dependence between two exams. The techniques are applied to the estimation of the diagnostic accuracy of mammography and clinical breast examination using data from the Ontario Breast Screening Program in Canada. Copyright © 2015 John Wiley & Sons, Ltd.
Irreversible Local Markov Chains with Rapid Convergence towards Equilibrium.
Kapfer, Sebastian C; Krauth, Werner
2017-12-15
We study the continuous one-dimensional hard-sphere model and present irreversible local Markov chains that mix on faster time scales than the reversible heat bath or Metropolis algorithms. The mixing time scales appear to fall into two distinct universality classes, both faster than for reversible local Markov chains. The event-chain algorithm, the infinitesimal limit of one of these Markov chains, belongs to the class presenting the fastest decay. For the lattice-gas limit of the hard-sphere model, reversible local Markov chains correspond to the symmetric simple exclusion process (SEP) with periodic boundary conditions. The two universality classes for irreversible Markov chains are realized by the totally asymmetric SEP (TASEP), and by a faster variant (lifted TASEP) that we propose here. We discuss how our irreversible hard-sphere Markov chains generalize to arbitrary repulsive pair interactions and carry over to higher dimensions through the concept of lifted Markov chains and the recently introduced factorized Metropolis acceptance rule.
Irreversible Local Markov Chains with Rapid Convergence towards Equilibrium
NASA Astrophysics Data System (ADS)
Kapfer, Sebastian C.; Krauth, Werner
2017-12-01
We study the continuous one-dimensional hard-sphere model and present irreversible local Markov chains that mix on faster time scales than the reversible heat bath or Metropolis algorithms. The mixing time scales appear to fall into two distinct universality classes, both faster than for reversible local Markov chains. The event-chain algorithm, the infinitesimal limit of one of these Markov chains, belongs to the class presenting the fastest decay. For the lattice-gas limit of the hard-sphere model, reversible local Markov chains correspond to the symmetric simple exclusion process (SEP) with periodic boundary conditions. The two universality classes for irreversible Markov chains are realized by the totally asymmetric SEP (TASEP), and by a faster variant (lifted TASEP) that we propose here. We discuss how our irreversible hard-sphere Markov chains generalize to arbitrary repulsive pair interactions and carry over to higher dimensions through the concept of lifted Markov chains and the recently introduced factorized Metropolis acceptance rule.
Bayesian analysis of the flutter margin method in aeroelasticity
Khalil, Mohammad; Poirel, Dominique; Sarkar, Abhijit
2016-08-27
A Bayesian statistical framework is presented for Zimmerman and Weissenburger flutter margin method which considers the uncertainties in aeroelastic modal parameters. The proposed methodology overcomes the limitations of the previously developed least-square based estimation technique which relies on the Gaussian approximation of the flutter margin probability density function (pdf). Using the measured free-decay responses at subcritical (preflutter) airspeeds, the joint non-Gaussain posterior pdf of the modal parameters is sampled using the Metropolis–Hastings (MH) Markov chain Monte Carlo (MCMC) algorithm. The posterior MCMC samples of the modal parameters are then used to obtain the flutter margin pdfs and finally the fluttermore » speed pdf. The usefulness of the Bayesian flutter margin method is demonstrated using synthetic data generated from a two-degree-of-freedom pitch-plunge aeroelastic model. The robustness of the statistical framework is demonstrated using different sets of measurement data. In conclusion, it will be shown that the probabilistic (Bayesian) approach reduces the number of test points required in providing a flutter speed estimate for a given accuracy and precision.« less
A New Monte Carlo Method for Estimating Marginal Likelihoods.
Wang, Yu-Bo; Chen, Ming-Hui; Kuo, Lynn; Lewis, Paul O
2018-06-01
Evaluating the marginal likelihood in Bayesian analysis is essential for model selection. Estimators based on a single Markov chain Monte Carlo sample from the posterior distribution include the harmonic mean estimator and the inflated density ratio estimator. We propose a new class of Monte Carlo estimators based on this single Markov chain Monte Carlo sample. This class can be thought of as a generalization of the harmonic mean and inflated density ratio estimators using a partition weighted kernel (likelihood times prior). We show that our estimator is consistent and has better theoretical properties than the harmonic mean and inflated density ratio estimators. In addition, we provide guidelines on choosing optimal weights. Simulation studies were conducted to examine the empirical performance of the proposed estimator. We further demonstrate the desirable features of the proposed estimator with two real data sets: one is from a prostate cancer study using an ordinal probit regression model with latent variables; the other is for the power prior construction from two Eastern Cooperative Oncology Group phase III clinical trials using the cure rate survival model with similar objectives.
Bayesian spatial transformation models with applications in neuroimaging data.
Miranda, Michelle F; Zhu, Hongtu; Ibrahim, Joseph G
2013-12-01
The aim of this article is to develop a class of spatial transformation models (STM) to spatially model the varying association between imaging measures in a three-dimensional (3D) volume (or 2D surface) and a set of covariates. The proposed STM include a varying Box-Cox transformation model for dealing with the issue of non-Gaussian distributed imaging data and a Gaussian Markov random field model for incorporating spatial smoothness of the imaging data. Posterior computation proceeds via an efficient Markov chain Monte Carlo algorithm. Simulations and real data analysis demonstrate that the STM significantly outperforms the voxel-wise linear model with Gaussian noise in recovering meaningful geometric patterns. Our STM is able to reveal important brain regions with morphological changes in children with attention deficit hyperactivity disorder. © 2013, The International Biometric Society.
Development of uncertainty-based work injury model using Bayesian structural equation modelling.
Chatterjee, Snehamoy
2014-01-01
This paper proposed a Bayesian method-based structural equation model (SEM) of miners' work injury for an underground coal mine in India. The environmental and behavioural variables for work injury were identified and causal relationships were developed. For Bayesian modelling, prior distributions of SEM parameters are necessary to develop the model. In this paper, two approaches were adopted to obtain prior distribution for factor loading parameters and structural parameters of SEM. In the first approach, the prior distributions were considered as a fixed distribution function with specific parameter values, whereas, in the second approach, prior distributions of the parameters were generated from experts' opinions. The posterior distributions of these parameters were obtained by applying Bayesian rule. The Markov Chain Monte Carlo sampling in the form Gibbs sampling was applied for sampling from the posterior distribution. The results revealed that all coefficients of structural and measurement model parameters are statistically significant in experts' opinion-based priors, whereas, two coefficients are not statistically significant when fixed prior-based distributions are applied. The error statistics reveals that Bayesian structural model provides reasonably good fit of work injury with high coefficient of determination (0.91) and less mean squared error as compared to traditional SEM.
Open-source Software for Exoplanet Atmospheric Modeling
NASA Astrophysics Data System (ADS)
Cubillos, Patricio; Blecic, Jasmina; Harrington, Joseph
2018-01-01
I will present a suite of self-standing open-source tools to model and retrieve exoplanet spectra implemented for Python. These include: (1) a Bayesian-statistical package to run Levenberg-Marquardt optimization and Markov-chain Monte Carlo posterior sampling, (2) a package to compress line-transition data from HITRAN or Exomol without loss of information, (3) a package to compute partition functions for HITRAN molecules, (4) a package to compute collision-induced absorption, and (5) a package to produce radiative-transfer spectra of transit and eclipse exoplanet observations and atmospheric retrievals.
Bayesian Inference for Time Trends in Parameter Values using Weighted Evidence Sets
DOE Office of Scientific and Technical Information (OSTI.GOV)
D. L. Kelly; A. Malkhasyan
2010-09-01
There is a nearly ubiquitous assumption in PSA that parameter values are at least piecewise-constant in time. As a result, Bayesian inference tends to incorporate many years of plant operation, over which there have been significant changes in plant operational and maintenance practices, plant management, etc. These changes can cause significant changes in parameter values over time; however, failure to perform Bayesian inference in the proper time-dependent framework can mask these changes. Failure to question the assumption of constant parameter values, and failure to perform Bayesian inference in the proper time-dependent framework were noted as important issues in NUREG/CR-6813, performedmore » for the U. S. Nuclear Regulatory Commission’s Advisory Committee on Reactor Safeguards in 2003. That report noted that “in-dustry lacks tools to perform time-trend analysis with Bayesian updating.” This paper describes an applica-tion of time-dependent Bayesian inference methods developed for the European Commission Ageing PSA Network. These methods utilize open-source software, implementing Markov chain Monte Carlo sampling. The paper also illustrates an approach to incorporating multiple sources of data via applicability weighting factors that address differences in key influences, such as vendor, component boundaries, conditions of the operating environment, etc.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dana L. Kelly; Albert Malkhasyan
2010-06-01
There is a nearly ubiquitous assumption in PSA that parameter values are at least piecewise-constant in time. As a result, Bayesian inference tends to incorporate many years of plant operation, over which there have been significant changes in plant operational and maintenance practices, plant management, etc. These changes can cause significant changes in parameter values over time; however, failure to perform Bayesian inference in the proper time-dependent framework can mask these changes. Failure to question the assumption of constant parameter values, and failure to perform Bayesian inference in the proper time-dependent framework were noted as important issues in NUREG/CR-6813, performedmore » for the U. S. Nuclear Regulatory Commission’s Advisory Committee on Reactor Safeguards in 2003. That report noted that “industry lacks tools to perform time-trend analysis with Bayesian updating.” This paper describes an application of time-dependent Bayesian inference methods developed for the European Commission Ageing PSA Network. These methods utilize open-source software, implementing Markov chain Monte Carlo sampling. The paper also illustrates the development of a generic prior distribution, which incorporates multiple sources of generic data via weighting factors that address differences in key influences, such as vendor, component boundaries, conditions of the operating environment, etc.« less
Liang, Li-Jung; Huang, David; Brecht, Mary-Lynn; Hser, Yih-ing
2010-01-01
Studies examining differences in mortality among long-term drug users have been limited. In this paper, we introduce a Bayesian framework that jointly models survival data using a Weibull proportional hazard model with frailty, and substance and alcohol data using mixed-effects models, to examine differences in mortality among heroin, cocaine, and methamphetamine users from five long-term follow-up studies. The traditional approach to analyzing combined survival data from numerous studies assumes that the studies are homogeneous, thus the estimates may be biased due to unobserved heterogeneity among studies. Our approach allows us to structurally combine the data from different studies while accounting for correlation among subjects within each study. Markov chain Monte Carlo facilitates the implementation of Bayesian analyses. Despite the complexity of the model, our approach is relatively straightforward to implement using WinBUGS. We demonstrate our joint modeling approach to the combined data and discuss the results from both approaches. PMID:21052518
NASA Astrophysics Data System (ADS)
Sadegh, Mojtaba; Ragno, Elisa; AghaKouchak, Amir
2017-06-01
We present a newly developed Multivariate Copula Analysis Toolbox (MvCAT) which includes a wide range of copula families with different levels of complexity. MvCAT employs a Bayesian framework with a residual-based Gaussian likelihood function for inferring copula parameters and estimating the underlying uncertainties. The contribution of this paper is threefold: (a) providing a Bayesian framework to approximate the predictive uncertainties of fitted copulas, (b) introducing a hybrid-evolution Markov Chain Monte Carlo (MCMC) approach designed for numerical estimation of the posterior distribution of copula parameters, and (c) enabling the community to explore a wide range of copulas and evaluate them relative to the fitting uncertainties. We show that the commonly used local optimization methods for copula parameter estimation often get trapped in local minima. The proposed method, however, addresses this limitation and improves describing the dependence structure. MvCAT also enables evaluation of uncertainties relative to the length of record, which is fundamental to a wide range of applications such as multivariate frequency analysis.
Validation of the thermal challenge problem using Bayesian Belief Networks.
DOE Office of Scientific and Technical Information (OSTI.GOV)
McFarland, John; Swiler, Laura Painton
The thermal challenge problem has been developed at Sandia National Laboratories as a testbed for demonstrating various types of validation approaches and prediction methods. This report discusses one particular methodology to assess the validity of a computational model given experimental data. This methodology is based on Bayesian Belief Networks (BBNs) and can incorporate uncertainty in experimental measurements, in physical quantities, and model uncertainties. The approach uses the prior and posterior distributions of model output to compute a validation metric based on Bayesian hypothesis testing (a Bayes' factor). This report discusses various aspects of the BBN, specifically in the context ofmore » the thermal challenge problem. A BBN is developed for a given set of experimental data in a particular experimental configuration. The development of the BBN and the method for ''solving'' the BBN to develop the posterior distribution of model output through Monte Carlo Markov Chain sampling is discussed in detail. The use of the BBN to compute a Bayes' factor is demonstrated.« less
NASA Astrophysics Data System (ADS)
Hadjidoukas, P. E.; Angelikopoulos, P.; Papadimitriou, C.; Koumoutsakos, P.
2015-03-01
We present Π4U, an extensible framework, for non-intrusive Bayesian Uncertainty Quantification and Propagation (UQ+P) of complex and computationally demanding physical models, that can exploit massively parallel computer architectures. The framework incorporates Laplace asymptotic approximations as well as stochastic algorithms, along with distributed numerical differentiation and task-based parallelism for heterogeneous clusters. Sampling is based on the Transitional Markov Chain Monte Carlo (TMCMC) algorithm and its variants. The optimization tasks associated with the asymptotic approximations are treated via the Covariance Matrix Adaptation Evolution Strategy (CMA-ES). A modified subset simulation method is used for posterior reliability measurements of rare events. The framework accommodates scheduling of multiple physical model evaluations based on an adaptive load balancing library and shows excellent scalability. In addition to the software framework, we also provide guidelines as to the applicability and efficiency of Bayesian tools when applied to computationally demanding physical models. Theoretical and computational developments are demonstrated with applications drawn from molecular dynamics, structural dynamics and granular flow.
NASA Astrophysics Data System (ADS)
Frost, Andrew J.; Thyer, Mark A.; Srikanthan, R.; Kuczera, George
2007-07-01
SummaryMulti-site simulation of hydrological data are required for drought risk assessment of large multi-reservoir water supply systems. In this paper, a general Bayesian framework is presented for the calibration and evaluation of multi-site hydrological data at annual timescales. Models included within this framework are the hidden Markov model (HMM) and the widely used lag-1 autoregressive (AR(1)) model. These models are extended by the inclusion of a Box-Cox transformation and a spatial correlation function in a multi-site setting. Parameter uncertainty is evaluated using Markov chain Monte Carlo techniques. Models are evaluated by their ability to reproduce a range of important extreme statistics and compared using Bayesian model selection techniques which evaluate model probabilities. The case study, using multi-site annual rainfall data situated within catchments which contribute to Sydney's main water supply, provided the following results: Firstly, in terms of model probabilities and diagnostics, the inclusion of the Box-Cox transformation was preferred. Secondly the AR(1) and HMM performed similarly, while some other proposed AR(1)/HMM models with regionally pooled parameters had greater posterior probability than these two models. The practical significance of parameter and model uncertainty was illustrated using a case study involving drought security analysis for urban water supply. It was shown that ignoring parameter uncertainty resulted in a significant overestimate of reservoir yield and an underestimation of system vulnerability to severe drought.
Minsley, B.J.
2011-01-01
A meaningful interpretation of geophysical measurements requires an assessment of the space of models that are consistent with the data, rather than just a single, 'best' model which does not convey information about parameter uncertainty. For this purpose, a trans-dimensional Bayesian Markov chain Monte Carlo (MCMC) algorithm is developed for assessing frequency-domain electromagnetic (FDEM) data acquired from airborne or ground-based systems. By sampling the distribution of models that are consistent with measured data and any prior knowledge, valuable inferences can be made about parameter values such as the likely depth to an interface, the distribution of possible resistivity values as a function of depth and non-unique relationships between parameters. The trans-dimensional aspect of the algorithm allows the number of layers to be a free parameter that is controlled by the data, where models with fewer layers are inherently favoured, which provides a natural measure of parsimony and a significant degree of flexibility in parametrization. The MCMC algorithm is used with synthetic examples to illustrate how the distribution of acceptable models is affected by the choice of prior information, the system geometry and configuration and the uncertainty in the measured system elevation. An airborne FDEM data set that was acquired for the purpose of hydrogeological characterization is also studied. The results compare favourably with traditional least-squares analysis, borehole resistivity and lithology logs from the site, and also provide new information about parameter uncertainty necessary for model assessment. ?? 2011. Geophysical Journal International ?? 2011 RAS.
Ma, Jianzhong; Amos, Christopher I; Warwick Daw, E
2007-09-01
Although extended pedigrees are often sampled through probands with extreme levels of a quantitative trait, Markov chain Monte Carlo (MCMC) methods for segregation and linkage analysis have not been able to perform ascertainment corrections. Further, the extent to which ascertainment of pedigrees leads to biases in the estimation of segregation and linkage parameters has not been previously studied for MCMC procedures. In this paper, we studied these issues with a Bayesian MCMC approach for joint segregation and linkage analysis, as implemented in the package Loki. We first simulated pedigrees ascertained through individuals with extreme values of a quantitative trait in spirit of the sequential sampling theory of Cannings and Thompson [Cannings and Thompson [1977] Clin. Genet. 12:208-212]. Using our simulated data, we detected no bias in estimates of the trait locus location. However, in addition to allele frequencies, when the ascertainment threshold was higher than or close to the true value of the highest genotypic mean, bias was also found in the estimation of this parameter. When there were multiple trait loci, this bias destroyed the additivity of the effects of the trait loci, and caused biases in the estimation all genotypic means when a purely additive model was used for analyzing the data. To account for pedigree ascertainment with sequential sampling, we developed a Bayesian ascertainment approach and implemented Metropolis-Hastings updates in the MCMC samplers used in Loki. Ascertainment correction greatly reduced biases in parameter estimates. Our method is designed for multiple, but a fixed number of trait loci. Copyright (c) 2007 Wiley-Liss, Inc.
NASA Astrophysics Data System (ADS)
Chen, Huaizhen; Pan, Xinpeng; Ji, Yuxin; Zhang, Guangzhi
2017-08-01
A system of aligned vertical fractures and fine horizontal shale layers combine to form equivalent orthorhombic media. Weak anisotropy parameters and fracture weaknesses play an important role in the description of orthorhombic anisotropy (OA). We propose a novel approach of utilizing seismic reflection amplitudes to estimate weak anisotropy parameters and fracture weaknesses from observed seismic data, based on azimuthal elastic impedance (EI). We first propose perturbation in stiffness matrix in terms of weak anisotropy parameters and fracture weaknesses, and using the perturbation and scattering function, we derive PP-wave reflection coefficient and azimuthal EI for the case of an interface separating two OA media. Then we demonstrate an approach to first use a model constrained damped least-squares algorithm to estimate azimuthal EI from partially incidence-phase-angle-stack seismic reflection data at different azimuths, and then extract weak anisotropy parameters and fracture weaknesses from the estimated azimuthal EI using a Bayesian Markov Chain Monte Carlo inversion method. In addition, a new procedure to construct rock physics effective model is presented to estimate weak anisotropy parameters and fracture weaknesses from well log interpretation results (minerals and their volumes, porosity, saturation, fracture density, etc.). Tests on synthetic and real data indicate that unknown parameters including elastic properties (P- and S-wave impedances and density), weak anisotropy parameters and fracture weaknesses can be estimated stably in the case of seismic data containing a moderate noise, and our approach can make a reasonable estimation of anisotropy in a fractured shale reservoir.
On a Result for Finite Markov Chains
ERIC Educational Resources Information Center
Kulathinal, Sangita; Ghosh, Lagnojita
2006-01-01
In an undergraduate course on stochastic processes, Markov chains are discussed in great detail. Textbooks on stochastic processes provide interesting properties of finite Markov chains. This note discusses one such property regarding the number of steps in which a state is reachable or accessible from another state in a finite Markov chain with M…
Comparing nonparametric Bayesian tree priors for clonal reconstruction of tumors.
Deshwar, Amit G; Vembu, Shankar; Morris, Quaid
2015-01-01
Statistical machine learning methods, especially nonparametric Bayesian methods, have become increasingly popular to infer clonal population structure of tumors. Here we describe the treeCRP, an extension of the Chinese restaurant process (CRP), a popular construction used in nonparametric mixture models, to infer the phylogeny and genotype of major subclonal lineages represented in the population of cancer cells. We also propose new split-merge updates tailored to the subclonal reconstruction problem that improve the mixing time of Markov chains. In comparisons with the tree-structured stick breaking prior used in PhyloSub, we demonstrate superior mixing and running time using the treeCRP with our new split-merge procedures. We also show that given the same number of samples, TSSB and treeCRP have similar ability to recover the subclonal structure of a tumor…
Traffic Video Image Segmentation Model Based on Bayesian and Spatio-Temporal Markov Random Field
NASA Astrophysics Data System (ADS)
Zhou, Jun; Bao, Xu; Li, Dawei; Yin, Yongwen
2017-10-01
Traffic video image is a kind of dynamic image and its background and foreground is changed at any time, which results in the occlusion. In this case, using the general method is more difficult to get accurate image segmentation. A segmentation algorithm based on Bayesian and Spatio-Temporal Markov Random Field is put forward, which respectively build the energy function model of observation field and label field to motion sequence image with Markov property, then according to Bayesian' rule, use the interaction of label field and observation field, that is the relationship of label field’s prior probability and observation field’s likelihood probability, get the maximum posterior probability of label field’s estimation parameter, use the ICM model to extract the motion object, consequently the process of segmentation is finished. Finally, the segmentation methods of ST - MRF and the Bayesian combined with ST - MRF were analyzed. Experimental results: the segmentation time in Bayesian combined with ST-MRF algorithm is shorter than in ST-MRF, and the computing workload is small, especially in the heavy traffic dynamic scenes the method also can achieve better segmentation effect.
Zhang, Hua; Huo, Mingdong; Chao, Jianqian; Liu, Pei
2016-01-01
Hepatitis B virus (HBV) infection is a major problem for public health; timely antiviral treatment can significantly prevent the progression of liver damage from HBV by slowing down or stopping the virus from reproducing. In the study we applied Bayesian approach to cost-effectiveness analysis, using Markov Chain Monte Carlo (MCMC) simulation methods for the relevant evidence input into the model to evaluate cost-effectiveness of entecavir (ETV) and lamivudine (LVD) therapy for chronic hepatitis B (CHB) in Jiangsu, China, thus providing information to the public health system in the CHB therapy. Eight-stage Markov model was developed, a hypothetical cohort of 35-year-old HBeAg-positive patients with CHB was entered into the model. Treatment regimens were LVD100mg daily and ETV 0.5 mg daily. The transition parameters were derived either from systematic reviews of the literature or from previous economic studies. The outcome measures were life-years, quality-adjusted lifeyears (QALYs), and expected costs associated with the treatments and disease progression. For the Bayesian models all the analysis was implemented by using WinBUGS version 1.4. Expected cost, life expectancy, QALYs decreased with age. Cost-effectiveness increased with age. Expected cost of ETV was less than LVD, while life expectancy and QALYs were higher than that of LVD, ETV strategy was more cost-effective. Costs and benefits of the Monte Carlo simulation were very close to the results of exact form among the group, but standard deviation of each group indicated there was a big difference between individual patients. Compared with lamivudine, entecavir is the more cost-effective option. CHB patients should accept antiviral treatment as soon as possible as the lower age the more cost-effective. Monte Carlo simulation obtained costs and effectiveness distribution, indicate our Markov model is of good robustness.
Understanding Past Population Dynamics: Bayesian Coalescent-Based Modeling with Covariates
Gill, Mandev S.; Lemey, Philippe; Bennett, Shannon N.; Biek, Roman; Suchard, Marc A.
2016-01-01
Effective population size characterizes the genetic variability in a population and is a parameter of paramount importance in population genetics and evolutionary biology. Kingman’s coalescent process enables inference of past population dynamics directly from molecular sequence data, and researchers have developed a number of flexible coalescent-based models for Bayesian nonparametric estimation of the effective population size as a function of time. Major goals of demographic reconstruction include identifying driving factors of effective population size, and understanding the association between the effective population size and such factors. Building upon Bayesian nonparametric coalescent-based approaches, we introduce a flexible framework that incorporates time-varying covariates that exploit Gaussian Markov random fields to achieve temporal smoothing of effective population size trajectories. To approximate the posterior distribution, we adapt efficient Markov chain Monte Carlo algorithms designed for highly structured Gaussian models. Incorporating covariates into the demographic inference framework enables the modeling of associations between the effective population size and covariates while accounting for uncertainty in population histories. Furthermore, it can lead to more precise estimates of population dynamics. We apply our model to four examples. We reconstruct the demographic history of raccoon rabies in North America and find a significant association with the spatiotemporal spread of the outbreak. Next, we examine the effective population size trajectory of the DENV-4 virus in Puerto Rico along with viral isolate count data and find similar cyclic patterns. We compare the population history of the HIV-1 CRF02_AG clade in Cameroon with HIV incidence and prevalence data and find that the effective population size is more reflective of incidence rate. Finally, we explore the hypothesis that the population dynamics of musk ox during the Late Quaternary period were related to climate change. [Coalescent; effective population size; Gaussian Markov random fields; phylodynamics; phylogenetics; population genetics. PMID:27368344
Bayesian Statistical Inference in Ion-Channel Models with Exact Missed Event Correction.
Epstein, Michael; Calderhead, Ben; Girolami, Mark A; Sivilotti, Lucia G
2016-07-26
The stochastic behavior of single ion channels is most often described as an aggregated continuous-time Markov process with discrete states. For ligand-gated channels each state can represent a different conformation of the channel protein or a different number of bound ligands. Single-channel recordings show only whether the channel is open or shut: states of equal conductance are aggregated, so transitions between them have to be inferred indirectly. The requirement to filter noise from the raw signal further complicates the modeling process, as it limits the time resolution of the data. The consequence of the reduced bandwidth is that openings or shuttings that are shorter than the resolution cannot be observed; these are known as missed events. Postulated models fitted using filtered data must therefore explicitly account for missed events to avoid bias in the estimation of rate parameters and therefore assess parameter identifiability accurately. In this article, we present the first, to our knowledge, Bayesian modeling of ion-channels with exact missed events correction. Bayesian analysis represents uncertain knowledge of the true value of model parameters by considering these parameters as random variables. This allows us to gain a full appreciation of parameter identifiability and uncertainty when estimating values for model parameters. However, Bayesian inference is particularly challenging in this context as the correction for missed events increases the computational complexity of the model likelihood. Nonetheless, we successfully implemented a two-step Markov chain Monte Carlo method that we called "BICME", which performs Bayesian inference in models of realistic complexity. The method is demonstrated on synthetic and real single-channel data from muscle nicotinic acetylcholine channels. We show that parameter uncertainty can be characterized more accurately than with maximum-likelihood methods. Our code for performing inference in these ion channel models is publicly available. Copyright © 2016 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Approximate Bayesian evaluations of measurement uncertainty
NASA Astrophysics Data System (ADS)
Possolo, Antonio; Bodnar, Olha
2018-04-01
The Guide to the Expression of Uncertainty in Measurement (GUM) includes formulas that produce an estimate of a scalar output quantity that is a function of several input quantities, and an approximate evaluation of the associated standard uncertainty. This contribution presents approximate, Bayesian counterparts of those formulas for the case where the output quantity is a parameter of the joint probability distribution of the input quantities, also taking into account any information about the value of the output quantity available prior to measurement expressed in the form of a probability distribution on the set of possible values for the measurand. The approximate Bayesian estimates and uncertainty evaluations that we present have a long history and illustrious pedigree, and provide sufficiently accurate approximations in many applications, yet are very easy to implement in practice. Differently from exact Bayesian estimates, which involve either (analytical or numerical) integrations, or Markov Chain Monte Carlo sampling, the approximations that we describe involve only numerical optimization and simple algebra. Therefore, they make Bayesian methods widely accessible to metrologists. We illustrate the application of the proposed techniques in several instances of measurement: isotopic ratio of silver in a commercial silver nitrate; odds of cryptosporidiosis in AIDS patients; height of a manometer column; mass fraction of chromium in a reference material; and potential-difference in a Zener voltage standard.
Probabilistic Damage Characterization Using the Computationally-Efficient Bayesian Approach
NASA Technical Reports Server (NTRS)
Warner, James E.; Hochhalter, Jacob D.
2016-01-01
This work presents a computationally-ecient approach for damage determination that quanti es uncertainty in the provided diagnosis. Given strain sensor data that are polluted with measurement errors, Bayesian inference is used to estimate the location, size, and orientation of damage. This approach uses Bayes' Theorem to combine any prior knowledge an analyst may have about the nature of the damage with information provided implicitly by the strain sensor data to form a posterior probability distribution over possible damage states. The unknown damage parameters are then estimated based on samples drawn numerically from this distribution using a Markov Chain Monte Carlo (MCMC) sampling algorithm. Several modi cations are made to the traditional Bayesian inference approach to provide signi cant computational speedup. First, an ecient surrogate model is constructed using sparse grid interpolation to replace a costly nite element model that must otherwise be evaluated for each sample drawn with MCMC. Next, the standard Bayesian posterior distribution is modi ed using a weighted likelihood formulation, which is shown to improve the convergence of the sampling process. Finally, a robust MCMC algorithm, Delayed Rejection Adaptive Metropolis (DRAM), is adopted to sample the probability distribution more eciently. Numerical examples demonstrate that the proposed framework e ectively provides damage estimates with uncertainty quanti cation and can yield orders of magnitude speedup over standard Bayesian approaches.
NASA Astrophysics Data System (ADS)
Saputro, D. R. S.; Amalia, F.; Widyaningsih, P.; Affan, R. C.
2018-05-01
Bayesian method is a method that can be used to estimate the parameters of multivariate multiple regression model. Bayesian method has two distributions, there are prior and posterior distributions. Posterior distribution is influenced by the selection of prior distribution. Jeffreys’ prior distribution is a kind of Non-informative prior distribution. This prior is used when the information about parameter not available. Non-informative Jeffreys’ prior distribution is combined with the sample information resulting the posterior distribution. Posterior distribution is used to estimate the parameter. The purposes of this research is to estimate the parameters of multivariate regression model using Bayesian method with Non-informative Jeffreys’ prior distribution. Based on the results and discussion, parameter estimation of β and Σ which were obtained from expected value of random variable of marginal posterior distribution function. The marginal posterior distributions for β and Σ are multivariate normal and inverse Wishart. However, in calculation of the expected value involving integral of a function which difficult to determine the value. Therefore, approach is needed by generating of random samples according to the posterior distribution characteristics of each parameter using Markov chain Monte Carlo (MCMC) Gibbs sampling algorithm.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vrugt, Jasper A; Robinson, Bruce A; Ter Braak, Cajo J F
In recent years, a strong debate has emerged in the hydrologic literature regarding what constitutes an appropriate framework for uncertainty estimation. Particularly, there is strong disagreement whether an uncertainty framework should have its roots within a proper statistical (Bayesian) context, or whether such a framework should be based on a different philosophy and implement informal measures and weaker inference to summarize parameter and predictive distributions. In this paper, we compare a formal Bayesian approach using Markov Chain Monte Carlo (MCMC) with generalized likelihood uncertainty estimation (GLUE) for assessing uncertainty in conceptual watershed modeling. Our formal Bayesian approach is implemented usingmore » the recently developed differential evolution adaptive metropolis (DREAM) MCMC scheme with a likelihood function that explicitly considers model structural, input and parameter uncertainty. Our results demonstrate that DREAM and GLUE can generate very similar estimates of total streamflow uncertainty. This suggests that formal and informal Bayesian approaches have more common ground than the hydrologic literature and ongoing debate might suggest. The main advantage of formal approaches is, however, that they attempt to disentangle the effect of forcing, parameter and model structural error on total predictive uncertainty. This is key to improving hydrologic theory and to better understand and predict the flow of water through catchments.« less
Bayesian analysis of experimental epidemics of foot-and-mouth disease.
Streftaris, George; Gibson, Gavin J.
2004-01-01
We investigate the transmission dynamics of a certain type of foot-and-mouth disease (FMD) virus under experimental conditions. Previous analyses of experimental data from FMD outbreaks in non-homogeneously mixing populations of sheep have suggested a decline in viraemic level through serial passage of the virus, but these do not take into account possible variation in the length of the chain of viral transmission for each animal, which is implicit in the non-observed transmission process. We consider a susceptible-exposed-infectious-removed non-Markovian compartmental model for partially observed epidemic processes, and we employ powerful methodology (Markov chain Monte Carlo) for statistical inference, to address epidemiological issues under a Bayesian framework that accounts for all available information and associated uncertainty in a coherent approach. The analysis allows us to investigate the posterior distribution of the hidden transmission history of the epidemic, and thus to determine the effect of the length of the infection chain on the recorded viraemic levels, based on the posterior distribution of a p-value. Parameter estimates of the epidemiological characteristics of the disease are also obtained. The results reveal a possible decline in viraemia in one of the two experimental outbreaks. Our model also suggests that individual infectivity is related to the level of viraemia. PMID:15306359
Representing Lumped Markov Chains by Minimal Polynomials over Field GF(q)
NASA Astrophysics Data System (ADS)
Zakharov, V. M.; Shalagin, S. V.; Eminov, B. F.
2018-05-01
A method has been proposed to represent lumped Markov chains by minimal polynomials over a finite field. The accuracy of representing lumped stochastic matrices, the law of lumped Markov chains depends linearly on the minimum degree of polynomials over field GF(q). The method allows constructing the realizations of lumped Markov chains on linear shift registers with a pre-defined “linear complexity”.
2D Bayesian automated tilted-ring fitting of disc galaxies in large H I galaxy surveys: 2DBAT
NASA Astrophysics Data System (ADS)
Oh, Se-Heon; Staveley-Smith, Lister; Spekkens, Kristine; Kamphuis, Peter; Koribalski, Bärbel S.
2018-01-01
We present a novel algorithm based on a Bayesian method for 2D tilted-ring analysis of disc galaxy velocity fields. Compared to the conventional algorithms based on a chi-squared minimization procedure, this new Bayesian-based algorithm suffers less from local minima of the model parameters even with highly multimodal posterior distributions. Moreover, the Bayesian analysis, implemented via Markov Chain Monte Carlo sampling, only requires broad ranges of posterior distributions of the parameters, which makes the fitting procedure fully automated. This feature will be essential when performing kinematic analysis on the large number of resolved galaxies expected to be detected in neutral hydrogen (H I) surveys with the Square Kilometre Array and its pathfinders. The so-called 2D Bayesian Automated Tilted-ring fitter (2DBAT) implements Bayesian fits of 2D tilted-ring models in order to derive rotation curves of galaxies. We explore 2DBAT performance on (a) artificial H I data cubes built based on representative rotation curves of intermediate-mass and massive spiral galaxies, and (b) Australia Telescope Compact Array H I data from the Local Volume H I Survey. We find that 2DBAT works best for well-resolved galaxies with intermediate inclinations (20° < i < 70°), complementing 3D techniques better suited to modelling inclined galaxies.
Appraisal of jump distributions in ensemble-based sampling algorithms
NASA Astrophysics Data System (ADS)
Dejanic, Sanda; Scheidegger, Andreas; Rieckermann, Jörg; Albert, Carlo
2017-04-01
Sampling Bayesian posteriors of model parameters is often required for making model-based probabilistic predictions. For complex environmental models, standard Monte Carlo Markov Chain (MCMC) methods are often infeasible because they require too many sequential model runs. Therefore, we focused on ensemble methods that use many Markov chains in parallel, since they can be run on modern cluster architectures. Little is known about how to choose the best performing sampler, for a given application. A poor choice can lead to an inappropriate representation of posterior knowledge. We assessed two different jump moves, the stretch and the differential evolution move, underlying, respectively, the software packages EMCEE and DREAM, which are popular in different scientific communities. For the assessment, we used analytical posteriors with features as they often occur in real posteriors, namely high dimensionality, strong non-linear correlations or multimodality. For posteriors with non-linear features, standard convergence diagnostics based on sample means can be insufficient. Therefore, we resorted to an entropy-based convergence measure. We assessed the samplers by means of their convergence speed, robustness and effective sample sizes. For posteriors with strongly non-linear features, we found that the stretch move outperforms the differential evolution move, w.r.t. all three aspects.
NASA Astrophysics Data System (ADS)
Zaib Jadoon, Khan; Umer Altaf, Muhammad; McCabe, Matthew Francis; Hoteit, Ibrahim; Muhammad, Nisar; Moghadas, Davood; Weihermüller, Lutz
2017-10-01
A substantial interpretation of electromagnetic induction (EMI) measurements requires quantifying optimal model parameters and uncertainty of a nonlinear inverse problem. For this purpose, an adaptive Bayesian Markov chain Monte Carlo (MCMC) algorithm is used to assess multi-orientation and multi-offset EMI measurements in an agriculture field with non-saline and saline soil. In MCMC the posterior distribution is computed using Bayes' rule. The electromagnetic forward model based on the full solution of Maxwell's equations was used to simulate the apparent electrical conductivity measured with the configurations of EMI instrument, the CMD Mini-Explorer. Uncertainty in the parameters for the three-layered earth model are investigated by using synthetic data. Our results show that in the scenario of non-saline soil, the parameters of layer thickness as compared to layers electrical conductivity are not very informative and are therefore difficult to resolve. Application of the proposed MCMC-based inversion to field measurements in a drip irrigation system demonstrates that the parameters of the model can be well estimated for the saline soil as compared to the non-saline soil, and provides useful insight about parameter uncertainty for the assessment of the model outputs.
NASA Astrophysics Data System (ADS)
Abhinav, S.; Manohar, C. S.
2018-03-01
The problem of combined state and parameter estimation in nonlinear state space models, based on Bayesian filtering methods, is considered. A novel approach, which combines Rao-Blackwellized particle filters for state estimation with Markov chain Monte Carlo (MCMC) simulations for parameter identification, is proposed. In order to ensure successful performance of the MCMC samplers, in situations involving large amount of dynamic measurement data and (or) low measurement noise, the study employs a modified measurement model combined with an importance sampling based correction. The parameters of the process noise covariance matrix are also included as quantities to be identified. The study employs the Rao-Blackwellization step at two stages: one, associated with the state estimation problem in the particle filtering step, and, secondly, in the evaluation of the ratio of likelihoods in the MCMC run. The satisfactory performance of the proposed method is illustrated on three dynamical systems: (a) a computational model of a nonlinear beam-moving oscillator system, (b) a laboratory scale beam traversed by a loaded trolley, and (c) an earthquake shake table study on a bending-torsion coupled nonlinear frame subjected to uniaxial support motion.
Annealed Importance Sampling Reversible Jump MCMC algorithms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Karagiannis, Georgios; Andrieu, Christophe
2013-03-20
It will soon be 20 years since reversible jump Markov chain Monte Carlo (RJ-MCMC) algorithms have been proposed. They have significantly extended the scope of Markov chain Monte Carlo simulation methods, offering the promise to be able to routinely tackle transdimensional sampling problems, as encountered in Bayesian model selection problems for example, in a principled and flexible fashion. Their practical efficient implementation, however, still remains a challenge. A particular difficulty encountered in practice is in the choice of the dimension matching variables (both their nature and their distribution) and the reversible transformations which allow one to define the one-to-one mappingsmore » underpinning the design of these algorithms. Indeed, even seemingly sensible choices can lead to algorithms with very poor performance. The focus of this paper is the development and performance evaluation of a method, annealed importance sampling RJ-MCMC (aisRJ), which addresses this problem by mitigating the sensitivity of RJ-MCMC algorithms to the aforementioned poor design. As we shall see the algorithm can be understood as being an “exact approximation” of an idealized MCMC algorithm that would sample from the model probabilities directly in a model selection set-up. Such an idealized algorithm may have good theoretical convergence properties, but typically cannot be implemented, and our algorithms can approximate the performance of such idealized algorithms to an arbitrary degree while not introducing any bias for any degree of approximation. Our approach combines the dimension matching ideas of RJ-MCMC with annealed importance sampling and its Markov chain Monte Carlo implementation. We illustrate the performance of the algorithm with numerical simulations which indicate that, although the approach may at first appear computationally involved, it is in fact competitive.« less
Leão, William L.; Chen, Ming-Hui
2017-01-01
A stochastic volatility-in-mean model with correlated errors using the generalized hyperbolic skew Student-t (GHST) distribution provides a robust alternative to the parameter estimation for daily stock returns in the absence of normality. An efficient Markov chain Monte Carlo (MCMC) sampling algorithm is developed for parameter estimation. The deviance information, the Bayesian predictive information and the log-predictive score criterion are used to assess the fit of the proposed model. The proposed method is applied to an analysis of the daily stock return data from the Standard & Poor’s 500 index (S&P 500). The empirical results reveal that the stochastic volatility-in-mean model with correlated errors and GH-ST distribution leads to a significant improvement in the goodness-of-fit for the S&P 500 index returns dataset over the usual normal model. PMID:29333210
Sensitivity analyses for sparse-data problems-using weakly informative bayesian priors.
Hamra, Ghassan B; MacLehose, Richard F; Cole, Stephen R
2013-03-01
Sparse-data problems are common, and approaches are needed to evaluate the sensitivity of parameter estimates based on sparse data. We propose a Bayesian approach that uses weakly informative priors to quantify sensitivity of parameters to sparse data. The weakly informative prior is based on accumulated evidence regarding the expected magnitude of relationships using relative measures of disease association. We illustrate the use of weakly informative priors with an example of the association of lifetime alcohol consumption and head and neck cancer. When data are sparse and the observed information is weak, a weakly informative prior will shrink parameter estimates toward the prior mean. Additionally, the example shows that when data are not sparse and the observed information is not weak, a weakly informative prior is not influential. Advancements in implementation of Markov Chain Monte Carlo simulation make this sensitivity analysis easily accessible to the practicing epidemiologist.
Sensitivity Analyses for Sparse-Data Problems—Using Weakly Informative Bayesian Priors
Hamra, Ghassan B.; MacLehose, Richard F.; Cole, Stephen R.
2013-01-01
Sparse-data problems are common, and approaches are needed to evaluate the sensitivity of parameter estimates based on sparse data. We propose a Bayesian approach that uses weakly informative priors to quantify sensitivity of parameters to sparse data. The weakly informative prior is based on accumulated evidence regarding the expected magnitude of relationships using relative measures of disease association. We illustrate the use of weakly informative priors with an example of the association of lifetime alcohol consumption and head and neck cancer. When data are sparse and the observed information is weak, a weakly informative prior will shrink parameter estimates toward the prior mean. Additionally, the example shows that when data are not sparse and the observed information is not weak, a weakly informative prior is not influential. Advancements in implementation of Markov Chain Monte Carlo simulation make this sensitivity analysis easily accessible to the practicing epidemiologist. PMID:23337241
Lee, Eunjee; Zhu, Hongtu; Kong, Dehan; Wang, Yalin; Giovanello, Kelly Sullivan; Ibrahim, Joseph G
2015-01-01
The aim of this paper is to develop a Bayesian functional linear Cox regression model (BFLCRM) with both functional and scalar covariates. This new development is motivated by establishing the likelihood of conversion to Alzheimer’s disease (AD) in 346 patients with mild cognitive impairment (MCI) enrolled in the Alzheimer’s Disease Neuroimaging Initiative 1 (ADNI-1) and the early markers of conversion. These 346 MCI patients were followed over 48 months, with 161 MCI participants progressing to AD at 48 months. The functional linear Cox regression model was used to establish that functional covariates including hippocampus surface morphology and scalar covariates including brain MRI volumes, cognitive performance (ADAS-Cog), and APOE status can accurately predict time to onset of AD. Posterior computation proceeds via an efficient Markov chain Monte Carlo algorithm. A simulation study is performed to evaluate the finite sample performance of BFLCRM. PMID:26900412
Upadhyay, S K; Mukherjee, Bhaswati; Gupta, Ashutosh
2009-09-01
Several models for studies related to tensile strength of materials are proposed in the literature where the size or length component has been taken to be an important factor for studying the specimens' failure behaviour. An important model, developed on the basis of cumulative damage approach, is the three-parameter extension of the Birnbaum-Saunders fatigue model that incorporates size of the specimen as an additional variable. This model is a strong competitor of the commonly used Weibull model and stands better than the traditional models, which do not incorporate the size effect. The paper considers two such cumulative damage models, checks their compatibility with a real dataset, compares them with some of the recent toolkits, and finally recommends a model, which appears an appropriate one. Throughout the study is Bayesian based on Markov chain Monte Carlo simulation.
Scheel, Ida; Ferkingstad, Egil; Frigessi, Arnoldo; Haug, Ola; Hinnerichsen, Mikkel; Meze-Hausken, Elisabeth
2013-01-01
Climate change will affect the insurance industry. We develop a Bayesian hierarchical statistical approach to explain and predict insurance losses due to weather events at a local geographic scale. The number of weather-related insurance claims is modelled by combining generalized linear models with spatially smoothed variable selection. Using Gibbs sampling and reversible jump Markov chain Monte Carlo methods, this model is fitted on daily weather and insurance data from each of the 319 municipalities which constitute southern and central Norway for the period 1997–2006. Precise out-of-sample predictions validate the model. Our results show interesting regional patterns in the effect of different weather covariates. In addition to being useful for insurance pricing, our model can be used for short-term predictions based on weather forecasts and for long-term predictions based on downscaled climate models. PMID:23396890
Furtado-Junior, I; Abrunhosa, F A; Holanda, F C A F; Tavares, M C S
2016-06-01
Fishing selectivity of the mangrove crab Ucides cordatus in the north coast of Brazil can be defined as the fisherman's ability to capture and select individuals from a certain size or sex (or a combination of these factors) which suggests an empirical selectivity. Considering this hypothesis, we calculated the selectivity curves for males and females crabs using the logit function of the logistic model in the formulation. The Bayesian inference consisted of obtaining the posterior distribution by applying the Markov chain Monte Carlo (MCMC) method to software R using the OpenBUGS, BRugs, and R2WinBUGS libraries. The estimated results of width average carapace selection for males and females compared with previous studies reporting the average width of the carapace of sexual maturity allow us to confirm the hypothesis that most mature individuals do not suffer from fishing pressure; thus, ensuring their sustainability.
Leão, William L; Abanto-Valle, Carlos A; Chen, Ming-Hui
2017-01-01
A stochastic volatility-in-mean model with correlated errors using the generalized hyperbolic skew Student-t (GHST) distribution provides a robust alternative to the parameter estimation for daily stock returns in the absence of normality. An efficient Markov chain Monte Carlo (MCMC) sampling algorithm is developed for parameter estimation. The deviance information, the Bayesian predictive information and the log-predictive score criterion are used to assess the fit of the proposed model. The proposed method is applied to an analysis of the daily stock return data from the Standard & Poor's 500 index (S&P 500). The empirical results reveal that the stochastic volatility-in-mean model with correlated errors and GH-ST distribution leads to a significant improvement in the goodness-of-fit for the S&P 500 index returns dataset over the usual normal model.
Multiscale hidden Markov models for photon-limited imaging
NASA Astrophysics Data System (ADS)
Nowak, Robert D.
1999-06-01
Photon-limited image analysis is often hindered by low signal-to-noise ratios. A novel Bayesian multiscale modeling and analysis method is developed in this paper to assist in these challenging situations. In addition to providing a very natural and useful framework for modeling an d processing images, Bayesian multiscale analysis is often much less computationally demanding compared to classical Markov random field models. This paper focuses on a probabilistic graph model called the multiscale hidden Markov model (MHMM), which captures the key inter-scale dependencies present in natural image intensities. The MHMM framework presented here is specifically designed for photon-limited imagin applications involving Poisson statistics, and applications to image intensity analysis are examined.
Probability distributions for Markov chain based quantum walks
NASA Astrophysics Data System (ADS)
Balu, Radhakrishnan; Liu, Chaobin; Venegas-Andraca, Salvador E.
2018-01-01
We analyze the probability distributions of the quantum walks induced from Markov chains by Szegedy (2004). The first part of this paper is devoted to the quantum walks induced from finite state Markov chains. It is shown that the probability distribution on the states of the underlying Markov chain is always convergent in the Cesaro sense. In particular, we deduce that the limiting distribution is uniform if the transition matrix is symmetric. In the case of a non-symmetric Markov chain, we exemplify that the limiting distribution of the quantum walk is not necessarily identical with the stationary distribution of the underlying irreducible Markov chain. The Szegedy scheme can be extended to infinite state Markov chains (random walks). In the second part, we formulate the quantum walk induced from a lazy random walk on the line. We then obtain the weak limit of the quantum walk. It is noted that the current quantum walk appears to spread faster than its counterpart-quantum walk on the line driven by the Grover coin discussed in literature. The paper closes with an outlook on possible future directions.
Estimating the Effective Sample Size of Tree Topologies from Bayesian Phylogenetic Analyses
Lanfear, Robert; Hua, Xia; Warren, Dan L.
2016-01-01
Bayesian phylogenetic analyses estimate posterior distributions of phylogenetic tree topologies and other parameters using Markov chain Monte Carlo (MCMC) methods. Before making inferences from these distributions, it is important to assess their adequacy. To this end, the effective sample size (ESS) estimates how many truly independent samples of a given parameter the output of the MCMC represents. The ESS of a parameter is frequently much lower than the number of samples taken from the MCMC because sequential samples from the chain can be non-independent due to autocorrelation. Typically, phylogeneticists use a rule of thumb that the ESS of all parameters should be greater than 200. However, we have no method to calculate an ESS of tree topology samples, despite the fact that the tree topology is often the parameter of primary interest and is almost always central to the estimation of other parameters. That is, we lack a method to determine whether we have adequately sampled one of the most important parameters in our analyses. In this study, we address this problem by developing methods to estimate the ESS for tree topologies. We combine these methods with two new diagnostic plots for assessing posterior samples of tree topologies, and compare their performance on simulated and empirical data sets. Combined, the methods we present provide new ways to assess the mixing and convergence of phylogenetic tree topologies in Bayesian MCMC analyses. PMID:27435794
Yang, Jingjing; Cox, Dennis D; Lee, Jong Soo; Ren, Peng; Choi, Taeryon
2017-12-01
Functional data are defined as realizations of random functions (mostly smooth functions) varying over a continuum, which are usually collected on discretized grids with measurement errors. In order to accurately smooth noisy functional observations and deal with the issue of high-dimensional observation grids, we propose a novel Bayesian method based on the Bayesian hierarchical model with a Gaussian-Wishart process prior and basis function representations. We first derive an induced model for the basis-function coefficients of the functional data, and then use this model to conduct posterior inference through Markov chain Monte Carlo methods. Compared to the standard Bayesian inference that suffers serious computational burden and instability in analyzing high-dimensional functional data, our method greatly improves the computational scalability and stability, while inheriting the advantage of simultaneously smoothing raw observations and estimating the mean-covariance functions in a nonparametric way. In addition, our method can naturally handle functional data observed on random or uncommon grids. Simulation and real studies demonstrate that our method produces similar results to those obtainable by the standard Bayesian inference with low-dimensional common grids, while efficiently smoothing and estimating functional data with random and high-dimensional observation grids when the standard Bayesian inference fails. In conclusion, our method can efficiently smooth and estimate high-dimensional functional data, providing one way to resolve the curse of dimensionality for Bayesian functional data analysis with Gaussian-Wishart processes. © 2017, The International Biometric Society.
Regenerative Simulation of Harris Recurrent Markov Chains.
1982-07-01
Sutijle) S. TYPE OF REPORT A PERIOD COVERED REGENERATIVE SIMULATION OF HARRIS RECURRENT Technical Report MARKOV CHAINS 14. PERFORMING ORG. REPORT NUMBER...7 AD-Ag 251 STANFORD UNIV CA DEPT OF OPERATIONS RESEARCH /s i2/ REGENERATIVE SIMULATION OF HARRIS RECURRENT MARKOV CHAINS,(U) JUL 82 P W GLYNN N0001...76-C-0578 UNtLASSIFIED TR-62 NL EhhhIhEEEEEEI EEEEEIIIIIII REGENERATIVE SIMULATION OF HARRIS RECURRENT MARKOV CHAINS by Peter W. Glynn TECHNICAL
Efficient Bayesian experimental design for contaminant source identification
NASA Astrophysics Data System (ADS)
Zhang, Jiangjiang; Zeng, Lingzao; Chen, Cheng; Chen, Dingjiang; Wu, Laosheng
2015-01-01
In this study, an efficient full Bayesian approach is developed for the optimal sampling well location design and source parameters identification of groundwater contaminants. An information measure, i.e., the relative entropy, is employed to quantify the information gain from concentration measurements in identifying unknown parameters. In this approach, the sampling locations that give the maximum expected relative entropy are selected as the optimal design. After the sampling locations are determined, a Bayesian approach based on Markov Chain Monte Carlo (MCMC) is used to estimate unknown parameters. In both the design and estimation, the contaminant transport equation is required to be solved many times to evaluate the likelihood. To reduce the computational burden, an interpolation method based on the adaptive sparse grid is utilized to construct a surrogate for the contaminant transport equation. The approximated likelihood can be evaluated directly from the surrogate, which greatly accelerates the design and estimation process. The accuracy and efficiency of our approach are demonstrated through numerical case studies. It is shown that the methods can be used to assist in both single sampling location and monitoring network design for contaminant source identifications in groundwater.
NASA Astrophysics Data System (ADS)
Silva, F. E. O. E.; Naghettini, M. D. C.; Fernandes, W.
2014-12-01
This paper evaluated the uncertainties associated with the estimation of the parameters of a conceptual rainfall-runoff model, through the use of Bayesian inference techniques by Monte Carlo simulation. The Pará River sub-basin, located in the upper São Francisco river basin, in southeastern Brazil, was selected for developing the studies. In this paper, we used the Rio Grande conceptual hydrologic model (EHR/UFMG, 2001) and the Markov Chain Monte Carlo simulation method named DREAM (VRUGT, 2008a). Two probabilistic models for the residues were analyzed: (i) the classic [Normal likelihood - r ≈ N (0, σ²)]; and (ii) a generalized likelihood (SCHOUPS & VRUGT, 2010), in which it is assumed that the differences between observed and simulated flows are correlated, non-stationary, and distributed as a Skew Exponential Power density. The assumptions made for both models were checked to ensure that the estimation of uncertainties in the parameters was not biased. The results showed that the Bayesian approach proved to be adequate to the proposed objectives, enabling and reinforcing the importance of assessing the uncertainties associated with hydrological modeling.
High-throughput Bayesian Network Learning using Heterogeneous Multicore Computers
Linderman, Michael D.; Athalye, Vivek; Meng, Teresa H.; Asadi, Narges Bani; Bruggner, Robert; Nolan, Garry P.
2017-01-01
Aberrant intracellular signaling plays an important role in many diseases. The causal structure of signal transduction networks can be modeled as Bayesian Networks (BNs), and computationally learned from experimental data. However, learning the structure of Bayesian Networks (BNs) is an NP-hard problem that, even with fast heuristics, is too time consuming for large, clinically important networks (20–50 nodes). In this paper, we present a novel graphics processing unit (GPU)-accelerated implementation of a Monte Carlo Markov Chain-based algorithm for learning BNs that is up to 7.5-fold faster than current general-purpose processor (GPP)-based implementations. The GPU-based implementation is just one of several implementations within the larger application, each optimized for a different input or machine configuration. We describe the methodology we use to build an extensible application, assembled from these variants, that can target a broad range of heterogeneous systems, e.g., GPUs, multicore GPPs. Specifically we show how we use the Merge programming model to efficiently integrate, test and intelligently select among the different potential implementations. PMID:28819655
Bayesian Inference for Source Reconstruction: A Real-World Application
Yee, Eugene; Hoffman, Ian; Ungar, Kurt
2014-01-01
This paper applies a Bayesian probabilistic inferential methodology for the reconstruction of the location and emission rate from an actual contaminant source (emission from the Chalk River Laboratories medical isotope production facility) using a small number of activity concentration measurements of a noble gas (Xenon-133) obtained from three stations that form part of the International Monitoring System radionuclide network. The sampling of the resulting posterior distribution of the source parameters is undertaken using a very efficient Markov chain Monte Carlo technique that utilizes a multiple-try differential evolution adaptive Metropolis algorithm with an archive of past states. It is shown that the principal difficulty in the reconstruction lay in the correct specification of the model errors (both scale and structure) for use in the Bayesian inferential methodology. In this context, two different measurement models for incorporation of the model error of the predicted concentrations are considered. The performance of both of these measurement models with respect to their accuracy and precision in the recovery of the source parameters is compared and contrasted. PMID:27379292
Bayesian nonparametric dictionary learning for compressed sensing MRI.
Huang, Yue; Paisley, John; Lin, Qin; Ding, Xinghao; Fu, Xueyang; Zhang, Xiao-Ping
2014-12-01
We develop a Bayesian nonparametric model for reconstructing magnetic resonance images (MRIs) from highly undersampled k -space data. We perform dictionary learning as part of the image reconstruction process. To this end, we use the beta process as a nonparametric dictionary learning prior for representing an image patch as a sparse combination of dictionary elements. The size of the dictionary and patch-specific sparsity pattern are inferred from the data, in addition to other dictionary learning variables. Dictionary learning is performed directly on the compressed image, and so is tailored to the MRI being considered. In addition, we investigate a total variation penalty term in combination with the dictionary learning model, and show how the denoising property of dictionary learning removes dependence on regularization parameters in the noisy setting. We derive a stochastic optimization algorithm based on Markov chain Monte Carlo for the Bayesian model, and use the alternating direction method of multipliers for efficiently performing total variation minimization. We present empirical results on several MRI, which show that the proposed regularization framework can improve reconstruction accuracy over other methods.
A Variational Bayes Genomic-Enabled Prediction Model with Genotype × Environment Interaction
Montesinos-López, Osval A.; Montesinos-López, Abelardo; Crossa, José; Montesinos-López, José Cricelio; Luna-Vázquez, Francisco Javier; Salinas-Ruiz, Josafhat; Herrera-Morales, José R.; Buenrostro-Mariscal, Raymundo
2017-01-01
There are Bayesian and non-Bayesian genomic models that take into account G×E interactions. However, the computational cost of implementing Bayesian models is high, and becomes almost impossible when the number of genotypes, environments, and traits is very large, while, in non-Bayesian models, there are often important and unsolved convergence problems. The variational Bayes method is popular in machine learning, and, by approximating the probability distributions through optimization, it tends to be faster than Markov Chain Monte Carlo methods. For this reason, in this paper, we propose a new genomic variational Bayes version of the Bayesian genomic model with G×E using half-t priors on each standard deviation (SD) term to guarantee highly noninformative and posterior inferences that are not sensitive to the choice of hyper-parameters. We show the complete theoretical derivation of the full conditional and the variational posterior distributions, and their implementations. We used eight experimental genomic maize and wheat data sets to illustrate the new proposed variational Bayes approximation, and compared its predictions and implementation time with a standard Bayesian genomic model with G×E. Results indicated that prediction accuracies are slightly higher in the standard Bayesian model with G×E than in its variational counterpart, but, in terms of computation time, the variational Bayes genomic model with G×E is, in general, 10 times faster than the conventional Bayesian genomic model with G×E. For this reason, the proposed model may be a useful tool for researchers who need to predict and select genotypes in several environments. PMID:28391241
A Variational Bayes Genomic-Enabled Prediction Model with Genotype × Environment Interaction.
Montesinos-López, Osval A; Montesinos-López, Abelardo; Crossa, José; Montesinos-López, José Cricelio; Luna-Vázquez, Francisco Javier; Salinas-Ruiz, Josafhat; Herrera-Morales, José R; Buenrostro-Mariscal, Raymundo
2017-06-07
There are Bayesian and non-Bayesian genomic models that take into account G×E interactions. However, the computational cost of implementing Bayesian models is high, and becomes almost impossible when the number of genotypes, environments, and traits is very large, while, in non-Bayesian models, there are often important and unsolved convergence problems. The variational Bayes method is popular in machine learning, and, by approximating the probability distributions through optimization, it tends to be faster than Markov Chain Monte Carlo methods. For this reason, in this paper, we propose a new genomic variational Bayes version of the Bayesian genomic model with G×E using half-t priors on each standard deviation (SD) term to guarantee highly noninformative and posterior inferences that are not sensitive to the choice of hyper-parameters. We show the complete theoretical derivation of the full conditional and the variational posterior distributions, and their implementations. We used eight experimental genomic maize and wheat data sets to illustrate the new proposed variational Bayes approximation, and compared its predictions and implementation time with a standard Bayesian genomic model with G×E. Results indicated that prediction accuracies are slightly higher in the standard Bayesian model with G×E than in its variational counterpart, but, in terms of computation time, the variational Bayes genomic model with G×E is, in general, 10 times faster than the conventional Bayesian genomic model with G×E. For this reason, the proposed model may be a useful tool for researchers who need to predict and select genotypes in several environments. Copyright © 2017 Montesinos-López et al.
LECTURES ON GAME THEORY, MARKOV CHAINS, AND RELATED TOPICS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Thompson, G L
1958-03-01
Notes on nine lectures delivered at Sandin Corporation in August 1957 are given. Part one contains the manuscript of a paper concerning a judging problem. Part two is concerned with finite Markov-chain theory amd discusses regular Markov chains, absorbing Markov chains, the classification of states, application to the Leontief input-output model, and semimartingales. Part three contains notes on game theory and covers matrix games, the effect of psychological attitudes on the outcomes of games, extensive games, amd matrix theory applied to mathematical economics. (auth)
Markov chains: computing limit existence and approximations with DNA.
Cardona, M; Colomer, M A; Conde, J; Miret, J M; Miró, J; Zaragoza, A
2005-09-01
We present two algorithms to perform computations over Markov chains. The first one determines whether the sequence of powers of the transition matrix of a Markov chain converges or not to a limit matrix. If it does converge, the second algorithm enables us to estimate this limit. The combination of these algorithms allows the computation of a limit using DNA computing. In this sense, we have encoded the states and the transition probabilities using strands of DNA for generating paths of the Markov chain.
Zhu, Qi; Burzykowski, Tomasz
2011-03-01
To reduce the influence of the between-spectra variability on the results of peptide quantification, one can consider the (18)O-labeling approach. Ideally, with such labeling technique, a mass shift of 4 Da of the isotopic distributions of peptides from the labeled sample is induced, which allows one to distinguish the two samples and to quantify the relative abundance of the peptides. It is worth noting, however, that the presence of small quantities of (16)O and (17)O atoms during the labeling step can cause incomplete labeling. In practice, ignoring incomplete labeling may result in the biased estimation of the relative abundance of the peptide in the compared samples. A Markov model was developed to address this issue (Zhu, Valkenborg, Burzykowski. J. Proteome Res. 9, 2669-2677, 2010). The model assumed that the peak intensities were normally distributed with heteroscedasticity using a power-of-the-mean variance funtion. Such a dependence has been observed in practice. Alternatively, we formulate the model within the Bayesian framework. This opens the possibility to further extend the model by the inclusion of random effects that can be used to capture the biological/technical variability of the peptide abundance. The operational characteristics of the model were investigated by applications to real-life mass-spectrometry data sets and a simulation study. © American Society for Mass Spectrometry, 2011
Guédon, Yann; d'Aubenton-Carafa, Yves; Thermes, Claude
2006-03-01
The most commonly used models for analysing local dependencies in DNA sequences are (high-order) Markov chains. Incorporating knowledge relative to the possible grouping of the nucleotides enables to define dedicated sub-classes of Markov chains. The problem of formulating lumpability hypotheses for a Markov chain is therefore addressed. In the classical approach to lumpability, this problem can be formulated as the determination of an appropriate state space (smaller than the original state space) such that the lumped chain defined on this state space retains the Markov property. We propose a different perspective on lumpability where the state space is fixed and the partitioning of this state space is represented by a one-to-many probabilistic function within a two-level stochastic process. Three nested classes of lumped processes can be defined in this way as sub-classes of first-order Markov chains. These lumped processes enable parsimonious reparameterizations of Markov chains that help to reveal relevant partitions of the state space. Characterizations of the lumped processes on the original transition probability matrix are derived. Different model selection methods relying either on hypothesis testing or on penalized log-likelihood criteria are presented as well as extensions to lumped processes constructed from high-order Markov chains. The relevance of the proposed approach to lumpability is illustrated by the analysis of DNA sequences. In particular, the use of lumped processes enables to highlight differences between intronic sequences and gene untranslated region sequences.
NASA Astrophysics Data System (ADS)
Arnst, M.; Abello Álvarez, B.; Ponthot, J.-P.; Boman, R.
2017-11-01
This paper is concerned with the characterization and the propagation of errors associated with data limitations in polynomial-chaos-based stochastic methods for uncertainty quantification. Such an issue can arise in uncertainty quantification when only a limited amount of data is available. When the available information does not suffice to accurately determine the probability distributions that must be assigned to the uncertain variables, the Bayesian method for assigning these probability distributions becomes attractive because it allows the stochastic model to account explicitly for insufficiency of the available information. In previous work, such applications of the Bayesian method had already been implemented by using the Metropolis-Hastings and Gibbs Markov Chain Monte Carlo (MCMC) methods. In this paper, we present an alternative implementation, which uses an alternative MCMC method built around an Itô stochastic differential equation (SDE) that is ergodic for the Bayesian posterior. We draw together from the mathematics literature a number of formal properties of this Itô SDE that lend support to its use in the implementation of the Bayesian method, and we describe its discretization, including the choice of the free parameters, by using the implicit Euler method. We demonstrate the proposed methodology on a problem of uncertainty quantification in a complex nonlinear engineering application relevant to metal forming.
Using Bayesian neural networks to classify forest scenes
NASA Astrophysics Data System (ADS)
Vehtari, Aki; Heikkonen, Jukka; Lampinen, Jouko; Juujarvi, Jouni
1998-10-01
We present results that compare the performance of Bayesian learning methods for neural networks on the task of classifying forest scenes into trees and background. Classification task is demanding due to the texture richness of the trees, occlusions of the forest scene objects and diverse lighting conditions under operation. This makes it difficult to determine which are optimal image features for the classification. A natural way to proceed is to extract many different types of potentially suitable features, and to evaluate their usefulness in later processing stages. One approach to cope with large number of features is to use Bayesian methods to control the model complexity. Bayesian learning uses a prior on model parameters, combines this with evidence from a training data, and the integrates over the resulting posterior to make predictions. With this method, we can use large networks and many features without fear of overfitting. For this classification task we compare two Bayesian learning methods for multi-layer perceptron (MLP) neural networks: (1) The evidence framework of MacKay uses a Gaussian approximation to the posterior weight distribution and maximizes with respect to hyperparameters. (2) In a Markov Chain Monte Carlo (MCMC) method due to Neal, the posterior distribution of the network parameters is numerically integrated using the MCMC method. As baseline classifiers for comparison we use (3) MLP early stop committee, (4) K-nearest-neighbor and (5) Classification And Regression Tree.
Bayesian-MCMC-based parameter estimation of stealth aircraft RCS models
NASA Astrophysics Data System (ADS)
Xia, Wei; Dai, Xiao-Xia; Feng, Yuan
2015-12-01
When modeling a stealth aircraft with low RCS (Radar Cross Section), conventional parameter estimation methods may cause a deviation from the actual distribution, owing to the fact that the characteristic parameters are estimated via directly calculating the statistics of RCS. The Bayesian-Markov Chain Monte Carlo (Bayesian-MCMC) method is introduced herein to estimate the parameters so as to improve the fitting accuracies of fluctuation models. The parameter estimations of the lognormal and the Legendre polynomial models are reformulated in the Bayesian framework. The MCMC algorithm is then adopted to calculate the parameter estimates. Numerical results show that the distribution curves obtained by the proposed method exhibit improved consistence with the actual ones, compared with those fitted by the conventional method. The fitting accuracy could be improved by no less than 25% for both fluctuation models, which implies that the Bayesian-MCMC method might be a good candidate among the optimal parameter estimation methods for stealth aircraft RCS models. Project supported by the National Natural Science Foundation of China (Grant No. 61101173), the National Basic Research Program of China (Grant No. 613206), the National High Technology Research and Development Program of China (Grant No. 2012AA01A308), the State Scholarship Fund by the China Scholarship Council (CSC), and the Oversea Academic Training Funds, and University of Electronic Science and Technology of China (UESTC).
NASA Astrophysics Data System (ADS)
Chen, Mingjie; Izady, Azizallah; Abdalla, Osman A.; Amerjeed, Mansoor
2018-02-01
Bayesian inference using Markov Chain Monte Carlo (MCMC) provides an explicit framework for stochastic calibration of hydrogeologic models accounting for uncertainties; however, the MCMC sampling entails a large number of model calls, and could easily become computationally unwieldy if the high-fidelity hydrogeologic model simulation is time consuming. This study proposes a surrogate-based Bayesian framework to address this notorious issue, and illustrates the methodology by inverse modeling a regional MODFLOW model. The high-fidelity groundwater model is approximated by a fast statistical model using Bagging Multivariate Adaptive Regression Spline (BMARS) algorithm, and hence the MCMC sampling can be efficiently performed. In this study, the MODFLOW model is developed to simulate the groundwater flow in an arid region of Oman consisting of mountain-coast aquifers, and used to run representative simulations to generate training dataset for BMARS model construction. A BMARS-based Sobol' method is also employed to efficiently calculate input parameter sensitivities, which are used to evaluate and rank their importance for the groundwater flow model system. According to sensitivity analysis, insensitive parameters are screened out of Bayesian inversion of the MODFLOW model, further saving computing efforts. The posterior probability distribution of input parameters is efficiently inferred from the prescribed prior distribution using observed head data, demonstrating that the presented BMARS-based Bayesian framework is an efficient tool to reduce parameter uncertainties of a groundwater system.
An Overview of Markov Chain Methods for the Study of Stage-Sequential Developmental Processes
ERIC Educational Resources Information Center
Kapland, David
2008-01-01
This article presents an overview of quantitative methodologies for the study of stage-sequential development based on extensions of Markov chain modeling. Four methods are presented that exemplify the flexibility of this approach: the manifest Markov model, the latent Markov model, latent transition analysis, and the mixture latent Markov model.…
Properties of the Bayesian Knowledge Tracing Model
ERIC Educational Resources Information Center
van de Sande, Brett
2013-01-01
Bayesian Knowledge Tracing is used very widely to model student learning. It comes in two different forms: The first form is the Bayesian Knowledge Tracing "hidden Markov model" which predicts the probability of correct application of a skill as a function of the number of previous opportunities to apply that skill and the model…
Computational statistics using the Bayesian Inference Engine
NASA Astrophysics Data System (ADS)
Weinberg, Martin D.
2013-09-01
This paper introduces the Bayesian Inference Engine (BIE), a general parallel, optimized software package for parameter inference and model selection. This package is motivated by the analysis needs of modern astronomical surveys and the need to organize and reuse expensive derived data. The BIE is the first platform for computational statistics designed explicitly to enable Bayesian update and model comparison for astronomical problems. Bayesian update is based on the representation of high-dimensional posterior distributions using metric-ball-tree based kernel density estimation. Among its algorithmic offerings, the BIE emphasizes hybrid tempered Markov chain Monte Carlo schemes that robustly sample multimodal posterior distributions in high-dimensional parameter spaces. Moreover, the BIE implements a full persistence or serialization system that stores the full byte-level image of the running inference and previously characterized posterior distributions for later use. Two new algorithms to compute the marginal likelihood from the posterior distribution, developed for and implemented in the BIE, enable model comparison for complex models and data sets. Finally, the BIE was designed to be a collaborative platform for applying Bayesian methodology to astronomy. It includes an extensible object-oriented and easily extended framework that implements every aspect of the Bayesian inference. By providing a variety of statistical algorithms for all phases of the inference problem, a scientist may explore a variety of approaches with a single model and data implementation. Additional technical details and download details are available from http://www.astro.umass.edu/bie. The BIE is distributed under the GNU General Public License.
Generalized species sampling priors with latent Beta reinforcements
Airoldi, Edoardo M.; Costa, Thiago; Bassetti, Federico; Leisen, Fabrizio; Guindani, Michele
2014-01-01
Many popular Bayesian nonparametric priors can be characterized in terms of exchangeable species sampling sequences. However, in some applications, exchangeability may not be appropriate. We introduce a novel and probabilistically coherent family of non-exchangeable species sampling sequences characterized by a tractable predictive probability function with weights driven by a sequence of independent Beta random variables. We compare their theoretical clustering properties with those of the Dirichlet Process and the two parameters Poisson-Dirichlet process. The proposed construction provides a complete characterization of the joint process, differently from existing work. We then propose the use of such process as prior distribution in a hierarchical Bayes modeling framework, and we describe a Markov Chain Monte Carlo sampler for posterior inference. We evaluate the performance of the prior and the robustness of the resulting inference in a simulation study, providing a comparison with popular Dirichlet Processes mixtures and Hidden Markov Models. Finally, we develop an application to the detection of chromosomal aberrations in breast cancer by leveraging array CGH data. PMID:25870462
A Bayesian model for visual space perception
NASA Technical Reports Server (NTRS)
Curry, R. E.
1972-01-01
A model for visual space perception is proposed that contains desirable features in the theories of Gibson and Brunswik. This model is a Bayesian processor of proximal stimuli which contains three important elements: an internal model of the Markov process describing the knowledge of the distal world, the a priori distribution of the state of the Markov process, and an internal model relating state to proximal stimuli. The universality of the model is discussed and it is compared with signal detection theory models. Experimental results of Kinchla are used as a special case.
A Bayesian Approach for Population Pharmacokinetic Modeling of Alcohol in Japanese Individuals.
Nemoto, Asuka; Masaaki, Matsuura; Yamaoka, Kazue
2017-01-01
Blood alcohol concentration data that were previously obtained from 34 healthy Japanese subjects with limited sampling times were reanalyzed. Characteristics of the data were that the concentrations were obtained from only the early part of the time-concentration curve. To explore significant covariates for the population pharmacokinetic analysis of alcohol by incorporating external data using a Bayesian method, and to estimate effects of the covariates. The data were analyzed using a Markov chain Monte Carlo Bayesian estimation with NONMEM 7.3 (ICON Clinical Research LLC, North Wales, Pennsylvania). Informative priors were obtained from the external study. A 1-compartment model with Michaelis-Menten elimination was used. The typical value for the apparent volume of distribution was 49.3 L at the age of 29.4 years. Volume of distribution was estimated to be 20.4 L smaller in subjects with the ALDH2*1/*2 genotype than in subjects with the ALDH2*1/*1 genotype. A population pharmacokinetic model for alcohol was updated. A Bayesian approach allowed interpretation of significant covariate relationships, even if the current dataset is not informative about all parameters. This is the first study reporting an estimate of the effect of the ALDH2 genotype in a PPK model.
NASA Astrophysics Data System (ADS)
Kiyan, Duygu; Rath, Volker; Delhaye, Robert
2017-04-01
The frequency- and time-domain airborne electromagnetic (AEM) data collected under the Tellus projects of the Geological Survey of Ireland (GSI) which represent a wealth of information on the multi-dimensional electrical structure of Ireland's near-surface. Our project, which was funded by GSI under the framework of their Short Call Research Programme, aims to develop and implement inverse techniques based on various Bayesian methods for these densely sampled data. We have developed a highly flexible toolbox using Python language for the one-dimensional inversion of AEM data along the flight lines. The computational core is based on an adapted frequency- and time-domain forward modelling core derived from the well-tested open-source code AirBeo, which was developed by the CSIRO (Australia) and the AMIRA consortium. Three different inversion methods have been implemented: (i) Tikhonov-type inversion including optimal regularisation methods (Aster el al., 2012; Zhdanov, 2015), (ii) Bayesian MAP inversion in parameter and data space (e.g. Tarantola, 2005), and (iii) Full Bayesian inversion with Markov Chain Monte Carlo (Sambridge and Mosegaard, 2002; Mosegaard and Sambridge, 2002), all including different forms of spatial constraints. The methods have been tested on synthetic and field data. This contribution will introduce the toolbox and present case studies on the AEM data from the Tellus projects.
NASA Astrophysics Data System (ADS)
Wang, Q. J.; Robertson, D. E.; Chiew, F. H. S.
2009-05-01
Seasonal forecasting of streamflows can be highly valuable for water resources management. In this paper, a Bayesian joint probability (BJP) modeling approach for seasonal forecasting of streamflows at multiple sites is presented. A Box-Cox transformed multivariate normal distribution is proposed to model the joint distribution of future streamflows and their predictors such as antecedent streamflows and El Niño-Southern Oscillation indices and other climate indicators. Bayesian inference of model parameters and uncertainties is implemented using Markov chain Monte Carlo sampling, leading to joint probabilistic forecasts of streamflows at multiple sites. The model provides a parametric structure for quantifying relationships between variables, including intersite correlations. The Box-Cox transformed multivariate normal distribution has considerable flexibility for modeling a wide range of predictors and predictands. The Bayesian inference formulated allows the use of data that contain nonconcurrent and missing records. The model flexibility and data-handling ability means that the BJP modeling approach is potentially of wide practical application. The paper also presents a number of statistical measures and graphical methods for verification of probabilistic forecasts of continuous variables. Results for streamflows at three river gauges in the Murrumbidgee River catchment in southeast Australia show that the BJP modeling approach has good forecast quality and that the fitted model is consistent with observed data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Jiangjiang; Li, Weixuan; Zeng, Lingzao
Surrogate models are commonly used in Bayesian approaches such as Markov Chain Monte Carlo (MCMC) to avoid repetitive CPU-demanding model evaluations. However, the approximation error of a surrogate may lead to biased estimations of the posterior distribution. This bias can be corrected by constructing a very accurate surrogate or implementing MCMC in a two-stage manner. Since the two-stage MCMC requires extra original model evaluations, the computational cost is still high. If the information of measurement is incorporated, a locally accurate approximation of the original model can be adaptively constructed with low computational cost. Based on this idea, we propose amore » Gaussian process (GP) surrogate-based Bayesian experimental design and parameter estimation approach for groundwater contaminant source identification problems. A major advantage of the GP surrogate is that it provides a convenient estimation of the approximation error, which can be incorporated in the Bayesian formula to avoid over-confident estimation of the posterior distribution. The proposed approach is tested with a numerical case study. Without sacrificing the estimation accuracy, the new approach achieves about 200 times of speed-up compared to our previous work using two-stage MCMC.« less
Receptive Field Inference with Localized Priors
Park, Mijung; Pillow, Jonathan W.
2011-01-01
The linear receptive field describes a mapping from sensory stimuli to a one-dimensional variable governing a neuron's spike response. However, traditional receptive field estimators such as the spike-triggered average converge slowly and often require large amounts of data. Bayesian methods seek to overcome this problem by biasing estimates towards solutions that are more likely a priori, typically those with small, smooth, or sparse coefficients. Here we introduce a novel Bayesian receptive field estimator designed to incorporate locality, a powerful form of prior information about receptive field structure. The key to our approach is a hierarchical receptive field model that flexibly adapts to localized structure in both spacetime and spatiotemporal frequency, using an inference method known as empirical Bayes. We refer to our method as automatic locality determination (ALD), and show that it can accurately recover various types of smooth, sparse, and localized receptive fields. We apply ALD to neural data from retinal ganglion cells and V1 simple cells, and find it achieves error rates several times lower than standard estimators. Thus, estimates of comparable accuracy can be achieved with substantially less data. Finally, we introduce a computationally efficient Markov Chain Monte Carlo (MCMC) algorithm for fully Bayesian inference under the ALD prior, yielding accurate Bayesian confidence intervals for small or noisy datasets. PMID:22046110
Using Games to Teach Markov Chains
ERIC Educational Resources Information Center
Johnson, Roger W.
2003-01-01
Games are promoted as examples for classroom discussion of stationary Markov chains. In a game context Markov chain terminology and results are made concrete, interesting, and entertaining. Game length for several-player games such as "Hi Ho! Cherry-O" and "Chutes and Ladders" is investigated and new, simple formulas are given. Slight…
Sampling rare fluctuations of discrete-time Markov chains
NASA Astrophysics Data System (ADS)
Whitelam, Stephen
2018-03-01
We describe a simple method that can be used to sample the rare fluctuations of discrete-time Markov chains. We focus on the case of Markov chains with well-defined steady-state measures, and derive expressions for the large-deviation rate functions (and upper bounds on such functions) for dynamical quantities extensive in the length of the Markov chain. We illustrate the method using a series of simple examples, and use it to study the fluctuations of a lattice-based model of active matter that can undergo motility-induced phase separation.
Sampling rare fluctuations of discrete-time Markov chains.
Whitelam, Stephen
2018-03-01
We describe a simple method that can be used to sample the rare fluctuations of discrete-time Markov chains. We focus on the case of Markov chains with well-defined steady-state measures, and derive expressions for the large-deviation rate functions (and upper bounds on such functions) for dynamical quantities extensive in the length of the Markov chain. We illustrate the method using a series of simple examples, and use it to study the fluctuations of a lattice-based model of active matter that can undergo motility-induced phase separation.
NASA Astrophysics Data System (ADS)
Jamaluddin, Fadhilah; Rahim, Rahela Abdul
2015-12-01
Markov Chain has been introduced since the 1913 for the purpose of studying the flow of data for a consecutive number of years of the data and also forecasting. The important feature in Markov Chain is obtaining the accurate Transition Probability Matrix (TPM). However to obtain the suitable TPM is hard especially in involving long-term modeling due to unavailability of data. This paper aims to enhance the classical Markov Chain by introducing Exponential Smoothing technique in developing the appropriate TPM.
A reversible-jump Markov chain Monte Carlo algorithm for 1D inversion of magnetotelluric data
NASA Astrophysics Data System (ADS)
Mandolesi, Eric; Ogaya, Xenia; Campanyà, Joan; Piana Agostinetti, Nicola
2018-04-01
This paper presents a new computer code developed to solve the 1D magnetotelluric (MT) inverse problem using a Bayesian trans-dimensional Markov chain Monte Carlo algorithm. MT data are sensitive to the depth-distribution of rock electric conductivity (or its reciprocal, resistivity). The solution provided is a probability distribution - the so-called posterior probability distribution (PPD) for the conductivity at depth, together with the PPD of the interface depths. The PPD is sampled via a reversible-jump Markov Chain Monte Carlo (rjMcMC) algorithm, using a modified Metropolis-Hastings (MH) rule to accept or discard candidate models along the chains. As the optimal parameterization for the inversion process is generally unknown a trans-dimensional approach is used to allow the dataset itself to indicate the most probable number of parameters needed to sample the PPD. The algorithm is tested against two simulated datasets and a set of MT data acquired in the Clare Basin (County Clare, Ireland). For the simulated datasets the correct number of conductive layers at depth and the associated electrical conductivity values is retrieved, together with reasonable estimates of the uncertainties on the investigated parameters. Results from the inversion of field measurements are compared with results obtained using a deterministic method and with well-log data from a nearby borehole. The PPD is in good agreement with the well-log data, showing as a main structure a high conductive layer associated with the Clare Shale formation. In this study, we demonstrate that our new code go beyond algorithms developend using a linear inversion scheme, as it can be used: (1) to by-pass the subjective choices in the 1D parameterizations, i.e. the number of horizontal layers in the 1D parameterization, and (2) to estimate realistic uncertainties on the retrieved parameters. The algorithm is implemented using a simple MPI approach, where independent chains run on isolated CPU, to take full advantage of parallel computer architectures. In case of a large number of data, a master/slave appoach can be used, where the master CPU samples the parameter space and the slave CPUs compute forward solutions.
Statistical Analysis of Notational AFL Data Using Continuous Time Markov Chains
Meyer, Denny; Forbes, Don; Clarke, Stephen R.
2006-01-01
Animal biologists commonly use continuous time Markov chain models to describe patterns of animal behaviour. In this paper we consider the use of these models for describing AFL football. In particular we test the assumptions for continuous time Markov chain models (CTMCs), with time, distance and speed values associated with each transition. Using a simple event categorisation it is found that a semi-Markov chain model is appropriate for this data. This validates the use of Markov Chains for future studies in which the outcomes of AFL matches are simulated. Key Points A comparison of four AFL matches suggests similarity in terms of transition probabilities for events and the mean times, distances and speeds associated with each transition. The Markov assumption appears to be valid. However, the speed, time and distance distributions associated with each transition are not exponential suggesting that semi-Markov model can be used to model and simulate play. Team identified events and directions associated with transitions are required to develop the model into a tool for the prediction of match outcomes. PMID:24357946
Statistical Analysis of Notational AFL Data Using Continuous Time Markov Chains.
Meyer, Denny; Forbes, Don; Clarke, Stephen R
2006-01-01
Animal biologists commonly use continuous time Markov chain models to describe patterns of animal behaviour. In this paper we consider the use of these models for describing AFL football. In particular we test the assumptions for continuous time Markov chain models (CTMCs), with time, distance and speed values associated with each transition. Using a simple event categorisation it is found that a semi-Markov chain model is appropriate for this data. This validates the use of Markov Chains for future studies in which the outcomes of AFL matches are simulated. Key PointsA comparison of four AFL matches suggests similarity in terms of transition probabilities for events and the mean times, distances and speeds associated with each transition.The Markov assumption appears to be valid.However, the speed, time and distance distributions associated with each transition are not exponential suggesting that semi-Markov model can be used to model and simulate play.Team identified events and directions associated with transitions are required to develop the model into a tool for the prediction of match outcomes.
Sweeney, Lisa M.; Parker, Ann; Haber, Lynne T.; Tran, C. Lang; Kuempel, Eileen D.
2015-01-01
A biomathematical model was previously developed to describe the long-term clearance and retention of particles in the lungs of coal miners. The model structure was evaluated and parameters were estimated in two data sets, one from the United States and one from the United Kingdom. The three-compartment model structure consists of deposition of inhaled particles in the alveolar region, competing processes of either clearance from the alveolar region or translocation to the lung interstitial region, and very slow, irreversible sequestration of interstitialized material in the lung-associated lymph nodes. Point estimates of model parameter values were estimated separately for the two data sets. In the current effort, Bayesian population analysis using Markov chain Monte Carlo simulation was used to recalibrate the model while improving assessments of parameter variability and uncertainty. When model parameters were calibrated simultaneously to the two data sets, agreement between the derived parameters for the two groups was very good, and the central tendency values were similar to those derived from the deterministic approach. These findings are relevant to the proposed update of the ICRP human respiratory tract model with revisions to the alveolar-interstitial region based on this long-term particle clearance and retention model. PMID:23454101
Updating categorical soil maps using limited survey data by Bayesian Markov chain cosimulation.
Li, Weidong; Zhang, Chuanrong; Dey, Dipak K; Willig, Michael R
2013-01-01
Updating categorical soil maps is necessary for providing current, higher-quality soil data to agricultural and environmental management but may not require a costly thorough field survey because latest legacy maps may only need limited corrections. This study suggests a Markov chain random field (MCRF) sequential cosimulation (Co-MCSS) method for updating categorical soil maps using limited survey data provided that qualified legacy maps are available. A case study using synthetic data demonstrates that Co-MCSS can appreciably improve simulation accuracy of soil types with both contributions from a legacy map and limited sample data. The method indicates the following characteristics: (1) if a soil type indicates no change in an update survey or it has been reclassified into another type that similarly evinces no change, it will be simply reproduced in the updated map; (2) if a soil type has changes in some places, it will be simulated with uncertainty quantified by occurrence probability maps; (3) if a soil type has no change in an area but evinces changes in other distant areas, it still can be captured in the area with unobvious uncertainty. We concluded that Co-MCSS might be a practical method for updating categorical soil maps with limited survey data.
Updating Categorical Soil Maps Using Limited Survey Data by Bayesian Markov Chain Cosimulation
Dey, Dipak K.; Willig, Michael R.
2013-01-01
Updating categorical soil maps is necessary for providing current, higher-quality soil data to agricultural and environmental management but may not require a costly thorough field survey because latest legacy maps may only need limited corrections. This study suggests a Markov chain random field (MCRF) sequential cosimulation (Co-MCSS) method for updating categorical soil maps using limited survey data provided that qualified legacy maps are available. A case study using synthetic data demonstrates that Co-MCSS can appreciably improve simulation accuracy of soil types with both contributions from a legacy map and limited sample data. The method indicates the following characteristics: (1) if a soil type indicates no change in an update survey or it has been reclassified into another type that similarly evinces no change, it will be simply reproduced in the updated map; (2) if a soil type has changes in some places, it will be simulated with uncertainty quantified by occurrence probability maps; (3) if a soil type has no change in an area but evinces changes in other distant areas, it still can be captured in the area with unobvious uncertainty. We concluded that Co-MCSS might be a practical method for updating categorical soil maps with limited survey data. PMID:24027447
SAChES: Scalable Adaptive Chain-Ensemble Sampling.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Swiler, Laura Painton; Ray, Jaideep; Ebeida, Mohamed Salah
We present the development of a parallel Markov Chain Monte Carlo (MCMC) method called SAChES, Scalable Adaptive Chain-Ensemble Sampling. This capability is targed to Bayesian calibration of com- putationally expensive simulation models. SAChES involves a hybrid of two methods: Differential Evo- lution Monte Carlo followed by Adaptive Metropolis. Both methods involve parallel chains. Differential evolution allows one to explore high-dimensional parameter spaces using loosely coupled (i.e., largely asynchronous) chains. Loose coupling allows the use of large chain ensembles, with far more chains than the number of parameters to explore. This reduces per-chain sampling burden, enables high-dimensional inversions and the usemore » of computationally expensive forward models. The large number of chains can also ameliorate the impact of silent-errors, which may affect only a few chains. The chain ensemble can also be sampled to provide an initial condition when an aberrant chain is re-spawned. Adaptive Metropolis takes the best points from the differential evolution and efficiently hones in on the poste- rior density. The multitude of chains in SAChES is leveraged to (1) enable efficient exploration of the parameter space; and (2) ensure robustness to silent errors which may be unavoidable in extreme-scale computational platforms of the future. This report outlines SAChES, describes four papers that are the result of the project, and discusses some additional results.« less
Fast-slow asymptotics for a Markov chain model of fast sodium current
NASA Astrophysics Data System (ADS)
Starý, Tomáš; Biktashev, Vadim N.
2017-09-01
We explore the feasibility of using fast-slow asymptotics to eliminate the computational stiffness of discrete-state, continuous-time deterministic Markov chain models of ionic channels underlying cardiac excitability. We focus on a Markov chain model of fast sodium current, and investigate its asymptotic behaviour with respect to small parameters identified in different ways.
Zeng, Jianyang; Roberts, Kyle E.; Zhou, Pei
2011-01-01
Abstract A major bottleneck in protein structure determination via nuclear magnetic resonance (NMR) is the lengthy and laborious process of assigning resonances and nuclear Overhauser effect (NOE) cross peaks. Recent studies have shown that accurate backbone folds can be determined using sparse NMR data, such as residual dipolar couplings (RDCs) or backbone chemical shifts. This opens a question of whether we can also determine the accurate protein side-chain conformations using sparse or unassigned NMR data. We attack this question by using unassigned nuclear Overhauser effect spectroscopy (NOESY) data, which records the through-space dipolar interactions between protons nearby in three-dimensional (3D) space. We propose a Bayesian approach with a Markov random field (MRF) model to integrate the likelihood function derived from observed experimental data, with prior information (i.e., empirical molecular mechanics energies) about the protein structures. We unify the side-chain structure prediction problem with the side-chain structure determination problem using unassigned NMR data, and apply the deterministic dead-end elimination (DEE) and A* search algorithms to provably find the global optimum solution that maximizes the posterior probability. We employ a Hausdorff-based measure to derive the likelihood of a rotamer or a pairwise rotamer interaction from unassigned NOESY data. In addition, we apply a systematic and rigorous approach to estimate the experimental noise in NMR data, which also determines the weighting factor of the data term in the scoring function derived from the Bayesian framework. We tested our approach on real NMR data of three proteins: the FF Domain 2 of human transcription elongation factor CA150 (FF2), the B1 domain of Protein G (GB1), and human ubiquitin. The promising results indicate that our algorithm can be applied in high-resolution protein structure determination. Since our approach does not require any NOE assignment, it can accelerate the NMR structure determination process. PMID:21970619
Hao, Jie; Astle, William; De Iorio, Maria; Ebbels, Timothy M D
2012-08-01
Nuclear Magnetic Resonance (NMR) spectra are widely used in metabolomics to obtain metabolite profiles in complex biological mixtures. Common methods used to assign and estimate concentrations of metabolites involve either an expert manual peak fitting or extra pre-processing steps, such as peak alignment and binning. Peak fitting is very time consuming and is subject to human error. Conversely, alignment and binning can introduce artefacts and limit immediate biological interpretation of models. We present the Bayesian automated metabolite analyser for NMR spectra (BATMAN), an R package that deconvolutes peaks from one-dimensional NMR spectra, automatically assigns them to specific metabolites from a target list and obtains concentration estimates. The Bayesian model incorporates information on characteristic peak patterns of metabolites and is able to account for shifts in the position of peaks commonly seen in NMR spectra of biological samples. It applies a Markov chain Monte Carlo algorithm to sample from a joint posterior distribution of the model parameters and obtains concentration estimates with reduced error compared with conventional numerical integration and comparable to manual deconvolution by experienced spectroscopists. http://www1.imperial.ac.uk/medicine/people/t.ebbels/ t.ebbels@imperial.ac.uk.
Sandoval-Castellanos, Edson; Palkopoulou, Eleftheria; Dalén, Love
2014-01-01
Inference of population demographic history has vastly improved in recent years due to a number of technological and theoretical advances including the use of ancient DNA. Approximate Bayesian computation (ABC) stands among the most promising methods due to its simple theoretical fundament and exceptional flexibility. However, limited availability of user-friendly programs that perform ABC analysis renders it difficult to implement, and hence programming skills are frequently required. In addition, there is limited availability of programs able to deal with heterochronous data. Here we present the software BaySICS: Bayesian Statistical Inference of Coalescent Simulations. BaySICS provides an integrated and user-friendly platform that performs ABC analyses by means of coalescent simulations from DNA sequence data. It estimates historical demographic population parameters and performs hypothesis testing by means of Bayes factors obtained from model comparisons. Although providing specific features that improve inference from datasets with heterochronous data, BaySICS also has several capabilities making it a suitable tool for analysing contemporary genetic datasets. Those capabilities include joint analysis of independent tables, a graphical interface and the implementation of Markov-chain Monte Carlo without likelihoods.
A Bayesian framework to estimate diversification rates and their variation through time and space
2011-01-01
Background Patterns of species diversity are the result of speciation and extinction processes, and molecular phylogenetic data can provide valuable information to derive their variability through time and across clades. Bayesian Markov chain Monte Carlo methods offer a promising framework to incorporate phylogenetic uncertainty when estimating rates of diversification. Results We introduce a new approach to estimate diversification rates in a Bayesian framework over a distribution of trees under various constant and variable rate birth-death and pure-birth models, and test it on simulated phylogenies. Furthermore, speciation and extinction rates and their posterior credibility intervals can be estimated while accounting for non-random taxon sampling. The framework is particularly suitable for hypothesis testing using Bayes factors, as we demonstrate analyzing dated phylogenies of Chondrostoma (Cyprinidae) and Lupinus (Fabaceae). In addition, we develop a model that extends the rate estimation to a meta-analysis framework in which different data sets are combined in a single analysis to detect general temporal and spatial trends in diversification. Conclusions Our approach provides a flexible framework for the estimation of diversification parameters and hypothesis testing while simultaneously accounting for uncertainties in the divergence times and incomplete taxon sampling. PMID:22013891
Bayesian energy landscape tilting: towards concordant models of molecular ensembles.
Beauchamp, Kyle A; Pande, Vijay S; Das, Rhiju
2014-03-18
Predicting biological structure has remained challenging for systems such as disordered proteins that take on myriad conformations. Hybrid simulation/experiment strategies have been undermined by difficulties in evaluating errors from computational model inaccuracies and data uncertainties. Building on recent proposals from maximum entropy theory and nonequilibrium thermodynamics, we address these issues through a Bayesian energy landscape tilting (BELT) scheme for computing Bayesian hyperensembles over conformational ensembles. BELT uses Markov chain Monte Carlo to directly sample maximum-entropy conformational ensembles consistent with a set of input experimental observables. To test this framework, we apply BELT to model trialanine, starting from disagreeing simulations with the force fields ff96, ff99, ff99sbnmr-ildn, CHARMM27, and OPLS-AA. BELT incorporation of limited chemical shift and (3)J measurements gives convergent values of the peptide's α, β, and PPII conformational populations in all cases. As a test of predictive power, all five BELT hyperensembles recover set-aside measurements not used in the fitting and report accurate errors, even when starting from highly inaccurate simulations. BELT's principled framework thus enables practical predictions for complex biomolecular systems from discordant simulations and sparse data. Copyright © 2014 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Bayesian Atmospheric Radiative Transfer (BART) Code and Application to WASP-43b
NASA Astrophysics Data System (ADS)
Blecic, Jasmina; Harrington, Joseph; Cubillos, Patricio; Bowman, Oliver; Rojo, Patricio; Stemm, Madison; Lust, Nathaniel B.; Challener, Ryan; Foster, Austin James; Foster, Andrew S.; Blumenthal, Sarah D.; Bruce, Dylan
2016-01-01
We present a new open-source Bayesian radiative-transfer framework, Bayesian Atmospheric Radiative Transfer (BART, https://github.com/exosports/BART), and its application to WASP-43b. BART initializes a model for the atmospheric retrieval calculation, generates thousands of theoretical model spectra using parametrized pressure and temperature profiles and line-by-line radiative-transfer calculation, and employs a statistical package to compare the models with the observations. It consists of three self-sufficient modules available to the community under the reproducible-research license, the Thermochemical Equilibrium Abundances module (TEA, https://github.com/dzesmin/TEA, Blecic et al. 2015}, the radiative-transfer module (Transit, https://github.com/exosports/transit), and the Multi-core Markov-chain Monte Carlo statistical module (MCcubed, https://github.com/pcubillos/MCcubed, Cubillos et al. 2015). We applied BART on all available WASP-43b secondary eclipse data from the space- and ground-based observations constraining the temperature-pressure profile and molecular abundances of the dayside atmosphere of WASP-43b. This work was supported by NASA Planetary Atmospheres grant NNX12AI69G and NASA Astrophysics Data Analysis Program grant NNX13AF38G. JB holds a NASA Earth and Space Science Fellowship.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Konomi, Bledar A.; Karagiannis, Georgios; Sarkar, Avik
2014-05-16
Computer experiments (numerical simulations) are widely used in scientific research to study and predict the behavior of complex systems, which usually have responses consisting of a set of distinct outputs. The computational cost of the simulations at high resolution are often expensive and become impractical for parametric studies at different input values. To overcome these difficulties we develop a Bayesian treed multivariate Gaussian process (BTMGP) as an extension of the Bayesian treed Gaussian process (BTGP) in order to model and evaluate a multivariate process. A suitable choice of covariance function and the prior distributions facilitates the different Markov chain Montemore » Carlo (MCMC) movements. We utilize this model to sequentially sample the input space for the most informative values, taking into account model uncertainty and expertise gained. A simulation study demonstrates the use of the proposed method and compares it with alternative approaches. We apply the sequential sampling technique and BTMGP to model the multiphase flow in a full scale regenerator of a carbon capture unit. The application presented in this paper is an important tool for research into carbon dioxide emissions from thermal power plants.« less
Modelisation de l'historique d'operation de groupes turbine-alternateur
NASA Astrophysics Data System (ADS)
Szczota, Mickael
Because of their ageing fleet, the utility managers are increasingly in needs of tools that can help them to plan efficiently maintenance operations. Hydro-Quebec started a project that aim to foresee the degradation of their hydroelectric runner, and use that information to classify the generating unit. That classification will help to know which generating unit is more at risk to undergo a major failure. Cracks linked to the fatigue phenomenon are a predominant degradation mode and the loading sequences applied to the runner is a parameter impacting the crack growth. So, the aim of this memoir is to create a generator able to generate synthetic loading sequences that are statistically equivalent to the observed history. Those simulated sequences will be used as input in a life assessment model. At first, we describe how the generating units are operated by Hydro-Quebec and analyse the available data, the analysis shows that the data are non-stationnary. Then, we review modelisation and validation methods. In the following chapter a particular attention is given to a precise description of the validation and comparison procedure. Then, we present the comparison of three kind of model : Discrete Time Markov Chains, Discrete Time Semi-Markov Chains and the Moving Block Bootstrap. For the first two models, we describe how to take account for the non-stationnarity. Finally, we show that the Markov Chain is not adapted for our case, and that the Semi-Markov chains are better when they include the non-stationnarity. The final choice between Semi-Markov Chains and the Moving Block Bootstrap depends of the user. But, with a long term vision we recommend the use of Semi-Markov chains for their flexibility. Keywords: Stochastic models, Models validation, Reliability, Semi-Markov Chains, Markov Chains, Bootstrap
Saccade selection when reward probability is dynamically manipulated using Markov chains
Lovejoy, Lee P.; Krauzlis, Richard J.
2012-01-01
Markov chains (stochastic processes where probabilities are assigned based on the previous outcome) are commonly used to examine the transitions between behavioral states, such as those that occur during foraging or social interactions. However, relatively little is known about how well primates can incorporate knowledge about Markov chains into their behavior. Saccadic eye movements are an example of a simple behavior influenced by information about probability, and thus are good candidates for testing whether subjects can learn Markov chains. In addition, when investigating the influence of probability on saccade target selection, the use of Markov chains could provide an alternative method that avoids confounds present in other task designs. To investigate these possibilities, we evaluated human behavior on a task in which stimulus reward probabilities were assigned using a Markov chain. On each trial, the subject selected one of four identical stimuli by saccade; after selection, feedback indicated the rewarded stimulus. Each session consisted of 200–600 trials, and on some sessions, the reward magnitude varied. On sessions with a uniform reward, subjects (n = 6) learned to select stimuli at a frequency close to reward probability, which is similar to human behavior on matching or probability classification tasks. When informed that a Markov chain assigned reward probabilities, subjects (n = 3) learned to select the greatest reward probability more often, bringing them close to behavior that maximizes reward. On sessions where reward magnitude varied across stimuli, subjects (n = 6) demonstrated preferences for both greater reward probability and greater reward magnitude, resulting in a preference for greater expected value (the product of reward probability and magnitude). These results demonstrate that Markov chains can be used to dynamically assign probabilities that are rapidly exploited by human subjects during saccade target selection. PMID:18330552
Saccade selection when reward probability is dynamically manipulated using Markov chains.
Nummela, Samuel U; Lovejoy, Lee P; Krauzlis, Richard J
2008-05-01
Markov chains (stochastic processes where probabilities are assigned based on the previous outcome) are commonly used to examine the transitions between behavioral states, such as those that occur during foraging or social interactions. However, relatively little is known about how well primates can incorporate knowledge about Markov chains into their behavior. Saccadic eye movements are an example of a simple behavior influenced by information about probability, and thus are good candidates for testing whether subjects can learn Markov chains. In addition, when investigating the influence of probability on saccade target selection, the use of Markov chains could provide an alternative method that avoids confounds present in other task designs. To investigate these possibilities, we evaluated human behavior on a task in which stimulus reward probabilities were assigned using a Markov chain. On each trial, the subject selected one of four identical stimuli by saccade; after selection, feedback indicated the rewarded stimulus. Each session consisted of 200-600 trials, and on some sessions, the reward magnitude varied. On sessions with a uniform reward, subjects (n = 6) learned to select stimuli at a frequency close to reward probability, which is similar to human behavior on matching or probability classification tasks. When informed that a Markov chain assigned reward probabilities, subjects (n = 3) learned to select the greatest reward probability more often, bringing them close to behavior that maximizes reward. On sessions where reward magnitude varied across stimuli, subjects (n = 6) demonstrated preferences for both greater reward probability and greater reward magnitude, resulting in a preference for greater expected value (the product of reward probability and magnitude). These results demonstrate that Markov chains can be used to dynamically assign probabilities that are rapidly exploited by human subjects during saccade target selection.
Bayesian estimation of optical properties of the human head via 3D structural MRI
NASA Astrophysics Data System (ADS)
Barnett, Alexander H.; Culver, Joseph P.; Sorensen, A. Gregory; Dale, Anders M.; Boas, David A.
2003-10-01
Knowledge of the baseline optical properties of the tissues of the human head is essential for absolute cerebral oximetry, and for quantitative studies of brain activation. In this work we numerically model the utility of signals from a small 6-optode time-resolved diffuse optical tomographic apparatus for inferring baseline scattering and absorption coefficients of the scalp, skull and brain, when complete geometric information is available from magnetic resonance imaging (MRI). We use an optical model where MRI-segmented tissues are assumed homogeneous. We introduce a noise model capturing both photon shot noise and forward model numerical accuracy, and use Bayesian inference to predict errorbars and correlations on the measurments. We also sample from the full posterior distribution using Markov chain Monte Carlo. We conclude that ~ 106 detected photons are sufficient to measure the brain"s scattering and absorption to a few percent. We present preliminary results using a fast multi-layer slab model, comparing the case when layer thicknesses are known versus unknown.
Montesinos-López, Osval A.; Montesinos-López, Abelardo; Crossa, José; Toledo, Fernando H.; Montesinos-López, José C.; Singh, Pawan; Juliana, Philomin; Salinas-Ruiz, Josafhat
2017-01-01
When a plant scientist wishes to make genomic-enabled predictions of multiple traits measured in multiple individuals in multiple environments, the most common strategy for performing the analysis is to use a single trait at a time taking into account genotype × environment interaction (G × E), because there is a lack of comprehensive models that simultaneously take into account the correlated counting traits and G × E. For this reason, in this study we propose a multiple-trait and multiple-environment model for count data. The proposed model was developed under the Bayesian paradigm for which we developed a Markov Chain Monte Carlo (MCMC) with noninformative priors. This allows obtaining all required full conditional distributions of the parameters leading to an exact Gibbs sampler for the posterior distribution. Our model was tested with simulated data and a real data set. Results show that the proposed multi-trait, multi-environment model is an attractive alternative for modeling multiple count traits measured in multiple environments. PMID:28364037
Spectral decompositions of multiple time series: a Bayesian non-parametric approach.
Macaro, Christian; Prado, Raquel
2014-01-01
We consider spectral decompositions of multiple time series that arise in studies where the interest lies in assessing the influence of two or more factors. We write the spectral density of each time series as a sum of the spectral densities associated to the different levels of the factors. We then use Whittle's approximation to the likelihood function and follow a Bayesian non-parametric approach to obtain posterior inference on the spectral densities based on Bernstein-Dirichlet prior distributions. The prior is strategically important as it carries identifiability conditions for the models and allows us to quantify our degree of confidence in such conditions. A Markov chain Monte Carlo (MCMC) algorithm for posterior inference within this class of frequency-domain models is presented.We illustrate the approach by analyzing simulated and real data via spectral one-way and two-way models. In particular, we present an analysis of functional magnetic resonance imaging (fMRI) brain responses measured in individuals who participated in a designed experiment to study pain perception in humans.
Mahardika, G N K; Dibia, N; Budayanti, N S; Susilawathi, N M; Subrata, K; Darwinata, A E; Wignall, F S; Richt, J A; Valdivia-Granda, W A; Sudewi, A A R
2014-06-01
The emergence of human and animal rabies in Bali since November 2008 has attracted local, national and international interest. The potential origin and time of introduction of rabies virus to Bali is described. The nucleoprotein (N) gene of rabies virus from dog brain and human clinical specimens was sequenced using an automated DNA sequencer. Phylogenetic inference with Bayesian Markov Chain Monte Carlo (MCMC) analysis using the Bayesian Evolutionary Analysis by Sampling Trees (BEAST) v. 1.7.5 software confirmed that the outbreak of rabies in Bali was caused by an Indonesian lineage virus following a single introduction. The ancestor of Bali viruses was the descendant of a virus from Kalimantan. Contact tracing showed that the event most likely occurred in early 2008. The introduction of rabies into a large unvaccinated dog population in Bali clearly demonstrates the risk of disease transmission for government agencies and should lead to an increased preparedness and efforts for sustained risk reduction to prevent such events from occurring in future.
Campbell, Kieran R; Yau, Christopher
2017-03-15
Modeling bifurcations in single-cell transcriptomics data has become an increasingly popular field of research. Several methods have been proposed to infer bifurcation structure from such data, but all rely on heuristic non-probabilistic inference. Here we propose the first generative, fully probabilistic model for such inference based on a Bayesian hierarchical mixture of factor analyzers. Our model exhibits competitive performance on large datasets despite implementing full Markov-Chain Monte Carlo sampling, and its unique hierarchical prior structure enables automatic determination of genes driving the bifurcation process. We additionally propose an Empirical-Bayes like extension that deals with the high levels of zero-inflation in single-cell RNA-seq data and quantify when such models are useful. We apply or model to both real and simulated single-cell gene expression data and compare the results to existing pseudotime methods. Finally, we discuss both the merits and weaknesses of such a unified, probabilistic approach in the context practical bioinformatics analyses.
Streck, André Felipe; Homeier, Timo; Foerster, Tessa; Truyen, Uwe
2013-09-01
To estimate the impact of porcine parvovirus (PPV) vaccines on the emergence of new phenotypes, the population dynamic history of the virus was calculated using the Bayesian Markov chain Monte Carlo method with a Bayesian skyline coalescent model. Additionally, an in vitro model was performed with consecutive passages of the 'Challenge' strain (a virulent field strain) and NADL2 strain (a vaccine strain) in a PK-15 cell line supplemented with polyclonal antibodies raised against the vaccine strain. A decrease in genetic diversity was observed in the presence of antibodies in vitro or after vaccination (as estimated by the in silico model). We hypothesized that the antibodies induced a selective pressure that may reduce the incidence of neutral selection, which should play a major role in the emergence of new mutations. In this scenario, vaccine failures and non-vaccinated populations (e.g. wild boars) may have an important impact in the emergence of new phenotypes.
Bayesian inference of radiation belt loss timescales.
NASA Astrophysics Data System (ADS)
Camporeale, E.; Chandorkar, M.
2017-12-01
Electron fluxes in the Earth's radiation belts are routinely studied using the classical quasi-linear radial diffusion model. Although this simplified linear equation has proven to be an indispensable tool in understanding the dynamics of the radiation belt, it requires specification of quantities such as the diffusion coefficient and electron loss timescales that are never directly measured. Researchers have so far assumed a-priori parameterisations for radiation belt quantities and derived the best fit using satellite data. The state of the art in this domain lacks a coherent formulation of this problem in a probabilistic framework. We present some recent progress that we have made in performing Bayesian inference of radial diffusion parameters. We achieve this by making extensive use of the theory connecting Gaussian Processes and linear partial differential equations, and performing Markov Chain Monte Carlo sampling of radial diffusion parameters. These results are important for understanding the role and the propagation of uncertainties in radiation belt simulations and, eventually, for providing a probabilistic forecast of energetic electron fluxes in a Space Weather context.
Modelling past land use using archaeological and pollen data
NASA Astrophysics Data System (ADS)
Pirzamanbein, Behnaz; Lindström, johan; Poska, Anneli; Gaillard-Lemdahl, Marie-José
2016-04-01
Accurate maps of past land use are necessary for studying the impact of anthropogenic land-cover changes on climate and biodiversity. We develop a Bayesian hierarchical model to reconstruct the land use using Gaussian Markov random fields. The model uses two observations sets: 1) archaeological data, representing human settlements, urbanization and agricultural findings; and 2) pollen-based land estimates of the three land-cover types Coniferous forest, Broadleaved forest and Unforested/Open land. The pollen based estimates are obtained from the REVEALS model, based on pollen counts from lakes and bogs. Our developed model uses the sparse pollen-based estimations to reconstruct the spatial continuous cover of three land cover types. Using the open-land component and the archaeological data, the extent of land-use is reconstructed. The model is applied on three time periods - centred around 1900 CE, 1000 and, 4000 BCE over Sweden for which both pollen-based estimates and archaeological data are available. To estimate the model parameters and land use, a block updated Markov chain Monte Carlo (MCMC) algorithm is applied. Using the MCMC posterior samples uncertainties in land-use predictions are computed. Due to lack of good historic land use data, model results are evaluated by cross-validation. Keywords. Spatial reconstruction, Gaussian Markov random field, Fossil pollen records, Archaeological data, Human land-use, Prediction uncertainty
NASA Astrophysics Data System (ADS)
Nelson, Benjamin Earl; Wright, Jason Thomas; Wang, Sharon
2015-08-01
For this hack session, we will present three tools used in analyses of radial velocity exoplanet systems. RVLIN is a set of IDL routines used to quickly fit an arbitrary number of Keplerian curves to radial velocity data to find adequate parameter point estimates. BOOTTRAN is an IDL-based extension of RVLIN to provide orbital parameter uncertainties using bootstrap based on a Keplerian model. RUN DMC is a highly parallelized Markov chain Monte Carlo algorithm that employs an n-body model, primarily used for dynamically complex or poorly constrained exoplanet systems. We will compare the performance of these tools and their applications to various exoplanet systems.
Estimation of under-reporting in epidemics using approximations.
Gamado, Kokouvi; Streftaris, George; Zachary, Stan
2017-06-01
Under-reporting in epidemics, when it is ignored, leads to under-estimation of the infection rate and therefore of the reproduction number. In the case of stochastic models with temporal data, a usual approach for dealing with such issues is to apply data augmentation techniques through Bayesian methodology. Departing from earlier literature approaches implemented using reversible jump Markov chain Monte Carlo (RJMCMC) techniques, we make use of approximations to obtain faster estimation with simple MCMC. Comparisons among the methods developed here, and with the RJMCMC approach, are carried out and highlight that approximation-based methodology offers useful alternative inference tools for large epidemics, with a good trade-off between time cost and accuracy.
Stormwater quality modelling in combined sewers: calibration and uncertainty analysis.
Kanso, A; Chebbo, G; Tassin, B
2005-01-01
Estimating the level of uncertainty in urban stormwater quality models is vital for their utilization. This paper presents the results of application of a Monte Carlo Markov Chain method based on the Bayesian theory for the calibration and uncertainty analysis of a storm water quality model commonly used in available software. The tested model uses a hydrologic/hydrodynamic scheme to estimate the accumulation, the erosion and the transport of pollutants on surfaces and in sewers. It was calibrated for four different initial conditions of in-sewer deposits. Calibration results showed large variability in the model's responses in function of the initial conditions. They demonstrated that the model's predictive capacity is very low.
The spectral method and the central limit theorem for general Markov chains
NASA Astrophysics Data System (ADS)
Nagaev, S. V.
2017-12-01
We consider Markov chains with an arbitrary phase space and develop a modification of the spectral method that enables us to prove the central limit theorem (CLT) for non-uniformly ergodic Markov chains. The conditions imposed on the transition function are more general than those by Athreya-Ney and Nummelin. Our proof of the CLT is purely analytical.
Zipf exponent of trajectory distribution in the hidden Markov model
NASA Astrophysics Data System (ADS)
Bochkarev, V. V.; Lerner, E. Yu
2014-03-01
This paper is the first step of generalization of the previously obtained full classification of the asymptotic behavior of the probability for Markov chain trajectories for the case of hidden Markov models. The main goal is to study the power (Zipf) and nonpower asymptotics of the frequency list of trajectories of hidden Markov frequencys and to obtain explicit formulae for the exponent of the power asymptotics. We consider several simple classes of hidden Markov models. We prove that the asymptotics for a hidden Markov model and for the corresponding Markov chain can be essentially different.
Detection of multiple damages employing best achievable eigenvectors under Bayesian inference
NASA Astrophysics Data System (ADS)
Prajapat, Kanta; Ray-Chaudhuri, Samit
2018-05-01
A novel approach is presented in this work to localize simultaneously multiple damaged elements in a structure along with the estimation of damage severity for each of the damaged elements. For detection of damaged elements, a best achievable eigenvector based formulation has been derived. To deal with noisy data, Bayesian inference is employed in the formulation wherein the likelihood of the Bayesian algorithm is formed on the basis of errors between the best achievable eigenvectors and the measured modes. In this approach, the most probable damage locations are evaluated under Bayesian inference by generating combinations of various possible damaged elements. Once damage locations are identified, damage severities are estimated using a Bayesian inference Markov chain Monte Carlo simulation. The efficiency of the proposed approach has been demonstrated by carrying out a numerical study involving a 12-story shear building. It has been found from this study that damage scenarios involving as low as 10% loss of stiffness in multiple elements are accurately determined (localized and severities quantified) even when 2% noise contaminated modal data are utilized. Further, this study introduces a term parameter impact (evaluated based on sensitivity of modal parameters towards structural parameters) to decide the suitability of selecting a particular mode, if some idea about the damaged elements are available. It has been demonstrated here that the accuracy and efficiency of the Bayesian quantification algorithm increases if damage localization is carried out a-priori. An experimental study involving a laboratory scale shear building and different stiffness modification scenarios shows that the proposed approach is efficient enough to localize the stories with stiffness modification.
Model-based Bayesian inference for ROC data analysis
NASA Astrophysics Data System (ADS)
Lei, Tianhu; Bae, K. Ty
2013-03-01
This paper presents a study of model-based Bayesian inference to Receiver Operating Characteristics (ROC) data. The model is a simple version of general non-linear regression model. Different from Dorfman model, it uses a probit link function with a covariate variable having zero-one two values to express binormal distributions in a single formula. Model also includes a scale parameter. Bayesian inference is implemented by Markov Chain Monte Carlo (MCMC) method carried out by Bayesian analysis Using Gibbs Sampling (BUGS). Contrast to the classical statistical theory, Bayesian approach considers model parameters as random variables characterized by prior distributions. With substantial amount of simulated samples generated by sampling algorithm, posterior distributions of parameters as well as parameters themselves can be accurately estimated. MCMC-based BUGS adopts Adaptive Rejection Sampling (ARS) protocol which requires the probability density function (pdf) which samples are drawing from be log concave with respect to the targeted parameters. Our study corrects a common misconception and proves that pdf of this regression model is log concave with respect to its scale parameter. Therefore, ARS's requirement is satisfied and a Gaussian prior which is conjugate and possesses many analytic and computational advantages is assigned to the scale parameter. A cohort of 20 simulated data sets and 20 simulations from each data set are used in our study. Output analysis and convergence diagnostics for MCMC method are assessed by CODA package. Models and methods by using continuous Gaussian prior and discrete categorical prior are compared. Intensive simulations and performance measures are given to illustrate our practice in the framework of model-based Bayesian inference using MCMC method.
Assessing significance in a Markov chain without mixing.
Chikina, Maria; Frieze, Alan; Pegden, Wesley
2017-03-14
We present a statistical test to detect that a presented state of a reversible Markov chain was not chosen from a stationary distribution. In particular, given a value function for the states of the Markov chain, we would like to show rigorously that the presented state is an outlier with respect to the values, by establishing a [Formula: see text] value under the null hypothesis that it was chosen from a stationary distribution of the chain. A simple heuristic used in practice is to sample ranks of states from long random trajectories on the Markov chain and compare these with the rank of the presented state; if the presented state is a [Formula: see text] outlier compared with the sampled ranks (its rank is in the bottom [Formula: see text] of sampled ranks), then this observation should correspond to a [Formula: see text] value of [Formula: see text] This significance is not rigorous, however, without good bounds on the mixing time of the Markov chain. Our test is the following: Given the presented state in the Markov chain, take a random walk from the presented state for any number of steps. We prove that observing that the presented state is an [Formula: see text]-outlier on the walk is significant at [Formula: see text] under the null hypothesis that the state was chosen from a stationary distribution. We assume nothing about the Markov chain beyond reversibility and show that significance at [Formula: see text] is best possible in general. We illustrate the use of our test with a potential application to the rigorous detection of gerrymandering in Congressional districting.
Assessing significance in a Markov chain without mixing
Chikina, Maria; Frieze, Alan; Pegden, Wesley
2017-01-01
We present a statistical test to detect that a presented state of a reversible Markov chain was not chosen from a stationary distribution. In particular, given a value function for the states of the Markov chain, we would like to show rigorously that the presented state is an outlier with respect to the values, by establishing a p value under the null hypothesis that it was chosen from a stationary distribution of the chain. A simple heuristic used in practice is to sample ranks of states from long random trajectories on the Markov chain and compare these with the rank of the presented state; if the presented state is a 0.1% outlier compared with the sampled ranks (its rank is in the bottom 0.1% of sampled ranks), then this observation should correspond to a p value of 0.001. This significance is not rigorous, however, without good bounds on the mixing time of the Markov chain. Our test is the following: Given the presented state in the Markov chain, take a random walk from the presented state for any number of steps. We prove that observing that the presented state is an ε-outlier on the walk is significant at p=2ε under the null hypothesis that the state was chosen from a stationary distribution. We assume nothing about the Markov chain beyond reversibility and show that significance at p≈ε is best possible in general. We illustrate the use of our test with a potential application to the rigorous detection of gerrymandering in Congressional districting. PMID:28246331
NASA Astrophysics Data System (ADS)
Zhang, Junlong; Li, Yongping; Huang, Guohe; Chen, Xi; Bao, Anming
2016-07-01
Without a realistic assessment of parameter uncertainty, decision makers may encounter difficulties in accurately describing hydrologic processes and assessing relationships between model parameters and watershed characteristics. In this study, a Markov-Chain-Monte-Carlo-based multilevel-factorial-analysis (MCMC-MFA) method is developed, which can not only generate samples of parameters from a well constructed Markov chain and assess parameter uncertainties with straightforward Bayesian inference, but also investigate the individual and interactive effects of multiple parameters on model output through measuring the specific variations of hydrological responses. A case study is conducted for addressing parameter uncertainties in the Kaidu watershed of northwest China. Effects of multiple parameters and their interactions are quantitatively investigated using the MCMC-MFA with a three-level factorial experiment (totally 81 runs). A variance-based sensitivity analysis method is used to validate the results of parameters' effects. Results disclose that (i) soil conservation service runoff curve number for moisture condition II (CN2) and fraction of snow volume corresponding to 50% snow cover (SNO50COV) are the most significant factors to hydrological responses, implying that infiltration-excess overland flow and snow water equivalent represent important water input to the hydrological system of the Kaidu watershed; (ii) saturate hydraulic conductivity (SOL_K) and soil evaporation compensation factor (ESCO) have obvious effects on hydrological responses; this implies that the processes of percolation and evaporation would impact hydrological process in this watershed; (iii) the interactions of ESCO and SNO50COV as well as CN2 and SNO50COV have an obvious effect, implying that snow cover can impact the generation of runoff on land surface and the extraction of soil evaporative demand in lower soil layers. These findings can help enhance the hydrological model's capability for simulating/predicting water resources.
Rottman, Benjamin M; Hastie, Reid
2016-06-01
Making judgments by relying on beliefs about the causal relationships between events is a fundamental capacity of everyday cognition. In the last decade, Causal Bayesian Networks have been proposed as a framework for modeling causal reasoning. Two experiments were conducted to provide comprehensive data sets with which to evaluate a variety of different types of judgments in comparison to the standard Bayesian networks calculations. Participants were introduced to a fictional system of three events and observed a set of learning trials that instantiated the multivariate distribution relating the three variables. We tested inferences on chains X1→Y→X2, common cause structures X1←Y→X2, and common effect structures X1→Y←X2, on binary and numerical variables, and with high and intermediate causal strengths. We tested transitive inferences, inferences when one variable is irrelevant because it is blocked by an intervening variable (Markov Assumption), inferences from two variables to a middle variable, and inferences about the presence of one cause when the alternative cause was known to have occurred (the normative "explaining away" pattern). Compared to the normative account, in general, when the judgments should change, they change in the normative direction. However, we also discuss a few persistent violations of the standard normative model. In addition, we evaluate the relative success of 12 theoretical explanations for these deviations. Copyright © 2016 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Musenge, Eustasius; Chirwa, Tobias Freeman; Kahn, Kathleen; Vounatsou, Penelope
2013-06-01
Longitudinal mortality data with few deaths usually have problems of zero-inflation. This paper presents and applies two Bayesian models which cater for zero-inflation, spatial and temporal random effects. To reduce the computational burden experienced when a large number of geo-locations are treated as a Gaussian field (GF) we transformed the field to a Gaussian Markov Random Fields (GMRF) by triangulation. We then modelled the spatial random effects using the Stochastic Partial Differential Equations (SPDEs). Inference was done using a computationally efficient alternative to Markov chain Monte Carlo (MCMC) called Integrated Nested Laplace Approximation (INLA) suited for GMRF. The models were applied to data from 71,057 children aged 0 to under 10 years from rural north-east South Africa living in 15,703 households over the years 1992-2010. We found protective effects on HIV/TB mortality due to greater birth weight, older age and more antenatal clinic visits during pregnancy (adjusted RR (95% CI)): 0.73(0.53;0.99), 0.18(0.14;0.22) and 0.96(0.94;0.97) respectively. Therefore childhood HIV/TB mortality could be reduced if mothers are better catered for during pregnancy as this can reduce mother-to-child transmissions and contribute to improved birth weights. The INLA and SPDE approaches are computationally good alternatives in modelling large multilevel spatiotemporal GMRF data structures.
A Bayesian alternative for multi-objective ecohydrological model specification
NASA Astrophysics Data System (ADS)
Tang, Yating; Marshall, Lucy; Sharma, Ashish; Ajami, Hoori
2018-01-01
Recent studies have identified the importance of vegetation processes in terrestrial hydrologic systems. Process-based ecohydrological models combine hydrological, physical, biochemical and ecological processes of the catchments, and as such are generally more complex and parametric than conceptual hydrological models. Thus, appropriate calibration objectives and model uncertainty analysis are essential for ecohydrological modeling. In recent years, Bayesian inference has become one of the most popular tools for quantifying the uncertainties in hydrological modeling with the development of Markov chain Monte Carlo (MCMC) techniques. The Bayesian approach offers an appealing alternative to traditional multi-objective hydrologic model calibrations by defining proper prior distributions that can be considered analogous to the ad-hoc weighting often prescribed in multi-objective calibration. Our study aims to develop appropriate prior distributions and likelihood functions that minimize the model uncertainties and bias within a Bayesian ecohydrological modeling framework based on a traditional Pareto-based model calibration technique. In our study, a Pareto-based multi-objective optimization and a formal Bayesian framework are implemented in a conceptual ecohydrological model that combines a hydrological model (HYMOD) and a modified Bucket Grassland Model (BGM). Simulations focused on one objective (streamflow/LAI) and multiple objectives (streamflow and LAI) with different emphasis defined via the prior distribution of the model error parameters. Results show more reliable outputs for both predicted streamflow and LAI using Bayesian multi-objective calibration with specified prior distributions for error parameters based on results from the Pareto front in the ecohydrological modeling. The methodology implemented here provides insight into the usefulness of multiobjective Bayesian calibration for ecohydrologic systems and the importance of appropriate prior distributions in such approaches.
NASA Technical Reports Server (NTRS)
Leutenegger, Scott T.; Horton, Graham
1994-01-01
Recently the Multi-Level algorithm was introduced as a general purpose solver for the solution of steady state Markov chains. In this paper, we consider the performance of the Multi-Level algorithm for solving Nearly Completely Decomposable (NCD) Markov chains, for which special-purpose iteractive aggregation/disaggregation algorithms such as the Koury-McAllister-Stewart (KMS) method have been developed that can exploit the decomposability of the the Markov chain. We present experimental results indicating that the general-purpose Multi-Level algorithm is competitive, and can be significantly faster than the special-purpose KMS algorithm when Gauss-Seidel and Gaussian Elimination are used for solving the individual blocks.
Berlow, Noah; Pal, Ranadip
2011-01-01
Genetic Regulatory Networks (GRNs) are frequently modeled as Markov Chains providing the transition probabilities of moving from one state of the network to another. The inverse problem of inference of the Markov Chain from noisy and limited experimental data is an ill posed problem and often generates multiple model possibilities instead of a unique one. In this article, we address the issue of intervention in a genetic regulatory network represented by a family of Markov Chains. The purpose of intervention is to alter the steady state probability distribution of the GRN as the steady states are considered to be representative of the phenotypes. We consider robust stationary control policies with best expected behavior. The extreme computational complexity involved in search of robust stationary control policies is mitigated by using a sequential approach to control policy generation and utilizing computationally efficient techniques for updating the stationary probability distribution of a Markov chain following a rank one perturbation.
NASA Astrophysics Data System (ADS)
Sadegh, M.; Vrugt, J. A.
2013-12-01
The ever increasing pace of computational power, along with continued advances in measurement technologies and improvements in process understanding has stimulated the development of increasingly complex hydrologic models that simulate soil moisture flow, groundwater recharge, surface runoff, root water uptake, and river discharge at increasingly finer spatial and temporal scales. Reconciling these system models with field and remote sensing data is a difficult task, particularly because average measures of model/data similarity inherently lack the power to provide a meaningful comparative evaluation of the consistency in model form and function. The very construction of the likelihood function - as a summary variable of the (usually averaged) properties of the error residuals - dilutes and mixes the available information into an index having little remaining correspondence to specific behaviors of the system (Gupta et al., 2008). The quest for a more powerful method for model evaluation has inspired Vrugt and Sadegh [2013] to introduce "likelihood-free" inference as vehicle for diagnostic model evaluation. This class of methods is also referred to as Approximate Bayesian Computation (ABC) and relaxes the need for an explicit likelihood function in favor of one or multiple different summary statistics rooted in hydrologic theory that together have a much stronger and compelling diagnostic power than some aggregated measure of the size of the error residuals. Here, we will introduce an efficient ABC sampling method that is orders of magnitude faster in exploring the posterior parameter distribution than commonly used rejection and Population Monte Carlo (PMC) samplers. Our methodology uses Markov Chain Monte Carlo simulation with DREAM, and takes advantage of a simple computational trick to resolve discontinuity problems with the application of set-theoretic summary statistics. We will also demonstrate a set of summary statistics that are rather insensitive to errors in the forcing data. This enhances prospects of detecting model structural deficiencies.
Reliability modelling and analysis of a multi-state element based on a dynamic Bayesian network
NASA Astrophysics Data System (ADS)
Li, Zhiqiang; Xu, Tingxue; Gu, Junyuan; Dong, Qi; Fu, Linyu
2018-04-01
This paper presents a quantitative reliability modelling and analysis method for multi-state elements based on a combination of the Markov process and a dynamic Bayesian network (DBN), taking perfect repair, imperfect repair and condition-based maintenance (CBM) into consideration. The Markov models of elements without repair and under CBM are established, and an absorbing set is introduced to determine the reliability of the repairable element. According to the state-transition relations between the states determined by the Markov process, a DBN model is built. In addition, its parameters for series and parallel systems, namely, conditional probability tables, can be calculated by referring to the conditional degradation probabilities. Finally, the power of a control unit in a failure model is used as an example. A dynamic fault tree (DFT) is translated into a Bayesian network model, and subsequently extended to a DBN. The results show the state probabilities of an element and the system without repair, with perfect and imperfect repair, and under CBM, with an absorbing set plotted by differential equations and verified. Through referring forward, the reliability value of the control unit is determined in different kinds of modes. Finally, weak nodes are noted in the control unit.
Markov Random Fields, Stochastic Quantization and Image Analysis
1990-01-01
Markov random fields based on the lattice Z2 have been extensively used in image analysis in a Bayesian framework as a-priori models for the...of Image Analysis can be given some fundamental justification then there is a remarkable connection between Probabilistic Image Analysis , Statistical Mechanics and Lattice-based Euclidean Quantum Field Theory.
Handling target obscuration through Markov chain observations
NASA Astrophysics Data System (ADS)
Kouritzin, Michael A.; Wu, Biao
2008-04-01
Target Obscuration, including foliage or building obscuration of ground targets and landscape or horizon obscuration of airborne targets, plagues many real world filtering problems. In particular, ground moving target identification Doppler radar, mounted on a surveillance aircraft or unattended airborne vehicle, is used to detect motion consistent with targets of interest. However, these targets try to obscure themselves (at least partially) by, for example, traveling along the edge of a forest or around buildings. This has the effect of creating random blockages in the Doppler radar image that move dynamically and somewhat randomly through this image. Herein, we address tracking problems with target obscuration by building memory into the observations, eschewing the usual corrupted, distorted partial measurement assumptions of filtering in favor of dynamic Markov chain assumptions. In particular, we assume the observations are a Markov chain whose transition probabilities depend upon the signal. The state of the observation Markov chain attempts to depict the current obscuration and the Markov chain dynamics are used to handle the evolution of the partially obscured radar image. Modifications of the classical filtering equations that allow observation memory (in the form of a Markov chain) are given. We use particle filters to estimate the position of the moving targets. Moreover, positive proof-of-concept simulations are included.
A computer program for uncertainty analysis integrating regression and Bayesian methods
Lu, Dan; Ye, Ming; Hill, Mary C.; Poeter, Eileen P.; Curtis, Gary
2014-01-01
This work develops a new functionality in UCODE_2014 to evaluate Bayesian credible intervals using the Markov Chain Monte Carlo (MCMC) method. The MCMC capability in UCODE_2014 is based on the FORTRAN version of the differential evolution adaptive Metropolis (DREAM) algorithm of Vrugt et al. (2009), which estimates the posterior probability density function of model parameters in high-dimensional and multimodal sampling problems. The UCODE MCMC capability provides eleven prior probability distributions and three ways to initialize the sampling process. It evaluates parametric and predictive uncertainties and it has parallel computing capability based on multiple chains to accelerate the sampling process. This paper tests and demonstrates the MCMC capability using a 10-dimensional multimodal mathematical function, a 100-dimensional Gaussian function, and a groundwater reactive transport model. The use of the MCMC capability is made straightforward and flexible by adopting the JUPITER API protocol. With the new MCMC capability, UCODE_2014 can be used to calculate three types of uncertainty intervals, which all can account for prior information: (1) linear confidence intervals which require linearity and Gaussian error assumptions and typically 10s–100s of highly parallelizable model runs after optimization, (2) nonlinear confidence intervals which require a smooth objective function surface and Gaussian observation error assumptions and typically 100s–1,000s of partially parallelizable model runs after optimization, and (3) MCMC Bayesian credible intervals which require few assumptions and commonly 10,000s–100,000s or more partially parallelizable model runs. Ready access allows users to select methods best suited to their work, and to compare methods in many circumstances.
Metrics for Labeled Markov Systems
NASA Technical Reports Server (NTRS)
Desharnais, Josee; Jagadeesan, Radha; Gupta, Vineet; Panangaden, Prakash
1999-01-01
Partial Labeled Markov Chains are simultaneously generalizations of process algebra and of traditional Markov chains. They provide a foundation for interacting discrete probabilistic systems, the interaction being synchronization on labels as in process algebra. Existing notions of process equivalence are too sensitive to the exact probabilities of various transitions. This paper addresses contextual reasoning principles for reasoning about more robust notions of "approximate" equivalence between concurrent interacting probabilistic systems. The present results indicate that:We develop a family of metrics between partial labeled Markov chains to formalize the notion of distance between processes. We show that processes at distance zero are bisimilar. We describe a decision procedure to compute the distance between two processes. We show that reasoning about approximate equivalence can be done compositionally by showing that process combinators do not increase distance. We introduce an asymptotic metric to capture asymptotic properties of Markov chains; and show that parallel composition does not increase asymptotic distance.
Nonstationary Extreme Value Analysis in a Changing Climate: A Software Package
NASA Astrophysics Data System (ADS)
Cheng, L.; AghaKouchak, A.; Gilleland, E.
2013-12-01
Numerous studies show that climatic extremes have increased substantially in the second half of the 20th century. For this reason, analysis of extremes under a nonstationary assumption has received a great deal of attention. This paper presents a software package developed for estimation of return levels, return periods, and risks of climatic extremes in a changing climate. This MATLAB software package offers tools for analysis of climate extremes under both stationary and non-stationary assumptions. The Nonstationary Extreme Value Analysis (hereafter, NEVA) provides an efficient and generalized framework for analyzing extremes using Bayesian inference. NEVA estimates the extreme value parameters using a Differential Evolution Markov Chain (DE-MC) which utilizes the genetic algorithm Differential Evolution (DE) for global optimization over the real parameter space with the Markov Chain Monte Carlo (MCMC) approach and has the advantage of simplicity, speed of calculation and convergence over conventional MCMC. NEVA also offers the confidence interval and uncertainty bounds of estimated return levels based on the sampled parameters. NEVA integrates extreme value design concepts, data analysis tools, optimization and visualization, explicitly designed to facilitate analysis extremes in geosciences. The generalized input and output files of this software package make it attractive for users from across different fields. Both stationary and nonstationary components of the package are validated for a number of case studies using empirical return levels. The results show that NEVA reliably describes extremes and their return levels.
Filtering Using Nonlinear Expectations
2016-04-16
gives a solution to estimating a Markov chain observed in Gaussian noise when the variance of the noise is unkown. This paper is accepted for the IEEE...Optimization, an A* journal. A short third paper discusses how to estimate a change in the transition dynamics of a noisily observed Markov chain ...The change point time is hidden in a hidden Markov chain , so a second level of discovery is involved. This paper is accepted for Communications in
Markov chains for testing redundant software
NASA Technical Reports Server (NTRS)
White, Allan L.; Sjogren, Jon A.
1988-01-01
A preliminary design for a validation experiment has been developed that addresses several problems unique to assuring the extremely high quality of multiple-version programs in process-control software. The procedure uses Markov chains to model the error states of the multiple version programs. The programs are observed during simulated process-control testing, and estimates are obtained for the transition probabilities between the states of the Markov chain. The experimental Markov chain model is then expanded into a reliability model that takes into account the inertia of the system being controlled. The reliability of the multiple version software is computed from this reliability model at a given confidence level using confidence intervals obtained for the transition probabilities during the experiment. An example demonstrating the method is provided.
Counting of oligomers in sequences generated by markov chains for DNA motif discovery.
Shan, Gao; Zheng, Wei-Mou
2009-02-01
By means of the technique of the imbedded Markov chain, an efficient algorithm is proposed to exactly calculate first, second moments of word counts and the probability for a word to occur at least once in random texts generated by a Markov chain. A generating function is introduced directly from the imbedded Markov chain to derive asymptotic approximations for the problem. Two Z-scores, one based on the number of sequences with hits and the other on the total number of word hits in a set of sequences, are examined for discovery of motifs on a set of promoter sequences extracted from A. thaliana genome. Source code is available at http://www.itp.ac.cn/zheng/oligo.c.
Bayesian Inference and Online Learning in Poisson Neuronal Networks.
Huang, Yanping; Rao, Rajesh P N
2016-08-01
Motivated by the growing evidence for Bayesian computation in the brain, we show how a two-layer recurrent network of Poisson neurons can perform both approximate Bayesian inference and learning for any hidden Markov model. The lower-layer sensory neurons receive noisy measurements of hidden world states. The higher-layer neurons infer a posterior distribution over world states via Bayesian inference from inputs generated by sensory neurons. We demonstrate how such a neuronal network with synaptic plasticity can implement a form of Bayesian inference similar to Monte Carlo methods such as particle filtering. Each spike in a higher-layer neuron represents a sample of a particular hidden world state. The spiking activity across the neural population approximates the posterior distribution over hidden states. In this model, variability in spiking is regarded not as a nuisance but as an integral feature that provides the variability necessary for sampling during inference. We demonstrate how the network can learn the likelihood model, as well as the transition probabilities underlying the dynamics, using a Hebbian learning rule. We present results illustrating the ability of the network to perform inference and learning for arbitrary hidden Markov models.
Learning Instance-Specific Predictive Models
Visweswaran, Shyam; Cooper, Gregory F.
2013-01-01
This paper introduces a Bayesian algorithm for constructing predictive models from data that are optimized to predict a target variable well for a particular instance. This algorithm learns Markov blanket models, carries out Bayesian model averaging over a set of models to predict a target variable of the instance at hand, and employs an instance-specific heuristic to locate a set of suitable models to average over. We call this method the instance-specific Markov blanket (ISMB) algorithm. The ISMB algorithm was evaluated on 21 UCI data sets using five different performance measures and its performance was compared to that of several commonly used predictive algorithms, including nave Bayes, C4.5 decision tree, logistic regression, neural networks, k-Nearest Neighbor, Lazy Bayesian Rules, and AdaBoost. Over all the data sets, the ISMB algorithm performed better on average on all performance measures against all the comparison algorithms. PMID:25045325
Optimal choice of word length when comparing two Markov sequences using a χ 2-statistic.
Bai, Xin; Tang, Kujin; Ren, Jie; Waterman, Michael; Sun, Fengzhu
2017-10-03
Alignment-free sequence comparison using counts of word patterns (grams, k-tuples) has become an active research topic due to the large amount of sequence data from the new sequencing technologies. Genome sequences are frequently modelled by Markov chains and the likelihood ratio test or the corresponding approximate χ 2 -statistic has been suggested to compare two sequences. However, it is not known how to best choose the word length k in such studies. We develop an optimal strategy to choose k by maximizing the statistical power of detecting differences between two sequences. Let the orders of the Markov chains for the two sequences be r 1 and r 2 , respectively. We show through both simulations and theoretical studies that the optimal k= max(r 1 ,r 2 )+1 for both long sequences and next generation sequencing (NGS) read data. The orders of the Markov chains may be unknown and several methods have been developed to estimate the orders of Markov chains based on both long sequences and NGS reads. We study the power loss of the statistics when the estimated orders are used. It is shown that the power loss is minimal for some of the estimators of the orders of Markov chains. Our studies provide guidelines on choosing the optimal word length for the comparison of Markov sequences.
ERIC Educational Resources Information Center
Wang, Shiyu; Yang, Yan; Culpepper, Steven Andrew; Douglas, Jeffrey A.
2018-01-01
A family of learning models that integrates a cognitive diagnostic model and a higher-order, hidden Markov model in one framework is proposed. This new framework includes covariates to model skill transition in the learning environment. A Bayesian formulation is adopted to estimate parameters from a learning model. The developed methods are…
What are hierarchical models and how do we analyze them?
Royle, Andy
2016-01-01
In this chapter we provide a basic definition of hierarchical models and introduce the two canonical hierarchical models in this book: site occupancy and N-mixture models. The former is a hierarchical extension of logistic regression and the latter is a hierarchical extension of Poisson regression. We introduce basic concepts of probability modeling and statistical inference including likelihood and Bayesian perspectives. We go through the mechanics of maximizing the likelihood and characterizing the posterior distribution by Markov chain Monte Carlo (MCMC) methods. We give a general perspective on topics such as model selection and assessment of model fit, although we demonstrate these topics in practice in later chapters (especially Chapters 5, 6, 7, and 10 Chapter 5 Chapter 6 Chapter 7 Chapter 10)
INFERENCE FOR INDIVIDUAL-LEVEL MODELS OF INFECTIOUS DISEASES IN LARGE POPULATIONS.
Deardon, Rob; Brooks, Stephen P; Grenfell, Bryan T; Keeling, Matthew J; Tildesley, Michael J; Savill, Nicholas J; Shaw, Darren J; Woolhouse, Mark E J
2010-01-01
Individual Level Models (ILMs), a new class of models, are being applied to infectious epidemic data to aid in the understanding of the spatio-temporal dynamics of infectious diseases. These models are highly flexible and intuitive, and can be parameterised under a Bayesian framework via Markov chain Monte Carlo (MCMC) methods. Unfortunately, this parameterisation can be difficult to implement due to intense computational requirements when calculating the full posterior for large, or even moderately large, susceptible populations, or when missing data are present. Here we detail a methodology that can be used to estimate parameters for such large, and/or incomplete, data sets. This is done in the context of a study of the UK 2001 foot-and-mouth disease (FMD) epidemic.
Assessment of parametric uncertainty for groundwater reactive transport modeling,
Shi, Xiaoqing; Ye, Ming; Curtis, Gary P.; Miller, Geoffery L.; Meyer, Philip D.; Kohler, Matthias; Yabusaki, Steve; Wu, Jichun
2014-01-01
The validity of using Gaussian assumptions for model residuals in uncertainty quantification of a groundwater reactive transport model was evaluated in this study. Least squares regression methods explicitly assume Gaussian residuals, and the assumption leads to Gaussian likelihood functions, model parameters, and model predictions. While the Bayesian methods do not explicitly require the Gaussian assumption, Gaussian residuals are widely used. This paper shows that the residuals of the reactive transport model are non-Gaussian, heteroscedastic, and correlated in time; characterizing them requires using a generalized likelihood function such as the formal generalized likelihood function developed by Schoups and Vrugt (2010). For the surface complexation model considered in this study for simulating uranium reactive transport in groundwater, parametric uncertainty is quantified using the least squares regression methods and Bayesian methods with both Gaussian and formal generalized likelihood functions. While the least squares methods and Bayesian methods with Gaussian likelihood function produce similar Gaussian parameter distributions, the parameter distributions of Bayesian uncertainty quantification using the formal generalized likelihood function are non-Gaussian. In addition, predictive performance of formal generalized likelihood function is superior to that of least squares regression and Bayesian methods with Gaussian likelihood function. The Bayesian uncertainty quantification is conducted using the differential evolution adaptive metropolis (DREAM(zs)) algorithm; as a Markov chain Monte Carlo (MCMC) method, it is a robust tool for quantifying uncertainty in groundwater reactive transport models. For the surface complexation model, the regression-based local sensitivity analysis and Morris- and DREAM(ZS)-based global sensitivity analysis yield almost identical ranking of parameter importance. The uncertainty analysis may help select appropriate likelihood functions, improve model calibration, and reduce predictive uncertainty in other groundwater reactive transport and environmental modeling.
Rhodes, Kirsty M; Turner, Rebecca M; White, Ian R; Jackson, Dan; Spiegelhalter, David J; Higgins, Julian P T
2016-12-20
Many meta-analyses combine results from only a small number of studies, a situation in which the between-study variance is imprecisely estimated when standard methods are applied. Bayesian meta-analysis allows incorporation of external evidence on heterogeneity, providing the potential for more robust inference on the effect size of interest. We present a method for performing Bayesian meta-analysis using data augmentation, in which we represent an informative conjugate prior for between-study variance by pseudo data and use meta-regression for estimation. To assist in this, we derive predictive inverse-gamma distributions for the between-study variance expected in future meta-analyses. These may serve as priors for heterogeneity in new meta-analyses. In a simulation study, we compare approximate Bayesian methods using meta-regression and pseudo data against fully Bayesian approaches based on importance sampling techniques and Markov chain Monte Carlo (MCMC). We compare the frequentist properties of these Bayesian methods with those of the commonly used frequentist DerSimonian and Laird procedure. The method is implemented in standard statistical software and provides a less complex alternative to standard MCMC approaches. An importance sampling approach produces almost identical results to standard MCMC approaches, and results obtained through meta-regression and pseudo data are very similar. On average, data augmentation provides closer results to MCMC, if implemented using restricted maximum likelihood estimation rather than DerSimonian and Laird or maximum likelihood estimation. The methods are applied to real datasets, and an extension to network meta-analysis is described. The proposed method facilitates Bayesian meta-analysis in a way that is accessible to applied researchers. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Bayesian state space models for dynamic genetic network construction across multiple tissues.
Liang, Yulan; Kelemen, Arpad
2016-08-01
Construction of gene-gene interaction networks and potential pathways is a challenging and important problem in genomic research for complex diseases while estimating the dynamic changes of the temporal correlations and non-stationarity are the keys in this process. In this paper, we develop dynamic state space models with hierarchical Bayesian settings to tackle this challenge for inferring the dynamic profiles and genetic networks associated with disease treatments. We treat both the stochastic transition matrix and the observation matrix time-variant and include temporal correlation structures in the covariance matrix estimations in the multivariate Bayesian state space models. The unevenly spaced short time courses with unseen time points are treated as hidden state variables. Hierarchical Bayesian approaches with various prior and hyper-prior models with Monte Carlo Markov Chain and Gibbs sampling algorithms are used to estimate the model parameters and the hidden state variables. We apply the proposed Hierarchical Bayesian state space models to multiple tissues (liver, skeletal muscle, and kidney) Affymetrix time course data sets following corticosteroid (CS) drug administration. Both simulation and real data analysis results show that the genomic changes over time and gene-gene interaction in response to CS treatment can be well captured by the proposed models. The proposed dynamic Hierarchical Bayesian state space modeling approaches could be expanded and applied to other large scale genomic data, such as next generation sequence (NGS) combined with real time and time varying electronic health record (EHR) for more comprehensive and robust systematic and network based analysis in order to transform big biomedical data into predictions and diagnostics for precision medicine and personalized healthcare with better decision making and patient outcomes.
Markov chains and semi-Markov models in time-to-event analysis.
Abner, Erin L; Charnigo, Richard J; Kryscio, Richard J
2013-10-25
A variety of statistical methods are available to investigators for analysis of time-to-event data, often referred to as survival analysis. Kaplan-Meier estimation and Cox proportional hazards regression are commonly employed tools but are not appropriate for all studies, particularly in the presence of competing risks and when multiple or recurrent outcomes are of interest. Markov chain models can accommodate censored data, competing risks (informative censoring), multiple outcomes, recurrent outcomes, frailty, and non-constant survival probabilities. Markov chain models, though often overlooked by investigators in time-to-event analysis, have long been used in clinical studies and have widespread application in other fields.
Markov chains and semi-Markov models in time-to-event analysis
Abner, Erin L.; Charnigo, Richard J.; Kryscio, Richard J.
2014-01-01
A variety of statistical methods are available to investigators for analysis of time-to-event data, often referred to as survival analysis. Kaplan-Meier estimation and Cox proportional hazards regression are commonly employed tools but are not appropriate for all studies, particularly in the presence of competing risks and when multiple or recurrent outcomes are of interest. Markov chain models can accommodate censored data, competing risks (informative censoring), multiple outcomes, recurrent outcomes, frailty, and non-constant survival probabilities. Markov chain models, though often overlooked by investigators in time-to-event analysis, have long been used in clinical studies and have widespread application in other fields. PMID:24818062
NASA Astrophysics Data System (ADS)
Rajaona, Harizo; Septier, François; Armand, Patrick; Delignon, Yves; Olry, Christophe; Albergel, Armand; Moussafir, Jacques
2015-12-01
In the eventuality of an accidental or intentional atmospheric release, the reconstruction of the source term using measurements from a set of sensors is an important and challenging inverse problem. A rapid and accurate estimation of the source allows faster and more efficient action for first-response teams, in addition to providing better damage assessment. This paper presents a Bayesian probabilistic approach to estimate the location and the temporal emission profile of a pointwise source. The release rate is evaluated analytically by using a Gaussian assumption on its prior distribution, and is enhanced with a positivity constraint to improve the estimation. The source location is obtained by the means of an advanced iterative Monte-Carlo technique called Adaptive Multiple Importance Sampling (AMIS), which uses a recycling process at each iteration to accelerate its convergence. The proposed methodology is tested using synthetic and real concentration data in the framework of the Fusion Field Trials 2007 (FFT-07) experiment. The quality of the obtained results is comparable to those coming from the Markov Chain Monte Carlo (MCMC) algorithm, a popular Bayesian method used for source estimation. Moreover, the adaptive processing of the AMIS provides a better sampling efficiency by reusing all the generated samples.
Conditional adaptive Bayesian spectral analysis of nonstationary biomedical time series.
Bruce, Scott A; Hall, Martica H; Buysse, Daniel J; Krafty, Robert T
2018-03-01
Many studies of biomedical time series signals aim to measure the association between frequency-domain properties of time series and clinical and behavioral covariates. However, the time-varying dynamics of these associations are largely ignored due to a lack of methods that can assess the changing nature of the relationship through time. This article introduces a method for the simultaneous and automatic analysis of the association between the time-varying power spectrum and covariates, which we refer to as conditional adaptive Bayesian spectrum analysis (CABS). The procedure adaptively partitions the grid of time and covariate values into an unknown number of approximately stationary blocks and nonparametrically estimates local spectra within blocks through penalized splines. CABS is formulated in a fully Bayesian framework, in which the number and locations of partition points are random, and fit using reversible jump Markov chain Monte Carlo techniques. Estimation and inference averaged over the distribution of partitions allows for the accurate analysis of spectra with both smooth and abrupt changes. The proposed methodology is used to analyze the association between the time-varying spectrum of heart rate variability and self-reported sleep quality in a study of older adults serving as the primary caregiver for their ill spouse. © 2017, The International Biometric Society.
Comparing hierarchical models via the marginalized deviance information criterion.
Quintero, Adrian; Lesaffre, Emmanuel
2018-07-20
Hierarchical models are extensively used in pharmacokinetics and longitudinal studies. When the estimation is performed from a Bayesian approach, model comparison is often based on the deviance information criterion (DIC). In hierarchical models with latent variables, there are several versions of this statistic: the conditional DIC (cDIC) that incorporates the latent variables in the focus of the analysis and the marginalized DIC (mDIC) that integrates them out. Regardless of the asymptotic and coherency difficulties of cDIC, this alternative is usually used in Markov chain Monte Carlo (MCMC) methods for hierarchical models because of practical convenience. The mDIC criterion is more appropriate in most cases but requires integration of the likelihood, which is computationally demanding and not implemented in Bayesian software. Therefore, we consider a method to compute mDIC by generating replicate samples of the latent variables that need to be integrated out. This alternative can be easily conducted from the MCMC output of Bayesian packages and is widely applicable to hierarchical models in general. Additionally, we propose some approximations in order to reduce the computational complexity for large-sample situations. The method is illustrated with simulated data sets and 2 medical studies, evidencing that cDIC may be misleading whilst mDIC appears pertinent. Copyright © 2018 John Wiley & Sons, Ltd.
Bayesian inference of nonlinear unsteady aerodynamics from aeroelastic limit cycle oscillations
NASA Astrophysics Data System (ADS)
Sandhu, Rimple; Poirel, Dominique; Pettit, Chris; Khalil, Mohammad; Sarkar, Abhijit
2016-07-01
A Bayesian model selection and parameter estimation algorithm is applied to investigate the influence of nonlinear and unsteady aerodynamic loads on the limit cycle oscillation (LCO) of a pitching airfoil in the transitional Reynolds number regime. At small angles of attack, laminar boundary layer trailing edge separation causes negative aerodynamic damping leading to the LCO. The fluid-structure interaction of the rigid, but elastically mounted, airfoil and nonlinear unsteady aerodynamics is represented by two coupled nonlinear stochastic ordinary differential equations containing uncertain parameters and model approximation errors. Several plausible aerodynamic models with increasing complexity are proposed to describe the aeroelastic system leading to LCO. The likelihood in the posterior parameter probability density function (pdf) is available semi-analytically using the extended Kalman filter for the state estimation of the coupled nonlinear structural and unsteady aerodynamic model. The posterior parameter pdf is sampled using a parallel and adaptive Markov Chain Monte Carlo (MCMC) algorithm. The posterior probability of each model is estimated using the Chib-Jeliazkov method that directly uses the posterior MCMC samples for evidence (marginal likelihood) computation. The Bayesian algorithm is validated through a numerical study and then applied to model the nonlinear unsteady aerodynamic loads using wind-tunnel test data at various Reynolds numbers.
Bayesian Monte Carlo and Maximum Likelihood Approach for ...
Model uncertainty estimation and risk assessment is essential to environmental management and informed decision making on pollution mitigation strategies. In this study, we apply a probabilistic methodology, which combines Bayesian Monte Carlo simulation and Maximum Likelihood estimation (BMCML) to calibrate a lake oxygen recovery model. We first derive an analytical solution of the differential equation governing lake-averaged oxygen dynamics as a function of time-variable wind speed. Statistical inferences on model parameters and predictive uncertainty are then drawn by Bayesian conditioning of the analytical solution on observed daily wind speed and oxygen concentration data obtained from an earlier study during two recovery periods on a eutrophic lake in upper state New York. The model is calibrated using oxygen recovery data for one year and statistical inferences were validated using recovery data for another year. Compared with essentially two-step, regression and optimization approach, the BMCML results are more comprehensive and performed relatively better in predicting the observed temporal dissolved oxygen levels (DO) in the lake. BMCML also produced comparable calibration and validation results with those obtained using popular Markov Chain Monte Carlo technique (MCMC) and is computationally simpler and easier to implement than the MCMC. Next, using the calibrated model, we derive an optimal relationship between liquid film-transfer coefficien
Bayesian inference of nonlinear unsteady aerodynamics from aeroelastic limit cycle oscillations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sandhu, Rimple; Poirel, Dominique; Pettit, Chris
2016-07-01
A Bayesian model selection and parameter estimation algorithm is applied to investigate the influence of nonlinear and unsteady aerodynamic loads on the limit cycle oscillation (LCO) of a pitching airfoil in the transitional Reynolds number regime. At small angles of attack, laminar boundary layer trailing edge separation causes negative aerodynamic damping leading to the LCO. The fluid–structure interaction of the rigid, but elastically mounted, airfoil and nonlinear unsteady aerodynamics is represented by two coupled nonlinear stochastic ordinary differential equations containing uncertain parameters and model approximation errors. Several plausible aerodynamic models with increasing complexity are proposed to describe the aeroelastic systemmore » leading to LCO. The likelihood in the posterior parameter probability density function (pdf) is available semi-analytically using the extended Kalman filter for the state estimation of the coupled nonlinear structural and unsteady aerodynamic model. The posterior parameter pdf is sampled using a parallel and adaptive Markov Chain Monte Carlo (MCMC) algorithm. The posterior probability of each model is estimated using the Chib–Jeliazkov method that directly uses the posterior MCMC samples for evidence (marginal likelihood) computation. The Bayesian algorithm is validated through a numerical study and then applied to model the nonlinear unsteady aerodynamic loads using wind-tunnel test data at various Reynolds numbers.« less
Profile-Based LC-MS Data Alignment—A Bayesian Approach
Tsai, Tsung-Heng; Tadesse, Mahlet G.; Wang, Yue; Ressom, Habtom W.
2014-01-01
A Bayesian alignment model (BAM) is proposed for alignment of liquid chromatography-mass spectrometry (LC-MS) data. BAM belongs to the category of profile-based approaches, which are composed of two major components: a prototype function and a set of mapping functions. Appropriate estimation of these functions is crucial for good alignment results. BAM uses Markov chain Monte Carlo (MCMC) methods to draw inference on the model parameters and improves on existing MCMC-based alignment methods through 1) the implementation of an efficient MCMC sampler and 2) an adaptive selection of knots. A block Metropolis-Hastings algorithm that mitigates the problem of the MCMC sampler getting stuck at local modes of the posterior distribution is used for the update of the mapping function coefficients. In addition, a stochastic search variable selection (SSVS) methodology is used to determine the number and positions of knots. We applied BAM to a simulated data set, an LC-MS proteomic data set, and two LC-MS metabolomic data sets, and compared its performance with the Bayesian hierarchical curve registration (BHCR) model, the dynamic time-warping (DTW) model, and the continuous profile model (CPM). The advantage of applying appropriate profile-based retention time correction prior to performing a feature-based approach is also demonstrated through the metabolomic data sets. PMID:23929872
A Bayesian Framework for Reliability Analysis of Spacecraft Deployments
NASA Technical Reports Server (NTRS)
Evans, John W.; Gallo, Luis; Kaminsky, Mark
2012-01-01
Deployable subsystems are essential to mission success of most spacecraft. These subsystems enable critical functions including power, communications and thermal control. The loss of any of these functions will generally result in loss of the mission. These subsystems and their components often consist of unique designs and applications for which various standardized data sources are not applicable for estimating reliability and for assessing risks. In this study, a two stage sequential Bayesian framework for reliability estimation of spacecraft deployment was developed for this purpose. This process was then applied to the James Webb Space Telescope (JWST) Sunshield subsystem, a unique design intended for thermal control of the Optical Telescope Element. Initially, detailed studies of NASA deployment history, "heritage information", were conducted, extending over 45 years of spacecraft launches. This information was then coupled to a non-informative prior and a binomial likelihood function to create a posterior distribution for deployments of various subsystems uSing Monte Carlo Markov Chain sampling. Select distributions were then coupled to a subsequent analysis, using test data and anomaly occurrences on successive ground test deployments of scale model test articles of JWST hardware, to update the NASA heritage data. This allowed for a realistic prediction for the reliability of the complex Sunshield deployment, with credibility limits, within this two stage Bayesian framework.
NASA Technical Reports Server (NTRS)
Warner, James E.; Zubair, Mohammad; Ranjan, Desh
2017-01-01
This work investigates novel approaches to probabilistic damage diagnosis that utilize surrogate modeling and high performance computing (HPC) to achieve substantial computational speedup. Motivated by Digital Twin, a structural health management (SHM) paradigm that integrates vehicle-specific characteristics with continual in-situ damage diagnosis and prognosis, the methods studied herein yield near real-time damage assessments that could enable monitoring of a vehicle's health while it is operating (i.e. online SHM). High-fidelity modeling and uncertainty quantification (UQ), both critical to Digital Twin, are incorporated using finite element method simulations and Bayesian inference, respectively. The crux of the proposed Bayesian diagnosis methods, however, is the reformulation of the numerical sampling algorithms (e.g. Markov chain Monte Carlo) used to generate the resulting probabilistic damage estimates. To this end, three distinct methods are demonstrated for rapid sampling that utilize surrogate modeling and exploit various degrees of parallelism for leveraging HPC. The accuracy and computational efficiency of the methods are compared on the problem of strain-based crack identification in thin plates. While each approach has inherent problem-specific strengths and weaknesses, all approaches are shown to provide accurate probabilistic damage diagnoses and several orders of magnitude computational speedup relative to a baseline Bayesian diagnosis implementation.
Decomposition of conditional probability for high-order symbolic Markov chains.
Melnik, S S; Usatenko, O V
2017-07-01
The main goal of this paper is to develop an estimate for the conditional probability function of random stationary ergodic symbolic sequences with elements belonging to a finite alphabet. We elaborate on a decomposition procedure for the conditional probability function of sequences considered to be high-order Markov chains. We represent the conditional probability function as the sum of multilinear memory function monomials of different orders (from zero up to the chain order). This allows us to introduce a family of Markov chain models and to construct artificial sequences via a method of successive iterations, taking into account at each step increasingly high correlations among random elements. At weak correlations, the memory functions are uniquely expressed in terms of the high-order symbolic correlation functions. The proposed method fills the gap between two approaches, namely the likelihood estimation and the additive Markov chains. The obtained results may have applications for sequential approximation of artificial neural network training.
Herbei, Radu; Kubatko, Laura
2013-03-26
Markov chains are widely used for modeling in many areas of molecular biology and genetics. As the complexity of such models advances, it becomes increasingly important to assess the rate at which a Markov chain converges to its stationary distribution in order to carry out accurate inference. A common measure of convergence to the stationary distribution is the total variation distance, but this measure can be difficult to compute when the state space of the chain is large. We propose a Monte Carlo method to estimate the total variation distance that can be applied in this situation, and we demonstrate how the method can be efficiently implemented by taking advantage of GPU computing techniques. We apply the method to two Markov chains on the space of phylogenetic trees, and discuss the implications of our findings for the development of algorithms for phylogenetic inference.
Decomposition of conditional probability for high-order symbolic Markov chains
NASA Astrophysics Data System (ADS)
Melnik, S. S.; Usatenko, O. V.
2017-07-01
The main goal of this paper is to develop an estimate for the conditional probability function of random stationary ergodic symbolic sequences with elements belonging to a finite alphabet. We elaborate on a decomposition procedure for the conditional probability function of sequences considered to be high-order Markov chains. We represent the conditional probability function as the sum of multilinear memory function monomials of different orders (from zero up to the chain order). This allows us to introduce a family of Markov chain models and to construct artificial sequences via a method of successive iterations, taking into account at each step increasingly high correlations among random elements. At weak correlations, the memory functions are uniquely expressed in terms of the high-order symbolic correlation functions. The proposed method fills the gap between two approaches, namely the likelihood estimation and the additive Markov chains. The obtained results may have applications for sequential approximation of artificial neural network training.
Turner, Rebecca M; Jackson, Dan; Wei, Yinghui; Thompson, Simon G; Higgins, Julian P T
2015-01-01
Numerous meta-analyses in healthcare research combine results from only a small number of studies, for which the variance representing between-study heterogeneity is estimated imprecisely. A Bayesian approach to estimation allows external evidence on the expected magnitude of heterogeneity to be incorporated. The aim of this paper is to provide tools that improve the accessibility of Bayesian meta-analysis. We present two methods for implementing Bayesian meta-analysis, using numerical integration and importance sampling techniques. Based on 14 886 binary outcome meta-analyses in the Cochrane Database of Systematic Reviews, we derive a novel set of predictive distributions for the degree of heterogeneity expected in 80 settings depending on the outcomes assessed and comparisons made. These can be used as prior distributions for heterogeneity in future meta-analyses. The two methods are implemented in R, for which code is provided. Both methods produce equivalent results to standard but more complex Markov chain Monte Carlo approaches. The priors are derived as log-normal distributions for the between-study variance, applicable to meta-analyses of binary outcomes on the log odds-ratio scale. The methods are applied to two example meta-analyses, incorporating the relevant predictive distributions as prior distributions for between-study heterogeneity. We have provided resources to facilitate Bayesian meta-analysis, in a form accessible to applied researchers, which allow relevant prior information on the degree of heterogeneity to be incorporated. © 2014 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. PMID:25475839
Quantifying Uncertainty in Near Surface Electromagnetic Imaging Using Bayesian Methods
NASA Astrophysics Data System (ADS)
Blatter, D. B.; Ray, A.; Key, K.
2017-12-01
Geoscientists commonly use electromagnetic methods to image the Earth's near surface. Field measurements of EM fields are made (often with the aid an artificial EM source) and then used to infer near surface electrical conductivity via a process known as inversion. In geophysics, the standard inversion tool kit is robust and can provide an estimate of the Earth's near surface conductivity that is both geologically reasonable and compatible with the measured field data. However, standard inverse methods struggle to provide a sense of the uncertainty in the estimate they provide. This is because the task of finding an Earth model that explains the data to within measurement error is non-unique - that is, there are many, many such models; but the standard methods provide only one "answer." An alternative method, known as Bayesian inversion, seeks to explore the full range of Earth model parameters that can adequately explain the measured data, rather than attempting to find a single, "ideal" model. Bayesian inverse methods can therefore provide a quantitative assessment of the uncertainty inherent in trying to infer near surface conductivity from noisy, measured field data. This study applies a Bayesian inverse method (called trans-dimensional Markov chain Monte Carlo) to transient airborne EM data previously collected over Taylor Valley - one of the McMurdo Dry Valleys in Antarctica. Our results confirm the reasonableness of previous estimates (made using standard methods) of near surface conductivity beneath Taylor Valley. In addition, we demonstrate quantitatively the uncertainty associated with those estimates. We demonstrate that Bayesian inverse methods can provide quantitative uncertainty to estimates of near surface conductivity.
Evaluating impacts using a BACI design, ratios, and a Bayesian approach with a focus on restoration.
Conner, Mary M; Saunders, W Carl; Bouwes, Nicolaas; Jordan, Chris
2015-10-01
Before-after-control-impact (BACI) designs are an effective method to evaluate natural and human-induced perturbations on ecological variables when treatment sites cannot be randomly chosen. While effect sizes of interest can be tested with frequentist methods, using Bayesian Markov chain Monte Carlo (MCMC) sampling methods, probabilities of effect sizes, such as a ≥20 % increase in density after restoration, can be directly estimated. Although BACI and Bayesian methods are used widely for assessing natural and human-induced impacts for field experiments, the application of hierarchal Bayesian modeling with MCMC sampling to BACI designs is less common. Here, we combine these approaches and extend the typical presentation of results with an easy to interpret ratio, which provides an answer to the main study question-"How much impact did a management action or natural perturbation have?" As an example of this approach, we evaluate the impact of a restoration project, which implemented beaver dam analogs, on survival and density of juvenile steelhead. Results indicated the probabilities of a ≥30 % increase were high for survival and density after the dams were installed, 0.88 and 0.99, respectively, while probabilities for a higher increase of ≥50 % were variable, 0.17 and 0.82, respectively. This approach demonstrates a useful extension of Bayesian methods that can easily be generalized to other study designs from simple (e.g., single factor ANOVA, paired t test) to more complicated block designs (e.g., crossover, split-plot). This approach is valuable for estimating the probabilities of restoration impacts or other management actions.
Bayesian Estimation of Small Effects in Exercise and Sports Science.
Mengersen, Kerrie L; Drovandi, Christopher C; Robert, Christian P; Pyne, David B; Gore, Christopher J
2016-01-01
The aim of this paper is to provide a Bayesian formulation of the so-called magnitude-based inference approach to quantifying and interpreting effects, and in a case study example provide accurate probabilistic statements that correspond to the intended magnitude-based inferences. The model is described in the context of a published small-scale athlete study which employed a magnitude-based inference approach to compare the effect of two altitude training regimens (live high-train low (LHTL), and intermittent hypoxic exposure (IHE)) on running performance and blood measurements of elite triathletes. The posterior distributions, and corresponding point and interval estimates, for the parameters and associated effects and comparisons of interest, were estimated using Markov chain Monte Carlo simulations. The Bayesian analysis was shown to provide more direct probabilistic comparisons of treatments and able to identify small effects of interest. The approach avoided asymptotic assumptions and overcame issues such as multiple testing. Bayesian analysis of unscaled effects showed a probability of 0.96 that LHTL yields a substantially greater increase in hemoglobin mass than IHE, a 0.93 probability of a substantially greater improvement in running economy and a greater than 0.96 probability that both IHE and LHTL yield a substantially greater improvement in maximum blood lactate concentration compared to a Placebo. The conclusions are consistent with those obtained using a 'magnitude-based inference' approach that has been promoted in the field. The paper demonstrates that a fully Bayesian analysis is a simple and effective way of analysing small effects, providing a rich set of results that are straightforward to interpret in terms of probabilistic statements.
NASA Astrophysics Data System (ADS)
Jing, R.; Lin, N.; Emanuel, K.; Vecchi, G. A.; Knutson, T. R.
2017-12-01
A Markov environment-dependent hurricane intensity model (MeHiM) is developed to simulate the climatology of hurricane intensity given the surrounding large-scale environment. The model considers three unobserved discrete states representing respectively storm's slow, moderate, and rapid intensification (and deintensification). Each state is associated with a probability distribution of intensity change. The storm's movement from one state to another, regarded as a Markov chain, is described by a transition probability matrix. The initial state is estimated with a Bayesian approach. All three model components (initial intensity, state transition, and intensity change) are dependent on environmental variables including potential intensity, vertical wind shear, midlevel relative humidity, and ocean mixing characteristics. This dependent Markov model of hurricane intensity shows a significant improvement over previous statistical models (e.g., linear, nonlinear, and finite mixture models) in estimating the distributions of 6-h and 24-h intensity change, lifetime maximum intensity, and landfall intensity, etc. Here we compare MeHiM with various dynamical models, including a global climate model [High-Resolution Forecast-Oriented Low Ocean Resolution model (HiFLOR)], a regional hurricane model (Geophysical Fluid Dynamics Laboratory (GFDL) hurricane model), and a simplified hurricane dynamic model [Coupled Hurricane Intensity Prediction System (CHIPS)] and its newly developed fast simulator. The MeHiM developed based on the reanalysis data is applied to estimate the intensity of simulated storms to compare with the dynamical-model predictions under the current climate. The dependences of hurricanes on the environment under current and future projected climates in the various models will also be compared statistically.
Transition records of stationary Markov chains.
Naudts, Jan; Van der Straeten, Erik
2006-10-01
In any Markov chain with finite state space the distribution of transition records always belongs to the exponential family. This observation is used to prove a fluctuation theorem, and to show that the dynamical entropy of a stationary Markov chain is linear in the number of steps. Three applications are discussed. A known result about entropy production is reproduced. A thermodynamic relation is derived for equilibrium systems with Metropolis dynamics. Finally, a link is made with recent results concerning a one-dimensional polymer model.
Maximum Kolmogorov-Sinai Entropy Versus Minimum Mixing Time in Markov Chains
NASA Astrophysics Data System (ADS)
Mihelich, M.; Dubrulle, B.; Paillard, D.; Kral, Q.; Faranda, D.
2018-01-01
We establish a link between the maximization of Kolmogorov Sinai entropy (KSE) and the minimization of the mixing time for general Markov chains. Since the maximisation of KSE is analytical and easier to compute in general than mixing time, this link provides a new faster method to approximate the minimum mixing time dynamics. It could be interesting in computer sciences and statistical physics, for computations that use random walks on graphs that can be represented as Markov chains.
Simplification of irreversible Markov chains by removal of states with fast leaving rates.
Jia, Chen
2016-07-07
In the recent work of Ullah et al. (2012a), the authors developed an effective method to simplify reversible Markov chains by removal of states with low equilibrium occupancies. In this paper, we extend this result to irreversible Markov chains. We show that an irreversible chain can be simplified by removal of states with fast leaving rates. Moreover, we reveal that the irreversibility of the chain will always decrease after model simplification. This suggests that although model simplification can retain almost all the dynamic information of the chain, it will lose some thermodynamic information as a trade-off. Examples from biology are also given to illustrate the main results of this paper. Copyright © 2016 Elsevier Ltd. All rights reserved.
Jia, Chen
2017-09-01
Here we develop an effective approach to simplify two-time-scale Markov chains with infinite state spaces by removal of states with fast leaving rates, which improves the simplification method of finite Markov chains. We introduce the concept of fast transition paths and show that the effective transitions of the reduced chain can be represented as the superposition of the direct transitions and the indirect transitions via all the fast transition paths. Furthermore, we apply our simplification approach to the standard Markov model of single-cell stochastic gene expression and provide a mathematical theory of random gene expression bursts. We give the precise mathematical conditions for the bursting kinetics of both mRNAs and proteins. It turns out that random bursts exactly correspond to the fast transition paths of the Markov model. This helps us gain a better understanding of the physics behind the bursting kinetics as an emergent behavior from the fundamental multiscale biochemical reaction kinetics of stochastic gene expression.
NASA Astrophysics Data System (ADS)
Jia, Chen
2017-09-01
Here we develop an effective approach to simplify two-time-scale Markov chains with infinite state spaces by removal of states with fast leaving rates, which improves the simplification method of finite Markov chains. We introduce the concept of fast transition paths and show that the effective transitions of the reduced chain can be represented as the superposition of the direct transitions and the indirect transitions via all the fast transition paths. Furthermore, we apply our simplification approach to the standard Markov model of single-cell stochastic gene expression and provide a mathematical theory of random gene expression bursts. We give the precise mathematical conditions for the bursting kinetics of both mRNAs and proteins. It turns out that random bursts exactly correspond to the fast transition paths of the Markov model. This helps us gain a better understanding of the physics behind the bursting kinetics as an emergent behavior from the fundamental multiscale biochemical reaction kinetics of stochastic gene expression.
Marginally specified priors for non-parametric Bayesian estimation
Kessler, David C.; Hoff, Peter D.; Dunson, David B.
2014-01-01
Summary Prior specification for non-parametric Bayesian inference involves the difficult task of quantifying prior knowledge about a parameter of high, often infinite, dimension. A statistician is unlikely to have informed opinions about all aspects of such a parameter but will have real information about functionals of the parameter, such as the population mean or variance. The paper proposes a new framework for non-parametric Bayes inference in which the prior distribution for a possibly infinite dimensional parameter is decomposed into two parts: an informative prior on a finite set of functionals, and a non-parametric conditional prior for the parameter given the functionals. Such priors can be easily constructed from standard non-parametric prior distributions in common use and inherit the large support of the standard priors on which they are based. Additionally, posterior approximations under these informative priors can generally be made via minor adjustments to existing Markov chain approximation algorithms for standard non-parametric prior distributions. We illustrate the use of such priors in the context of multivariate density estimation using Dirichlet process mixture models, and in the modelling of high dimensional sparse contingency tables. PMID:25663813
NASA Astrophysics Data System (ADS)
Xu, Yingru; Bernhard, Jonah E.; Bass, Steffen A.; Nahrgang, Marlene; Cao, Shanshan
2018-01-01
By applying a Bayesian model-to-data analysis, we estimate the temperature and momentum dependence of the heavy quark diffusion coefficient in an improved Langevin framework. The posterior range of the diffusion coefficient is obtained by performing a Markov chain Monte Carlo random walk and calibrating on the experimental data of D -meson RAA and v2 in three different collision systems at the Relativistic Heavy-Ion Collidaer (RHIC) and the Large Hadron Collider (LHC): Au-Au collisions at 200 GeV and Pb-Pb collisions at 2.76 and 5.02 TeV. The spatial diffusion coefficient is found to be consistent with lattice QCD calculations and comparable with other models' estimation. We demonstrate the capability of our improved Langevin model to simultaneously describe the RAA and v2 at both RHIC and the LHC energies, as well as the higher order flow coefficient such as D meson v3. We show that by applying a Bayesian analysis, we are able to quantitatively and systematically study the heavy flavor dynamics in heavy-ion collisions.
Comparison of sampling techniques for Bayesian parameter estimation
NASA Astrophysics Data System (ADS)
Allison, Rupert; Dunkley, Joanna
2014-02-01
The posterior probability distribution for a set of model parameters encodes all that the data have to tell us in the context of a given model; it is the fundamental quantity for Bayesian parameter estimation. In order to infer the posterior probability distribution we have to decide how to explore parameter space. Here we compare three prescriptions for how parameter space is navigated, discussing their relative merits. We consider Metropolis-Hasting sampling, nested sampling and affine-invariant ensemble Markov chain Monte Carlo (MCMC) sampling. We focus on their performance on toy-model Gaussian likelihoods and on a real-world cosmological data set. We outline the sampling algorithms themselves and elaborate on performance diagnostics such as convergence time, scope for parallelization, dimensional scaling, requisite tunings and suitability for non-Gaussian distributions. We find that nested sampling delivers high-fidelity estimates for posterior statistics at low computational cost, and should be adopted in favour of Metropolis-Hastings in many cases. Affine-invariant MCMC is competitive when computing clusters can be utilized for massive parallelization. Affine-invariant MCMC and existing extensions to nested sampling naturally probe multimodal and curving distributions.
NASA Astrophysics Data System (ADS)
Sheng, Zheng
2013-02-01
The estimation of lower atmospheric refractivity from radar sea clutter (RFC) is a complicated nonlinear optimization problem. This paper deals with the RFC problem in a Bayesian framework. It uses the unbiased Markov Chain Monte Carlo (MCMC) sampling technique, which can provide accurate posterior probability distributions of the estimated refractivity parameters by using an electromagnetic split-step fast Fourier transform terrain parabolic equation propagation model within a Bayesian inversion framework. In contrast to the global optimization algorithm, the Bayesian—MCMC can obtain not only the approximate solutions, but also the probability distributions of the solutions, that is, uncertainty analyses of solutions. The Bayesian—MCMC algorithm is implemented on the simulation radar sea-clutter data and the real radar sea-clutter data. Reference data are assumed to be simulation data and refractivity profiles are obtained using a helicopter. The inversion algorithm is assessed (i) by comparing the estimated refractivity profiles from the assumed simulation and the helicopter sounding data; (ii) the one-dimensional (1D) and two-dimensional (2D) posterior probability distribution of solutions.
NASA Astrophysics Data System (ADS)
Miftahurrohmah, Brina; Iriawan, Nur; Fithriasari, Kartika
2017-06-01
Stocks are known as the financial instruments traded in the capital market which have a high level of risk. Their risks are indicated by their uncertainty of their return which have to be accepted by investors in the future. The higher the risk to be faced, the higher the return would be gained. Therefore, the measurements need to be made against the risk. Value at Risk (VaR) as the most popular risk measurement method, is frequently ignore when the pattern of return is not uni-modal Normal. The calculation of the risks using VaR method with the Normal Mixture Autoregressive (MNAR) approach has been considered. This paper proposes VaR method couple with the Mixture Laplace Autoregressive (MLAR) that would be implemented for analysing the first three biggest capitalization Islamic stock return in JII, namely PT. Astra International Tbk (ASII), PT. Telekomunikasi Indonesia Tbk (TLMK), and PT. Unilever Indonesia Tbk (UNVR). Parameter estimation is performed by employing Bayesian Markov Chain Monte Carlo (MCMC) approaches.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Xuesong; Liang, Faming; Yu, Beibei
2011-11-09
Estimating uncertainty of hydrologic forecasting is valuable to water resources and other relevant decision making processes. Recently, Bayesian Neural Networks (BNNs) have been proved powerful tools for quantifying uncertainty of streamflow forecasting. In this study, we propose a Markov Chain Monte Carlo (MCMC) framework to incorporate the uncertainties associated with input, model structure, and parameter into BNNs. This framework allows the structure of the neural networks to change by removing or adding connections between neurons and enables scaling of input data by using rainfall multipliers. The results show that the new BNNs outperform the BNNs that only consider uncertainties associatedmore » with parameter and model structure. Critical evaluation of posterior distribution of neural network weights, number of effective connections, rainfall multipliers, and hyper-parameters show that the assumptions held in our BNNs are not well supported. Further understanding of characteristics of different uncertainty sources and including output error into the MCMC framework are expected to enhance the application of neural networks for uncertainty analysis of hydrologic forecasting.« less
Bayesian calibration of the Community Land Model using surrogates
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ray, Jaideep; Hou, Zhangshuan; Huang, Maoyi
2014-02-01
We present results from the Bayesian calibration of hydrological parameters of the Community Land Model (CLM), which is often used in climate simulations and Earth system models. A statistical inverse problem is formulated for three hydrological parameters, conditional on observations of latent heat surface fluxes over 48 months. Our calibration method uses polynomial and Gaussian process surrogates of the CLM, and solves the parameter estimation problem using a Markov chain Monte Carlo sampler. Posterior probability densities for the parameters are developed for two sites with different soil and vegetation covers. Our method also allows us to examine the structural errormore » in CLM under two error models. We find that surrogate models can be created for CLM in most cases. The posterior distributions are more predictive than the default parameter values in CLM. Climatologically averaging the observations does not modify the parameters' distributions significantly. The structural error model reveals a correlation time-scale which can be used to identify the physical process that could be contributing to it. While the calibrated CLM has a higher predictive skill, the calibration is under-dispersive.« less
Hamiltonian Monte Carlo acceleration using surrogate functions with random bases.
Zhang, Cheng; Shahbaba, Babak; Zhao, Hongkai
2017-11-01
For big data analysis, high computational cost for Bayesian methods often limits their applications in practice. In recent years, there have been many attempts to improve computational efficiency of Bayesian inference. Here we propose an efficient and scalable computational technique for a state-of-the-art Markov chain Monte Carlo methods, namely, Hamiltonian Monte Carlo. The key idea is to explore and exploit the structure and regularity in parameter space for the underlying probabilistic model to construct an effective approximation of its geometric properties. To this end, we build a surrogate function to approximate the target distribution using properly chosen random bases and an efficient optimization process. The resulting method provides a flexible, scalable, and efficient sampling algorithm, which converges to the correct target distribution. We show that by choosing the basis functions and optimization process differently, our method can be related to other approaches for the construction of surrogate functions such as generalized additive models or Gaussian process models. Experiments based on simulated and real data show that our approach leads to substantially more efficient sampling algorithms compared to existing state-of-the-art methods.
Montesinos-López, Osval A; Montesinos-López, Abelardo; Crossa, José; Toledo, Fernando H; Montesinos-López, José C; Singh, Pawan; Juliana, Philomin; Salinas-Ruiz, Josafhat
2017-05-05
When a plant scientist wishes to make genomic-enabled predictions of multiple traits measured in multiple individuals in multiple environments, the most common strategy for performing the analysis is to use a single trait at a time taking into account genotype × environment interaction (G × E), because there is a lack of comprehensive models that simultaneously take into account the correlated counting traits and G × E. For this reason, in this study we propose a multiple-trait and multiple-environment model for count data. The proposed model was developed under the Bayesian paradigm for which we developed a Markov Chain Monte Carlo (MCMC) with noninformative priors. This allows obtaining all required full conditional distributions of the parameters leading to an exact Gibbs sampler for the posterior distribution. Our model was tested with simulated data and a real data set. Results show that the proposed multi-trait, multi-environment model is an attractive alternative for modeling multiple count traits measured in multiple environments. Copyright © 2017 Montesinos-López et al.
Belland, Brian R; Walker, Andrew E; Kim, Nam Ju
2017-12-01
Computer-based scaffolding provides temporary support that enables students to participate in and become more proficient at complex skills like problem solving, argumentation, and evaluation. While meta-analyses have addressed between-subject differences on cognitive outcomes resulting from scaffolding, none has addressed within-subject gains. This leaves much quantitative scaffolding literature not covered by existing meta-analyses. To address this gap, this study used Bayesian network meta-analysis to synthesize within-subjects (pre-post) differences resulting from scaffolding in 56 studies. We generated the posterior distribution using 20,000 Markov Chain Monte Carlo samples. Scaffolding has a consistently strong effect across student populations, STEM (science, technology, engineering, and mathematics) disciplines, and assessment levels, and a strong effect when used with most problem-centered instructional models (exception: inquiry-based learning and modeling visualization) and educational levels (exception: secondary education). Results also indicate some promising areas for future scaffolding research, including scaffolding among students with learning disabilities, for whom the effect size was particularly large (ḡ = 3.13).
A Computationally-Efficient Inverse Approach to Probabilistic Strain-Based Damage Diagnosis
NASA Technical Reports Server (NTRS)
Warner, James E.; Hochhalter, Jacob D.; Leser, William P.; Leser, Patrick E.; Newman, John A
2016-01-01
This work presents a computationally-efficient inverse approach to probabilistic damage diagnosis. Given strain data at a limited number of measurement locations, Bayesian inference and Markov Chain Monte Carlo (MCMC) sampling are used to estimate probability distributions of the unknown location, size, and orientation of damage. Substantial computational speedup is obtained by replacing a three-dimensional finite element (FE) model with an efficient surrogate model. The approach is experimentally validated on cracked test specimens where full field strains are determined using digital image correlation (DIC). Access to full field DIC data allows for testing of different hypothetical sensor arrangements, facilitating the study of strain-based diagnosis effectiveness as the distance between damage and measurement locations increases. The ability of the framework to effectively perform both probabilistic damage localization and characterization in cracked plates is demonstrated and the impact of measurement location on uncertainty in the predictions is shown. Furthermore, the analysis time to produce these predictions is orders of magnitude less than a baseline Bayesian approach with the FE method by utilizing surrogate modeling and effective numerical sampling approaches.
A Bayesian inversion for slip distribution of 1 Apr 2007 Mw8.1 Solomon Islands Earthquake
NASA Astrophysics Data System (ADS)
Chen, T.; Luo, H.
2013-12-01
On 1 Apr 2007 the megathrust Mw8.1 Solomon Islands earthquake occurred in the southeast pacific along the New Britain subduction zone. 102 vertical displacement measurements over the southeastern end of the rupture zone from two field surveys after this event provide a unique constraint for slip distribution inversion. In conventional inversion method (such as bounded variable least squares) the smoothing parameter that determines the relative weight placed on fitting the data versus smoothing the slip distribution is often subjectively selected at the bend of the trade-off curve. Here a fully probabilistic inversion method[Fukuda,2008] is applied to estimate distributed slip and smoothing parameter objectively. The joint posterior probability density function of distributed slip and the smoothing parameter is formulated under a Bayesian framework and sampled with Markov chain Monte Carlo method. We estimate the spatial distribution of dip slip associated with the 1 Apr 2007 Solomon Islands earthquake with this method. Early results show a shallower dip angle than previous study and highly variable dip slip both along-strike and down-dip.
Hamilton, B H
1999-08-01
The fraction of US Medicare recipients enrolled in health maintenance organizations (HMOs) has increased substantially over the past 10 years. However, the impact of HMOs on health care costs is still hotly debated. In particular, it is argued that HMOs achieve cost reduction through 'cream-skimming' and enrolling relatively healthy patients. This paper develops a Bayesian panel data tobit model of HMO selection and Medicare expenditures for recent US retirees that accounts for mortality over the course of the panel. The model is estimated using Markov Chain Monte Carlo (MCMC) simulation methods, and is novel in that a multivariate t-link is used in place of normality to allow for the heavy-tailed distributions often found in health care expenditure data. The findings indicate that HMOs select individuals who are less likely to have positive health care expenditures prior to enrollment. However, there is no evidence that HMOs disenrol high cost patients. The results also indicate the importance of accounting for survival over the panel, since high mortality probabilities are associated with higher health care expenditures in the last year of life.
Belland, Brian R.; Walker, Andrew E.; Kim, Nam Ju
2017-01-01
Computer-based scaffolding provides temporary support that enables students to participate in and become more proficient at complex skills like problem solving, argumentation, and evaluation. While meta-analyses have addressed between-subject differences on cognitive outcomes resulting from scaffolding, none has addressed within-subject gains. This leaves much quantitative scaffolding literature not covered by existing meta-analyses. To address this gap, this study used Bayesian network meta-analysis to synthesize within-subjects (pre–post) differences resulting from scaffolding in 56 studies. We generated the posterior distribution using 20,000 Markov Chain Monte Carlo samples. Scaffolding has a consistently strong effect across student populations, STEM (science, technology, engineering, and mathematics) disciplines, and assessment levels, and a strong effect when used with most problem-centered instructional models (exception: inquiry-based learning and modeling visualization) and educational levels (exception: secondary education). Results also indicate some promising areas for future scaffolding research, including scaffolding among students with learning disabilities, for whom the effect size was particularly large (ḡ = 3.13). PMID:29200508
A Bayesian network for modelling blood glucose concentration and exercise in type 1 diabetes.
Ewings, Sean M; Sahu, Sujit K; Valletta, John J; Byrne, Christopher D; Chipperfield, Andrew J
2015-06-01
This article presents a new statistical approach to analysing the effects of everyday physical activity on blood glucose concentration in people with type 1 diabetes. A physiologically based model of blood glucose dynamics is developed to cope with frequently sampled data on food, insulin and habitual physical activity; the model is then converted to a Bayesian network to account for measurement error and variability in the physiological processes. A simulation study is conducted to determine the feasibility of using Markov chain Monte Carlo methods for simultaneous estimation of all model parameters and prediction of blood glucose concentration. Although there are problems with parameter identification in a minority of cases, most parameters can be estimated without bias. Predictive performance is unaffected by parameter misspecification and is insensitive to misleading prior distributions. This article highlights important practical and theoretical issues not previously addressed in the quest for an artificial pancreas as treatment for type 1 diabetes. The proposed methods represent a new paradigm for analysis of deterministic mathematical models of blood glucose concentration. © The Author(s) 2014 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.
A Bayesian Approach to Model Selection in Hierarchical Mixtures-of-Experts Architectures.
Tanner, Martin A.; Peng, Fengchun; Jacobs, Robert A.
1997-03-01
There does not exist a statistical model that shows good performance on all tasks. Consequently, the model selection problem is unavoidable; investigators must decide which model is best at summarizing the data for each task of interest. This article presents an approach to the model selection problem in hierarchical mixtures-of-experts architectures. These architectures combine aspects of generalized linear models with those of finite mixture models in order to perform tasks via a recursive "divide-and-conquer" strategy. Markov chain Monte Carlo methodology is used to estimate the distribution of the architectures' parameters. One part of our approach to model selection attempts to estimate the worth of each component of an architecture so that relatively unused components can be pruned from the architecture's structure. A second part of this approach uses a Bayesian hypothesis testing procedure in order to differentiate inputs that carry useful information from nuisance inputs. Simulation results suggest that the approach presented here adheres to the dictum of Occam's razor; simple architectures that are adequate for summarizing the data are favored over more complex structures. Copyright 1997 Elsevier Science Ltd. All Rights Reserved.
Reliability modelling and analysis of a multi-state element based on a dynamic Bayesian network
Xu, Tingxue; Gu, Junyuan; Dong, Qi; Fu, Linyu
2018-01-01
This paper presents a quantitative reliability modelling and analysis method for multi-state elements based on a combination of the Markov process and a dynamic Bayesian network (DBN), taking perfect repair, imperfect repair and condition-based maintenance (CBM) into consideration. The Markov models of elements without repair and under CBM are established, and an absorbing set is introduced to determine the reliability of the repairable element. According to the state-transition relations between the states determined by the Markov process, a DBN model is built. In addition, its parameters for series and parallel systems, namely, conditional probability tables, can be calculated by referring to the conditional degradation probabilities. Finally, the power of a control unit in a failure model is used as an example. A dynamic fault tree (DFT) is translated into a Bayesian network model, and subsequently extended to a DBN. The results show the state probabilities of an element and the system without repair, with perfect and imperfect repair, and under CBM, with an absorbing set plotted by differential equations and verified. Through referring forward, the reliability value of the control unit is determined in different kinds of modes. Finally, weak nodes are noted in the control unit. PMID:29765629
A simplified parsimonious higher order multivariate Markov chain model
NASA Astrophysics Data System (ADS)
Wang, Chao; Yang, Chuan-sheng
2017-09-01
In this paper, a simplified parsimonious higher-order multivariate Markov chain model (SPHOMMCM) is presented. Moreover, parameter estimation method of TPHOMMCM is give. Numerical experiments shows the effectiveness of TPHOMMCM.
On the uncertain nature of the core of α Cen A
NASA Astrophysics Data System (ADS)
Bazot, M.; Christensen-Dalsgaard, J.; Gizon, L.; Benomar, O.
2016-08-01
High-quality astrometric, spectroscopic, interferometric and, importantly, asteroseismic observations are available for α Cen A, which is the closest binary star system to earth. Taking all these constraints into account, we study the internal structure of the star by means of theoretical modelling. Using the Aarhus STellar Evolution Code (ASTEC) and the tools of Computational Bayesian Statistics, in particular a Markov chain Monte Carlo algorithm, we perform statistical inferences for the physical characteristics of the star. We find that α Cen A has a probability of approximately 40 per cent of having a convective core. This probability drops to few per cent if one considers reduced rates for the 14N(p,γ)15O reaction. These convective cores have fractional radii less than 8 per cent when overshoot is neglected. Including overshooting also leads to the possibility of a convective core mostly sustained by the ppII chain energy output. We finally show that roughly 30 per cent of the stellar models describing α Cen A are in the subgiant regime.
A Bayesian approach for parameter estimation and prediction using a computationally intensive model
Higdon, Dave; McDonnell, Jordan D.; Schunck, Nicolas; ...
2015-02-05
Bayesian methods have been successful in quantifying uncertainty in physics-based problems in parameter estimation and prediction. In these cases, physical measurements y are modeled as the best fit of a physics-based modelmore » $$\\eta (\\theta )$$, where θ denotes the uncertain, best input setting. Hence the statistical model is of the form $$y=\\eta (\\theta )+\\epsilon ,$$ where $$\\epsilon $$ accounts for measurement, and possibly other, error sources. When nonlinearity is present in $$\\eta (\\cdot )$$, the resulting posterior distribution for the unknown parameters in the Bayesian formulation is typically complex and nonstandard, requiring computationally demanding computational approaches such as Markov chain Monte Carlo (MCMC) to produce multivariate draws from the posterior. Although generally applicable, MCMC requires thousands (or even millions) of evaluations of the physics model $$\\eta (\\cdot )$$. This requirement is problematic if the model takes hours or days to evaluate. To overcome this computational bottleneck, we present an approach adapted from Bayesian model calibration. This approach combines output from an ensemble of computational model runs with physical measurements, within a statistical formulation, to carry out inference. A key component of this approach is a statistical response surface, or emulator, estimated from the ensemble of model runs. We demonstrate this approach with a case study in estimating parameters for a density functional theory model, using experimental mass/binding energy measurements from a collection of atomic nuclei. Lastly, we also demonstrate how this approach produces uncertainties in predictions for recent mass measurements obtained at Argonne National Laboratory.« less
Time-varying Concurrent Risk of Extreme Droughts and Heatwaves in California
NASA Astrophysics Data System (ADS)
Sarhadi, A.; Diffenbaugh, N. S.; Ausin, M. C.
2016-12-01
Anthropogenic global warming has changed the nature and the risk of extreme climate phenomena such as droughts and heatwaves. The concurrent of these nature-changing climatic extremes may result in intensifying undesirable consequences in terms of human health and destructive effects in water resources. The present study assesses the risk of concurrent extreme droughts and heatwaves under dynamic nonstationary conditions arising from climate change in California. For doing so, a generalized fully Bayesian time-varying multivariate risk framework is proposed evolving through time under dynamic human-induced environment. In this methodology, an extreme, Bayesian, dynamic copula (Gumbel) is developed to model the time-varying dependence structure between the two different climate extremes. The time-varying extreme marginals are previously modeled using a Generalized Extreme Value (GEV) distribution. Bayesian Markov Chain Monte Carlo (MCMC) inference is integrated to estimate parameters of the nonstationary marginals and copula using a Gibbs sampling method. Modelled marginals and copula are then used to develop a fully Bayesian, time-varying joint return period concept for the estimation of concurrent risk. Here we argue that climate change has increased the chance of concurrent droughts and heatwaves over decades in California. It is also demonstrated that a time-varying multivariate perspective should be incorporated to assess realistic concurrent risk of the extremes for water resources planning and management in a changing climate in this area. The proposed generalized methodology can be applied for other stochastic nature-changing compound climate extremes that are under the influence of climate change.
A Bayesian Alternative for Multi-objective Ecohydrological Model Specification
NASA Astrophysics Data System (ADS)
Tang, Y.; Marshall, L. A.; Sharma, A.; Ajami, H.
2015-12-01
Process-based ecohydrological models combine the study of hydrological, physical, biogeochemical and ecological processes of the catchments, which are usually more complex and parametric than conceptual hydrological models. Thus, appropriate calibration objectives and model uncertainty analysis are essential for ecohydrological modeling. In recent years, Bayesian inference has become one of the most popular tools for quantifying the uncertainties in hydrological modeling with the development of Markov Chain Monte Carlo (MCMC) techniques. Our study aims to develop appropriate prior distributions and likelihood functions that minimize the model uncertainties and bias within a Bayesian ecohydrological framework. In our study, a formal Bayesian approach is implemented in an ecohydrological model which combines a hydrological model (HyMOD) and a dynamic vegetation model (DVM). Simulations focused on one objective likelihood (Streamflow/LAI) and multi-objective likelihoods (Streamflow and LAI) with different weights are compared. Uniform, weakly informative and strongly informative prior distributions are used in different simulations. The Kullback-leibler divergence (KLD) is used to measure the dis(similarity) between different priors and corresponding posterior distributions to examine the parameter sensitivity. Results show that different prior distributions can strongly influence posterior distributions for parameters, especially when the available data is limited or parameters are insensitive to the available data. We demonstrate differences in optimized parameters and uncertainty limits in different cases based on multi-objective likelihoods vs. single objective likelihoods. We also demonstrate the importance of appropriately defining the weights of objectives in multi-objective calibration according to different data types.
Bayesian structured additive regression modeling of epidemic data: application to cholera
2012-01-01
Background A significant interest in spatial epidemiology lies in identifying associated risk factors which enhances the risk of infection. Most studies, however, make no, or limited use of the spatial structure of the data, as well as possible nonlinear effects of the risk factors. Methods We develop a Bayesian Structured Additive Regression model for cholera epidemic data. Model estimation and inference is based on fully Bayesian approach via Markov Chain Monte Carlo (MCMC) simulations. The model is applied to cholera epidemic data in the Kumasi Metropolis, Ghana. Proximity to refuse dumps, density of refuse dumps, and proximity to potential cholera reservoirs were modeled as continuous functions; presence of slum settlers and population density were modeled as fixed effects, whereas spatial references to the communities were modeled as structured and unstructured spatial effects. Results We observe that the risk of cholera is associated with slum settlements and high population density. The risk of cholera is equal and lower for communities with fewer refuse dumps, but variable and higher for communities with more refuse dumps. The risk is also lower for communities distant from refuse dumps and potential cholera reservoirs. The results also indicate distinct spatial variation in the risk of cholera infection. Conclusion The study highlights the usefulness of Bayesian semi-parametric regression model analyzing public health data. These findings could serve as novel information to help health planners and policy makers in making effective decisions to control or prevent cholera epidemics. PMID:22866662
An efficient Bayesian meta-analysis approach for studying cross-phenotype genetic associations
Majumdar, Arunabha; Haldar, Tanushree; Bhattacharya, Sourabh; Witte, John S.
2018-01-01
Simultaneous analysis of genetic associations with multiple phenotypes may reveal shared genetic susceptibility across traits (pleiotropy). For a locus exhibiting overall pleiotropy, it is important to identify which specific traits underlie this association. We propose a Bayesian meta-analysis approach (termed CPBayes) that uses summary-level data across multiple phenotypes to simultaneously measure the evidence of aggregate-level pleiotropic association and estimate an optimal subset of traits associated with the risk locus. This method uses a unified Bayesian statistical framework based on a spike and slab prior. CPBayes performs a fully Bayesian analysis by employing the Markov Chain Monte Carlo (MCMC) technique Gibbs sampling. It takes into account heterogeneity in the size and direction of the genetic effects across traits. It can be applied to both cohort data and separate studies of multiple traits having overlapping or non-overlapping subjects. Simulations show that CPBayes can produce higher accuracy in the selection of associated traits underlying a pleiotropic signal than the subset-based meta-analysis ASSET. We used CPBayes to undertake a genome-wide pleiotropic association study of 22 traits in the large Kaiser GERA cohort and detected six independent pleiotropic loci associated with at least two phenotypes. This includes a locus at chromosomal region 1q24.2 which exhibits an association simultaneously with the risk of five different diseases: Dermatophytosis, Hemorrhoids, Iron Deficiency, Osteoporosis and Peripheral Vascular Disease. We provide an R-package ‘CPBayes’ implementing the proposed method. PMID:29432419
NASA Astrophysics Data System (ADS)
Kim, Jin-Young; Kwon, Hyun-Han; Kim, Hung-Soo
2015-04-01
The existing regional frequency analysis has disadvantages in that it is difficult to consider geographical characteristics in estimating areal rainfall. In this regard, this study aims to develop a hierarchical Bayesian model based nonstationary regional frequency analysis in that spatial patterns of the design rainfall with geographical information (e.g. latitude, longitude and altitude) are explicitly incorporated. This study assumes that the parameters of Gumbel (or GEV distribution) are a function of geographical characteristics within a general linear regression framework. Posterior distribution of the regression parameters are estimated by Bayesian Markov Chain Monte Carlo (MCMC) method, and the identified functional relationship is used to spatially interpolate the parameters of the distributions by using digital elevation models (DEM) as inputs. The proposed model is applied to derive design rainfalls over the entire Han-river watershed. It was found that the proposed Bayesian regional frequency analysis model showed similar results compared to L-moment based regional frequency analysis. In addition, the model showed an advantage in terms of quantifying uncertainty of the design rainfall and estimating the area rainfall considering geographical information. Finally, comprehensive discussion on design rainfall in the context of nonstationary will be presented. KEYWORDS: Regional frequency analysis, Nonstationary, Spatial information, Bayesian Acknowledgement This research was supported by a grant (14AWMP-B082564-01) from Advanced Water Management Research Program funded by Ministry of Land, Infrastructure and Transport of Korean government.
NASA Astrophysics Data System (ADS)
Ojha, Maheswar; Maiti, Saumen
2016-03-01
A novel approach based on the concept of Bayesian neural network (BNN) has been implemented for classifying sediment boundaries using downhole log data obtained during Integrated Ocean Drilling Program (IODP) Expedition 323 in the Bering Sea slope region. The Bayesian framework in conjunction with Markov Chain Monte Carlo (MCMC)/hybrid Monte Carlo (HMC) learning paradigm has been applied to constrain the lithology boundaries using density, density porosity, gamma ray, sonic P-wave velocity and electrical resistivity at the Hole U1344A. We have demonstrated the effectiveness of our supervised classification methodology by comparing our findings with a conventional neural network and a Bayesian neural network optimized by scaled conjugate gradient method (SCG), and tested the robustness of the algorithm in the presence of red noise in the data. The Bayesian results based on the HMC algorithm (BNN.HMC) resolve detailed finer structures at certain depths in addition to main lithology such as silty clay, diatom clayey silt and sandy silt. Our method also recovers the lithology information from a depth ranging between 615 and 655 m Wireline log Matched depth below Sea Floor of no core recovery zone. Our analyses demonstrate that the BNN based approach renders robust means for the classification of complex lithology successions at the Hole U1344A, which could be very useful for other studies and understanding the oceanic crustal inhomogeneity and structural discontinuities.
Convergence analysis of surrogate-based methods for Bayesian inverse problems
NASA Astrophysics Data System (ADS)
Yan, Liang; Zhang, Yuan-Xiang
2017-12-01
The major challenges in the Bayesian inverse problems arise from the need for repeated evaluations of the forward model, as required by Markov chain Monte Carlo (MCMC) methods for posterior sampling. Many attempts at accelerating Bayesian inference have relied on surrogates for the forward model, typically constructed through repeated forward simulations that are performed in an offline phase. Although such approaches can be quite effective at reducing computation cost, there has been little analysis of the approximation on posterior inference. In this work, we prove error bounds on the Kullback-Leibler (KL) distance between the true posterior distribution and the approximation based on surrogate models. Our rigorous error analysis show that if the forward model approximation converges at certain rate in the prior-weighted L 2 norm, then the posterior distribution generated by the approximation converges to the true posterior at least two times faster in the KL sense. The error bound on the Hellinger distance is also provided. To provide concrete examples focusing on the use of the surrogate model based methods, we present an efficient technique for constructing stochastic surrogate models to accelerate the Bayesian inference approach. The Christoffel least squares algorithms, based on generalized polynomial chaos, are used to construct a polynomial approximation of the forward solution over the support of the prior distribution. The numerical strategy and the predicted convergence rates are then demonstrated on the nonlinear inverse problems, involving the inference of parameters appearing in partial differential equations.
A tridiagonal parsimonious higher order multivariate Markov chain model
NASA Astrophysics Data System (ADS)
Wang, Chao; Yang, Chuan-sheng
2017-09-01
In this paper, we present a tridiagonal parsimonious higher-order multivariate Markov chain model (TPHOMMCM). Moreover, estimation method of the parameters in TPHOMMCM is give. Numerical experiments illustrate the effectiveness of TPHOMMCM.
ERIC Educational Resources Information Center
Kayser, Brian D.
The fit of educational aspirations of Illinois rural high school youths to 3 related one-parameter mathematical models was investigated. The models used were the continuous-time Markov chain model, the discrete-time Markov chain, and the Poisson distribution. The sample of 635 students responded to questionnaires from 1966 to 1969 as part of an…
Markov chain model for demersal fish catch analysis in Indonesia
NASA Astrophysics Data System (ADS)
Firdaniza; Gusriani, N.
2018-03-01
As an archipelagic country, Indonesia has considerable potential fishery resources. One of the fish resources that has high economic value is demersal fish. Demersal fish is a fish with a habitat in the muddy seabed. Demersal fish scattered throughout the Indonesian seas. Demersal fish production in each Indonesia’s Fisheries Management Area (FMA) varies each year. In this paper we have discussed the Markov chain model for demersal fish yield analysis throughout all Indonesia’s Fisheries Management Area. Data of demersal fish catch in every FMA in 2005-2014 was obtained from Directorate of Capture Fisheries. From this data a transition probability matrix is determined by the number of transitions from the catch that lie below the median or above the median. The Markov chain model of demersal fish catch data was an ergodic Markov chain model, so that the limiting probability of the Markov chain model can be determined. The predictive value of demersal fishing yields was obtained by calculating the combination of limiting probability with average catch results below the median and above the median. The results showed that for 2018 and long-term demersal fishing results in most of FMA were below the median value.
Analysis and design of a second-order digital phase-locked loop
NASA Technical Reports Server (NTRS)
Blasche, P. R.
1979-01-01
A specific second-order digital phase-locked loop (DPLL) was modeled as a first-order Markov chain with alternatives. From the matrix of transition probabilities of the Markov chain, the steady-state phase error of the DPLL was determined. In a similar manner the loop's response was calculated for a fading input. Additionally, a hardware DPLL was constructed and tested to provide a comparison to the results obtained from the Markov chain model. In all cases tested, good agreement was found between the theoretical predictions and the experimental data.
Information Entropy Production of Maximum Entropy Markov Chains from Spike Trains
NASA Astrophysics Data System (ADS)
Cofré, Rodrigo; Maldonado, Cesar
2018-01-01
We consider the maximum entropy Markov chain inference approach to characterize the collective statistics of neuronal spike trains, focusing on the statistical properties of the inferred model. We review large deviations techniques useful in this context to describe properties of accuracy and convergence in terms of sampling size. We use these results to study the statistical fluctuation of correlations, distinguishability and irreversibility of maximum entropy Markov chains. We illustrate these applications using simple examples where the large deviation rate function is explicitly obtained for maximum entropy models of relevance in this field.
Chauvin, C; Clement, C; Bruneau, M; Pommeret, D
2007-07-16
This article describes the use of Markov chains to explore the time-patterns of antimicrobial exposure in broiler poultry. The transition in antimicrobial exposure status (exposed/not exposed to an antimicrobial, with a distinction between exposures to the different antimicrobial classes) in extensive data collected in broiler chicken flocks from November 2003 onwards, was investigated. All Markov chains were first-order chains. Mortality rate, geographical location and slaughter semester were sources of heterogeneity between transition matrices. Transitions towards a 'no antimicrobial' exposure state were highly predominant, whatever the initial state. From a 'no antimicrobial' exposure state, the transition to beta-lactams was predominant among transitions to an antimicrobial exposure state. Transitions between antimicrobial classes were rare and variable. Switches between antimicrobial classes and repeats of a particular class were both observed. Application of Markov chains analysis to the database of the nation-wide antimicrobial resistance monitoring programme pointed out that transition probabilities between antimicrobial exposure states increased with the number of resistances in Escherichia coli strains.
Indexed semi-Markov process for wind speed modeling.
NASA Astrophysics Data System (ADS)
Petroni, F.; D'Amico, G.; Prattico, F.
2012-04-01
The increasing interest in renewable energy leads scientific research to find a better way to recover most of the available energy. Particularly, the maximum energy recoverable from wind is equal to 59.3% of that available (Betz law) at a specific pitch angle and when the ratio between the wind speed in output and in input is equal to 1/3. The pitch angle is the angle formed between the airfoil of the blade of the wind turbine and the wind direction. Old turbine and a lot of that actually marketed, in fact, have always the same invariant geometry of the airfoil. This causes that wind turbines will work with an efficiency that is lower than 59.3%. New generation wind turbines, instead, have a system to variate the pitch angle by rotating the blades. This system able the wind turbines to recover, at different wind speed, always the maximum energy, working in Betz limit at different speed ratios. A powerful system control of the pitch angle allows the wind turbine to recover better the energy in transient regime. A good stochastic model for wind speed is then needed to help both the optimization of turbine design and to assist the system control to predict the value of the wind speed to positioning the blades quickly and correctly. The possibility to have synthetic data of wind speed is a powerful instrument to assist designer to verify the structures of the wind turbines or to estimate the energy recoverable from a specific site. To generate synthetic data, Markov chains of first or higher order are often used [1,2,3]. In particular in [1] is presented a comparison between a first-order Markov chain and a second-order Markov chain. A similar work, but only for the first-order Markov chain, is conduced by [2], presenting the probability transition matrix and comparing the energy spectral density and autocorrelation of real and synthetic wind speed data. A tentative to modeling and to join speed and direction of wind is presented in [3], by using two models, first-order Markov chain with different number of states, and Weibull distribution. All this model use Markov chains to generate synthetic wind speed time series but the search for a better model is still open. Approaching this issue, we applied new models which are generalization of Markov models. More precisely we applied semi-Markov models to generate synthetic wind speed time series. In a previous work we proposed different semi-Markov models, showing their ability to reproduce the autocorrelation structures of wind speed data. In that paper we showed also that the autocorrelation is higher with respect to the Markov model. Unfortunately this autocorrelation was still too small compared to the empirical one. In order to overcome the problem of low autocorrelation, in this paper we propose an indexed semi-Markov model. More precisely we assume that wind speed is described by a discrete time homogeneous semi-Markov process. We introduce a memory index which takes into account the periods of different wind activities. With this model the statistical characteristics of wind speed are faithfully reproduced. The wind is a very unstable phenomenon characterized by a sequence of lulls and sustained speeds, and a good wind generator must be able to reproduce such sequences. To check the validity of the predictive semi-Markovian model, the persistence of synthetic winds were calculated, then averaged and computed. The model is used to generate synthetic time series for wind speed by means of Monte Carlo simulations and the time lagged autocorrelation is used to compare statistical properties of the proposed models with those of real data and also with a time series generated though a simple Markov chain. [1] A. Shamshad, M.A. Bawadi, W.M.W. Wan Hussin, T.A. Majid, S.A.M. Sanusi, First and second order Markov chain models for synthetic generation of wind speed time series, Energy 30 (2005) 693-708. [2] H. Nfaoui, H. Essiarab, A.A.M. Sayigh, A stochastic Markov chain model for simulating wind speed time series at Tangiers, Morocco, Renewable Energy 29 (2004) 1407-1418. [3] F. Youcef Ettoumi, H. Sauvageot, A.-E.-H. Adane, Statistical bivariate modeling of wind using first-order Markov chain and Weibull distribution, Renewable Energy 28 (2003) 1787-1802.
Markov chain Monte Carlo techniques and spatial-temporal modelling for medical EIT.
West, Robert M; Aykroyd, Robert G; Meng, Sha; Williams, Richard A
2004-02-01
Many imaging problems such as imaging with electrical impedance tomography (EIT) can be shown to be inverse problems: that is either there is no unique solution or the solution does not depend continuously on the data. As a consequence solution of inverse problems based on measured data alone is unstable, particularly if the mapping between the solution distribution and the measurements is also nonlinear as in EIT. To deliver a practical stable solution, it is necessary to make considerable use of prior information or regularization techniques. The role of a Bayesian approach is therefore of fundamental importance, especially when coupled with Markov chain Monte Carlo (MCMC) sampling to provide information about solution behaviour. Spatial smoothing is a commonly used approach to regularization. In the human thorax EIT example considered here nonlinearity increases the difficulty of imaging, using only boundary data, leading to reconstructions which are often rather too smooth. In particular, in medical imaging the resistivity distribution usually contains substantial jumps at the boundaries of different anatomical regions. With spatial smoothing these boundaries can be masked by blurring. This paper focuses on the medical application of EIT to monitor lung and cardiac function and uses explicit geometric information regarding anatomical structure and incorporates temporal correlation. Some simple properties are assumed known, or at least reliably estimated from separate studies, whereas others are estimated from the voltage measurements. This structural formulation will also allow direct estimation of clinically important quantities, such as ejection fraction and residual capacity, along with assessment of precision.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Jinsong; Kemna, Andreas; Hubbard, Susan S.
2008-05-15
We develop a Bayesian model to invert spectral induced polarization (SIP) data for Cole-Cole parameters using Markov chain Monte Carlo (MCMC) sampling methods. We compare the performance of the MCMC based stochastic method with an iterative Gauss-Newton based deterministic method for Cole-Cole parameter estimation through inversion of synthetic and laboratory SIP data. The Gauss-Newton based method can provide an optimal solution for given objective functions under constraints, but the obtained optimal solution generally depends on the choice of initial values and the estimated uncertainty information is often inaccurate or insufficient. In contrast, the MCMC based inversion method provides extensive globalmore » information on unknown parameters, such as the marginal probability distribution functions, from which we can obtain better estimates and tighter uncertainty bounds of the parameters than with the deterministic method. Additionally, the results obtained with the MCMC method are independent of the choice of initial values. Because the MCMC based method does not explicitly offer single optimal solution for given objective functions, the deterministic and stochastic methods can complement each other. For example, the stochastic method can first be used to obtain the means of the unknown parameters by starting from an arbitrary set of initial values and the deterministic method can then be initiated using the means as starting values to obtain the optimal estimates of the Cole-Cole parameters.« less
The behavior of Metropolis-coupled Markov chains when sampling rugged phylogenetic distributions.
Brown, Jeremy M; Thomson, Robert C
2018-02-15
Bayesian phylogenetic inference involves sampling from posterior distributions of trees, which sometimes exhibit local optima, or peaks, separated by regions of low posterior density. Markov chain Monte Carlo (MCMC) algorithms are the most widely used numerical method for generating samples from these posterior distributions, but they are susceptible to entrapment on individual optima in rugged distributions when they are unable to easily cross through or jump across regions of low posterior density. Ruggedness of posterior distributions can result from a variety of factors, including unmodeled variation in evolutionary processes and unrecognized variation in the true topology across sites or genes. Ruggedness can also become exaggerated when constraints are placed on topologies that require the presence or absence of particular bipartitions (often referred to as positive or negative constraints, respectively). These types of constraints are frequently employed when conducting tests of topological hypotheses (Bergsten et al. 2013; Brown and Thomson 2017). Negative constraints can lead to particularly rugged distributions when the data strongly support a forbidden clade, because monophyly of the clade can be disrupted by inserting outgroup taxa in many different ways. However, topological moves between the alternative disruptions are very difficult, because they require swaps between the inserted outgroup taxa while the data constrain taxa from the forbidden clade to remain close together on the tree. While this precise form of ruggedness is particular to negative constraints, trees with high posterior density can be separated by similarly complicated topological rearrangements, even in the absence of constraints.
Influence of credit scoring on the dynamics of Markov chain
NASA Astrophysics Data System (ADS)
Galina, Timofeeva
2015-11-01
Markov processes are widely used to model the dynamics of a credit portfolio and forecast the portfolio risk and profitability. In the Markov chain model the loan portfolio is divided into several groups with different quality, which determined by presence of indebtedness and its terms. It is proposed that dynamics of portfolio shares is described by a multistage controlled system. The article outlines mathematical formalization of controls which reflect the actions of the bank's management in order to improve the loan portfolio quality. The most important control is the organization of approval procedure of loan applications. The credit scoring is studied as a control affecting to the dynamic system. Different formalizations of "good" and "bad" consumers are proposed in connection with the Markov chain model.
Marathon: An Open Source Software Library for the Analysis of Markov-Chain Monte Carlo Algorithms
Rechner, Steffen; Berger, Annabell
2016-01-01
We present the software library marathon, which is designed to support the analysis of sampling algorithms that are based on the Markov-Chain Monte Carlo principle. The main application of this library is the computation of properties of so-called state graphs, which represent the structure of Markov chains. We demonstrate applications and the usefulness of marathon by investigating the quality of several bounding methods on four well-known Markov chains for sampling perfect matchings and bipartite graphs. In a set of experiments, we compute the total mixing time and several of its bounds for a large number of input instances. We find that the upper bound gained by the famous canonical path method is often several magnitudes larger than the total mixing time and deteriorates with growing input size. In contrast, the spectral bound is found to be a precise approximation of the total mixing time. PMID:26824442
Peng, Zhihang; Bao, Changjun; Zhao, Yang; Yi, Honggang; Xia, Letian; Yu, Hao; Shen, Hongbing; Chen, Feng
2010-01-01
This paper first applies the sequential cluster method to set up the classification standard of infectious disease incidence state based on the fact that there are many uncertainty characteristics in the incidence course. Then the paper presents a weighted Markov chain, a method which is used to predict the future incidence state. This method assumes the standardized self-coefficients as weights based on the special characteristics of infectious disease incidence being a dependent stochastic variable. It also analyzes the characteristics of infectious diseases incidence via the Markov chain Monte Carlo method to make the long-term benefit of decision optimal. Our method is successfully validated using existing incidents data of infectious diseases in Jiangsu Province. In summation, this paper proposes ways to improve the accuracy of the weighted Markov chain, specifically in the field of infection epidemiology. PMID:23554632
Peng, Zhihang; Bao, Changjun; Zhao, Yang; Yi, Honggang; Xia, Letian; Yu, Hao; Shen, Hongbing; Chen, Feng
2010-05-01
This paper first applies the sequential cluster method to set up the classification standard of infectious disease incidence state based on the fact that there are many uncertainty characteristics in the incidence course. Then the paper presents a weighted Markov chain, a method which is used to predict the future incidence state. This method assumes the standardized self-coefficients as weights based on the special characteristics of infectious disease incidence being a dependent stochastic variable. It also analyzes the characteristics of infectious diseases incidence via the Markov chain Monte Carlo method to make the long-term benefit of decision optimal. Our method is successfully validated using existing incidents data of infectious diseases in Jiangsu Province. In summation, this paper proposes ways to improve the accuracy of the weighted Markov chain, specifically in the field of infection epidemiology.
Canonical Structure and Orthogonality of Forces and Currents in Irreversible Markov Chains
NASA Astrophysics Data System (ADS)
Kaiser, Marcus; Jack, Robert L.; Zimmer, Johannes
2018-03-01
We discuss a canonical structure that provides a unifying description of dynamical large deviations for irreversible finite state Markov chains (continuous time), Onsager theory, and Macroscopic Fluctuation Theory (MFT). For Markov chains, this theory involves a non-linear relation between probability currents and their conjugate forces. Within this framework, we show how the forces can be split into two components, which are orthogonal to each other, in a generalised sense. This splitting allows a decomposition of the pathwise rate function into three terms, which have physical interpretations in terms of dissipation and convergence to equilibrium. Similar decompositions hold for rate functions at level 2 and level 2.5. These results clarify how bounds on entropy production and fluctuation theorems emerge from the underlying dynamical rules. We discuss how these results for Markov chains are related to similar structures within MFT, which describes hydrodynamic limits of such microscopic models.
An example of complex modelling in dentistry using Markov chain Monte Carlo (MCMC) simulation.
Helfenstein, Ulrich; Menghini, Giorgio; Steiner, Marcel; Murati, Francesca
2002-09-01
In the usual regression setting one regression line is computed for a whole data set. In a more complex situation, each person may be observed for example at several points in time and thus a regression line might be calculated for each person. Additional complexities, such as various forms of errors in covariables may make a straightforward statistical evaluation difficult or even impossible. During recent years methods have been developed allowing convenient analysis of problems where the data and the corresponding models show these and many other forms of complexity. The methodology makes use of a Bayesian approach and Markov chain Monte Carlo (MCMC) simulations. The methods allow the construction of increasingly elaborate models by building them up from local sub-models. The essential structure of the models can be represented visually by directed acyclic graphs (DAG). This attractive property allows communication and discussion of the essential structure and the substantial meaning of a complex model without needing algebra. After presentation of the statistical methods an example from dentistry is presented in order to demonstrate their application and use. The dataset of the example had a complex structure; each of a set of children was followed up over several years. The number of new fillings in permanent teeth had been recorded at several ages. The dependent variables were markedly different from the normal distribution and could not be transformed to normality. In addition, explanatory variables were assumed to be measured with different forms of error. Illustration of how the corresponding models can be estimated conveniently via MCMC simulation, in particular, 'Gibbs sampling', using the freely available software BUGS is presented. In addition, how the measurement error may influence the estimates of the corresponding coefficients is explored. It is demonstrated that the effect of the independent variable on the dependent variable may be markedly underestimated if the measurement error is not taken into account ('regression dilution bias'). Markov chain Monte Carlo methods may be of great value to dentists in allowing analysis of data sets which exhibit a wide range of different forms of complexity.
Wan, Wai-Yin; Chan, Jennifer S K
2009-08-01
For time series of count data, correlated measurements, clustering as well as excessive zeros occur simultaneously in biomedical applications. Ignoring such effects might contribute to misleading treatment outcomes. A generalized mixture Poisson geometric process (GMPGP) model and a zero-altered mixture Poisson geometric process (ZMPGP) model are developed from the geometric process model, which was originally developed for modelling positive continuous data and was extended to handle count data. These models are motivated by evaluating the trend development of new tumour counts for bladder cancer patients as well as by identifying useful covariates which affect the count level. The models are implemented using Bayesian method with Markov chain Monte Carlo (MCMC) algorithms and are assessed using deviance information criterion (DIC).
RadVel: The Radial Velocity Modeling Toolkit
NASA Astrophysics Data System (ADS)
Fulton, Benjamin J.; Petigura, Erik A.; Blunt, Sarah; Sinukoff, Evan
2018-04-01
RadVel is an open-source Python package for modeling Keplerian orbits in radial velocity (RV) timeseries. RadVel provides a convenient framework to fit RVs using maximum a posteriori optimization and to compute robust confidence intervals by sampling the posterior probability density via Markov Chain Monte Carlo (MCMC). RadVel allows users to float or fix parameters, impose priors, and perform Bayesian model comparison. We have implemented real-time MCMC convergence tests to ensure adequate sampling of the posterior. RadVel can output a number of publication-quality plots and tables. Users may interface with RadVel through a convenient command-line interface or directly from Python. The code is object-oriented and thus naturally extensible. We encourage contributions from the community. Documentation is available at http://radvel.readthedocs.io.
Continuous-time discrete-space models for animal movement
Hanks, Ephraim M.; Hooten, Mevin B.; Alldredge, Mat W.
2015-01-01
The processes influencing animal movement and resource selection are complex and varied. Past efforts to model behavioral changes over time used Bayesian statistical models with variable parameter space, such as reversible-jump Markov chain Monte Carlo approaches, which are computationally demanding and inaccessible to many practitioners. We present a continuous-time discrete-space (CTDS) model of animal movement that can be fit using standard generalized linear modeling (GLM) methods. This CTDS approach allows for the joint modeling of location-based as well as directional drivers of movement. Changing behavior over time is modeled using a varying-coefficient framework which maintains the computational simplicity of a GLM approach, and variable selection is accomplished using a group lasso penalty. We apply our approach to a study of two mountain lions (Puma concolor) in Colorado, USA.
Building an ACT-R Reader for Eye-Tracking Corpus Data.
Dotlačil, Jakub
2018-01-01
Cognitive architectures have often been applied to data from individual experiments. In this paper, I develop an ACT-R reader that can model a much larger set of data, eye-tracking corpus data. It is shown that the resulting model has a good fit to the data for the considered low-level processes. Unlike previous related works (most prominently, Engelmann, Vasishth, Engbert & Kliegl, ), the model achieves the fit by estimating free parameters of ACT-R using Bayesian estimation and Markov-Chain Monte Carlo (MCMC) techniques, rather than by relying on the mix of manual selection + default values. The method used in the paper is generalizable beyond this particular model and data set and could be used on other ACT-R models. Copyright © 2017 Cognitive Science Society, Inc.
Analysis of Spin Financial Market by GARCH Model
NASA Astrophysics Data System (ADS)
Takaishi, Tetsuya
2013-08-01
A spin model is used for simulations of financial markets. To determine return volatility in the spin financial market we use the GARCH model often used for volatility estimation in empirical finance. We apply the Bayesian inference performed by the Markov Chain Monte Carlo method to the parameter estimation of the GARCH model. It is found that volatility determined by the GARCH model exhibits "volatility clustering" also observed in the real financial markets. Using volatility determined by the GARCH model we examine the mixture-of-distribution hypothesis (MDH) suggested for the asset return dynamics. We find that the returns standardized by volatility are approximately standard normal random variables. Moreover we find that the absolute standardized returns show no significant autocorrelation. These findings are consistent with the view of the MDH for the return dynamics.
Sebastian, Tunny; Jeyaseelan, Visalakshi; Jeyaseelan, Lakshmanan; Anandan, Shalini; George, Sebastian; Bangdiwala, Shrikant I
2018-01-01
Hidden Markov models are stochastic models in which the observations are assumed to follow a mixture distribution, but the parameters of the components are governed by a Markov chain which is unobservable. The issues related to the estimation of Poisson-hidden Markov models in which the observations are coming from mixture of Poisson distributions and the parameters of the component Poisson distributions are governed by an m-state Markov chain with an unknown transition probability matrix are explained here. These methods were applied to the data on Vibrio cholerae counts reported every month for 11-year span at Christian Medical College, Vellore, India. Using Viterbi algorithm, the best estimate of the state sequence was obtained and hence the transition probability matrix. The mean passage time between the states were estimated. The 95% confidence interval for the mean passage time was estimated via Monte Carlo simulation. The three hidden states of the estimated Markov chain are labelled as 'Low', 'Moderate' and 'High' with the mean counts of 1.4, 6.6 and 20.2 and the estimated average duration of stay of 3, 3 and 4 months, respectively. Environmental risk factors were studied using Markov ordinal logistic regression analysis. No significant association was found between disease severity levels and climate components.
A New Approach to Predict user Mobility Using Semantic Analysis and Machine Learning.
Fernandes, Roshan; D'Souza G L, Rio
2017-10-19
Mobility prediction is a technique in which the future location of a user is identified in a given network. Mobility prediction provides solutions to many day-to-day life problems. It helps in seamless handovers in wireless networks to provide better location based services and to recalculate paths in Mobile Ad hoc Networks (MANET). In the present study, a framework is presented which predicts user mobility in presence and absence of mobility history. Naïve Bayesian classification algorithm and Markov Model are used to predict user future location when user mobility history is available. An attempt is made to predict user future location by using Short Message Service (SMS) and instantaneous Geological coordinates in the absence of mobility patterns. The proposed technique compares the performance metrics with commonly used Markov Chain model. From the experimental results it is evident that the techniques used in this work gives better results when considering both spatial and temporal information. The proposed method predicts user's future location in the absence of mobility history quite fairly. The proposed work is applied to predict the mobility of medical rescue vehicles and social security systems.
Vehtari, Aki; Mäkinen, Ville-Petteri; Soininen, Pasi; Ingman, Petri; Mäkelä, Sanna M; Savolainen, Markku J; Hannuksela, Minna L; Kaski, Kimmo; Ala-Korpela, Mika
2007-01-01
Background A key challenge in metabonomics is to uncover quantitative associations between multidimensional spectroscopic data and biochemical measures used for disease risk assessment and diagnostics. Here we focus on clinically relevant estimation of lipoprotein lipids by 1H NMR spectroscopy of serum. Results A Bayesian methodology, with a biochemical motivation, is presented for a real 1H NMR metabonomics data set of 75 serum samples. Lipoprotein lipid concentrations were independently obtained for these samples via ultracentrifugation and specific biochemical assays. The Bayesian models were constructed by Markov chain Monte Carlo (MCMC) and they showed remarkably good quantitative performance, the predictive R-values being 0.985 for the very low density lipoprotein triglycerides (VLDL-TG), 0.787 for the intermediate, 0.943 for the low, and 0.933 for the high density lipoprotein cholesterol (IDL-C, LDL-C and HDL-C, respectively). The modelling produced a kernel-based reformulation of the data, the parameters of which coincided with the well-known biochemical characteristics of the 1H NMR spectra; particularly for VLDL-TG and HDL-C the Bayesian methodology was able to clearly identify the most characteristic resonances within the heavily overlapping information in the spectra. For IDL-C and LDL-C the resulting model kernels were more complex than those for VLDL-TG and HDL-C, probably reflecting the severe overlap of the IDL and LDL resonances in the 1H NMR spectra. Conclusion The systematic use of Bayesian MCMC analysis is computationally demanding. Nevertheless, the combination of high-quality quantification and the biochemical rationale of the resulting models is expected to be useful in the field of metabonomics. PMID:17493257
Hierarchical Bayesian spatial models for multispecies conservation planning and monitoring.
Carroll, Carlos; Johnson, Devin S; Dunk, Jeffrey R; Zielinski, William J
2010-12-01
Biologists who develop and apply habitat models are often familiar with the statistical challenges posed by their data's spatial structure but are unsure of whether the use of complex spatial models will increase the utility of model results in planning. We compared the relative performance of nonspatial and hierarchical Bayesian spatial models for three vertebrate and invertebrate taxa of conservation concern (Church's sideband snails [Monadenia churchi], red tree voles [Arborimus longicaudus], and Pacific fishers [Martes pennanti pacifica]) that provide examples of a range of distributional extents and dispersal abilities. We used presence-absence data derived from regional monitoring programs to develop models with both landscape and site-level environmental covariates. We used Markov chain Monte Carlo algorithms and a conditional autoregressive or intrinsic conditional autoregressive model framework to fit spatial models. The fit of Bayesian spatial models was between 35 and 55% better than the fit of nonspatial analogue models. Bayesian spatial models outperformed analogous models developed with maximum entropy (Maxent) methods. Although the best spatial and nonspatial models included similar environmental variables, spatial models provided estimates of residual spatial effects that suggested how ecological processes might structure distribution patterns. Spatial models built from presence-absence data improved fit most for localized endemic species with ranges constrained by poorly known biogeographic factors and for widely distributed species suspected to be strongly affected by unmeasured environmental variables or population processes. By treating spatial effects as a variable of interest rather than a nuisance, hierarchical Bayesian spatial models, especially when they are based on a common broad-scale spatial lattice (here the national Forest Inventory and Analysis grid of 24 km(2) hexagons), can increase the relevance of habitat models to multispecies conservation planning. Journal compilation © 2010 Society for Conservation Biology. No claim to original US government works.
Borchani, Hanen; Bielza, Concha; Martı Nez-Martı N, Pablo; Larrañaga, Pedro
2012-12-01
Multi-dimensional Bayesian network classifiers (MBCs) are probabilistic graphical models recently proposed to deal with multi-dimensional classification problems, where each instance in the data set has to be assigned to more than one class variable. In this paper, we propose a Markov blanket-based approach for learning MBCs from data. Basically, it consists of determining the Markov blanket around each class variable using the HITON algorithm, then specifying the directionality over the MBC subgraphs. Our approach is applied to the prediction problem of the European Quality of Life-5 Dimensions (EQ-5D) from the 39-item Parkinson's Disease Questionnaire (PDQ-39) in order to estimate the health-related quality of life of Parkinson's patients. Fivefold cross-validation experiments were carried out on randomly generated synthetic data sets, Yeast data set, as well as on a real-world Parkinson's disease data set containing 488 patients. The experimental study, including comparison with additional Bayesian network-based approaches, back propagation for multi-label learning, multi-label k-nearest neighbor, multinomial logistic regression, ordinary least squares, and censored least absolute deviations, shows encouraging results in terms of predictive accuracy as well as the identification of dependence relationships among class and feature variables. Copyright © 2012 Elsevier Inc. All rights reserved.
Markov Chain Estimation of Avian Seasonal Fecundity
To explore the consequences of modeling decisions on inference about avian seasonal fecundity we generalize previous Markov chain (MC) models of avian nest success to formulate two different MC models of avian seasonal fecundity that represent two different ways to model renestin...
Markov Chain Model with Catastrophe to Determine Mean Time to Default of Credit Risky Assets
NASA Astrophysics Data System (ADS)
Dharmaraja, Selvamuthu; Pasricha, Puneet; Tardelli, Paola
2017-11-01
This article deals with the problem of probabilistic prediction of the time distance to default for a firm. To model the credit risk, the dynamics of an asset is described as a function of a homogeneous discrete time Markov chain subject to a catastrophe, the default. The behaviour of the Markov chain is investigated and the mean time to the default is expressed in a closed form. The methodology to estimate the parameters is given. Numerical results are provided to illustrate the applicability of the proposed model on real data and their analysis is discussed.
NASA Astrophysics Data System (ADS)
Li, Xuesong; Northrop, William F.
2016-04-01
This paper describes a quantitative approach to approximate multiple scattering through an isotropic turbid slab based on Markov Chain theorem. There is an increasing need to utilize multiple scattering for optical diagnostic purposes; however, existing methods are either inaccurate or computationally expensive. Here, we develop a novel Markov Chain approximation approach to solve multiple scattering angular distribution (AD) that can accurately calculate AD while significantly reducing computational cost compared to Monte Carlo simulation. We expect this work to stimulate ongoing multiple scattering research and deterministic reconstruction algorithm development with AD measurements.
Exact goodness-of-fit tests for Markov chains.
Besag, J; Mondal, D
2013-06-01
Goodness-of-fit tests are useful in assessing whether a statistical model is consistent with available data. However, the usual χ² asymptotics often fail, either because of the paucity of the data or because a nonstandard test statistic is of interest. In this article, we describe exact goodness-of-fit tests for first- and higher order Markov chains, with particular attention given to time-reversible ones. The tests are obtained by conditioning on the sufficient statistics for the transition probabilities and are implemented by simple Monte Carlo sampling or by Markov chain Monte Carlo. They apply both to single and to multiple sequences and allow a free choice of test statistic. Three examples are given. The first concerns multiple sequences of dry and wet January days for the years 1948-1983 at Snoqualmie Falls, Washington State, and suggests that standard analysis may be misleading. The second one is for a four-state DNA sequence and lends support to the original conclusion that a second-order Markov chain provides an adequate fit to the data. The last one is six-state atomistic data arising in molecular conformational dynamics simulation of solvated alanine dipeptide and points to strong evidence against a first-order reversible Markov chain at 6 picosecond time steps. © 2013, The International Biometric Society.
First and second order semi-Markov chains for wind speed modeling
NASA Astrophysics Data System (ADS)
Prattico, F.; Petroni, F.; D'Amico, G.
2012-04-01
The increasing interest in renewable energy leads scientific research to find a better way to recover most of the available energy. Particularly, the maximum energy recoverable from wind is equal to 59.3% of that available (Betz law) at a specific pitch angle and when the ratio between the wind speed in output and in input is equal to 1/3. The pitch angle is the angle formed between the airfoil of the blade of the wind turbine and the wind direction. Old turbine and a lot of that actually marketed, in fact, have always the same invariant geometry of the airfoil. This causes that wind turbines will work with an efficiency that is lower than 59.3%. New generation wind turbines, instead, have a system to variate the pitch angle by rotating the blades. This system able the wind turbines to recover, at different wind speed, always the maximum energy, working in Betz limit at different speed ratios. A powerful system control of the pitch angle allows the wind turbine to recover better the energy in transient regime. A good stochastic model for wind speed is then needed to help both the optimization of turbine design and to assist the system control to predict the value of the wind speed to positioning the blades quickly and correctly. The possibility to have synthetic data of wind speed is a powerful instrument to assist designer to verify the structures of the wind turbines or to estimate the energy recoverable from a specific site. To generate synthetic data, Markov chains of first or higher order are often used [1,2,3]. In particular in [3] is presented a comparison between a first-order Markov chain and a second-order Markov chain. A similar work, but only for the first-order Markov chain, is conduced by [2], presenting the probability transition matrix and comparing the energy spectral density and autocorrelation of real and synthetic wind speed data. A tentative to modeling and to join speed and direction of wind is presented in [1], by using two models, first-order Markov chain with different number of states, and Weibull distribution. All this model use Markov chains to generate synthetic wind speed time series but the search for a better model is still open. Approaching this issue, we applied new models which are generalization of Markov models. More precisely we applied semi-Markov models to generate synthetic wind speed time series. Semi-Markov processes (SMP) are a wide class of stochastic processes which generalize at the same time both Markov chains and renewal processes. Their main advantage is that of using whatever type of waiting time distribution for modeling the time to have a transition from one state to another one. This major flexibility has a price to pay: availability of data to estimate the parameters of the model which are more numerous. Data availability is not an issue in wind speed studies, therefore, semi-Markov models can be used in a statistical efficient way. In this work we present three different semi-Markov chain models: the first one is a first-order SMP where the transition probabilities from two speed states (at time Tn and Tn-1) depend on the initial state (the state at Tn-1), final state (the state at Tn) and on the waiting time (given by t=Tn-Tn-1), the second model is a second order SMP where we consider the transition probabilities as depending also on the state the wind speed was before the initial state (which is the state at Tn-2) and the last one is still a second order SMP where the transition probabilities depends on the three states at Tn-2,Tn-1 and Tn and on the waiting times t_1=Tn-1-Tn-2 and t_2=Tn-Tn-1. The three models are used to generate synthetic time series for wind speed by means of Monte Carlo simulations and the time lagged autocorrelation is used to compare statistical properties of the proposed models with those of real data and also with a time series generated though a simple Markov chain. [1] F. Youcef Ettoumi, H. Sauvageot, A.-E.-H. Adane, Statistical bivariate modeling of wind using first-order Markov chain and Weibull distribution, Renewable Energy, 28/2003 1787-1802. [2] A. Shamshad, M.A. Bawadi, W.M.W. Wan Hussin, T.A. Majid, S.A.M. Sanusi, First and second order Markov chain models for synthetic generation of wind speed time series, Energy 30/2005 693-708. [3] H. Nfaoui, H. Essiarab, A.A.M. Sayigh, A stochastic Markov chain model for simulating wind speed time series at Tangiers, Morocco, Renewable Energy 29/2004, 1407-1418.
CTPPL: A Continuous Time Probabilistic Programming Language
2009-07-01
recent years there has been a flurry of interest in continuous time models, mostly focused on continuous time Bayesian networks ( CTBNs ) [Nodelman, 2007... CTBNs are built on homogenous Markov processes. A homogenous Markov pro- cess is a finite state, continuous time process, consisting of an initial...q1 : xn()] ... Some state transitions can produce emissions. In a CTBN , each variable has a conditional inten- sity matrix Qu for every combination of
Bayesian probabilistic population projections for all countries.
Raftery, Adrian E; Li, Nan; Ševčíková, Hana; Gerland, Patrick; Heilig, Gerhard K
2012-08-28
Projections of countries' future populations, broken down by age and sex, are widely used for planning and research. They are mostly done deterministically, but there is a widespread need for probabilistic projections. We propose a bayesian method for probabilistic population projections for all countries. The total fertility rate and female and male life expectancies at birth are projected probabilistically using bayesian hierarchical models estimated via Markov chain Monte Carlo using United Nations population data for all countries. These are then converted to age-specific rates and combined with a cohort component projection model. This yields probabilistic projections of any population quantity of interest. The method is illustrated for five countries of different demographic stages, continents and sizes. The method is validated by an out of sample experiment in which data from 1950-1990 are used for estimation, and applied to predict 1990-2010. The method appears reasonably accurate and well calibrated for this period. The results suggest that the current United Nations high and low variants greatly underestimate uncertainty about the number of oldest old from about 2050 and that they underestimate uncertainty for high fertility countries and overstate uncertainty for countries that have completed the demographic transition and whose fertility has started to recover towards replacement level, mostly in Europe. The results also indicate that the potential support ratio (persons aged 20-64 per person aged 65+) will almost certainly decline dramatically in most countries over the coming decades.
Arcuti, Simona; Pollice, Alessio; Ribecco, Nunziata; D'Onghia, Gianfranco
2016-03-01
We evaluate the spatiotemporal changes in the density of a particular species of crustacean known as deep-water rose shrimp, Parapenaeus longirostris, based on biological sample data collected during trawl surveys carried out from 1995 to 2006 as part of the international project MEDITS (MEDiterranean International Trawl Surveys). As is the case for many biological variables, density data are continuous and characterized by unusually large amounts of zeros, accompanied by a skewed distribution of the remaining values. Here we analyze the normalized density data by a Bayesian delta-normal semiparametric additive model including the effects of covariates, using penalized regression with low-rank thin-plate splines for nonlinear spatial and temporal effects. Modeling the zero and nonzero values by two joint processes, as we propose in this work, allows to obtain great flexibility and easily handling of complex likelihood functions, avoiding inaccurate statistical inferences due to misclassification of the high proportion of exact zeros in the model. Bayesian model estimation is obtained by Markov chain Monte Carlo simulations, suitably specifying the complex likelihood function of the zero-inflated density data. The study highlights relevant nonlinear spatial and temporal effects and the influence of the annual Mediterranean oscillations index and of the sea surface temperature on the distribution of the deep-water rose shrimp density. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Bayesian inference for the spatio-temporal invasion of alien species.
Cook, Alex; Marion, Glenn; Butler, Adam; Gibson, Gavin
2007-08-01
In this paper we develop a Bayesian approach to parameter estimation in a stochastic spatio-temporal model of the spread of invasive species across a landscape. To date, statistical techniques, such as logistic and autologistic regression, have outstripped stochastic spatio-temporal models in their ability to handle large numbers of covariates. Here we seek to address this problem by making use of a range of covariates describing the bio-geographical features of the landscape. Relative to regression techniques, stochastic spatio-temporal models are more transparent in their representation of biological processes. They also explicitly model temporal change, and therefore do not require the assumption that the species' distribution (or other spatial pattern) has already reached equilibrium as is often the case with standard statistical approaches. In order to illustrate the use of such techniques we apply them to the analysis of data detailing the spread of an invasive plant, Heracleum mantegazzianum, across Britain in the 20th Century using geo-referenced covariate information describing local temperature, elevation and habitat type. The use of Markov chain Monte Carlo sampling within a Bayesian framework facilitates statistical assessments of differences in the suitability of different habitat classes for H. mantegazzianum, and enables predictions of future spread to account for parametric uncertainty and system variability. Our results show that ignoring such covariate information may lead to biased estimates of key processes and implausible predictions of future distributions.
Allen, Bruce C; Hack, C Eric; Clewell, Harvey J
2007-08-01
A Bayesian approach, implemented using Markov Chain Monte Carlo (MCMC) analysis, was applied with a physiologically-based pharmacokinetic (PBPK) model of methylmercury (MeHg) to evaluate the variability of MeHg exposure in women of childbearing age in the U.S. population. The analysis made use of the newly available National Health and Nutrition Survey (NHANES) blood and hair mercury concentration data for women of age 16-49 years (sample size, 1,582). Bayesian analysis was performed to estimate the population variability in MeHg exposure (daily ingestion rate) implied by the variation in blood and hair concentrations of mercury in the NHANES database. The measured variability in the NHANES blood and hair data represents the result of a process that includes interindividual variation in exposure to MeHg and interindividual variation in the pharmacokinetics (distribution, clearance) of MeHg. The PBPK model includes a number of pharmacokinetic parameters (e.g., tissue volumes, partition coefficients, rate constants for metabolism and elimination) that can vary from individual to individual within the subpopulation of interest. Using MCMC analysis, it was possible to combine prior distributions of the PBPK model parameters with the NHANES blood and hair data, as well as with kinetic data from controlled human exposures to MeHg, to derive posterior distributions that refine the estimates of both the population exposure distribution and the pharmacokinetic parameters. In general, based on the populations surveyed by NHANES, the results of the MCMC analysis indicate that a small fraction, less than 1%, of the U.S. population of women of childbearing age may have mercury exposures greater than the EPA RfD for MeHg of 0.1 microg/kg/day, and that there are few, if any, exposures greater than the ATSDR MRL of 0.3 microg/kg/day. The analysis also indicates that typical exposures may be greater than previously estimated from food consumption surveys, but that the variability in exposure within the population of U.S. women of childbearing age may be less than previously assumed.
NASA Technical Reports Server (NTRS)
Mengshoel, Ole J.; Roth, Dan; Wilkins, David C.
2001-01-01
Portfolio methods support the combination of different algorithms and heuristics, including stochastic local search (SLS) heuristics, and have been identified as a promising approach to solve computationally hard problems. While successful in experiments, theoretical foundations and analytical results for portfolio-based SLS heuristics are less developed. This article aims to improve the understanding of the role of portfolios of heuristics in SLS. We emphasize the problem of computing most probable explanations (MPEs) in Bayesian networks (BNs). Algorithmically, we discuss a portfolio-based SLS algorithm for MPE computation, Stochastic Greedy Search (SGS). SGS supports the integration of different initialization operators (or initialization heuristics) and different search operators (greedy and noisy heuristics), thereby enabling new analytical and experimental results. Analytically, we introduce a novel Markov chain model tailored to portfolio-based SLS algorithms including SGS, thereby enabling us to analytically form expected hitting time results that explain empirical run time results. For a specific BN, we show the benefit of using a homogenous initialization portfolio. To further illustrate the portfolio approach, we consider novel additive search heuristics for handling determinism in the form of zero entries in conditional probability tables in BNs. Our additive approach adds rather than multiplies probabilities when computing the utility of an explanation. We motivate the additive measure by studying the dramatic impact of zero entries in conditional probability tables on the number of zero-probability explanations, which again complicates the search process. We consider the relationship between MAXSAT and MPE, and show that additive utility (or gain) is a generalization, to the probabilistic setting, of MAXSAT utility (or gain) used in the celebrated GSAT and WalkSAT algorithms and their descendants. Utilizing our Markov chain framework, we show that expected hitting time is a rational function - i.e. a ratio of two polynomials - of the probability of applying an additive search operator. Experimentally, we report on synthetically generated BNs as well as BNs from applications, and compare SGSs performance to that of Hugin, which performs BN inference by compilation to and propagation in clique trees. On synthetic networks, SGS speeds up computation by approximately two orders of magnitude compared to Hugin. In application networks, our approach is highly competitive in Bayesian networks with a high degree of determinism. In addition to showing that stochastic local search can be competitive with clique tree clustering, our empirical results provide an improved understanding of the circumstances under which portfolio-based SLS outperforms clique tree clustering and vice versa.
Astrometry and exoplanets in the Gaia era: a Bayesian approach to detection and parameter recovery
NASA Astrophysics Data System (ADS)
Ranalli, P.; Hobbs, D.; Lindegren, L.
2018-06-01
The Gaia mission is expected to make a significant contribution to the knowledge of exoplanet systems, both in terms of their number and of their physical properties. We develop Bayesian methods and detection criteria for orbital fitting, and revise the detectability of exoplanets in light of the in-flight properties of Gaia. Limiting ourselves to one-planet systems as a first step of the development, we simulate Gaia data for exoplanet systems over a grid of S/N, orbital period, and eccentricity. The simulations are then fit using Markov chain Monte Carlo methods. We investigate the detection rate according to three information criteria and the Δχ2. For the Δχ2, the effective number of degrees of freedom depends on the mission length. We find that the choice of the Markov chain starting point can affect the quality of the results; we therefore consider two limit possibilities: an ideal case, and a very simple method that finds the starting point assuming circular orbits. We use 6644 and 4402 simulations to assess the fraction of false positive detections in a 5 yr and in a 10 yr mission, respectively; and 4968 and 4706 simulations to assess the detection rate and how the parameters are recovered. Using Jeffreys' scale of evidence, the fraction of false positives passing a strong evidence criterion is ≲0.2% (0.6%) when considering a 5 yr (10 yr) mission and using the Akaike information criterion or the Watanabe-Akaike information criterion, and <0.02% (<0.06%) when using the Bayesian information criterion. We find that there is a 50% chance of detecting a planet with a minimum S/N = 2.3 (1.7). This sets the maximum distance to which a planet is detectable to 70 pc and 3.5 pc for a Jupiter-mass and Neptune-mass planets, respectively, assuming a 10 yr mission, a 4 au semi-major axis, and a 1 M⊙ star. We show the distribution of the accuracy and precision with which orbital parameters are recovered. The period is the orbital parameter that can be determined with the best accuracy, with a median relative difference between input and output periods of 4.2% (2.9%) assuming a 5 yr (10 yr) mission. The median accuracy of the semi-major axis of the orbit can be recovered with a median relative error of 7% (6%). The eccentricity can also be recovered with a median absolute accuracy of 0.07 (0.06).
NASA Astrophysics Data System (ADS)
Smith, T.; Marshall, L.
2007-12-01
In many mountainous regions, the single most important parameter in forecasting the controls on regional water resources is snowpack (Williams et al., 1999). In an effort to bridge the gap between theoretical understanding and functional modeling of snow-driven watersheds, a flexible hydrologic modeling framework is being developed. The aim is to create a suite of models that move from parsimonious structures, concentrated on aggregated watershed response, to those focused on representing finer scale processes and distributed response. This framework will operate as a tool to investigate the link between hydrologic model predictive performance, uncertainty, model complexity, and observable hydrologic processes. Bayesian methods, and particularly Markov chain Monte Carlo (MCMC) techniques, are extremely useful in uncertainty assessment and parameter estimation of hydrologic models. However, these methods have some difficulties in implementation. In a traditional Bayesian setting, it can be difficult to reconcile multiple data types, particularly those offering different spatial and temporal coverage, depending on the model type. These difficulties are also exacerbated by sensitivity of MCMC algorithms to model initialization and complex parameter interdependencies. As a way of circumnavigating some of the computational complications, adaptive MCMC algorithms have been developed to take advantage of the information gained from each successive iteration. Two adaptive algorithms are compared is this study, the Adaptive Metropolis (AM) algorithm, developed by Haario et al (2001), and the Delayed Rejection Adaptive Metropolis (DRAM) algorithm, developed by Haario et al (2006). While neither algorithm is truly Markovian, it has been proven that each satisfies the desired ergodicity and stationarity properties of Markov chains. Both algorithms were implemented as the uncertainty and parameter estimation framework for a conceptual rainfall-runoff model based on the Probability Distributed Model (PDM), developed by Moore (1985). We implement the modeling framework in Stringer Creek watershed in the Tenderfoot Creek Experimental Forest (TCEF), Montana. The snowmelt-driven watershed offers that additional challenge of modeling snow accumulation and melt and current efforts are aimed at developing a temperature- and radiation-index snowmelt model. Auxiliary data available from within TCEF's watersheds are used to support in the understanding of information value as it relates to predictive performance. Because the model is based on lumped parameters, auxiliary data are hard to incorporate directly. However, these additional data offer benefits through the ability to inform prior distributions of the lumped, model parameters. By incorporating data offering different information into the uncertainty assessment process, a cross-validation technique is engaged to better ensure that modeled results reflect real process complexity.
A Hierarchical Bayesian Model for Calibrating Estimates of Species Divergence Times
Heath, Tracy A.
2012-01-01
In Bayesian divergence time estimation methods, incorporating calibrating information from the fossil record is commonly done by assigning prior densities to ancestral nodes in the tree. Calibration prior densities are typically parametric distributions offset by minimum age estimates provided by the fossil record. Specification of the parameters of calibration densities requires the user to quantify his or her prior knowledge of the age of the ancestral node relative to the age of its calibrating fossil. The values of these parameters can, potentially, result in biased estimates of node ages if they lead to overly informative prior distributions. Accordingly, determining parameter values that lead to adequate prior densities is not straightforward. In this study, I present a hierarchical Bayesian model for calibrating divergence time analyses with multiple fossil age constraints. This approach applies a Dirichlet process prior as a hyperprior on the parameters of calibration prior densities. Specifically, this model assumes that the rate parameters of exponential prior distributions on calibrated nodes are distributed according to a Dirichlet process, whereby the rate parameters are clustered into distinct parameter categories. Both simulated and biological data are analyzed to evaluate the performance of the Dirichlet process hyperprior. Compared with fixed exponential prior densities, the hierarchical Bayesian approach results in more accurate and precise estimates of internal node ages. When this hyperprior is applied using Markov chain Monte Carlo methods, the ages of calibrated nodes are sampled from mixtures of exponential distributions and uncertainty in the values of calibration density parameters is taken into account. PMID:22334343
Approximate likelihood calculation on a phylogeny for Bayesian estimation of divergence times.
dos Reis, Mario; Yang, Ziheng
2011-07-01
The molecular clock provides a powerful way to estimate species divergence times. If information on some species divergence times is available from the fossil or geological record, it can be used to calibrate a phylogeny and estimate divergence times for all nodes in the tree. The Bayesian method provides a natural framework to incorporate different sources of information concerning divergence times, such as information in the fossil and molecular data. Current models of sequence evolution are intractable in a Bayesian setting, and Markov chain Monte Carlo (MCMC) is used to generate the posterior distribution of divergence times and evolutionary rates. This method is computationally expensive, as it involves the repeated calculation of the likelihood function. Here, we explore the use of Taylor expansion to approximate the likelihood during MCMC iteration. The approximation is much faster than conventional likelihood calculation. However, the approximation is expected to be poor when the proposed parameters are far from the likelihood peak. We explore the use of parameter transforms (square root, logarithm, and arcsine) to improve the approximation to the likelihood curve. We found that the new methods, particularly the arcsine-based transform, provided very good approximations under relaxed clock models and also under the global clock model when the global clock is not seriously violated. The approximation is poorer for analysis under the global clock when the global clock is seriously wrong and should thus not be used. The results suggest that the approximate method may be useful for Bayesian dating analysis using large data sets.
NASA Astrophysics Data System (ADS)
Han, Feng; Zheng, Yi
2018-06-01
Significant Input uncertainty is a major source of error in watershed water quality (WWQ) modeling. It remains challenging to address the input uncertainty in a rigorous Bayesian framework. This study develops the Bayesian Analysis of Input and Parametric Uncertainties (BAIPU), an approach for the joint analysis of input and parametric uncertainties through a tight coupling of Markov Chain Monte Carlo (MCMC) analysis and Bayesian Model Averaging (BMA). The formal likelihood function for this approach is derived considering a lag-1 autocorrelated, heteroscedastic, and Skew Exponential Power (SEP) distributed error model. A series of numerical experiments were performed based on a synthetic nitrate pollution case and on a real study case in the Newport Bay Watershed, California. The Soil and Water Assessment Tool (SWAT) and Differential Evolution Adaptive Metropolis (DREAM(ZS)) were used as the representative WWQ model and MCMC algorithm, respectively. The major findings include the following: (1) the BAIPU can be implemented and used to appropriately identify the uncertain parameters and characterize the predictive uncertainty; (2) the compensation effect between the input and parametric uncertainties can seriously mislead the modeling based management decisions, if the input uncertainty is not explicitly accounted for; (3) the BAIPU accounts for the interaction between the input and parametric uncertainties and therefore provides more accurate calibration and uncertainty results than a sequential analysis of the uncertainties; and (4) the BAIPU quantifies the credibility of different input assumptions on a statistical basis and can be implemented as an effective inverse modeling approach to the joint inference of parameters and inputs.
DOE Office of Scientific and Technical Information (OSTI.GOV)
La Russa, D
Purpose: The purpose of this project is to develop a robust method of parameter estimation for a Poisson-based TCP model using Bayesian inference. Methods: Bayesian inference was performed using the PyMC3 probabilistic programming framework written in Python. A Poisson-based TCP regression model that accounts for clonogen proliferation was fit to observed rates of local relapse as a function of equivalent dose in 2 Gy fractions for a population of 623 stage-I non-small-cell lung cancer patients. The Slice Markov Chain Monte Carlo sampling algorithm was used to sample the posterior distributions, and was initiated using the maximum of the posterior distributionsmore » found by optimization. The calculation of TCP with each sample step required integration over the free parameter α, which was performed using an adaptive 24-point Gauss-Legendre quadrature. Convergence was verified via inspection of the trace plot and posterior distribution for each of the fit parameters, as well as with comparisons of the most probable parameter values with their respective maximum likelihood estimates. Results: Posterior distributions for α, the standard deviation of α (σ), the average tumour cell-doubling time (Td), and the repopulation delay time (Tk), were generated assuming α/β = 10 Gy, and a fixed clonogen density of 10{sup 7} cm−{sup 3}. Posterior predictive plots generated from samples from these posterior distributions are in excellent agreement with the observed rates of local relapse used in the Bayesian inference. The most probable values of the model parameters also agree well with maximum likelihood estimates. Conclusion: A robust method of performing Bayesian inference of TCP data using a complex TCP model has been established.« less
Yen, A M-F; Liou, H-H; Lin, H-L; Chen, T H-H
2006-01-01
The study aimed to develop a predictive model to deal with data fraught with heterogeneity that cannot be explained by sampling variation or measured covariates. The random-effect Poisson regression model was first proposed to deal with over-dispersion for data fraught with heterogeneity after making allowance for measured covariates. Bayesian acyclic graphic model in conjunction with Markov Chain Monte Carlo (MCMC) technique was then applied to estimate the parameters of both relevant covariates and random effect. Predictive distribution was then generated to compare the predicted with the observed for the Bayesian model with and without random effect. Data from repeated measurement of episodes among 44 patients with intractable epilepsy were used as an illustration. The application of Poisson regression without taking heterogeneity into account to epilepsy data yielded a large value of heterogeneity (heterogeneity factor = 17.90, deviance = 1485, degree of freedom (df) = 83). After taking the random effect into account, the value of heterogeneity factor was greatly reduced (heterogeneity factor = 0.52, deviance = 42.5, df = 81). The Pearson chi2 for the comparison between the expected seizure frequencies and the observed ones at two and three months of the model with and without random effect were 34.27 (p = 1.00) and 1799.90 (p < 0.0001), respectively. The Bayesian acyclic model using the MCMC method was demonstrated to have great potential for disease prediction while data show over-dispersion attributed either to correlated property or to subject-to-subject variability.
How Much Can We Learn from a Single Chromatographic Experiment? A Bayesian Perspective.
Wiczling, Paweł; Kaliszan, Roman
2016-01-05
In this work, we proposed and investigated a Bayesian inference procedure to find the desired chromatographic conditions based on known analyte properties (lipophilicity, pKa, and polar surface area) using one preliminary experiment. A previously developed nonlinear mixed effect model was used to specify the prior information about a new analyte with known physicochemical properties. Further, the prior (no preliminary data) and posterior predictive distribution (prior + one experiment) were determined sequentially to search towards the desired separation. The following isocratic high-performance reversed-phase liquid chromatographic conditions were sought: (1) retention time of a single analyte within the range of 4-6 min and (2) baseline separation of two analytes with retention times within the range of 4-10 min. The empirical posterior Bayesian distribution of parameters was estimated using the "slice sampling" Markov Chain Monte Carlo (MCMC) algorithm implemented in Matlab. The simulations with artificial analytes and experimental data of ketoprofen and papaverine were used to test the proposed methodology. The simulation experiment showed that for a single and two randomly selected analytes, there is 97% and 74% probability of obtaining a successful chromatogram using none or one preliminary experiment. The desired separation for ketoprofen and papaverine was established based on a single experiment. It was confirmed that the search for a desired separation rarely requires a large number of chromatographic analyses at least for a simple optimization problem. The proposed Bayesian-based optimization scheme is a powerful method of finding a desired chromatographic separation based on a small number of preliminary experiments.
NASA Astrophysics Data System (ADS)
Figueiredo, Danilo Zucolli; Costa, Oswaldo Luiz do Valle
2017-10-01
This paper deals with the H2 optimal control problem of discrete-time Markov jump linear systems (MJLS) considering the case in which the Markov chain takes values in a general Borel space ?. It is assumed that the controller has access only to an output variable and to the jump parameter. The goal, in this case, is to design a dynamic Markov jump controller such that the H2-norm of the closed-loop system is minimised. It is shown that the H2-norm can be written as the sum of two H2-norms, such that one of them does not depend on the control, and the other one is obtained from the optimal filter for an infinite-horizon filtering problem. This result can be seen as a separation principle for MJLS with Markov chain in a Borel space ? considering the infinite time horizon case.
Quantum Graphical Models and Belief Propagation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Leifer, M.S.; Perimeter Institute for Theoretical Physics, 31 Caroline Street North, Waterloo Ont., N2L 2Y5; Poulin, D.
Belief Propagation algorithms acting on Graphical Models of classical probability distributions, such as Markov Networks, Factor Graphs and Bayesian Networks, are amongst the most powerful known methods for deriving probabilistic inferences amongst large numbers of random variables. This paper presents a generalization of these concepts and methods to the quantum case, based on the idea that quantum theory can be thought of as a noncommutative, operator-valued, generalization of classical probability theory. Some novel characterizations of quantum conditional independence are derived, and definitions of Quantum n-Bifactor Networks, Markov Networks, Factor Graphs and Bayesian Networks are proposed. The structure of Quantum Markovmore » Networks is investigated and some partial characterization results are obtained, along the lines of the Hammersley-Clifford theorem. A Quantum Belief Propagation algorithm is presented and is shown to converge on 1-Bifactor Networks and Markov Networks when the underlying graph is a tree. The use of Quantum Belief Propagation as a heuristic algorithm in cases where it is not known to converge is discussed. Applications to decoding quantum error correcting codes and to the simulation of many-body quantum systems are described.« less
Risk assessment by dynamic representation of vulnerability, exploitation, and impact
NASA Astrophysics Data System (ADS)
Cam, Hasan
2015-05-01
Assessing and quantifying cyber risk accurately in real-time is essential to providing security and mission assurance in any system and network. This paper presents a modeling and dynamic analysis approach to assessing cyber risk of a network in real-time by representing dynamically its vulnerabilities, exploitations, and impact using integrated Bayesian network and Markov models. Given the set of vulnerabilities detected by a vulnerability scanner in a network, this paper addresses how its risk can be assessed by estimating in real-time the exploit likelihood and impact of vulnerability exploitation on the network, based on real-time observations and measurements over the network. The dynamic representation of the network in terms of its vulnerabilities, sensor measurements, and observations is constructed dynamically using the integrated Bayesian network and Markov models. The transition rates of outgoing and incoming links of states in hidden Markov models are used in determining exploit likelihood and impact of attacks, whereas emission rates help quantify the attack states of vulnerabilities. Simulation results show the quantification and evolving risk scores over time for individual and aggregated vulnerabilities of a network.
Machine learning in sentiment reconstruction of the simulated stock market
NASA Astrophysics Data System (ADS)
Goykhman, Mikhail; Teimouri, Ali
2018-02-01
In this paper we continue the study of the simulated stock market framework defined by the driving sentiment processes. We focus on the market environment driven by the buy/sell trading sentiment process of the Markov chain type. We apply the methodology of the Hidden Markov Models and the Recurrent Neural Networks to reconstruct the transition probabilities matrix of the Markov sentiment process and recover the underlying sentiment states from the observed stock price behavior. We demonstrate that the Hidden Markov Model can successfully recover the transition probabilities matrix for the hidden sentiment process of the Markov Chain type. We also demonstrate that the Recurrent Neural Network can successfully recover the hidden sentiment states from the observed simulated stock price time series.
Greenhouse Gas Source Attribution: Measurements Modeling and Uncertainty Quantification
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Zhen; Safta, Cosmin; Sargsyan, Khachik
2014-09-01
In this project we have developed atmospheric measurement capabilities and a suite of atmospheric modeling and analysis tools that are well suited for verifying emissions of green- house gases (GHGs) on an urban-through-regional scale. We have for the first time applied the Community Multiscale Air Quality (CMAQ) model to simulate atmospheric CO 2 . This will allow for the examination of regional-scale transport and distribution of CO 2 along with air pollutants traditionally studied using CMAQ at relatively high spatial and temporal resolution with the goal of leveraging emissions verification efforts for both air quality and climate. We have developedmore » a bias-enhanced Bayesian inference approach that can remedy the well-known problem of transport model errors in atmospheric CO 2 inversions. We have tested the approach using data and model outputs from the TransCom3 global CO 2 inversion comparison project. We have also performed two prototyping studies on inversion approaches in the generalized convection-diffusion context. One of these studies employed Polynomial Chaos Expansion to accelerate the evaluation of a regional transport model and enable efficient Markov Chain Monte Carlo sampling of the posterior for Bayesian inference. The other approach uses de- terministic inversion of a convection-diffusion-reaction system in the presence of uncertainty. These approaches should, in principle, be applicable to realistic atmospheric problems with moderate adaptation. We outline a regional greenhouse gas source inference system that integrates (1) two ap- proaches of atmospheric dispersion simulation and (2) a class of Bayesian inference and un- certainty quantification algorithms. We use two different and complementary approaches to simulate atmospheric dispersion. Specifically, we use a Eulerian chemical transport model CMAQ and a Lagrangian Particle Dispersion Model - FLEXPART-WRF. These two models share the same WRF assimilated meteorology fields, making it possible to perform a hybrid simulation, in which the Eulerian model (CMAQ) can be used to compute the initial condi- tion needed by the Lagrangian model, while the source-receptor relationships for a large state vector can be efficiently computed using the Lagrangian model in its backward mode. In ad- dition, CMAQ has a complete treatment of atmospheric chemistry of a suite of traditional air pollutants, many of which could help attribute GHGs from different sources. The inference of emissions sources using atmospheric observations is cast as a Bayesian model calibration problem, which is solved using a variety of Bayesian techniques, such as the bias-enhanced Bayesian inference algorithm, which accounts for the intrinsic model deficiency, Polynomial Chaos Expansion to accelerate model evaluation and Markov Chain Monte Carlo sampling, and Karhunen-Lo %60 eve (KL) Expansion to reduce the dimensionality of the state space. We have established an atmospheric measurement site in Livermore, CA and are collect- ing continuous measurements of CO 2 , CH 4 and other species that are typically co-emitted with these GHGs. Measurements of co-emitted species can assist in attributing the GHGs to different emissions sectors. Automatic calibrations using traceable standards are performed routinely for the gas-phase measurements. We are also collecting standard meteorological data at the Livermore site as well as planetary boundary height measurements using a ceilometer. The location of the measurement site is well suited to sample air transported between the San Francisco Bay area and the California Central Valley.« less
Metastates in Mean-Field Models with Random External Fields Generated by Markov Chains
NASA Astrophysics Data System (ADS)
Formentin, M.; Külske, C.; Reichenbachs, A.
2012-01-01
We extend the construction by Külske and Iacobelli of metastates in finite-state mean-field models in independent disorder to situations where the local disorder terms are a sample of an external ergodic Markov chain in equilibrium. We show that for non-degenerate Markov chains, the structure of the theorems is analogous to the case of i.i.d. variables when the limiting weights in the metastate are expressed with the aid of a CLT for the occupation time measure of the chain. As a new phenomenon we also show in a Potts example that for a degenerate non-reversible chain this CLT approximation is not enough, and that the metastate can have less symmetry than the symmetry of the interaction and a Gaussian approximation of disorder fluctuations would suggest.
Technical manual for basic version of the Markov chain nest productivity model (MCnest)
The Markov Chain Nest Productivity Model (or MCnest) integrates existing toxicity information from three standardized avian toxicity tests with information on species life history and the timing of pesticide applications relative to the timing of avian breeding seasons to quantit...
User’s manual for basic version of MCnest Markov chain nest productivity model
The Markov Chain Nest Productivity Model (or MCnest) integrates existing toxicity information from three standardized avian toxicity tests with information on species life history and the timing of pesticide applications relative to the timing of avian breeding seasons to quantit...
NASA Astrophysics Data System (ADS)
Wang, Chao; Yang, Chuan-sheng
2017-09-01
In this paper, we present a simplified parsimonious higher-order multivariate Markov chain model with new convergence condition. (TPHOMMCM-NCC). Moreover, estimation method of the parameters in TPHOMMCM-NCC is give. Numerical experiments illustrate the effectiveness of TPHOMMCM-NCC.
NASA Astrophysics Data System (ADS)
Gudder, Stanley
2008-07-01
A new approach to quantum Markov chains is presented. We first define a transition operation matrix (TOM) as a matrix whose entries are completely positive maps whose column sums form a quantum operation. A quantum Markov chain is defined to be a pair (G,E) where G is a directed graph and E =[Eij] is a TOM whose entry Eij labels the edge from vertex j to vertex i. We think of the vertices of G as sites that a quantum system can occupy and Eij is the transition operation from site j to site i in one time step. The discrete dynamics of the system is obtained by iterating the TOM E. We next consider a special type of TOM called a transition effect matrix. In this case, there are two types of dynamics, a state dynamics and an operator dynamics. Although these two types are not identical, they are statistically equivalent. We next give examples that illustrate various properties of quantum Markov chains. We conclude by showing that our formalism generalizes the usual framework for quantum random walks.
Open Markov Processes and Reaction Networks
ERIC Educational Resources Information Center
Swistock Pollard, Blake Stephen
2017-01-01
We begin by defining the concept of "open" Markov processes, which are continuous-time Markov chains where probability can flow in and out through certain "boundary" states. We study open Markov processes which in the absence of such boundary flows admit equilibrium states satisfying detailed balance, meaning that the net flow…
Markovian prediction of future values for food grains in the economic survey
NASA Astrophysics Data System (ADS)
Sathish, S.; Khadar Babu, S. K.
2017-11-01
Now-a-days prediction and forecasting are plays a vital role in research. For prediction, regression is useful to predict the future value and current value on production process. In this paper, we assume food grain production exhibit Markov chain dependency and time homogeneity. The economic generative performance evaluation the balance time artificial fertilization different level in Estrusdetection using a daily Markov chain model. Finally, Markov process prediction gives better performance compare with Regression model.
Schmandt, Nicolaus T; Galán, Roberto F
2012-09-14
Markov chains provide realistic models of numerous stochastic processes in nature. We demonstrate that in any Markov chain, the change in occupation number in state A is correlated to the change in occupation number in state B if and only if A and B are directly connected. This implies that if we are only interested in state A, fluctuations in B may be replaced with their mean if state B is not directly connected to A, which shortens computing time considerably. We show the accuracy and efficacy of our approximation theoretically and in simulations of stochastic ion-channel gating in neurons.
Calibrated tree priors for relaxed phylogenetics and divergence time estimation.
Heled, Joseph; Drummond, Alexei J
2012-01-01
The use of fossil evidence to calibrate divergence time estimation has a long history. More recently, Bayesian Markov chain Monte Carlo has become the dominant method of divergence time estimation, and fossil evidence has been reinterpreted as the specification of prior distributions on the divergence times of calibration nodes. These so-called "soft calibrations" have become widely used but the statistical properties of calibrated tree priors in a Bayesian setting hashave not been carefully investigated. Here, we clarify that calibration densities, such as those defined in BEAST 1.5, do not represent the marginal prior distribution of the calibration node. We illustrate this with a number of analytical results on small trees. We also describe an alternative construction for a calibrated Yule prior on trees that allows direct specification of the marginal prior distribution of the calibrated divergence time, with or without the restriction of monophyly. This method requires the computation of the Yule prior conditional on the height of the divergence being calibrated. Unfortunately, a practical solution for multiple calibrations remains elusive. Our results suggest that direct estimation of the prior induced by specifying multiple calibration densities should be a prerequisite of any divergence time dating analysis.
NASA Astrophysics Data System (ADS)
Perreault Levasseur, Laurence; Hezaveh, Yashar D.; Wechsler, Risa H.
2017-11-01
In Hezaveh et al. we showed that deep learning can be used for model parameter estimation and trained convolutional neural networks to determine the parameters of strong gravitational-lensing systems. Here we demonstrate a method for obtaining the uncertainties of these parameters. We review the framework of variational inference to obtain approximate posteriors of Bayesian neural networks and apply it to a network trained to estimate the parameters of the Singular Isothermal Ellipsoid plus external shear and total flux magnification. We show that the method can capture the uncertainties due to different levels of noise in the input data, as well as training and architecture-related errors made by the network. To evaluate the accuracy of the resulting uncertainties, we calculate the coverage probabilities of marginalized distributions for each lensing parameter. By tuning a single variational parameter, the dropout rate, we obtain coverage probabilities approximately equal to the confidence levels for which they were calculated, resulting in accurate and precise uncertainty estimates. Our results suggest that the application of approximate Bayesian neural networks to astrophysical modeling problems can be a fast alternative to Monte Carlo Markov Chains, allowing orders of magnitude improvement in speed.
Bayesian Multiscale Modeling of Closed Curves in Point Clouds
Gu, Kelvin; Pati, Debdeep; Dunson, David B.
2014-01-01
Modeling object boundaries based on image or point cloud data is frequently necessary in medical and scientific applications ranging from detecting tumor contours for targeted radiation therapy, to the classification of organisms based on their structural information. In low-contrast images or sparse and noisy point clouds, there is often insufficient data to recover local segments of the boundary in isolation. Thus, it becomes critical to model the entire boundary in the form of a closed curve. To achieve this, we develop a Bayesian hierarchical model that expresses highly diverse 2D objects in the form of closed curves. The model is based on a novel multiscale deformation process. By relating multiple objects through a hierarchical formulation, we can successfully recover missing boundaries by borrowing structural information from similar objects at the appropriate scale. Furthermore, the model’s latent parameters help interpret the population, indicating dimensions of significant structural variability and also specifying a ‘central curve’ that summarizes the collection. Theoretical properties of our prior are studied in specific cases and efficient Markov chain Monte Carlo methods are developed, evaluated through simulation examples and applied to panorex teeth images for modeling teeth contours and also to a brain tumor contour detection problem. PMID:25544786
Liang, Li-Jung; Weiss, Robert E; Redelings, Benjamin; Suchard, Marc A
2009-10-01
Statistical analyses of phylogenetic data culminate in uncertain estimates of underlying model parameters. Lack of additional data hinders the ability to reduce this uncertainty, as the original phylogenetic dataset is often complete, containing the entire gene or genome information available for the given set of taxa. Informative priors in a Bayesian analysis can reduce posterior uncertainty; however, publicly available phylogenetic software specifies vague priors for model parameters by default. We build objective and informative priors using hierarchical random effect models that combine additional datasets whose parameters are not of direct interest but are similar to the analysis of interest. We propose principled statistical methods that permit more precise parameter estimates in phylogenetic analyses by creating informative priors for parameters of interest. Using additional sequence datasets from our lab or public databases, we construct a fully Bayesian semiparametric hierarchical model to combine datasets. A dynamic iteratively reweighted Markov chain Monte Carlo algorithm conveniently recycles posterior samples from the individual analyses. We demonstrate the value of our approach by examining the insertion-deletion (indel) process in the enolase gene across the Tree of Life using the phylogenetic software BALI-PHY; we incorporate prior information about indels from 82 curated alignments downloaded from the BAliBASE database.
Constraining the symmetry energy with heavy-ion collisions and Bayesian analysis
NASA Astrophysics Data System (ADS)
Tsang, C. Y.; Jhang, G.; Morfouace, P.; Lynch, W. G.; Tsang, M. B.; HiRA Collaboration
2017-09-01
To extract constraints on symmetry energy terms in nuclear Equation of State (EoS), data from heavy ion reactions, are often compared to calculations from transport models. As multiple model input parameters are needed in the transport model, it is necessary to do multi-parameter analysis to understand the relationship especially if strong correlations exist among the parameters. In this talk, I will discuss how four symmetry energy parameters, So, (Symmetry energy) and L (slope) at saturation density as well as the nucleon scaler effective mass (ms*) and the nucleon effective mass splitting, (FI) are obtained by comparing transport mode results with experimental data such as isospin diffusions and n/p spectral ratios using MADAI Bayesian analysis software. Probability of each parameter having a certain value given experimental data can be calculated with Bayes theorem by Markov Chain Monte Carlo integration. Results using single and double ratios of neutron and proton spectra from 124Sn +124Sn, 112Sn +112Sn collisions at 120 MeV/u as well as isospin diffusion from Sn +Sn isotopes, at 50 and 35 MeV/u will be presented. This research is supported by the National Science Foundation under Grant No. PHY-1565546.
Structural and parameteric uncertainty quantification in cloud microphysics parameterization schemes
NASA Astrophysics Data System (ADS)
van Lier-Walqui, M.; Morrison, H.; Kumjian, M. R.; Prat, O. P.; Martinkus, C.
2017-12-01
Atmospheric model parameterization schemes employ approximations to represent the effects of unresolved processes. These approximations are a source of error in forecasts, caused in part by considerable uncertainty about the optimal value of parameters within each scheme -- parameteric uncertainty. Furthermore, there is uncertainty regarding the best choice of the overarching structure of the parameterization scheme -- structrual uncertainty. Parameter estimation can constrain the first, but may struggle with the second because structural choices are typically discrete. We address this problem in the context of cloud microphysics parameterization schemes by creating a flexible framework wherein structural and parametric uncertainties can be simultaneously constrained. Our scheme makes no assuptions about drop size distribution shape or the functional form of parametrized process rate terms. Instead, these uncertainties are constrained by observations using a Markov Chain Monte Carlo sampler within a Bayesian inference framework. Our scheme, the Bayesian Observationally-constrained Statistical-physical Scheme (BOSS), has flexibility to predict various sets of prognostic drop size distribution moments as well as varying complexity of process rate formulations. We compare idealized probabilistic forecasts from versions of BOSS with varying levels of structural complexity. This work has applications in ensemble forecasts with model physics uncertainty, data assimilation, and cloud microphysics process studies.
Cruz-Marcelo, Alejandro; Ensor, Katherine B; Rosner, Gary L
2011-06-01
The term structure of interest rates is used to price defaultable bonds and credit derivatives, as well as to infer the quality of bonds for risk management purposes. We introduce a model that jointly estimates term structures by means of a Bayesian hierarchical model with a prior probability model based on Dirichlet process mixtures. The modeling methodology borrows strength across term structures for purposes of estimation. The main advantage of our framework is its ability to produce reliable estimators at the company level even when there are only a few bonds per company. After describing the proposed model, we discuss an empirical application in which the term structure of 197 individual companies is estimated. The sample of 197 consists of 143 companies with only one or two bonds. In-sample and out-of-sample tests are used to quantify the improvement in accuracy that results from approximating the term structure of corporate bonds with estimators by company rather than by credit rating, the latter being a popular choice in the financial literature. A complete description of a Markov chain Monte Carlo (MCMC) scheme for the proposed model is available as Supplementary Material.
Cruz-Marcelo, Alejandro; Ensor, Katherine B.; Rosner, Gary L.
2011-01-01
The term structure of interest rates is used to price defaultable bonds and credit derivatives, as well as to infer the quality of bonds for risk management purposes. We introduce a model that jointly estimates term structures by means of a Bayesian hierarchical model with a prior probability model based on Dirichlet process mixtures. The modeling methodology borrows strength across term structures for purposes of estimation. The main advantage of our framework is its ability to produce reliable estimators at the company level even when there are only a few bonds per company. After describing the proposed model, we discuss an empirical application in which the term structure of 197 individual companies is estimated. The sample of 197 consists of 143 companies with only one or two bonds. In-sample and out-of-sample tests are used to quantify the improvement in accuracy that results from approximating the term structure of corporate bonds with estimators by company rather than by credit rating, the latter being a popular choice in the financial literature. A complete description of a Markov chain Monte Carlo (MCMC) scheme for the proposed model is available as Supplementary Material. PMID:21765566
Recursive algorithms for phylogenetic tree counting.
Gavryushkina, Alexandra; Welch, David; Drummond, Alexei J
2013-10-28
In Bayesian phylogenetic inference we are interested in distributions over a space of trees. The number of trees in a tree space is an important characteristic of the space and is useful for specifying prior distributions. When all samples come from the same time point and no prior information available on divergence times, the tree counting problem is easy. However, when fossil evidence is used in the inference to constrain the tree or data are sampled serially, new tree spaces arise and counting the number of trees is more difficult. We describe an algorithm that is polynomial in the number of sampled individuals for counting of resolutions of a constraint tree assuming that the number of constraints is fixed. We generalise this algorithm to counting resolutions of a fully ranked constraint tree. We describe a quadratic algorithm for counting the number of possible fully ranked trees on n sampled individuals. We introduce a new type of tree, called a fully ranked tree with sampled ancestors, and describe a cubic time algorithm for counting the number of such trees on n sampled individuals. These algorithms should be employed for Bayesian Markov chain Monte Carlo inference when fossil data are included or data are serially sampled.
MULTINEST: an efficient and robust Bayesian inference tool for cosmology and particle physics
NASA Astrophysics Data System (ADS)
Feroz, F.; Hobson, M. P.; Bridges, M.
2009-10-01
We present further development and the first public release of our multimodal nested sampling algorithm, called MULTINEST. This Bayesian inference tool calculates the evidence, with an associated error estimate, and produces posterior samples from distributions that may contain multiple modes and pronounced (curving) degeneracies in high dimensions. The developments presented here lead to further substantial improvements in sampling efficiency and robustness, as compared to the original algorithm presented in Feroz & Hobson, which itself significantly outperformed existing Markov chain Monte Carlo techniques in a wide range of astrophysical inference problems. The accuracy and economy of the MULTINEST algorithm are demonstrated by application to two toy problems and to a cosmological inference problem focusing on the extension of the vanilla Λ cold dark matter model to include spatial curvature and a varying equation of state for dark energy. The MULTINEST software, which is fully parallelized using MPI and includes an interface to COSMOMC, is available at http://www.mrao.cam.ac.uk/software/multinest/. It will also be released as part of the SUPERBAYES package, for the analysis of supersymmetric theories of particle physics, at http://www.superbayes.org.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Altmann, Yoann; Maccarone, Aurora; McCarthy, Aongus
Here, this paper presents a new Bayesian spectral un-mixing algorithm to analyse remote scenes sensed via sparse multispectral Lidar measurements. To a first approximation, in the presence of a target, each Lidar waveform consists of a main peak, whose position depends on the target distance and whose amplitude depends on the wavelength of the laser source considered (i.e, on the target reflectivity). Besides, these temporal responses are usually assumed to be corrupted by Poisson noise in the low photon count regime. When considering multiple wavelengths, it becomes possible to use spectral information in order to identify and quantify the mainmore » materials in the scene, in addition to estimation of the Lidar-based range profiles. Due to its anomaly detection capability, the proposed hierarchical Bayesian model, coupled with an efficient Markov chain Monte Carlo algorithm, allows robust estimation of depth images together with abundance and outlier maps associated with the observed 3D scene. The proposed methodology is illustrated via experiments conducted with real multispectral Lidar data acquired in a controlled environment. The results demonstrate the possibility to unmix spectral responses constructed from extremely sparse photon counts (less than 10 photons per pixel and band).« less
Bayesian inference on EMRI signals using low frequency approximations
NASA Astrophysics Data System (ADS)
Ali, Asad; Christensen, Nelson; Meyer, Renate; Röver, Christian
2012-07-01
Extreme mass ratio inspirals (EMRIs) are thought to be one of the most exciting gravitational wave sources to be detected with LISA. Due to their complicated nature and weak amplitudes the detection and parameter estimation of such sources is a challenging task. In this paper we present a statistical methodology based on Bayesian inference in which the estimation of parameters is carried out by advanced Markov chain Monte Carlo (MCMC) algorithms such as parallel tempering MCMC. We analysed high and medium mass EMRI systems that fall well inside the low frequency range of LISA. In the context of the Mock LISA Data Challenges, our investigation and results are also the first instance in which a fully Markovian algorithm is applied for EMRI searches. Results show that our algorithm worked well in recovering EMRI signals from different (simulated) LISA data sets having single and multiple EMRI sources and holds great promise for posterior computation under more realistic conditions. The search and estimation methods presented in this paper are general in their nature, and can be applied in any other scenario such as AdLIGO, AdVIRGO and Einstein Telescope with their respective response functions.
The Bayesian group lasso for confounded spatial data
Hefley, Trevor J.; Hooten, Mevin B.; Hanks, Ephraim M.; Russell, Robin E.; Walsh, Daniel P.
2017-01-01
Generalized linear mixed models for spatial processes are widely used in applied statistics. In many applications of the spatial generalized linear mixed model (SGLMM), the goal is to obtain inference about regression coefficients while achieving optimal predictive ability. When implementing the SGLMM, multicollinearity among covariates and the spatial random effects can make computation challenging and influence inference. We present a Bayesian group lasso prior with a single tuning parameter that can be chosen to optimize predictive ability of the SGLMM and jointly regularize the regression coefficients and spatial random effect. We implement the group lasso SGLMM using efficient Markov chain Monte Carlo (MCMC) algorithms and demonstrate how multicollinearity among covariates and the spatial random effect can be monitored as a derived quantity. To test our method, we compared several parameterizations of the SGLMM using simulated data and two examples from plant ecology and disease ecology. In all examples, problematic levels multicollinearity occurred and influenced sampling efficiency and inference. We found that the group lasso prior resulted in roughly twice the effective sample size for MCMC samples of regression coefficients and can have higher and less variable predictive accuracy based on out-of-sample data when compared to the standard SGLMM.
NASA Astrophysics Data System (ADS)
van Rossum, Anne C.; Lin, Hai Xiang; Dubbeldam, Johan; van der Herik, H. Jaap
2018-04-01
In machine vision typical heuristic methods to extract parameterized objects out of raw data points are the Hough transform and RANSAC. Bayesian models carry the promise to optimally extract such parameterized objects given a correct definition of the model and the type of noise at hand. A category of solvers for Bayesian models are Markov chain Monte Carlo methods. Naive implementations of MCMC methods suffer from slow convergence in machine vision due to the complexity of the parameter space. Towards this blocked Gibbs and split-merge samplers have been developed that assign multiple data points to clusters at once. In this paper we introduce a new split-merge sampler, the triadic split-merge sampler, that perform steps between two and three randomly chosen clusters. This has two advantages. First, it reduces the asymmetry between the split and merge steps. Second, it is able to propose a new cluster that is composed out of data points from two different clusters. Both advantages speed up convergence which we demonstrate on a line extraction problem. We show that the triadic split-merge sampler outperforms the conventional split-merge sampler. Although this new MCMC sampler is demonstrated in this machine vision context, its application extend to the very general domain of statistical inference.
Uncertainty Estimates of Psychoacoustic Thresholds Obtained from Group Tests
NASA Technical Reports Server (NTRS)
Rathsam, Jonathan; Christian, Andrew
2016-01-01
Adaptive psychoacoustic test methods, in which the next signal level depends on the response to the previous signal, are the most efficient for determining psychoacoustic thresholds of individual subjects. In many tests conducted in the NASA psychoacoustic labs, the goal is to determine thresholds representative of the general population. To do this economically, non-adaptive testing methods are used in which three or four subjects are tested at the same time with predetermined signal levels. This approach requires us to identify techniques for assessing the uncertainty in resulting group-average psychoacoustic thresholds. In this presentation we examine the Delta Method of frequentist statistics, the Generalized Linear Model (GLM), the Nonparametric Bootstrap, a frequentist method, and Markov Chain Monte Carlo Posterior Estimation and a Bayesian approach. Each technique is exercised on a manufactured, theoretical dataset and then on datasets from two psychoacoustics facilities at NASA. The Delta Method is the simplest to implement and accurate for the cases studied. The GLM is found to be the least robust, and the Bootstrap takes the longest to calculate. The Bayesian Posterior Estimate is the most versatile technique examined because it allows the inclusion of prior information.
User's Manual MCnest - Markov Chain Nest Productivity Model Version 2.0
The Markov chain nest productivity model, or MCnest, is a set of algorithms for integrating the results of avian toxicity tests with reproductive life-history data to project the relative magnitude of chemical effects on avian reproduction. The mathematical foundation of MCnest i...
Document Ranking Based upon Markov Chains.
ERIC Educational Resources Information Center
Danilowicz, Czeslaw; Balinski, Jaroslaw
2001-01-01
Considers how the order of documents in information retrieval responses are determined and introduces a method that uses a probabilistic model of a document set where documents are regarded as states of a Markov chain and where transition probabilities are directly proportional to similarities between documents. (Author/LRW)
Using Markov Chain Analyses in Counselor Education Research
ERIC Educational Resources Information Center
Duys, David K.; Headrick, Todd C.
2004-01-01
This study examined the efficacy of an infrequently used statistical analysis in counselor education research. A Markov chain analysis was used to examine hypothesized differences between students' use of counseling skills in an introductory course. Thirty graduate students participated in the study. Independent raters identified the microskills…
Karabatsos, George
2017-02-01
Most of applied statistics involves regression analysis of data. In practice, it is important to specify a regression model that has minimal assumptions which are not violated by data, to ensure that statistical inferences from the model are informative and not misleading. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models, constructed from MATLAB Compiler. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal linear models. Each of the 78 regression models handles either a continuous, binary, or ordinal dependent variable, and can handle multi-level (grouped) data. All 83 Bayesian models can handle the analysis of weighted observations (e.g., for meta-analysis), and the analysis of left-censored, right-censored, and/or interval-censored data. Each BNP infinite-mixture model has a mixture distribution assigned one of various BNP prior distributions, including priors defined by either the Dirichlet process, Pitman-Yor process (including the normalized stable process), beta (two-parameter) process, normalized inverse-Gaussian process, geometric weights prior, dependent Dirichlet process, or the dependent infinite-probits prior. The software user can mouse-click to select a Bayesian model and perform data analysis via Markov chain Monte Carlo (MCMC) sampling. After the sampling completes, the software automatically opens text output that reports MCMC-based estimates of the model's posterior distribution and model predictive fit to the data. Additional text and/or graphical output can be generated by mouse-clicking other menu options. This includes output of MCMC convergence analyses, and estimates of the model's posterior predictive distribution, for selected functionals and values of covariates. The software is illustrated through the BNP regression analysis of real data.
Yasaitis, Laura C; Arcaya, Mariana C; Subramanian, S V
2015-09-01
Creating local population health measures from administrative data would be useful for health policy and public health monitoring purposes. While a wide range of options--from simple spatial smoothers to model-based methods--for estimating such rates exists, there are relatively few side-by-side comparisons, especially not with real-world data. In this paper, we compare methods for creating local estimates of acute myocardial infarction rates from Medicare claims data. A Bayesian Monte Carlo Markov Chain estimator that incorporated spatial and local random effects performed best, followed by a method-of-moments spatial Empirical Bayes estimator. As the former is more complicated and time-consuming, spatial linear Empirical Bayes methods may represent a good alternative for non-specialist investigators. Copyright © 2015 Elsevier Ltd. All rights reserved.
Stochastic Inversion of 2D Magnetotelluric Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Jinsong
2010-07-01
The algorithm is developed to invert 2D magnetotelluric (MT) data based on sharp boundary parametrization using a Bayesian framework. Within the algorithm, we consider the locations and the resistivity of regions formed by the interfaces are as unknowns. We use a parallel, adaptive finite-element algorithm to forward simulate frequency-domain MT responses of 2D conductivity structure. Those unknown parameters are spatially correlated and are described by a geostatistical model. The joint posterior probability distribution function is explored by Markov Chain Monte Carlo (MCMC) sampling methods. The developed stochastic model is effective for estimating the interface locations and resistivity. Most importantly, itmore » provides details uncertainty information on each unknown parameter. Hardware requirements: PC, Supercomputer, Multi-platform, Workstation; Software requirements C and Fortan; Operation Systems/version is Linux/Unix or Windows« less
Analysis of capture-recapture models with individual covariates using data augmentation
Royle, J. Andrew
2009-01-01
I consider the analysis of capture-recapture models with individual covariates that influence detection probability. Bayesian analysis of the joint likelihood is carried out using a flexible data augmentation scheme that facilitates analysis by Markov chain Monte Carlo methods, and a simple and straightforward implementation in freely available software. This approach is applied to a study of meadow voles (Microtus pennsylvanicus) in which auxiliary data on a continuous covariate (body mass) are recorded, and it is thought that detection probability is related to body mass. In a second example, the model is applied to an aerial waterfowl survey in which a double-observer protocol is used. The fundamental unit of observation is the cluster of individual birds, and the size of the cluster (a discrete covariate) is used as a covariate on detection probability.
Martinez, Edson Zangiacomi; Roza, Daiane Leite da; Caccia-Bava, Maria do Carmo Gullaci Guimarães; Achcar, Jorge Alberto; Dal-Fabbro, Amaury Lelis
2011-05-01
Teenage pregnancy is a common public health problem worldwide. The objective of this ecological study was to investigate the spatial association between teenage pregnancy rates and socioeconomic characteristics of municipalities in São Paulo State, Southeast Brazil. We used a Bayesian model with a spatial distribution following a conditional autoregressive (CAR) form based on Markov Chain Monte Carlo algorithm. We used data from the Live Birth Information System (SINASC) and the Brazilian Institute of Geography and Statistics (IBGE). Early pregnancy was more frequent in municipalities with lower per capital gross domestic product (GDP), higher poverty rate, smaller population, lower human development index (HDI), and a higher percentage of individuals with State social vulnerability index of 5 or 6 (more vulnerable). The study demonstrates a significant association between teenage pregnancy and socioeconomic indicators.
Severe Neutropenia in Dengue Patients: Prevalence and Significance
Thein, Tun-Linn; Lye, David C.; Leo, Yee-Sin; Wong, Joshua G. X.; Hao, Ying; Wilder-Smith, Annelies
2014-01-01
Studies on severe neutropenia in dengue are scarce, and its clinical significance is uncertain. We analyzed a cohort of 1,921 reverse transcription polymerase chain reaction-confirmed adult dengue patients admitted to the Communicable Disease Center in Singapore between 2005 and 2008. Time trend analyses for daily absolute neutrophil counts (ANCs) were done using Bayesian hierarchical and Markov models. We found that severe neutropenia, defined as ANC ≤ 0.5 × 109/L, was found in 11.8% with a median duration of 1 day. ANC nadir occurred on illness day 5. Severe neutropenia was not predictive of more severe disease and not associated with secondary bacterial infections, prolonged hospital stay, prolonged fever, or fatal outcome. We concluded that prophylactic antibiotics are not indicated in patients with severe neutropenia without indication for bacterial infection. PMID:24732460
Methods used to calculate doses resulting from inhalation of Capstone depleted uranium aerosols.
Miller, Guthrie; Cheng, Yung Sung; Traub, Richard J; Little, Tom T; Guilmette, Raymond A
2009-03-01
The methods used to calculate radiological and toxicological doses to hypothetical persons inside either a U.S. Army Abrams tank or Bradley Fighting Vehicle that has been perforated by depleted uranium munitions are described. Data from time- and particle-size-resolved measurements of depleted uranium aerosol as well as particle-size-resolved measurements of aerosol solubility in lung fluids for aerosol produced in the breathing zones of the hypothetical occupants were used. The aerosol was approximated as a mixture of nine monodisperse (single particle size) components corresponding to particle size increments measured by the eight stages plus the backup filter of the cascade impactors used. A Markov Chain Monte Carlo Bayesian analysis technique was employed, which straightforwardly calculates the uncertainties in doses. Extensive quality control checking of the various computer codes used is described.
Diagonal couplings of quantum Markov chains
NASA Astrophysics Data System (ADS)
Kümmerer, Burkhard; Schwieger, Kay
2016-05-01
In this paper we extend the coupling method from classical probability theory to quantum Markov chains on atomic von Neumann algebras. In particular, we establish a coupling inequality, which allow us to estimate convergence rates by analyzing couplings. For a given tensor dilation we construct a self-coupling of a Markov operator. It turns out that the coupling is a dual version of the extended dual transition operator studied by Gohm et al. We deduce that this coupling is successful if and only if the dilation is asymptotically complete.
Avian life history profiles for use in the Markov chain nest productivity model (MCnest)
The Markov Chain nest productivity model, or MCnest, quantitatively estimates the effects of pesticides or other toxic chemicals on annual reproductive success of avian species (Bennett and Etterson 2013, Etterson and Bennett 2013). The Basic Version of MCnest was developed as a...
Exploring Mass Perception with Markov Chain Monte Carlo
ERIC Educational Resources Information Center
Cohen, Andrew L.; Ross, Michael G.
2009-01-01
Several previous studies have examined the ability to judge the relative mass of objects in idealized collisions. With a newly developed technique of psychological Markov chain Monte Carlo sampling (A. N. Sanborn & T. L. Griffiths, 2008), this work explores participants; perceptions of different collision mass ratios. The results reveal…
NASA Astrophysics Data System (ADS)
Yoon, Ilsang; Weinberg, Martin D.; Katz, Neal
2011-06-01
We introduce a new galaxy image decomposition tool, GALPHAT (GALaxy PHotometric ATtributes), which is a front-end application of the Bayesian Inference Engine (BIE), a parallel Markov chain Monte Carlo package, to provide full posterior probability distributions and reliable confidence intervals for all model parameters. The BIE relies on GALPHAT to compute the likelihood function. GALPHAT generates scale-free cumulative image tables for the desired model family with precise error control. Interpolation of this table yields accurate pixellated images with any centre, scale and inclination angle. GALPHAT then rotates the image by position angle using a Fourier shift theorem, yielding high-speed, accurate likelihood computation. We benchmark this approach using an ensemble of simulated Sérsic model galaxies over a wide range of observational conditions: the signal-to-noise ratio S/N, the ratio of galaxy size to the point spread function (PSF) and the image size, and errors in the assumed PSF; and a range of structural parameters: the half-light radius re and the Sérsic index n. We characterize the strength of parameter covariance in the Sérsic model, which increases with S/N and n, and the results strongly motivate the need for the full posterior probability distribution in galaxy morphology analyses and later inferences. The test results for simulated galaxies successfully demonstrate that, with a careful choice of Markov chain Monte Carlo algorithms and fast model image generation, GALPHAT is a powerful analysis tool for reliably inferring morphological parameters from a large ensemble of galaxies over a wide range of different observational conditions.
NASA Astrophysics Data System (ADS)
Gaál, Ladislav; Szolgay, Ján.; Bacigál, Tomáå.¡; Kohnová, Silvia
2010-05-01
Copula-based estimation methods of hydro-climatological extremes have increasingly been gaining attention of researchers and practitioners in the last couple of years. Unlike the traditional estimation methods which are based on bivariate cumulative distribution functions (CDFs), copulas are a relatively flexible tool of statistics that allow for modelling dependencies between two or more variables such as flood peaks and flood volumes without making strict assumptions on the marginal distributions. The dependence structure and the reliability of the joint estimates of hydro-climatological extremes, mainly in the right tail of the joint CDF not only depends on the particular copula adopted but also on the data available for the estimation of the marginal distributions of the individual variables. Generally, data samples for frequency modelling have limited temporal extent, which is a considerable drawback of frequency analyses in practice. Therefore, it is advised to deal with statistical methods that improve any part of the process of copula construction and result in more reliable design values of hydrological variables. The scarcity of the data sample mostly in the extreme tail of the joint CDF can be bypassed, e.g., by using a considerably larger amount of simulated data by rainfall-runoff analysis or by including historical information on the variables under study. The latter approach of data extension is used here to make the quantile estimates of the individual marginals of the copula more reliable. In the presented paper it is proposed to use historical information in the frequency analysis of the marginal distributions in the framework of Bayesian Monte Carlo Markov Chain (MCMC) simulations. Generally, a Bayesian approach allows for a straightforward combination of different sources of information on floods (e.g. flood data from systematic measurements and historical flood records, respectively) in terms of a product of the corresponding likelihood functions. On the other hand, the MCMC algorithm is a numerical approach for sampling from the likelihood distributions. The Bayesian MCMC methods therefore provide an attractive way to estimate the uncertainty in parameters and quantile metrics of frequency distributions. The applicability of the method is demonstrated in a case study of the hydroelectric power station Orlík on the Vltava River. This site has a key role in the flood prevention of Prague, the capital city of the Czech Republic. The record length of the available flood data is 126 years from the period 1877-2002, while the flood event observed in 2002 that caused extensive damages and numerous casualties is treated as a historic one. To estimate the joint probabilities of flood peaks and volumes, different copulas are fitted and their goodness-of-fit are evaluated by bootstrap simulations. Finally, selected quantiles of flood volumes conditioned on given flood peaks are derived and compared with those obtained by the traditional method used in the practice of water management specialists of the Vltava River.
Nested Fork-Join Queuing Networks and Their Application to Mobility Airfield Operations Analysis.
1997-03-01
shortest queue length. Setia , Squillante, and Tripathi [109] extend Makowski and Nelson’s work by performing a quantitative assessment of a range of...Markov chains." Numerical Solution of Markov Chains, edited by W. J. Stewart, 63- 88. Basel: Marcel Dekker, 1991. [109] Setia , S. K., and others
Modelling Faculty Replacement Strategies Using a Time-Dependent Finite Markov-Chain Process.
ERIC Educational Resources Information Center
Hackett, E. Raymond; Magg, Alexander A.; Carrigan, Sarah D.
1999-01-01
Describes the use of a time-dependent Markov-chain model to develop faculty-replacement strategies within a college at a research university. The study suggests that a stochastic modelling approach can provide valuable insight when planning for personnel needs in the immediate (five-to-ten year) future. (MSE)
Asteroid mass estimation with Markov-chain Monte Carlo
NASA Astrophysics Data System (ADS)
Siltala, L.; Granvik, M.
2017-09-01
We have developed a new Markov-chain Monte Carlo-based algorithm for asteroid mass estimation based on mutual encounters and tested it for several different asteroids. Our results are in line with previous literature values but suggest that uncertainties of prior estimates may be misleading as a consequence of using linearized methods.
Building Higher-Order Markov Chain Models with EXCEL
ERIC Educational Resources Information Center
Ching, Wai-Ki; Fung, Eric S.; Ng, Michael K.
2004-01-01
Categorical data sequences occur in many applications such as forecasting, data mining and bioinformatics. In this note, we present higher-order Markov chain models for modelling categorical data sequences with an efficient algorithm for solving the model parameters. The algorithm can be implemented easily in a Microsoft EXCEL worksheet. We give a…
UMAP Modules-Units 105, 107-109, 111-112, 158-162.
ERIC Educational Resources Information Center
Keller, Mary K.; And Others
This collection of materials includes six units dealing with applications of matrix methods. These are: 105-Food Service Management; 107-Markov Chains; 108-Electrical Circuits; 109-Food Service and Dietary Requirements; 111-Fixed Point and Absorbing Markov Chains; and 112-Analysis of Linear Circuits. The units contain exercises and model exams,…
Predicting hepatitis B monthly incidence rates using weighted Markov chains and time series methods.
Shahdoust, Maryam; Sadeghifar, Majid; Poorolajal, Jalal; Javanrooh, Niloofar; Amini, Payam
2015-01-01
Hepatitis B (HB) is a major global mortality. Accurately predicting the trend of the disease can provide an appropriate view to make health policy disease prevention. This paper aimed to apply three different to predict monthly incidence rates of HB. This historical cohort study was conducted on the HB incidence data of Hamadan Province, the west of Iran, from 2004 to 2012. Weighted Markov Chain (WMC) method based on Markov chain theory and two time series models including Holt Exponential Smoothing (HES) and SARIMA were applied on the data. The results of different applied methods were compared to correct percentages of predicted incidence rates. The monthly incidence rates were clustered into two clusters as state of Markov chain. The correct predicted percentage of the first and second clusters for WMC, HES and SARIMA methods was (100, 0), (84, 67) and (79, 47) respectively. The overall incidence rate of HBV is estimated to decrease over time. The comparison of results of the three models indicated that in respect to existing seasonality trend and non-stationarity, the HES had the most accurate prediction of the incidence rates.
Operations and support cost modeling using Markov chains
NASA Technical Reports Server (NTRS)
Unal, Resit
1989-01-01
Systems for future missions will be selected with life cycle costs (LCC) as a primary evaluation criterion. This reflects the current realization that only systems which are considered affordable will be built in the future due to the national budget constaints. Such an environment calls for innovative cost modeling techniques which address all of the phases a space system goes through during its life cycle, namely: design and development, fabrication, operations and support; and retirement. A significant portion of the LCC for reusable systems are generated during the operations and support phase (OS). Typically, OS costs can account for 60 to 80 percent of the total LCC. Clearly, OS costs are wholly determined or at least strongly influenced by decisions made during the design and development phases of the project. As a result OS costs need to be considered and estimated early in the conceptual phase. To be effective, an OS cost estimating model needs to account for actual instead of ideal processes by associating cost elements with probabilities. One approach that may be suitable for OS cost modeling is the use of the Markov Chain Process. Markov chains are an important method of probabilistic analysis for operations research analysts but they are rarely used for life cycle cost analysis. This research effort evaluates the use of Markov Chains in LCC analysis by developing OS cost model for a hypothetical reusable space transportation vehicle (HSTV) and suggests further uses of the Markov Chain process as a design-aid tool.
A reward semi-Markov process with memory for wind speed modeling
NASA Astrophysics Data System (ADS)
Petroni, F.; D'Amico, G.; Prattico, F.
2012-04-01
The increasing interest in renewable energy leads scientific research to find a better way to recover most of the available energy. Particularly, the maximum energy recoverable from wind is equal to 59.3% of that available (Betz law) at a specific pitch angle and when the ratio between the wind speed in output and in input is equal to 1/3. The pitch angle is the angle formed between the airfoil of the blade of the wind turbine and the wind direction. Old turbine and a lot of that actually marketed, in fact, have always the same invariant geometry of the airfoil. This causes that wind turbines will work with an efficiency that is lower than 59.3%. New generation wind turbines, instead, have a system to variate the pitch angle by rotating the blades. This system able the wind turbines to recover, at different wind speed, always the maximum energy, working in Betz limit at different speed ratios. A powerful system control of the pitch angle allows the wind turbine to recover better the energy in transient regime. A good stochastic model for wind speed is then needed to help both the optimization of turbine design and to assist the system control to predict the value of the wind speed to positioning the blades quickly and correctly. The possibility to have synthetic data of wind speed is a powerful instrument to assist designer to verify the structures of the wind turbines or to estimate the energy recoverable from a specific site. To generate synthetic data, Markov chains of first or higher order are often used [1,2,3]. In particular in [1] is presented a comparison between a first-order Markov chain and a second-order Markov chain. A similar work, but only for the first-order Markov chain, is conduced by [2], presenting the probability transition matrix and comparing the energy spectral density and autocorrelation of real and synthetic wind speed data. A tentative to modeling and to join speed and direction of wind is presented in [3], by using two models, first-order Markov chain with different number of states, and Weibull distribution. All this model use Markov chains to generate synthetic wind speed time series but the search for a better model is still open. Approaching this issue, we applied new models which are generalization of Markov models. More precisely we applied semi-Markov models to generate synthetic wind speed time series. The primary goal of this analysis is the study of the time history of the wind in order to assess its reliability as a source of power and to determine the associated storage levels required. In order to assess this issue we use a probabilistic model based on indexed semi-Markov process [4] to which a reward structure is attached. Our model is used to calculate the expected energy produced by a given turbine and its variability expressed by the variance of the process. Our results can be used to compare different wind farms based on their reward and also on the risk of missed production due to the intrinsic variability of the wind speed process. The model is used to generate synthetic time series for wind speed by means of Monte Carlo simulations and backtesting procedure is used to compare results on first and second oder moments of rewards between real and synthetic data. [1] A. Shamshad, M.A. Bawadi, W.M.W. Wan Hussin, T.A. Majid, S.A.M. Sanusi, First and second order Markov chain models for synthetic gen- eration of wind speed time series, Energy 30 (2005) 693-708. [2] H. Nfaoui, H. Essiarab, A.A.M. Sayigh, A stochastic Markov chain model for simulating wind speed time series at Tangiers, Morocco, Re- newable Energy 29 (2004) 1407-1418. [3] F. Youcef Ettoumi, H. Sauvageot, A.-E.-H. Adane, Statistical bivariate modeling of wind using first-order Markov chain and Weibull distribu- tion, Renewable Energy 28 (2003) 1787-1802. [4]F. Petroni, G. D'Amico, F. Prattico, Indexed semi-Markov process for wind speed modeling. To be submitted.
Tracking the visual focus of attention for a varying number of wandering people.
Smith, Kevin; Ba, Sileye O; Odobez, Jean-Marc; Gatica-Perez, Daniel
2008-07-01
We define and address the problem of finding the visual focus of attention for a varying number of wandering people (VFOA-W), determining where the people's movement is unconstrained. VFOA-W estimation is a new and important problem with mplications for behavior understanding and cognitive science, as well as real-world applications. One such application, which we present in this article, monitors the attention passers-by pay to an outdoor advertisement. Our approach to the VFOA-W problem proposes a multi-person tracking solution based on a dynamic Bayesian network that simultaneously infers the (variable) number of people in a scene, their body locations, their head locations, and their head pose. For efficient inference in the resulting large variable-dimensional state-space we propose a Reversible Jump Markov Chain Monte Carlo (RJMCMC) sampling scheme, as well as a novel global observation model which determines the number of people in the scene and localizes them. We propose a Gaussian Mixture Model (GMM) and Hidden Markov Model (HMM)-based VFOA-W model which use head pose and location information to determine people's focus state. Our models are evaluated for tracking performance and ability to recognize people looking at an outdoor advertisement, with results indicating good performance on sequences where a moderate number of people pass in front of an advertisement.
Tutorial: Asteroseismic Data Analysis with DIAMONDS
NASA Astrophysics Data System (ADS)
Corsaro, Enrico
Since the advent of the space-based photometric missions such as CoRoT and NASA's Kepler, asteroseismology has acquired a central role in our understanding about stellar physics. The Kepler spacecraft, especially, is still releasing excellent photometric observations that contain a large amount of information not yet investigated. For exploiting the full potential of these data, sophisticated and robust analysis tools are now essential, so that further constraining of stellar structure and evolutionary models can be obtained. In addition, extracting detailed asteroseismic properties for many stars can yield new insights on their correlations to fundamental stellar properties and dynamics. After a brief introduction to the Bayesian notion of probability, I describe the code Diamonds for Bayesian parameter estimation and model comparison by means of the nested sampling Monte Carlo (NSMC) algorithm. NSMC constitutes an efficient and powerful method, in replacement to standard Markov chain Monte Carlo, very suitable for high-dimensional and multimodal problems that are typical of detailed asteroseismic analyses, such as the fitting and mode identification of individual oscillation modes in stars (known as peak-bagging). Diamonds is able to provide robust results for statistical inferences involving tens of individual oscillation modes, while at the same time preserving a considerable computational efficiency for identifying the solution. In the tutorial, I will present the fitting of the stellar background signal and the peak-bagging analysis of the oscillation modes in a red-giant star, providing an example to use Bayesian evidence for assessing the peak significance of the fitted oscillation peaks.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Jinsong
2013-05-01
Development of a hierarchical Bayesian model to estimate the spatiotemporal distribution of aqueous geochemical parameters associated with in-situ bioremediation using surface spectral induced polarization (SIP) data and borehole geochemical measurements collected during a bioremediation experiment at a uranium-contaminated site near Rifle, Colorado. The SIP data are first inverted for Cole-Cole parameters including chargeability, time constant, resistivity at the DC frequency and dependence factor, at each pixel of two-dimensional grids using a previously developed stochastic method. Correlations between the inverted Cole-Cole parameters and the wellbore-based groundwater chemistry measurements indicative of key metabolic processes within the aquifer (e.g. ferrous iron, sulfate, uranium)more » were established and used as a basis for petrophysical model development. The developed Bayesian model consists of three levels of statistical sub-models: 1) data model, providing links between geochemical and geophysical attributes, 2) process model, describing the spatial and temporal variability of geochemical properties in the subsurface system, and 3) parameter model, describing prior distributions of various parameters and initial conditions. The unknown parameters are estimated using Markov chain Monte Carlo methods. By combining the temporally distributed geochemical data with the spatially distributed geophysical data, we obtain the spatio-temporal distribution of ferrous iron, sulfate and sulfide, and their associated uncertainity information. The obtained results can be used to assess the efficacy of the bioremediation treatment over space and time and to constrain reactive transport models.« less
Parameter Estimation of Partial Differential Equation Models.
Xun, Xiaolei; Cao, Jiguo; Mallick, Bani; Carroll, Raymond J; Maity, Arnab
2013-01-01
Partial differential equation (PDE) models are commonly used to model complex dynamic systems in applied sciences such as biology and finance. The forms of these PDE models are usually proposed by experts based on their prior knowledge and understanding of the dynamic system. Parameters in PDE models often have interesting scientific interpretations, but their values are often unknown, and need to be estimated from the measurements of the dynamic system in the present of measurement errors. Most PDEs used in practice have no analytic solutions, and can only be solved with numerical methods. Currently, methods for estimating PDE parameters require repeatedly solving PDEs numerically under thousands of candidate parameter values, and thus the computational load is high. In this article, we propose two methods to estimate parameters in PDE models: a parameter cascading method and a Bayesian approach. In both methods, the underlying dynamic process modeled with the PDE model is represented via basis function expansion. For the parameter cascading method, we develop two nested levels of optimization to estimate the PDE parameters. For the Bayesian method, we develop a joint model for data and the PDE, and develop a novel hierarchical model allowing us to employ Markov chain Monte Carlo (MCMC) techniques to make posterior inference. Simulation studies show that the Bayesian method and parameter cascading method are comparable, and both outperform other available methods in terms of estimation accuracy. The two methods are demonstrated by estimating parameters in a PDE model from LIDAR data.
Bayesian methods for estimating GEBVs of threshold traits
Wang, C-L; Ding, X-D; Wang, J-Y; Liu, J-F; Fu, W-X; Zhang, Z; Yin, Z-J; Zhang, Q
2013-01-01
Estimation of genomic breeding values is the key step in genomic selection (GS). Many methods have been proposed for continuous traits, but methods for threshold traits are still scarce. Here we introduced threshold model to the framework of GS, and specifically, we extended the three Bayesian methods BayesA, BayesB and BayesCπ on the basis of threshold model for estimating genomic breeding values of threshold traits, and the extended methods are correspondingly termed BayesTA, BayesTB and BayesTCπ. Computing procedures of the three BayesT methods using Markov Chain Monte Carlo algorithm were derived. A simulation study was performed to investigate the benefit of the presented methods in accuracy with the genomic estimated breeding values (GEBVs) for threshold traits. Factors affecting the performance of the three BayesT methods were addressed. As expected, the three BayesT methods generally performed better than the corresponding normal Bayesian methods, in particular when the number of phenotypic categories was small. In the standard scenario (number of categories=2, incidence=30%, number of quantitative trait loci=50, h2=0.3), the accuracies were improved by 30.4%, 2.4%, and 5.7% points, respectively. In most scenarios, BayesTB and BayesTCπ generated similar accuracies and both performed better than BayesTA. In conclusion, our work proved that threshold model fits well for predicting GEBVs of threshold traits, and BayesTCπ is supposed to be the method of choice for GS of threshold traits. PMID:23149458
Bayesian probabilistic population projections for all countries
Raftery, Adrian E.; Li, Nan; Ševčíková, Hana; Gerland, Patrick; Heilig, Gerhard K.
2012-01-01
Projections of countries’ future populations, broken down by age and sex, are widely used for planning and research. They are mostly done deterministically, but there is a widespread need for probabilistic projections. We propose a Bayesian method for probabilistic population projections for all countries. The total fertility rate and female and male life expectancies at birth are projected probabilistically using Bayesian hierarchical models estimated via Markov chain Monte Carlo using United Nations population data for all countries. These are then converted to age-specific rates and combined with a cohort component projection model. This yields probabilistic projections of any population quantity of interest. The method is illustrated for five countries of different demographic stages, continents and sizes. The method is validated by an out of sample experiment in which data from 1950–1990 are used for estimation, and applied to predict 1990–2010. The method appears reasonably accurate and well calibrated for this period. The results suggest that the current United Nations high and low variants greatly underestimate uncertainty about the number of oldest old from about 2050 and that they underestimate uncertainty for high fertility countries and overstate uncertainty for countries that have completed the demographic transition and whose fertility has started to recover towards replacement level, mostly in Europe. The results also indicate that the potential support ratio (persons aged 20–64 per person aged 65+) will almost certainly decline dramatically in most countries over the coming decades. PMID:22908249
Zhu, Xiang; Stephens, Matthew
2017-01-01
Bayesian methods for large-scale multiple regression provide attractive approaches to the analysis of genome-wide association studies (GWAS). For example, they can estimate heritability of complex traits, allowing for both polygenic and sparse models; and by incorporating external genomic data into the priors, they can increase power and yield new biological insights. However, these methods require access to individual genotypes and phenotypes, which are often not easily available. Here we provide a framework for performing these analyses without individual-level data. Specifically, we introduce a “Regression with Summary Statistics” (RSS) likelihood, which relates the multiple regression coefficients to univariate regression results that are often easily available. The RSS likelihood requires estimates of correlations among covariates (SNPs), which also can be obtained from public databases. We perform Bayesian multiple regression analysis by combining the RSS likelihood with previously proposed prior distributions, sampling posteriors by Markov chain Monte Carlo. In a wide range of simulations RSS performs similarly to analyses using the individual data, both for estimating heritability and detecting associations. We apply RSS to a GWAS of human height that contains 253,288 individuals typed at 1.06 million SNPs, for which analyses of individual-level data are practically impossible. Estimates of heritability (52%) are consistent with, but more precise, than previous results using subsets of these data. We also identify many previously unreported loci that show evidence for association with height in our analyses. Software is available at https://github.com/stephenslab/rss. PMID:29399241
A Bayesian method for assessing multiscalespecies-habitat relationships
Stuber, Erica F.; Gruber, Lutz F.; Fontaine, Joseph J.
2017-01-01
ContextScientists face several theoretical and methodological challenges in appropriately describing fundamental wildlife-habitat relationships in models. The spatial scales of habitat relationships are often unknown, and are expected to follow a multi-scale hierarchy. Typical frequentist or information theoretic approaches often suffer under collinearity in multi-scale studies, fail to converge when models are complex or represent an intractable computational burden when candidate model sets are large.ObjectivesOur objective was to implement an automated, Bayesian method for inference on the spatial scales of habitat variables that best predict animal abundance.MethodsWe introduce Bayesian latent indicator scale selection (BLISS), a Bayesian method to select spatial scales of predictors using latent scale indicator variables that are estimated with reversible-jump Markov chain Monte Carlo sampling. BLISS does not suffer from collinearity, and substantially reduces computation time of studies. We present a simulation study to validate our method and apply our method to a case-study of land cover predictors for ring-necked pheasant (Phasianus colchicus) abundance in Nebraska, USA.ResultsOur method returns accurate descriptions of the explanatory power of multiple spatial scales, and unbiased and precise parameter estimates under commonly encountered data limitations including spatial scale autocorrelation, effect size, and sample size. BLISS outperforms commonly used model selection methods including stepwise and AIC, and reduces runtime by 90%.ConclusionsGiven the pervasiveness of scale-dependency in ecology, and the implications of mismatches between the scales of analyses and ecological processes, identifying the spatial scales over which species are integrating habitat information is an important step in understanding species-habitat relationships. BLISS is a widely applicable method for identifying important spatial scales, propagating scale uncertainty, and testing hypotheses of scaling relationships.
NASA Astrophysics Data System (ADS)
Nomura, Shunichi; Ogata, Yosihiko
2016-04-01
We propose a Bayesian method of probability forecasting for recurrent earthquakes of inland active faults in Japan. Renewal processes with the Brownian Passage Time (BPT) distribution are applied for over a half of active faults in Japan by the Headquarters for Earthquake Research Promotion (HERP) of Japan. Long-term forecast with the BPT distribution needs two parameters; the mean and coefficient of variation (COV) for recurrence intervals. The HERP applies a common COV parameter for all of these faults because most of them have very few specified paleoseismic events, which is not enough to estimate reliable COV values for respective faults. However, different COV estimates are proposed for the same paleoseismic catalog by some related works. It can make critical difference in forecast to apply different COV estimates and so COV should be carefully selected for individual faults. Recurrence intervals on a fault are, on the average, determined by the long-term slip rate caused by the tectonic motion but fluctuated by nearby seismicities which influence surrounding stress field. The COVs of recurrence intervals depend on such stress perturbation and so have spatial trends due to the heterogeneity of tectonic motion and seismicity. Thus we introduce a spatial structure on its COV parameter by Bayesian modeling with a Gaussian process prior. The COVs on active faults are correlated and take similar values for closely located faults. It is found that the spatial trends in the estimated COV values coincide with the density of active faults in Japan. We also show Bayesian forecasts by the proposed model using Markov chain Monte Carlo method. Our forecasts are different from HERP's forecast especially on the active faults where HERP's forecasts are very high or low.
Nicoulaud-Gouin, V; Garcia-Sanchez, L; Giacalone, M; Attard, J C; Martin-Garin, A; Bois, F Y
2016-10-01
This paper addresses the methodological conditions -particularly experimental design and statistical inference- ensuring the identifiability of sorption parameters from breakthrough curves measured during stirred flow-through reactor experiments also known as continuous flow stirred-tank reactor (CSTR) experiments. The equilibrium-kinetic (EK) sorption model was selected as nonequilibrium parameterization embedding the K d approach. Parameter identifiability was studied formally on the equations governing outlet concentrations. It was also studied numerically on 6 simulated CSTR experiments on a soil with known equilibrium-kinetic sorption parameters. EK sorption parameters can not be identified from a single breakthrough curve of a CSTR experiment, because K d,1 and k - were diagnosed collinear. For pairs of CSTR experiments, Bayesian inference allowed to select the correct models of sorption and error among sorption alternatives. Bayesian inference was conducted with SAMCAT software (Sensitivity Analysis and Markov Chain simulations Applied to Transfer models) which launched the simulations through the embedded simulation engine GNU-MCSim, and automated their configuration and post-processing. Experimental designs consisting in varying flow rates between experiments reaching equilibrium at contamination stage were found optimal, because they simultaneously gave accurate sorption parameters and predictions. Bayesian results were comparable to maximum likehood method but they avoided convergence problems, the marginal likelihood allowed to compare all models, and credible interval gave directly the uncertainty of sorption parameters θ. Although these findings are limited to the specific conditions studied here, in particular the considered sorption model, the chosen parameter values and error structure, they help in the conception and analysis of future CSTR experiments with radionuclides whose kinetic behaviour is suspected. Copyright © 2016 Elsevier Ltd. All rights reserved.
A Bayesian connectivity-based approach to constructing probabilistic gene regulatory networks.
Zhou, Xiaobo; Wang, Xiaodong; Pal, Ranadip; Ivanov, Ivan; Bittner, Michael; Dougherty, Edward R
2004-11-22
We have hypothesized that the construction of transcriptional regulatory networks using a method that optimizes connectivity would lead to regulation consistent with biological expectations. A key expectation is that the hypothetical networks should produce a few, very strong attractors, highly similar to the original observations, mimicking biological state stability and determinism. Another central expectation is that, since it is expected that the biological control is distributed and mutually reinforcing, interpretation of the observations should lead to a very small number of connection schemes. We propose a fully Bayesian approach to constructing probabilistic gene regulatory networks (PGRNs) that emphasizes network topology. The method computes the possible parent sets of each gene, the corresponding predictors and the associated probabilities based on a nonlinear perceptron model, using a reversible jump Markov chain Monte Carlo (MCMC) technique, and an MCMC method is employed to search the network configurations to find those with the highest Bayesian scores to construct the PGRN. The Bayesian method has been used to construct a PGRN based on the observed behavior of a set of genes whose expression patterns vary across a set of melanoma samples exhibiting two very different phenotypes with respect to cell motility and invasiveness. Key biological features have been faithfully reflected in the model. Its steady-state distribution contains attractors that are either identical or very similar to the states observed in the data, and many of the attractors are singletons, which mimics the biological propensity to stably occupy a given state. Most interestingly, the connectivity rules for the most optimal generated networks constituting the PGRN are remarkably similar, as would be expected for a network operating on a distributed basis, with strong interactions between the components.
Xu, Wei-Wei; Hu, Shen-Jiang; Wu, Tao
2017-07-01
Antithrombotic therapy using new oral anticoagulants (NOACs) in patients with atrial fibrillation (AF) has been generally shown to have a favorable risk-benefit profile. Since there has been dispute about the risks of gastrointestinal bleeding (GIB) and intracranial hemorrhage (ICH), we sought to conduct a systematic review and network meta-analysis using Bayesian inference to analyze the risks of GIB and ICH in AF patients taking NOACs. We analyzed data from 20 randomized controlled trials of 91 671 AF patients receiving anticoagulants, antiplatelet drugs, or placebo. Bayesian network meta-analysis of two different evidence networks was performed using a binomial likelihood model, based on a network in which different agents (and doses) were treated as separate nodes. Odds ratios (ORs) and 95% confidence intervals (CIs) were modeled using Markov chain Monte Carlo methods. Indirect comparisons with the Bayesian model confirmed that aspirin+clopidogrel significantly increased the risk of GIB in AF patients compared to the placebo (OR 0.33, 95% CI 0.01-0.92). Warfarin was identified as greatly increasing the risk of ICH compared to edoxaban 30 mg (OR 3.42, 95% CI 1.22-7.24) and dabigatran 110 mg (OR 3.56, 95% CI 1.10-8.45). We further ranked the NOACs for the lowest risk of GIB (apixaban 5 mg) and ICH (apixaban 5 mg, dabigatran 110 mg, and edoxaban 30 mg). Bayesian network meta-analysis of treatment of non-valvular AF patients with anticoagulants suggested that NOACs do not increase risks of GIB and/or ICH, compared to each other.
Inference of reaction rate parameters based on summary statistics from experiments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Khalil, Mohammad; Chowdhary, Kamaljit Singh; Safta, Cosmin
Here, we present the results of an application of Bayesian inference and maximum entropy methods for the estimation of the joint probability density for the Arrhenius rate para meters of the rate coefficient of the H 2/O 2-mechanism chain branching reaction H + O 2 → OH + O. Available published data is in the form of summary statistics in terms of nominal values and error bars of the rate coefficient of this reaction at a number of temperature values obtained from shock-tube experiments. Our approach relies on generating data, in this case OH concentration profiles, consistent with the givenmore » summary statistics, using Approximate Bayesian Computation methods and a Markov Chain Monte Carlo procedure. The approach permits the forward propagation of parametric uncertainty through the computational model in a manner that is consistent with the published statistics. A consensus joint posterior on the parameters is obtained by pooling the posterior parameter densities given each consistent data set. To expedite this process, we construct efficient surrogates for the OH concentration using a combination of Pad'e and polynomial approximants. These surrogate models adequately represent forward model observables and their dependence on input parameters and are computationally efficient to allow their use in the Bayesian inference procedure. We also utilize Gauss-Hermite quadrature with Gaussian proposal probability density functions for moment computation resulting in orders of magnitude speedup in data likelihood evaluation. Despite the strong non-linearity in the model, the consistent data sets all res ult in nearly Gaussian conditional parameter probability density functions. The technique also accounts for nuisance parameters in the form of Arrhenius parameters of other rate coefficients with prescribed uncertainty. The resulting pooled parameter probability density function is propagated through stoichiometric hydrogen-air auto-ignition computations to illustrate the need to account for correlation among the Arrhenius rate parameters of one reaction and across rate parameters of different reactions.« less
Inference of reaction rate parameters based on summary statistics from experiments
Khalil, Mohammad; Chowdhary, Kamaljit Singh; Safta, Cosmin; ...
2016-10-15
Here, we present the results of an application of Bayesian inference and maximum entropy methods for the estimation of the joint probability density for the Arrhenius rate para meters of the rate coefficient of the H 2/O 2-mechanism chain branching reaction H + O 2 → OH + O. Available published data is in the form of summary statistics in terms of nominal values and error bars of the rate coefficient of this reaction at a number of temperature values obtained from shock-tube experiments. Our approach relies on generating data, in this case OH concentration profiles, consistent with the givenmore » summary statistics, using Approximate Bayesian Computation methods and a Markov Chain Monte Carlo procedure. The approach permits the forward propagation of parametric uncertainty through the computational model in a manner that is consistent with the published statistics. A consensus joint posterior on the parameters is obtained by pooling the posterior parameter densities given each consistent data set. To expedite this process, we construct efficient surrogates for the OH concentration using a combination of Pad'e and polynomial approximants. These surrogate models adequately represent forward model observables and their dependence on input parameters and are computationally efficient to allow their use in the Bayesian inference procedure. We also utilize Gauss-Hermite quadrature with Gaussian proposal probability density functions for moment computation resulting in orders of magnitude speedup in data likelihood evaluation. Despite the strong non-linearity in the model, the consistent data sets all res ult in nearly Gaussian conditional parameter probability density functions. The technique also accounts for nuisance parameters in the form of Arrhenius parameters of other rate coefficients with prescribed uncertainty. The resulting pooled parameter probability density function is propagated through stoichiometric hydrogen-air auto-ignition computations to illustrate the need to account for correlation among the Arrhenius rate parameters of one reaction and across rate parameters of different reactions.« less
Markov-modulated Markov chains and the covarion process of molecular evolution.
Galtier, N; Jean-Marie, A
2004-01-01
The covarion (or site specific rate variation, SSRV) process of biological sequence evolution is a process by which the evolutionary rate of a nucleotide/amino acid/codon position can change in time. In this paper, we introduce time-continuous, space-discrete, Markov-modulated Markov chains as a model for representing SSRV processes, generalizing existing theory to any model of rate change. We propose a fast algorithm for diagonalizing the generator matrix of relevant Markov-modulated Markov processes. This algorithm makes phylogeny likelihood calculation tractable even for a large number of rate classes and a large number of states, so that SSRV models become applicable to amino acid or codon sequence datasets. Using this algorithm, we investigate the accuracy of the discrete approximation to the Gamma distribution of evolutionary rates, widely used in molecular phylogeny. We show that a relatively large number of classes is required to achieve accurate approximation of the exact likelihood when the number of analyzed sequences exceeds 20, both under the SSRV and among site rate variation (ASRV) models.
Quantum Mechanics, Pattern Recognition, and the Mammalian Brain
NASA Astrophysics Data System (ADS)
Chapline, George
2008-10-01
Although the usual way of representing Markov processes is time asymmetric, there is a way of describing Markov processes, due to Schrodinger, which is time symmetric. This observation provides a link between quantum mechanics and the layered Bayesian networks that are often used in automated pattern recognition systems. In particular, there is a striking formal similarity between quantum mechanics and a particular type of Bayesian network, the Helmholtz machine, which provides a plausible model for how the mammalian brain recognizes important environmental situations. One interesting aspect of this relationship is that the "wake-sleep" algorithm for training a Helmholtz machine is very similar to the problem of finding the potential for the multi-channel Schrodinger equation. As a practical application of this insight it may be possible to use inverse scattering techniques to study the relationship between human brain wave patterns, pattern recognition, and learning. We also comment on whether there is a relationship between quantum measurements and consciousness.
Fast Bayesian Inference of Copy Number Variants using Hidden Markov Models with Wavelet Compression
Wiedenhoeft, John; Brugel, Eric; Schliep, Alexander
2016-01-01
By integrating Haar wavelets with Hidden Markov Models, we achieve drastically reduced running times for Bayesian inference using Forward-Backward Gibbs sampling. We show that this improves detection of genomic copy number variants (CNV) in array CGH experiments compared to the state-of-the-art, including standard Gibbs sampling. The method concentrates computational effort on chromosomal segments which are difficult to call, by dynamically and adaptively recomputing consecutive blocks of observations likely to share a copy number. This makes routine diagnostic use and re-analysis of legacy data collections feasible; to this end, we also propose an effective automatic prior. An open source software implementation of our method is available at http://schlieplab.org/Software/HaMMLET/ (DOI: 10.5281/zenodo.46262). This paper was selected for oral presentation at RECOMB 2016, and an abstract is published in the conference proceedings. PMID:27177143
The cutoff phenomenon in finite Markov chains.
Diaconis, P
1996-01-01
Natural mixing processes modeled by Markov chains often show a sharp cutoff in their convergence to long-time behavior. This paper presents problems where the cutoff can be proved (card shuffling, the Ehrenfests' urn). It shows that chains with polynomial growth (drunkard's walk) do not show cutoffs. The best general understanding of such cutoffs (high multiplicity of second eigenvalues due to symmetry) is explored. Examples are given where the symmetry is broken but the cutoff phenomenon persists. PMID:11607633
Experiences with Markov Chain Monte Carlo Convergence Assessment in Two Psychometric Examples
ERIC Educational Resources Information Center
Sinharay, Sandip
2004-01-01
There is an increasing use of Markov chain Monte Carlo (MCMC) algorithms for fitting statistical models in psychometrics, especially in situations where the traditional estimation techniques are very difficult to apply. One of the disadvantages of using an MCMC algorithm is that it is not straightforward to determine the convergence of the…
A Markov Chain Monte Carlo Approach to Confirmatory Item Factor Analysis
ERIC Educational Resources Information Center
Edwards, Michael C.
2010-01-01
Item factor analysis has a rich tradition in both the structural equation modeling and item response theory frameworks. The goal of this paper is to demonstrate a novel combination of various Markov chain Monte Carlo (MCMC) estimation routines to estimate parameters of a wide variety of confirmatory item factor analysis models. Further, I show…
Markov Chain Monte Carlo Estimation of Item Parameters for the Generalized Graded Unfolding Model
ERIC Educational Resources Information Center
de la Torre, Jimmy; Stark, Stephen; Chernyshenko, Oleksandr S.
2006-01-01
The authors present a Markov Chain Monte Carlo (MCMC) parameter estimation procedure for the generalized graded unfolding model (GGUM) and compare it to the marginal maximum likelihood (MML) approach implemented in the GGUM2000 computer program, using simulated and real personality data. In the simulation study, test length, number of response…
Chutes and Ladders for the Impatient
ERIC Educational Resources Information Center
Cheteyan, Leslie A.; Hengeveld, Stewart; Jones, Michael A.
2011-01-01
In this paper, we review the rules and game board for "Chutes and Ladders", define a Markov chain to model the game regardless of the spinner range, and describe how properties of Markov chains are used to determine that an optimal spinner range of 15 minimizes the expected number of turns for a player to complete the game. Because the Markov…
Students' Progress throughout Examination Process as a Markov Chain
ERIC Educational Resources Information Center
Hlavatý, Robert; Dömeová, Ludmila
2014-01-01
The paper is focused on students of Mathematical methods in economics at the Czech university of life sciences (CULS) in Prague. The idea is to create a model of students' progress throughout the whole course using the Markov chain approach. Each student has to go through various stages of the course requirements where his success depends on the…
NASA Astrophysics Data System (ADS)
Julie, Hongki; Pasaribu, Udjianna S.; Pancoro, Adi
2015-12-01
This paper will allow Markov Chain's application in genome shared identical by descent by two individual at full sibs model. The full sibs model was a continuous time Markov Chain with three state. In the full sibs model, we look for the cumulative distribution function of the number of sub segment which have 2 IBD haplotypes from a segment of the chromosome which the length is t Morgan and the cumulative distribution function of the number of sub segment which have at least 1 IBD haplotypes from a segment of the chromosome which the length is t Morgan. This cumulative distribution function will be developed by the moment generating function.
Network Security Risk Assessment System Based on Attack Graph and Markov Chain
NASA Astrophysics Data System (ADS)
Sun, Fuxiong; Pi, Juntao; Lv, Jin; Cao, Tian
2017-10-01
Network security risk assessment technology can be found in advance of the network problems and related vulnerabilities, it has become an important means to solve the problem of network security. Based on attack graph and Markov chain, this paper provides a Network Security Risk Assessment Model (NSRAM). Based on the network infiltration tests, NSRAM generates the attack graph by the breadth traversal algorithm. Combines with the international standard CVSS, the attack probability of atomic nodes are counted, and then the attack transition probabilities of ones are calculated by Markov chain. NSRAM selects the optimal attack path after comprehensive measurement to assessment network security risk. The simulation results show that NSRAM can reflect the actual situation of network security objectively.
An 'adding' algorithm for the Markov chain formalism for radiation transfer
NASA Technical Reports Server (NTRS)
Esposito, L. W.
1979-01-01
An adding algorithm is presented, that extends the Markov chain method and considers a preceding calculation as a single state of a new Markov chain. This method takes advantage of the description of the radiation transport as a stochastic process. Successive application of this procedure makes calculation possible for any optical depth without increasing the size of the linear system used. It is determined that the time required for the algorithm is comparable to that for a doubling calculation for homogeneous atmospheres. For an inhomogeneous atmosphere the new method is considerably faster than the standard adding routine. It is concluded that the algorithm is efficient, accurate, and suitable for smaller computers in calculating the diffuse intensity scattered by an inhomogeneous planetary atmosphere.
Metastable Distributions of Markov Chains with Rare Transitions
NASA Astrophysics Data System (ADS)
Freidlin, M.; Koralov, L.
2017-06-01
In this paper we consider Markov chains X^\\varepsilon _t with transition rates that depend on a small parameter \\varepsilon . We are interested in the long time behavior of X^\\varepsilon _t at various \\varepsilon -dependent time scales t = t(\\varepsilon ). The asymptotic behavior depends on how the point (1/\\varepsilon , t(\\varepsilon )) approaches infinity. We introduce a general notion of complete asymptotic regularity (a certain asymptotic relation between the ratios of transition rates), which ensures the existence of the metastable distribution for each initial point and a given time scale t(\\varepsilon ). The technique of i-graphs allows one to describe the metastable distribution explicitly. The result may be viewed as a generalization of the ergodic theorem to the case of parameter-dependent Markov chains.
NASA Astrophysics Data System (ADS)
Faizrahnemoon, Mahsa; Schlote, Arieh; Maggi, Lorenzo; Crisostomi, Emanuele; Shorten, Robert
2015-11-01
This paper describes a Markov-chain-based approach to modelling multi-modal transportation networks. An advantage of the model is the ability to accommodate complex dynamics and handle huge amounts of data. The transition matrix of the Markov chain is built and the model is validated using the data extracted from a traffic simulator. A realistic test-case using multi-modal data from the city of London is given to further support the ability of the proposed methodology to handle big quantities of data. Then, we use the Markov chain as a control tool to improve the overall efficiency of a transportation network, and some practical examples are described to illustrate the potentials of the approach.
Cognitive diagnosis modelling incorporating item response times.
Zhan, Peida; Jiao, Hong; Liao, Dandan
2018-05-01
To provide more refined diagnostic feedback with collateral information in item response times (RTs), this study proposed joint modelling of attributes and response speed using item responses and RTs simultaneously for cognitive diagnosis. For illustration, an extended deterministic input, noisy 'and' gate (DINA) model was proposed for joint modelling of responses and RTs. Model parameter estimation was explored using the Bayesian Markov chain Monte Carlo (MCMC) method. The PISA 2012 computer-based mathematics data were analysed first. These real data estimates were treated as true values in a subsequent simulation study. A follow-up simulation study with ideal testing conditions was conducted as well to further evaluate model parameter recovery. The results indicated that model parameters could be well recovered using the MCMC approach. Further, incorporating RTs into the DINA model would improve attribute and profile correct classification rates and result in more accurate and precise estimation of the model parameters. © 2017 The British Psychological Society.
Molecular phylogeny of the Drusinae (Trichoptera: Limnephilidae): preliminary results
NASA Astrophysics Data System (ADS)
Pauls, S.; Lumbsch, T.; Haase, P.
2005-05-01
We examine the phylogenetic relationships within the subfamily of the Drusinae using molecular markers. Sequence data from two mitochondrial loci (mitochondrial cytochrome oxidase I, mitochondrial ribosomal large subunit) are used to infer the relationships within and among the genera of the Drusinae. Sequence data were generated for 21 taxa from five genera from the subfamily. The molecular data were analyzed using a Bayesian Markov Chain Monte Carlo and a Maximum Parsimony approach for both single gene and combined data sets. Several hypotheses of relationships previously inferred based on morphological characters were tested. The study revealed a very close relationship between Drusus discolor and D. romanicus suggesting that divergence between these two species occurred recently. The relationships inferred by molecular data suggest that larval morphology may be an important taxonomic character, which has often been neglected. The data also indicate that the genera Ecclisopteryx and Drusus are polyphyletic with respect to one another.
Shehla, Romana; Khan, Athar Ali
2016-01-01
Models with bathtub-shaped hazard function have been widely accepted in the field of reliability and medicine and are particularly useful in reliability related decision making and cost analysis. In this paper, the exponential power model capable of assuming increasing as well as bathtub-shape, is studied. This article makes a Bayesian study of the same model and simultaneously shows how posterior simulations based on Markov chain Monte Carlo algorithms can be straightforward and routine in R. The study is carried out for complete as well as censored data, under the assumption of weakly-informative priors for the parameters. In addition to this, inference interest focuses on the posterior distribution of non-linear functions of the parameters. Also, the model has been extended to include continuous explanatory variables and R-codes are well illustrated. Two real data sets are considered for illustrative purposes.
The natural mathematics of behavior analysis.
Li, Don; Hautus, Michael J; Elliffe, Douglas
2018-04-19
Models that generate event records have very general scope regarding the dimensions of the target behavior that we measure. From a set of predicted event records, we can generate predictions for any dependent variable that we could compute from the event records of our subjects. In this sense, models that generate event records permit us a freely multivariate analysis. To explore this proposition, we conducted a multivariate examination of Catania's Operant Reserve on single VI schedules in transition using a Markov Chain Monte Carlo scheme for Approximate Bayesian Computation. Although we found systematic deviations between our implementation of Catania's Operant Reserve and our observed data (e.g., mismatches in the shape of the interresponse time distributions), the general approach that we have demonstrated represents an avenue for modelling behavior that transcends the typical constraints of algebraic models. © 2018 Society for the Experimental Analysis of Behavior.
Cosmological parameter estimation using Particle Swarm Optimization
NASA Astrophysics Data System (ADS)
Prasad, J.; Souradeep, T.
2014-03-01
Constraining parameters of a theoretical model from observational data is an important exercise in cosmology. There are many theoretically motivated models, which demand greater number of cosmological parameters than the standard model of cosmology uses, and make the problem of parameter estimation challenging. It is a common practice to employ Bayesian formalism for parameter estimation for which, in general, likelihood surface is probed. For the standard cosmological model with six parameters, likelihood surface is quite smooth and does not have local maxima, and sampling based methods like Markov Chain Monte Carlo (MCMC) method are quite successful. However, when there are a large number of parameters or the likelihood surface is not smooth, other methods may be more effective. In this paper, we have demonstrated application of another method inspired from artificial intelligence, called Particle Swarm Optimization (PSO) for estimating cosmological parameters from Cosmic Microwave Background (CMB) data taken from the WMAP satellite.
Analysing child mortality in Nigeria with geoadditive discrete-time survival models.
Adebayo, Samson B; Fahrmeir, Ludwig
2005-03-15
Child mortality reflects a country's level of socio-economic development and quality of life. In developing countries, mortality rates are not only influenced by socio-economic, demographic and health variables but they also vary considerably across regions and districts. In this paper, we analysed child mortality in Nigeria with flexible geoadditive discrete-time survival models. This class of models allows us to measure small-area district-specific spatial effects simultaneously with possibly non-linear or time-varying effects of other factors. Inference is fully Bayesian and uses computationally efficient Markov chain Monte Carlo (MCMC) simulation techniques. The application is based on the 1999 Nigeria Demographic and Health Survey. Our method assesses effects at a high level of temporal and spatial resolution not available with traditional parametric models, and the results provide some evidence on how to reduce child mortality by improving socio-economic and public health conditions. Copyright (c) 2004 John Wiley & Sons, Ltd.
Learning Weight Uncertainty with Stochastic Gradient MCMC for Shape Classification
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Chunyuan; Stevens, Andrew J.; Chen, Changyou
2016-08-10
Learning the representation of shape cues in 2D & 3D objects for recognition is a fundamental task in computer vision. Deep neural networks (DNNs) have shown promising performance on this task. Due to the large variability of shapes, accurate recognition relies on good estimates of model uncertainty, ignored in traditional training of DNNs, typically learned via stochastic optimization. This paper leverages recent advances in stochastic gradient Markov Chain Monte Carlo (SG-MCMC) to learn weight uncertainty in DNNs. It yields principled Bayesian interpretations for the commonly used Dropout/DropConnect techniques and incorporates them into the SG-MCMC framework. Extensive experiments on 2D &more » 3D shape datasets and various DNN models demonstrate the superiority of the proposed approach over stochastic optimization. Our approach yields higher recognition accuracy when used in conjunction with Dropout and Batch-Normalization.« less
Volcanic Eruption Forecasts From Accelerating Rates of Drumbeat Long-Period Earthquakes
NASA Astrophysics Data System (ADS)
Bell, Andrew F.; Naylor, Mark; Hernandez, Stephen; Main, Ian G.; Gaunt, H. Elizabeth; Mothes, Patricia; Ruiz, Mario
2018-02-01
Accelerating rates of quasiperiodic "drumbeat" long-period earthquakes (LPs) are commonly reported before eruptions at andesite and dacite volcanoes, and promise insights into the nature of fundamental preeruptive processes and improved eruption forecasts. Here we apply a new Bayesian Markov chain Monte Carlo gamma point process methodology to investigate an exceptionally well-developed sequence of drumbeat LPs preceding a recent large vulcanian explosion at Tungurahua volcano, Ecuador. For more than 24 hr, LP rates increased according to the inverse power law trend predicted by material failure theory, and with a retrospectively forecast failure time that agrees with the eruption onset within error. LPs resulted from repeated activation of a single characteristic source driven by accelerating loading, rather than a distributed failure process, showing that similar precursory trends can emerge from quite different underlying physics. Nevertheless, such sequences have clear potential for improving forecasts of eruptions at Tungurahua and analogous volcanoes.
Sampling schemes and parameter estimation for nonlinear Bernoulli-Gaussian sparse models
NASA Astrophysics Data System (ADS)
Boudineau, Mégane; Carfantan, Hervé; Bourguignon, Sébastien; Bazot, Michael
2016-06-01
We address the sparse approximation problem in the case where the data are approximated by the linear combination of a small number of elementary signals, each of these signals depending non-linearly on additional parameters. Sparsity is explicitly expressed through a Bernoulli-Gaussian hierarchical model in a Bayesian framework. Posterior mean estimates are computed using Markov Chain Monte-Carlo algorithms. We generalize the partially marginalized Gibbs sampler proposed in the linear case in [1], and build an hybrid Hastings-within-Gibbs algorithm in order to account for the nonlinear parameters. All model parameters are then estimated in an unsupervised procedure. The resulting method is evaluated on a sparse spectral analysis problem. It is shown to converge more efficiently than the classical joint estimation procedure, with only a slight increase of the computational cost per iteration, consequently reducing the global cost of the estimation procedure.
Trans-dimensional MCMC methods for fully automatic motion analysis in tagged MRI.
Smal, Ihor; Carranza-Herrezuelo, Noemí; Klein, Stefan; Niessen, Wiro; Meijering, Erik
2011-01-01
Tagged magnetic resonance imaging (tMRI) is a well-known noninvasive method allowing quantitative analysis of regional heart dynamics. Its clinical use has so far been limited, in part due to the lack of robustness and accuracy of existing tag tracking algorithms in dealing with low (and intrinsically time-varying) image quality. In this paper, we propose a novel probabilistic method for tag tracking, implemented by means of Bayesian particle filtering and a trans-dimensional Markov chain Monte Carlo (MCMC) approach, which efficiently combines information about the imaging process and tag appearance with prior knowledge about the heart dynamics obtained by means of non-rigid image registration. Experiments using synthetic image data (with ground truth) and real data (with expert manual annotation) from preclinical (small animal) and clinical (human) studies confirm that the proposed method yields higher consistency, accuracy, and intrinsic tag reliability assessment in comparison with other frequently used tag tracking methods.
Multinomial mixture model with heterogeneous classification probabilities
Holland, M.D.; Gray, B.R.
2011-01-01
Royle and Link (Ecology 86(9):2505-2512, 2005) proposed an analytical method that allowed estimation of multinomial distribution parameters and classification probabilities from categorical data measured with error. While useful, we demonstrate algebraically and by simulations that this method yields biased multinomial parameter estimates when the probabilities of correct category classifications vary among sampling units. We address this shortcoming by treating these probabilities as logit-normal random variables within a Bayesian framework. We use Markov chain Monte Carlo to compute Bayes estimates from a simulated sample from the posterior distribution. Based on simulations, this elaborated Royle-Link model yields nearly unbiased estimates of multinomial and correct classification probability estimates when classification probabilities are allowed to vary according to the normal distribution on the logit scale or according to the Beta distribution. The method is illustrated using categorical submersed aquatic vegetation data. ?? 2010 Springer Science+Business Media, LLC.
Modeling haplotype block variation using Markov chains.
Greenspan, G; Geiger, D
2006-04-01
Models of background variation in genomic regions form the basis of linkage disequilibrium mapping methods. In this work we analyze a background model that groups SNPs into haplotype blocks and represents the dependencies between blocks by a Markov chain. We develop an error measure to compare the performance of this model against the common model that assumes that blocks are independent. By examining data from the International Haplotype Mapping project, we show how the Markov model over haplotype blocks is most accurate when representing blocks in strong linkage disequilibrium. This contrasts with the independent model, which is rendered less accurate by linkage disequilibrium. We provide a theoretical explanation for this surprising property of the Markov model and relate its behavior to allele diversity.
Modeling Haplotype Block Variation Using Markov Chains
Greenspan, G.; Geiger, D.
2006-01-01
Models of background variation in genomic regions form the basis of linkage disequilibrium mapping methods. In this work we analyze a background model that groups SNPs into haplotype blocks and represents the dependencies between blocks by a Markov chain. We develop an error measure to compare the performance of this model against the common model that assumes that blocks are independent. By examining data from the International Haplotype Mapping project, we show how the Markov model over haplotype blocks is most accurate when representing blocks in strong linkage disequilibrium. This contrasts with the independent model, which is rendered less accurate by linkage disequilibrium. We provide a theoretical explanation for this surprising property of the Markov model and relate its behavior to allele diversity. PMID:16361244
Efficient Algorithms for Bayesian Network Parameter Learning from Incomplete Data
2015-07-01
Efficient Algorithms for Bayesian Network Parameter Learning from Incomplete Data Guy Van den Broeck∗ and Karthika Mohan∗ and Arthur Choi and Adnan ...notwithstanding any other provision of law , no person shall be subject to a penalty for failing to comply with a collection of information if it does...Wasserman, L. (2011). All of Statistics. Springer Science & Business Media. Yaramakala, S., & Margaritis, D. (2005). Speculative markov blanket discovery for optimal feature selection. In Proceedings of ICDM.
Variable context Markov chains for HIV protease cleavage site prediction.
Oğul, Hasan
2009-06-01
Deciphering the knowledge of HIV protease specificity and developing computational tools for detecting its cleavage sites in protein polypeptide chain are very desirable for designing efficient and specific chemical inhibitors to prevent acquired immunodeficiency syndrome. In this study, we developed a generative model based on a generalization of variable order Markov chains (VOMC) for peptide sequences and adapted the model for prediction of their cleavability by certain proteases. The new method, called variable context Markov chains (VCMC), attempts to identify the context equivalence based on the evolutionary similarities between individual amino acids. It was applied for HIV-1 protease cleavage site prediction problem and shown to outperform existing methods in terms of prediction accuracy on a common dataset. In general, the method is a promising tool for prediction of cleavage sites of all proteases and encouraged to be used for any kind of peptide classification problem as well.
Classification of customer lifetime value models using Markov chain
NASA Astrophysics Data System (ADS)
Permana, Dony; Pasaribu, Udjianna S.; Indratno, Sapto W.; Suprayogi
2017-10-01
A firm’s potential reward in future time from a customer can be determined by customer lifetime value (CLV). There are some mathematic methods to calculate it. One method is using Markov chain stochastic model. Here, a customer is assumed through some states. Transition inter the states follow Markovian properties. If we are given some states for a customer and the relationships inter states, then we can make some Markov models to describe the properties of the customer. As Markov models, CLV is defined as a vector contains CLV for a customer in the first state. In this paper we make a classification of Markov Models to calculate CLV. Start from two states of customer model, we make develop in many states models. The development a model is based on weaknesses in previous model. Some last models can be expected to describe how real characters of customers in a firm.
NASA Astrophysics Data System (ADS)
Zhang, D.; Liao, Q.
2016-12-01
The Bayesian inference provides a convenient framework to solve statistical inverse problems. In this method, the parameters to be identified are treated as random variables. The prior knowledge, the system nonlinearity, and the measurement errors can be directly incorporated in the posterior probability density function (PDF) of the parameters. The Markov chain Monte Carlo (MCMC) method is a powerful tool to generate samples from the posterior PDF. However, since the MCMC usually requires thousands or even millions of forward simulations, it can be a computationally intensive endeavor, particularly when faced with large-scale flow and transport models. To address this issue, we construct a surrogate system for the model responses in the form of polynomials by the stochastic collocation method. In addition, we employ interpolation based on the nested sparse grids and takes into account the different importance of the parameters, under the condition of high random dimensions in the stochastic space. Furthermore, in case of low regularity such as discontinuous or unsmooth relation between the input parameters and the output responses, we introduce an additional transform process to improve the accuracy of the surrogate model. Once we build the surrogate system, we may evaluate the likelihood with very little computational cost. We analyzed the convergence rate of the forward solution and the surrogate posterior by Kullback-Leibler divergence, which quantifies the difference between probability distributions. The fast convergence of the forward solution implies fast convergence of the surrogate posterior to the true posterior. We also tested the proposed algorithm on water-flooding two-phase flow reservoir examples. The posterior PDF calculated from a very long chain with direct forward simulation is assumed to be accurate. The posterior PDF calculated using the surrogate model is in reasonable agreement with the reference, revealing a great improvement in terms of computational efficiency.
Bayesian structural inference for hidden processes.
Strelioff, Christopher C; Crutchfield, James P
2014-04-01
We introduce a Bayesian approach to discovering patterns in structurally complex processes. The proposed method of Bayesian structural inference (BSI) relies on a set of candidate unifilar hidden Markov model (uHMM) topologies for inference of process structure from a data series. We employ a recently developed exact enumeration of topological ε-machines. (A sequel then removes the topological restriction.) This subset of the uHMM topologies has the added benefit that inferred models are guaranteed to be ε-machines, irrespective of estimated transition probabilities. Properties of ε-machines and uHMMs allow for the derivation of analytic expressions for estimating transition probabilities, inferring start states, and comparing the posterior probability of candidate model topologies, despite process internal structure being only indirectly present in data. We demonstrate BSI's effectiveness in estimating a process's randomness, as reflected by the Shannon entropy rate, and its structure, as quantified by the statistical complexity. We also compare using the posterior distribution over candidate models and the single, maximum a posteriori model for point estimation and show that the former more accurately reflects uncertainty in estimated values. We apply BSI to in-class examples of finite- and infinite-order Markov processes, as well to an out-of-class, infinite-state hidden process.
Bayesian structural inference for hidden processes
NASA Astrophysics Data System (ADS)
Strelioff, Christopher C.; Crutchfield, James P.
2014-04-01
We introduce a Bayesian approach to discovering patterns in structurally complex processes. The proposed method of Bayesian structural inference (BSI) relies on a set of candidate unifilar hidden Markov model (uHMM) topologies for inference of process structure from a data series. We employ a recently developed exact enumeration of topological ɛ-machines. (A sequel then removes the topological restriction.) This subset of the uHMM topologies has the added benefit that inferred models are guaranteed to be ɛ-machines, irrespective of estimated transition probabilities. Properties of ɛ-machines and uHMMs allow for the derivation of analytic expressions for estimating transition probabilities, inferring start states, and comparing the posterior probability of candidate model topologies, despite process internal structure being only indirectly present in data. We demonstrate BSI's effectiveness in estimating a process's randomness, as reflected by the Shannon entropy rate, and its structure, as quantified by the statistical complexity. We also compare using the posterior distribution over candidate models and the single, maximum a posteriori model for point estimation and show that the former more accurately reflects uncertainty in estimated values. We apply BSI to in-class examples of finite- and infinite-order Markov processes, as well to an out-of-class, infinite-state hidden process.
An Application of Bayesian Approach in Modeling Risk of Death in an Intensive Care Unit
Wong, Rowena Syn Yin; Ismail, Noor Azina
2016-01-01
Background and Objectives There are not many studies that attempt to model intensive care unit (ICU) risk of death in developing countries, especially in South East Asia. The aim of this study was to propose and describe application of a Bayesian approach in modeling in-ICU deaths in a Malaysian ICU. Methods This was a prospective study in a mixed medical-surgery ICU in a multidisciplinary tertiary referral hospital in Malaysia. Data collection included variables that were defined in Acute Physiology and Chronic Health Evaluation IV (APACHE IV) model. Bayesian Markov Chain Monte Carlo (MCMC) simulation approach was applied in the development of four multivariate logistic regression predictive models for the ICU, where the main outcome measure was in-ICU mortality risk. The performance of the models were assessed through overall model fit, discrimination and calibration measures. Results from the Bayesian models were also compared against results obtained using frequentist maximum likelihood method. Results The study involved 1,286 consecutive ICU admissions between January 1, 2009 and June 30, 2010, of which 1,111 met the inclusion criteria. Patients who were admitted to the ICU were generally younger, predominantly male, with low co-morbidity load and mostly under mechanical ventilation. The overall in-ICU mortality rate was 18.5% and the overall mean Acute Physiology Score (APS) was 68.5. All four models exhibited good discrimination, with area under receiver operating characteristic curve (AUC) values approximately 0.8. Calibration was acceptable (Hosmer-Lemeshow p-values > 0.05) for all models, except for model M3. Model M1 was identified as the model with the best overall performance in this study. Conclusion Four prediction models were proposed, where the best model was chosen based on its overall performance in this study. This study has also demonstrated the promising potential of the Bayesian MCMC approach as an alternative in the analysis and modeling of in-ICU mortality outcomes. PMID:27007413
An Application of Bayesian Approach in Modeling Risk of Death in an Intensive Care Unit.
Wong, Rowena Syn Yin; Ismail, Noor Azina
2016-01-01
There are not many studies that attempt to model intensive care unit (ICU) risk of death in developing countries, especially in South East Asia. The aim of this study was to propose and describe application of a Bayesian approach in modeling in-ICU deaths in a Malaysian ICU. This was a prospective study in a mixed medical-surgery ICU in a multidisciplinary tertiary referral hospital in Malaysia. Data collection included variables that were defined in Acute Physiology and Chronic Health Evaluation IV (APACHE IV) model. Bayesian Markov Chain Monte Carlo (MCMC) simulation approach was applied in the development of four multivariate logistic regression predictive models for the ICU, where the main outcome measure was in-ICU mortality risk. The performance of the models were assessed through overall model fit, discrimination and calibration measures. Results from the Bayesian models were also compared against results obtained using frequentist maximum likelihood method. The study involved 1,286 consecutive ICU admissions between January 1, 2009 and June 30, 2010, of which 1,111 met the inclusion criteria. Patients who were admitted to the ICU were generally younger, predominantly male, with low co-morbidity load and mostly under mechanical ventilation. The overall in-ICU mortality rate was 18.5% and the overall mean Acute Physiology Score (APS) was 68.5. All four models exhibited good discrimination, with area under receiver operating characteristic curve (AUC) values approximately 0.8. Calibration was acceptable (Hosmer-Lemeshow p-values > 0.05) for all models, except for model M3. Model M1 was identified as the model with the best overall performance in this study. Four prediction models were proposed, where the best model was chosen based on its overall performance in this study. This study has also demonstrated the promising potential of the Bayesian MCMC approach as an alternative in the analysis and modeling of in-ICU mortality outcomes.
Bayesian models for comparative analysis integrating phylogenetic uncertainty.
de Villemereuil, Pierre; Wells, Jessie A; Edwards, Robert D; Blomberg, Simon P
2012-06-28
Uncertainty in comparative analyses can come from at least two sources: a) phylogenetic uncertainty in the tree topology or branch lengths, and b) uncertainty due to intraspecific variation in trait values, either due to measurement error or natural individual variation. Most phylogenetic comparative methods do not account for such uncertainties. Not accounting for these sources of uncertainty leads to false perceptions of precision (confidence intervals will be too narrow) and inflated significance in hypothesis testing (e.g. p-values will be too small). Although there is some application-specific software for fitting Bayesian models accounting for phylogenetic error, more general and flexible software is desirable. We developed models to directly incorporate phylogenetic uncertainty into a range of analyses that biologists commonly perform, using a Bayesian framework and Markov Chain Monte Carlo analyses. We demonstrate applications in linear regression, quantification of phylogenetic signal, and measurement error models. Phylogenetic uncertainty was incorporated by applying a prior distribution for the phylogeny, where this distribution consisted of the posterior tree sets from Bayesian phylogenetic tree estimation programs. The models were analysed using simulated data sets, and applied to a real data set on plant traits, from rainforest plant species in Northern Australia. Analyses were performed using the free and open source software OpenBUGS and JAGS. Incorporating phylogenetic uncertainty through an empirical prior distribution of trees leads to more precise estimation of regression model parameters than using a single consensus tree and enables a more realistic estimation of confidence intervals. In addition, models incorporating measurement errors and/or individual variation, in one or both variables, are easily formulated in the Bayesian framework. We show that BUGS is a useful, flexible general purpose tool for phylogenetic comparative analyses, particularly for modelling in the face of phylogenetic uncertainty and accounting for measurement error or individual variation in explanatory variables. Code for all models is provided in the BUGS model description language.
Bayesian models for comparative analysis integrating phylogenetic uncertainty
2012-01-01
Background Uncertainty in comparative analyses can come from at least two sources: a) phylogenetic uncertainty in the tree topology or branch lengths, and b) uncertainty due to intraspecific variation in trait values, either due to measurement error or natural individual variation. Most phylogenetic comparative methods do not account for such uncertainties. Not accounting for these sources of uncertainty leads to false perceptions of precision (confidence intervals will be too narrow) and inflated significance in hypothesis testing (e.g. p-values will be too small). Although there is some application-specific software for fitting Bayesian models accounting for phylogenetic error, more general and flexible software is desirable. Methods We developed models to directly incorporate phylogenetic uncertainty into a range of analyses that biologists commonly perform, using a Bayesian framework and Markov Chain Monte Carlo analyses. Results We demonstrate applications in linear regression, quantification of phylogenetic signal, and measurement error models. Phylogenetic uncertainty was incorporated by applying a prior distribution for the phylogeny, where this distribution consisted of the posterior tree sets from Bayesian phylogenetic tree estimation programs. The models were analysed using simulated data sets, and applied to a real data set on plant traits, from rainforest plant species in Northern Australia. Analyses were performed using the free and open source software OpenBUGS and JAGS. Conclusions Incorporating phylogenetic uncertainty through an empirical prior distribution of trees leads to more precise estimation of regression model parameters than using a single consensus tree and enables a more realistic estimation of confidence intervals. In addition, models incorporating measurement errors and/or individual variation, in one or both variables, are easily formulated in the Bayesian framework. We show that BUGS is a useful, flexible general purpose tool for phylogenetic comparative analyses, particularly for modelling in the face of phylogenetic uncertainty and accounting for measurement error or individual variation in explanatory variables. Code for all models is provided in the BUGS model description language. PMID:22741602
A fast Bayesian approach to discrete object detection in astronomical data sets - PowellSnakes I
NASA Astrophysics Data System (ADS)
Carvalho, Pedro; Rocha, Graça; Hobson, M. P.
2009-03-01
A new fast Bayesian approach is introduced for the detection of discrete objects immersed in a diffuse background. This new method, called PowellSnakes, speeds up traditional Bayesian techniques by (i) replacing the standard form of the likelihood for the parameters characterizing the discrete objects by an alternative exact form that is much quicker to evaluate; (ii) using a simultaneous multiple minimization code based on Powell's direction set algorithm to locate rapidly the local maxima in the posterior and (iii) deciding whether each located posterior peak corresponds to a real object by performing a Bayesian model selection using an approximate evidence value based on a local Gaussian approximation to the peak. The construction of this Gaussian approximation also provides the covariance matrix of the uncertainties in the derived parameter values for the object in question. This new approach provides a speed up in performance by a factor of `100' as compared to existing Bayesian source extraction methods that use Monte Carlo Markov chain to explore the parameter space, such as that presented by Hobson & McLachlan. The method can be implemented in either real or Fourier space. In the case of objects embedded in a homogeneous random field, working in Fourier space provides a further speed up that takes advantage of the fact that the correlation matrix of the background is circulant. We illustrate the capabilities of the method by applying to some simplified toy models. Furthermore, PowellSnakes has the advantage of consistently defining the threshold for acceptance/rejection based on priors which cannot be said of the frequentist methods. We present here the first implementation of this technique (version I). Further improvements to this implementation are currently under investigation and will be published shortly. The application of the method to realistic simulated Planck observations will be presented in a forthcoming publication.
Dong, Linsong; Wang, Zhiyong
2018-06-11
Genomic prediction is feasible for estimating genomic breeding values because of dense genome-wide markers and credible statistical methods, such as Genomic Best Linear Unbiased Prediction (GBLUP) and various Bayesian methods. Compared with GBLUP, Bayesian methods propose more flexible assumptions for the distributions of SNP effects. However, most Bayesian methods are performed based on Markov chain Monte Carlo (MCMC) algorithms, leading to computational efficiency challenges. Hence, some fast Bayesian approaches, such as fast BayesB (fBayesB), were proposed to speed up the calculation. This study proposed another fast Bayesian method termed fast BayesC (fBayesC). The prior distribution of fBayesC assumes that a SNP with probability γ has a non-zero effect which comes from a normal density with a common variance. The simulated data from QTLMAS XII workshop and actual data on large yellow croaker were used to compare the predictive results of fBayesB, fBayesC and (MCMC-based) BayesC. The results showed that when γ was set as a small value, such as 0.01 in the simulated data or 0.001 in the actual data, fBayesB and fBayesC yielded lower prediction accuracies (abilities) than BayesC. In the actual data, fBayesC could yield very similar predictive abilities as BayesC when γ ≥ 0.01. When γ = 0.01, fBayesB could also yield similar results as fBayesC and BayesC. However, fBayesB could not yield an explicit result when γ ≥ 0.1, but a similar situation was not observed for fBayesC. Moreover, the computational speed of fBayesC was significantly faster than that of BayesC, making fBayesC a promising method for genomic prediction.
Density Control of Multi-Agent Systems with Safety Constraints: A Markov Chain Approach
NASA Astrophysics Data System (ADS)
Demirer, Nazli
The control of systems with autonomous mobile agents has been a point of interest recently, with many applications like surveillance, coverage, searching over an area with probabilistic target locations or exploring an area. In all of these applications, the main goal of the swarm is to distribute itself over an operational space to achieve mission objectives specified by the density of swarm. This research focuses on the problem of controlling the distribution of multi-agent systems considering a hierarchical control structure where the whole swarm coordination is achieved at the high-level and individual vehicle/agent control is managed at the low-level. High-level coordination algorithms uses macroscopic models that describes the collective behavior of the whole swarm and specify the agent motion commands, whose execution will lead to the desired swarm behavior. The low-level control laws execute the motion to follow these commands at the agent level. The main objective of this research is to develop high-level decision control policies and algorithms to achieve physically realizable commanding of the agents by imposing mission constraints on the distribution. We also make some connections with decentralized low-level motion control. This dissertation proposes a Markov chain based method to control the density distribution of the whole system where the implementation can be achieved in a decentralized manner with no communication between agents since establishing communication with large number of agents is highly challenging. The ultimate goal is to guide the overall density distribution of the system to a prescribed steady-state desired distribution while satisfying desired transition and safety constraints. Here, the desired distribution is determined based on the mission requirements, for example in the application of area search, the desired distribution should match closely with the probabilistic target locations. The proposed method is applicable for both systems with a single agent and systems with large number of agents due to the probabilistic nature, where the probability distribution of each agent's state evolves according to a finite-state and discrete-time Markov chain (MC). Hence, designing proper decision control policies requires numerically tractable solution methods for the synthesis of Markov chains. The synthesis problem has the form of a Linear Matrix Inequality Problem (LMI), with LMI formulation of the constraints. To this end, we propose convex necessary and sufficient conditions for safety constraints in Markov chains, which is a novel result in the Markov chain literature. In addition to LMI-based, offline, Markov matrix synthesis method, we also propose a QP-based, online, method to compute a time-varying Markov matrix based on the real-time density feedback. Both problems are convex optimization problems that can be solved in a reliable and tractable way, utilizing existing tools in the literature. A Low Earth Orbit (LEO) swarm simulations are presented to validate the effectiveness of the proposed algorithms. Another problem tackled as a part of this research is the generalization of the density control problem to autonomous mobile agents with two control modes: ON and OFF. Here, each mode consists of a (possibly overlapping) finite set of actions, that is, there exist a set of actions for the ON mode and another set for the OFF mode. We give formulation for a new Markov chain synthesis problem, with additional measurements for the state transitions, where a policy is designed to ensure desired safety and convergence properties for the underlying Markov chain.
Building Simple Hidden Markov Models. Classroom Notes
ERIC Educational Resources Information Center
Ching, Wai-Ki; Ng, Michael K.
2004-01-01
Hidden Markov models (HMMs) are widely used in bioinformatics, speech recognition and many other areas. This note presents HMMs via the framework of classical Markov chain models. A simple example is given to illustrate the model. An estimation method for the transition probabilities of the hidden states is also discussed.
An Evaluation of a Markov Chain Monte Carlo Method for the Two-Parameter Logistic Model.
ERIC Educational Resources Information Center
Kim, Seock-Ho; Cohen, Allan S.
The accuracy of the Markov Chain Monte Carlo (MCMC) procedure Gibbs sampling was considered for estimation of item parameters of the two-parameter logistic model. Data for the Law School Admission Test (LSAT) Section 6 were analyzed to illustrate the MCMC procedure. In addition, simulated data sets were analyzed using the MCMC, marginal Bayesian…
ERIC Educational Resources Information Center
Helbock, Richard W.; Marker, Gordon
This study concerns the feasibility of a Markov chain model for projecting housing values and racial mixes. Such projections could be used in planning the layout of school districts to achieve desired levels of socioeconomic heterogeneity. Based upon the concepts and assumptions underlying a Markov chain model, it is concluded that such a model is…
ERIC Educational Resources Information Center
Wollack, James A.; Bolt, Daniel M.; Cohen, Allan S.; Lee, Young-Sun
2002-01-01
Compared the quality of item parameter estimates for marginal maximum likelihood (MML) and Markov Chain Monte Carlo (MCMC) with the nominal response model using simulation. The quality of item parameter recovery was nearly identical for MML and MCMC, and both methods tended to produce good estimates. (SLD)
On Correlated-noise Analyses Applied to Exoplanet Light Curves
NASA Astrophysics Data System (ADS)
Cubillos, Patricio; Harrington, Joseph; Loredo, Thomas J.; Lust, Nate B.; Blecic, Jasmina; Stemm, Madison
2017-01-01
Time-correlated noise is a significant source of uncertainty when modeling exoplanet light-curve data. A correct assessment of correlated noise is fundamental to determine the true statistical significance of our findings. Here, we review three of the most widely used correlated-noise estimators in the exoplanet field, the time-averaging, residual-permutation, and wavelet-likelihood methods. We argue that the residual-permutation method is unsound in estimating the uncertainty of parameter estimates. We thus recommend to refrain from this method altogether. We characterize the behavior of the time averaging’s rms-versus-bin-size curves at bin sizes similar to the total observation duration, which may lead to underestimated uncertainties. For the wavelet-likelihood method, we note errors in the published equations and provide a list of corrections. We further assess the performance of these techniques by injecting and retrieving eclipse signals into synthetic and real Spitzer light curves, analyzing the results in terms of the relative-accuracy and coverage-fraction statistics. Both the time-averaging and wavelet-likelihood methods significantly improve the estimate of the eclipse depth over a white-noise analysis (a Markov-chain Monte Carlo exploration assuming uncorrelated noise). However, the corrections are not perfect when retrieving the eclipse depth from Spitzer data sets, these methods covered the true (injected) depth within the 68% credible region in only ˜45%-65% of the trials. Lastly, we present our open-source model-fitting tool, Multi-Core Markov-Chain Monte Carlo (MC3). This package uses Bayesian statistics to estimate the best-fitting values and the credible regions for the parameters for a (user-provided) model. MC3 is a Python/C code, available at https://github.com/pcubillos/MCcubed.
Williams, Michael S; Cao, Yong; Ebel, Eric D
2013-07-15
Levels of pathogenic organisms in food and water have steadily declined in many parts of the world. A consequence of this reduction is that the proportion of samples that test positive for the most contaminated product-pathogen pairings has fallen to less than 0.1. While this is unequivocally beneficial to public health, datasets with very few enumerated samples present an analytical challenge because a large proportion of the observations are censored values. One application of particular interest to risk assessors is the fitting of a statistical distribution function to datasets collected at some point in the farm-to-table continuum. The fitted distribution forms an important component of an exposure assessment. A number of studies have compared different fitting methods and proposed lower limits on the proportion of samples where the organisms of interest are identified and enumerated, with the recommended lower limit of enumerated samples being 0.2. This recommendation may not be applicable to food safety risk assessments for a number of reasons, which include the development of new Bayesian fitting methods, the use of highly sensitive screening tests, and the generally larger sample sizes found in surveys of food commodities. This study evaluates the performance of a Markov chain Monte Carlo fitting method when used in conjunction with a screening test and enumeration of positive samples by the Most Probable Number technique. The results suggest that levels of contamination for common product-pathogen pairs, such as Salmonella on poultry carcasses, can be reliably estimated with the proposed fitting method and samples sizes in excess of 500 observations. The results do, however, demonstrate that simple guidelines for this application, such as the proportion of positive samples, cannot be provided. Published by Elsevier B.V.
Uncertainty Quantification of Hypothesis Testing for the Integrated Knowledge Engine
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cuellar, Leticia
2012-05-31
The Integrated Knowledge Engine (IKE) is a tool of Bayesian analysis, based on Bayesian Belief Networks or Bayesian networks for short. A Bayesian network is a graphical model (directed acyclic graph) that allows representing the probabilistic structure of many variables assuming a localized type of dependency called the Markov property. The Markov property in this instance makes any node or random variable to be independent of any non-descendant node given information about its parent. A direct consequence of this property is that it is relatively easy to incorporate new evidence and derive the appropriate consequences, which in general is notmore » an easy or feasible task. Typically we use Bayesian networks as predictive models for a small subset of the variables, either the leave nodes or the root nodes. In IKE, since most applications deal with diagnostics, we are interested in predicting the likelihood of the root nodes given new observations on any of the children nodes. The root nodes represent the various possible outcomes of the analysis, and an important problem is to determine when we have gathered enough evidence to lean toward one of these particular outcomes. This document presents criteria to decide when the evidence gathered is sufficient to draw a particular conclusion or decide in favor of a particular outcome by quantifying the uncertainty in the conclusions that are drawn from the data. The material in this document is organized as follows: Section 2 presents briefly a forensics Bayesian network, and we explore evaluating the information provided by new evidence by looking first at the posterior distribution of the nodes of interest, and then at the corresponding posterior odds ratios. Section 3 presents a third alternative: Bayes Factors. In section 4 we finalize by showing the relation between the posterior odds ratios and Bayes factors and showing examples these cases, and in section 5 we conclude by providing clear guidelines of how to use these for the type of Bayesian networks used in IKE.« less
A Bayesian model averaging method for the derivation of reservoir operating rules
NASA Astrophysics Data System (ADS)
Zhang, Jingwen; Liu, Pan; Wang, Hao; Lei, Xiaohui; Zhou, Yanlai
2015-09-01
Because the intrinsic dynamics among optimal decision making, inflow processes and reservoir characteristics are complex, functional forms of reservoir operating rules are always determined subjectively. As a result, the uncertainty of selecting form and/or model involved in reservoir operating rules must be analyzed and evaluated. In this study, we analyze the uncertainty of reservoir operating rules using the Bayesian model averaging (BMA) model. Three popular operating rules, namely piecewise linear regression, surface fitting and a least-squares support vector machine, are established based on the optimal deterministic reservoir operation. These individual models provide three-member decisions for the BMA combination, enabling the 90% release interval to be estimated by the Markov Chain Monte Carlo simulation. A case study of China's the Baise reservoir shows that: (1) the optimal deterministic reservoir operation, superior to any reservoir operating rules, is used as the samples to derive the rules; (2) the least-squares support vector machine model is more effective than both piecewise linear regression and surface fitting; (3) BMA outperforms any individual model of operating rules based on the optimal trajectories. It is revealed that the proposed model can reduce the uncertainty of operating rules, which is of great potential benefit in evaluating the confidence interval of decisions.
Accurate age estimation in small-scale societies
Smith, Daniel; Gerbault, Pascale; Dyble, Mark; Migliano, Andrea Bamberg; Thomas, Mark G.
2017-01-01
Precise estimation of age is essential in evolutionary anthropology, especially to infer population age structures and understand the evolution of human life history diversity. However, in small-scale societies, such as hunter-gatherer populations, time is often not referred to in calendar years, and accurate age estimation remains a challenge. We address this issue by proposing a Bayesian approach that accounts for age uncertainty inherent to fieldwork data. We developed a Gibbs sampling Markov chain Monte Carlo algorithm that produces posterior distributions of ages for each individual, based on a ranking order of individuals from youngest to oldest and age ranges for each individual. We first validate our method on 65 Agta foragers from the Philippines with known ages, and show that our method generates age estimations that are superior to previously published regression-based approaches. We then use data on 587 Agta collected during recent fieldwork to demonstrate how multiple partial age ranks coming from multiple camps of hunter-gatherers can be integrated. Finally, we exemplify how the distributions generated by our method can be used to estimate important demographic parameters in small-scale societies: here, age-specific fertility patterns. Our flexible Bayesian approach will be especially useful to improve cross-cultural life history datasets for small-scale societies for which reliable age records are difficult to acquire. PMID:28696282
An accessible method for implementing hierarchical models with spatio-temporal abundance data
Ross, Beth E.; Hooten, Melvin B.; Koons, David N.
2012-01-01
A common goal in ecology and wildlife management is to determine the causes of variation in population dynamics over long periods of time and across large spatial scales. Many assumptions must nevertheless be overcome to make appropriate inference about spatio-temporal variation in population dynamics, such as autocorrelation among data points, excess zeros, and observation error in count data. To address these issues, many scientists and statisticians have recommended the use of Bayesian hierarchical models. Unfortunately, hierarchical statistical models remain somewhat difficult to use because of the necessary quantitative background needed to implement them, or because of the computational demands of using Markov Chain Monte Carlo algorithms to estimate parameters. Fortunately, new tools have recently been developed that make it more feasible for wildlife biologists to fit sophisticated hierarchical Bayesian models (i.e., Integrated Nested Laplace Approximation, ‘INLA’). We present a case study using two important game species in North America, the lesser and greater scaup, to demonstrate how INLA can be used to estimate the parameters in a hierarchical model that decouples observation error from process variation, and accounts for unknown sources of excess zeros as well as spatial and temporal dependence in the data. Ultimately, our goal was to make unbiased inference about spatial variation in population trends over time.
Norris, Peter M; da Silva, Arlindo M
2016-07-01
A method is presented to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation. The gridcolumn model includes assumed probability density function (PDF) intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used in the current study are Moderate Resolution Imaging Spectroradiometer (MODIS) cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. The current study uses a skewed-triangle distribution for layer moisture. The article also includes a discussion of the Metropolis and multiple-try Metropolis versions of MCMC.
Efficient inference for genetic association studies with multiple outcomes.
Ruffieux, Helene; Davison, Anthony C; Hager, Jorg; Irincheeva, Irina
2017-10-01
Combined inference for heterogeneous high-dimensional data is critical in modern biology, where clinical and various kinds of molecular data may be available from a single study. Classical genetic association studies regress a single clinical outcome on many genetic variants one by one, but there is an increasing demand for joint analysis of many molecular outcomes and genetic variants in order to unravel functional interactions. Unfortunately, most existing approaches to joint modeling are either too simplistic to be powerful or are impracticable for computational reasons. Inspired by Richardson and others (2010, Bayesian Statistics 9), we consider a sparse multivariate regression model that allows simultaneous selection of predictors and associated responses. As Markov chain Monte Carlo (MCMC) inference on such models can be prohibitively slow when the number of genetic variants exceeds a few thousand, we propose a variational inference approach which produces posterior information very close to that of MCMC inference, at a much reduced computational cost. Extensive numerical experiments show that our approach outperforms popular variable selection methods and tailored Bayesian procedures, dealing within hours with problems involving hundreds of thousands of genetic variants and tens to hundreds of clinical or molecular outcomes. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Modeling two strains of disease via aggregate-level infectivity curves.
Romanescu, Razvan; Deardon, Rob
2016-04-01
Well formulated models of disease spread, and efficient methods to fit them to observed data, are powerful tools for aiding the surveillance and control of infectious diseases. Our project considers the problem of the simultaneous spread of two related strains of disease in a context where spatial location is the key driver of disease spread. We start our modeling work with the individual level models (ILMs) of disease transmission, and extend these models to accommodate the competing spread of the pathogens in a two-tier hierarchical population (whose levels we refer to as 'farm' and 'animal'). The postulated interference mechanism between the two strains is a period of cross-immunity following infection. We also present a framework for speeding up the computationally intensive process of fitting the ILM to data, typically done using Markov chain Monte Carlo (MCMC) in a Bayesian framework, by turning the inference into a two-stage process. First, we approximate the number of animals infected on a farm over time by infectivity curves. These curves are fit to data sampled from farms, using maximum likelihood estimation, then, conditional on the fitted curves, Bayesian MCMC inference proceeds for the remaining parameters. Finally, we use posterior predictive distributions of salient epidemic summary statistics, in order to assess the model fitted.
Low frequency full waveform seismic inversion within a tree based Bayesian framework
NASA Astrophysics Data System (ADS)
Ray, Anandaroop; Kaplan, Sam; Washbourne, John; Albertin, Uwe
2018-01-01
Limited illumination, insufficient offset, noisy data and poor starting models can pose challenges for seismic full waveform inversion. We present an application of a tree based Bayesian inversion scheme which attempts to mitigate these problems by accounting for data uncertainty while using a mildly informative prior about subsurface structure. We sample the resulting posterior model distribution of compressional velocity using a trans-dimensional (trans-D) or Reversible Jump Markov chain Monte Carlo method in the wavelet transform domain of velocity. This allows us to attain rapid convergence to a stationary distribution of posterior models while requiring a limited number of wavelet coefficients to define a sampled model. Two synthetic, low frequency, noisy data examples are provided. The first example is a simple reflection + transmission inverse problem, and the second uses a scaled version of the Marmousi velocity model, dominated by reflections. Both examples are initially started from a semi-infinite half-space with incorrect background velocity. We find that the trans-D tree based approach together with parallel tempering for navigating rugged likelihood (i.e. misfit) topography provides a promising, easily generalized method for solving large-scale geophysical inverse problems which are difficult to optimize, but where the true model contains a hierarchy of features at multiple scales.
NASA Astrophysics Data System (ADS)
Validi, AbdoulAhad
2014-03-01
This study introduces a non-intrusive approach in the context of low-rank separated representation to construct a surrogate of high-dimensional stochastic functions, e.g., PDEs/ODEs, in order to decrease the computational cost of Markov Chain Monte Carlo simulations in Bayesian inference. The surrogate model is constructed via a regularized alternative least-square regression with Tikhonov regularization using a roughening matrix computing the gradient of the solution, in conjunction with a perturbation-based error indicator to detect optimal model complexities. The model approximates a vector of a continuous solution at discrete values of a physical variable. The required number of random realizations to achieve a successful approximation linearly depends on the function dimensionality. The computational cost of the model construction is quadratic in the number of random inputs, which potentially tackles the curse of dimensionality in high-dimensional stochastic functions. Furthermore, this vector-valued separated representation-based model, in comparison to the available scalar-valued case, leads to a significant reduction in the cost of approximation by an order of magnitude equal to the vector size. The performance of the method is studied through its application to three numerical examples including a 41-dimensional elliptic PDE and a 21-dimensional cavity flow.
NASA Technical Reports Server (NTRS)
Norris, Peter M.; Da Silva, Arlindo M.
2016-01-01
A method is presented to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation. The gridcolumn model includes assumed probability density function (PDF) intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used in the current study are Moderate Resolution Imaging Spectroradiometer (MODIS) cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. The current study uses a skewed-triangle distribution for layer moisture. The article also includes a discussion of the Metropolis and multiple-try Metropolis versions of MCMC.
Norris, Peter M.; da Silva, Arlindo M.
2018-01-01
A method is presented to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation. The gridcolumn model includes assumed probability density function (PDF) intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used in the current study are Moderate Resolution Imaging Spectroradiometer (MODIS) cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. The current study uses a skewed-triangle distribution for layer moisture. The article also includes a discussion of the Metropolis and multiple-try Metropolis versions of MCMC. PMID:29618847
Bayesian multiple-source localization in an uncertain ocean environment.
Dosso, Stan E; Wilmut, Michael J
2011-06-01
This paper considers simultaneous localization of multiple acoustic sources when properties of the ocean environment (water column and seabed) are poorly known. A Bayesian formulation is developed in which the environmental parameters, noise statistics, and locations and complex strengths (amplitudes and phases) of multiple sources are considered to be unknown random variables constrained by acoustic data and prior information. Two approaches are considered for estimating source parameters. Focalization maximizes the posterior probability density (PPD) over all parameters using adaptive hybrid optimization. Marginalization integrates the PPD using efficient Markov-chain Monte Carlo methods to produce joint marginal probability distributions for source ranges and depths, from which source locations are obtained. This approach also provides quantitative uncertainty analysis for all parameters, which can aid in understanding of the inverse problem and may be of practical interest (e.g., source-strength probability distributions). In both approaches, closed-form maximum-likelihood expressions for source strengths and noise variance at each frequency allow these parameters to be sampled implicitly, substantially reducing the dimensionality and difficulty of the inversion. Examples are presented of both approaches applied to single- and multi-frequency localization of multiple sources in an uncertain shallow-water environment, and a Monte Carlo performance evaluation study is carried out. © 2011 Acoustical Society of America
Bayesian Treed Calibration: An Application to Carbon Capture With AX Sorbent
DOE Office of Scientific and Technical Information (OSTI.GOV)
Konomi, Bledar A.; Karagiannis, Georgios; Lai, Kevin
2017-01-02
In cases where field or experimental measurements are not available, computer models can model real physical or engineering systems to reproduce their outcomes. They are usually calibrated in light of experimental data to create a better representation of the real system. Statistical methods, based on Gaussian processes, for calibration and prediction have been especially important when the computer models are expensive and experimental data limited. In this paper, we develop the Bayesian treed calibration (BTC) as an extension of standard Gaussian process calibration methods to deal with non-stationarity computer models and/or their discrepancy from the field (or experimental) data. Ourmore » proposed method partitions both the calibration and observable input space, based on a binary tree partitioning, into sub-regions where existing model calibration methods can be applied to connect a computer model with the real system. The estimation of the parameters in the proposed model is carried out using Markov chain Monte Carlo (MCMC) computational techniques. Different strategies have been applied to improve mixing. We illustrate our method in two artificial examples and a real application that concerns the capture of carbon dioxide with AX amine based sorbents. The source code and the examples analyzed in this paper are available as part of the supplementary materials.« less
Bayesian inference for OPC modeling
NASA Astrophysics Data System (ADS)
Burbine, Andrew; Sturtevant, John; Fryer, David; Smith, Bruce W.
2016-03-01
The use of optical proximity correction (OPC) demands increasingly accurate models of the photolithographic process. Model building and inference techniques in the data science community have seen great strides in the past two decades which make better use of available information. This paper aims to demonstrate the predictive power of Bayesian inference as a method for parameter selection in lithographic models by quantifying the uncertainty associated with model inputs and wafer data. Specifically, the method combines the model builder's prior information about each modelling assumption with the maximization of each observation's likelihood as a Student's t-distributed random variable. Through the use of a Markov chain Monte Carlo (MCMC) algorithm, a model's parameter space is explored to find the most credible parameter values. During parameter exploration, the parameters' posterior distributions are generated by applying Bayes' rule, using a likelihood function and the a priori knowledge supplied. The MCMC algorithm used, an affine invariant ensemble sampler (AIES), is implemented by initializing many walkers which semiindependently explore the space. The convergence of these walkers to global maxima of the likelihood volume determine the parameter values' highest density intervals (HDI) to reveal champion models. We show that this method of parameter selection provides insights into the data that traditional methods do not and outline continued experiments to vet the method.