mcmc simulation methods: Topics by Science.gov

Sample records for mcmc simulation methods

Markov Chain Monte Carlo: an introduction for epidemiologists

PubMed Central

Hamra, Ghassan; MacLehose, Richard; Richardson, David

2013-01-01

Markov Chain Monte Carlo (MCMC) methods are increasingly popular among epidemiologists. The reason for this may in part be that MCMC offers an appealing approach to handling some difficult types of analyses. Additionally, MCMC methods are those most commonly used for Bayesian analysis. However, epidemiologists are still largely unfamiliar with MCMC. They may lack familiarity either with he implementation of MCMC or with interpretation of the resultant output. As with tutorials outlining the calculus behind maximum likelihood in previous decades, a simple description of the machinery of MCMC is needed. We provide an introduction to conducting analyses with MCMC, and show that, given the same data and under certain model specifications, the results of an MCMC simulation match those of methods based on standard maximum-likelihood estimation (MLE). In addition, we highlight examples of instances in which MCMC approaches to data analysis provide a clear advantage over MLE. We hope that this brief tutorial will encourage epidemiologists to consider MCMC approaches as part of their analytic tool-kit. PMID:23569196
An introduction of Markov chain Monte Carlo method to geochemical inverse problems: Reading melting parameters from REE abundances in abyssal peridotites

NASA Astrophysics Data System (ADS)

Liu, Boda; Liang, Yan

2017-04-01

Markov chain Monte Carlo (MCMC) simulation is a powerful statistical method in solving inverse problems that arise from a wide range of applications. In Earth sciences applications of MCMC simulations are primarily in the field of geophysics. The purpose of this study is to introduce MCMC methods to geochemical inverse problems related to trace element fractionation during mantle melting. MCMC methods have several advantages over least squares methods in deciphering melting processes from trace element abundances in basalts and mantle rocks. Here we use an MCMC method to invert for extent of melting, fraction of melt present during melting, and extent of chemical disequilibrium between the melt and residual solid from REE abundances in clinopyroxene in abyssal peridotites from Mid-Atlantic Ridge, Central Indian Ridge, Southwest Indian Ridge, Lena Trough, and American-Antarctic Ridge. We consider two melting models: one with exact analytical solution and the other without. We solve the latter numerically in a chain of melting models according to the Metropolis-Hastings algorithm. The probability distribution of inverted melting parameters depends on assumptions of the physical model, knowledge of mantle source composition, and constraints from the REE data. Results from MCMC inversion are consistent with and provide more reliable uncertainty estimates than results based on nonlinear least squares inversion. We show that chemical disequilibrium is likely to play an important role in fractionating LREE in residual peridotites during partial melting beneath mid-ocean ridge spreading centers. MCMC simulation is well suited for more complicated but physically more realistic melting problems that do not have analytical solutions.
An Evaluation of a Markov Chain Monte Carlo Method for the Two-Parameter Logistic Model.

ERIC Educational Resources Information Center

Kim, Seock-Ho; Cohen, Allan S.

The accuracy of the Markov Chain Monte Carlo (MCMC) procedure Gibbs sampling was considered for estimation of item parameters of the two-parameter logistic model. Data for the Law School Admission Test (LSAT) Section 6 were analyzed to illustrate the MCMC procedure. In addition, simulated data sets were analyzed using the MCMC, marginal Bayesian…
Standard Error Estimation of 3PL IRT True Score Equating with an MCMC Method

ERIC Educational Resources Information Center

Liu, Yuming; Schulz, E. Matthew; Yu, Lei

2008-01-01

A Markov chain Monte Carlo (MCMC) method and a bootstrap method were compared in the estimation of standard errors of item response theory (IRT) true score equating. Three test form relationships were examined: parallel, tau-equivalent, and congeneric. Data were simulated based on Reading Comprehension and Vocabulary tests of the Iowa Tests of…
Estimating Model Probabilities using Thermodynamic Markov Chain Monte Carlo Methods

NASA Astrophysics Data System (ADS)

Ye, M.; Liu, P.; Beerli, P.; Lu, D.; Hill, M. C.

2014-12-01

Markov chain Monte Carlo (MCMC) methods are widely used to evaluate model probability for quantifying model uncertainty. In a general procedure, MCMC simulations are first conducted for each individual model, and MCMC parameter samples are then used to approximate marginal likelihood of the model by calculating the geometric mean of the joint likelihood of the model and its parameters. It has been found the method of evaluating geometric mean suffers from the numerical problem of low convergence rate. A simple test case shows that even millions of MCMC samples are insufficient to yield accurate estimation of the marginal likelihood. To resolve this problem, a thermodynamic method is used to have multiple MCMC runs with different values of a heating coefficient between zero and one. When the heating coefficient is zero, the MCMC run is equivalent to a random walk MC in the prior parameter space; when the heating coefficient is one, the MCMC run is the conventional one. For a simple case with analytical form of the marginal likelihood, the thermodynamic method yields more accurate estimate than the method of using geometric mean. This is also demonstrated for a case of groundwater modeling with consideration of four alternative models postulated based on different conceptualization of a confining layer. This groundwater example shows that model probabilities estimated using the thermodynamic method are more reasonable than those obtained using the geometric method. The thermodynamic method is general, and can be used for a wide range of environmental problem for model uncertainty quantification.
Auxiliary Parameter MCMC for Exponential Random Graph Models

NASA Astrophysics Data System (ADS)

Byshkin, Maksym; Stivala, Alex; Mira, Antonietta; Krause, Rolf; Robins, Garry; Lomi, Alessandro

2016-11-01

Exponential random graph models (ERGMs) are a well-established family of statistical models for analyzing social networks. Computational complexity has so far limited the appeal of ERGMs for the analysis of large social networks. Efficient computational methods are highly desirable in order to extend the empirical scope of ERGMs. In this paper we report results of a research project on the development of snowball sampling methods for ERGMs. We propose an auxiliary parameter Markov chain Monte Carlo (MCMC) algorithm for sampling from the relevant probability distributions. The method is designed to decrease the number of allowed network states without worsening the mixing of the Markov chains, and suggests a new approach for the developments of MCMC samplers for ERGMs. We demonstrate the method on both simulated and actual (empirical) network data and show that it reduces CPU time for parameter estimation by an order of magnitude compared to current MCMC methods.
Corruption of accuracy and efficiency of Markov chain Monte Carlo simulation by inaccurate numerical implementation of conceptual hydrologic models

NASA Astrophysics Data System (ADS)

Schoups, G.; Vrugt, J. A.; Fenicia, F.; van de Giesen, N. C.

2010-10-01

Conceptual rainfall-runoff models have traditionally been applied without paying much attention to numerical errors induced by temporal integration of water balance dynamics. Reliance on first-order, explicit, fixed-step integration methods leads to computationally cheap simulation models that are easy to implement. Computational speed is especially desirable for estimating parameter and predictive uncertainty using Markov chain Monte Carlo (MCMC) methods. Confirming earlier work of Kavetski et al. (2003), we show here that the computational speed of first-order, explicit, fixed-step integration methods comes at a cost: for a case study with a spatially lumped conceptual rainfall-runoff model, it introduces artificial bimodality in the marginal posterior parameter distributions, which is not present in numerically accurate implementations of the same model. The resulting effects on MCMC simulation include (1) inconsistent estimates of posterior parameter and predictive distributions, (2) poor performance and slow convergence of the MCMC algorithm, and (3) unreliable convergence diagnosis using the Gelman-Rubin statistic. We studied several alternative numerical implementations to remedy these problems, including various adaptive-step finite difference schemes and an operator splitting method. Our results show that adaptive-step, second-order methods, based on either explicit finite differencing or operator splitting with analytical integration, provide the best alternative for accurate and efficient MCMC simulation. Fixed-step or adaptive-step implicit methods may also be used for increased accuracy, but they cannot match the efficiency of adaptive-step explicit finite differencing or operator splitting. Of the latter two, explicit finite differencing is more generally applicable and is preferred if the individual hydrologic flux laws cannot be integrated analytically, as the splitting method then loses its advantage.
Model-based Bayesian inference for ROC data analysis

NASA Astrophysics Data System (ADS)

Lei, Tianhu; Bae, K. Ty

2013-03-01

This paper presents a study of model-based Bayesian inference to Receiver Operating Characteristics (ROC) data. The model is a simple version of general non-linear regression model. Different from Dorfman model, it uses a probit link function with a covariate variable having zero-one two values to express binormal distributions in a single formula. Model also includes a scale parameter. Bayesian inference is implemented by Markov Chain Monte Carlo (MCMC) method carried out by Bayesian analysis Using Gibbs Sampling (BUGS). Contrast to the classical statistical theory, Bayesian approach considers model parameters as random variables characterized by prior distributions. With substantial amount of simulated samples generated by sampling algorithm, posterior distributions of parameters as well as parameters themselves can be accurately estimated. MCMC-based BUGS adopts Adaptive Rejection Sampling (ARS) protocol which requires the probability density function (pdf) which samples are drawing from be log concave with respect to the targeted parameters. Our study corrects a common misconception and proves that pdf of this regression model is log concave with respect to its scale parameter. Therefore, ARS's requirement is satisfied and a Gaussian prior which is conjugate and possesses many analytic and computational advantages is assigned to the scale parameter. A cohort of 20 simulated data sets and 20 simulations from each data set are used in our study. Output analysis and convergence diagnostics for MCMC method are assessed by CODA package. Models and methods by using continuous Gaussian prior and discrete categorical prior are compared. Intensive simulations and performance measures are given to illustrate our practice in the framework of model-based Bayesian inference using MCMC method.
The estimation of lower refractivity uncertainty from radar sea clutter using the Bayesian—MCMC method

NASA Astrophysics Data System (ADS)

Sheng, Zheng

2013-02-01

The estimation of lower atmospheric refractivity from radar sea clutter (RFC) is a complicated nonlinear optimization problem. This paper deals with the RFC problem in a Bayesian framework. It uses the unbiased Markov Chain Monte Carlo (MCMC) sampling technique, which can provide accurate posterior probability distributions of the estimated refractivity parameters by using an electromagnetic split-step fast Fourier transform terrain parabolic equation propagation model within a Bayesian inversion framework. In contrast to the global optimization algorithm, the Bayesian—MCMC can obtain not only the approximate solutions, but also the probability distributions of the solutions, that is, uncertainty analyses of solutions. The Bayesian—MCMC algorithm is implemented on the simulation radar sea-clutter data and the real radar sea-clutter data. Reference data are assumed to be simulation data and refractivity profiles are obtained using a helicopter. The inversion algorithm is assessed (i) by comparing the estimated refractivity profiles from the assumed simulation and the helicopter sounding data; (ii) the one-dimensional (1D) and two-dimensional (2D) posterior probability distribution of solutions.
Recovery of Graded Response Model Parameters: A Comparison of Marginal Maximum Likelihood and Markov Chain Monte Carlo Estimation

ERIC Educational Resources Information Center

Kieftenbeld, Vincent; Natesan, Prathiba

2012-01-01

Markov chain Monte Carlo (MCMC) methods enable a fully Bayesian approach to parameter estimation of item response models. In this simulation study, the authors compared the recovery of graded response model parameters using marginal maximum likelihood (MML) and Gibbs sampling (MCMC) under various latent trait distributions, test lengths, and…
Recovery of Item Parameters in the Nominal Response Model: A Comparison of Marginal Maximum Likelihood Estimation and Markov Chain Monte Carlo Estimation.

ERIC Educational Resources Information Center

Wollack, James A.; Bolt, Daniel M.; Cohen, Allan S.; Lee, Young-Sun

2002-01-01

Compared the quality of item parameter estimates for marginal maximum likelihood (MML) and Markov Chain Monte Carlo (MCMC) with the nominal response model using simulation. The quality of item parameter recovery was nearly identical for MML and MCMC, and both methods tended to produce good estimates. (SLD)
Annealed Importance Sampling Reversible Jump MCMC algorithms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Karagiannis, Georgios; Andrieu, Christophe

2013-03-20

It will soon be 20 years since reversible jump Markov chain Monte Carlo (RJ-MCMC) algorithms have been proposed. They have significantly extended the scope of Markov chain Monte Carlo simulation methods, offering the promise to be able to routinely tackle transdimensional sampling problems, as encountered in Bayesian model selection problems for example, in a principled and flexible fashion. Their practical efficient implementation, however, still remains a challenge. A particular difficulty encountered in practice is in the choice of the dimension matching variables (both their nature and their distribution) and the reversible transformations which allow one to define the one-to-one mappingsmore » underpinning the design of these algorithms. Indeed, even seemingly sensible choices can lead to algorithms with very poor performance. The focus of this paper is the development and performance evaluation of a method, annealed importance sampling RJ-MCMC (aisRJ), which addresses this problem by mitigating the sensitivity of RJ-MCMC algorithms to the aforementioned poor design. As we shall see the algorithm can be understood as being an “exact approximation” of an idealized MCMC algorithm that would sample from the model probabilities directly in a model selection set-up. Such an idealized algorithm may have good theoretical convergence properties, but typically cannot be implemented, and our algorithms can approximate the performance of such idealized algorithms to an arbitrary degree while not introducing any bias for any degree of approximation. Our approach combines the dimension matching ideas of RJ-MCMC with annealed importance sampling and its Markov chain Monte Carlo implementation. We illustrate the performance of the algorithm with numerical simulations which indicate that, although the approach may at first appear computationally involved, it is in fact competitive.« less
[Comparison of different methods in dealing with HIV viral load data with diversified missing value mechanism on HIV positive MSM].

PubMed

Jiang, Z; Dou, Z; Song, W L; Xu, J; Wu, Z Y

2017-11-10

Objective: To compare results of different methods: in organizing HIV viral load (VL) data with missing values mechanism. Methods We used software SPSS 17.0 to simulate complete and missing data with different missing value mechanism from HIV viral loading data collected from MSM in 16 cities in China in 2013. Maximum Likelihood Methods Using the Expectation and Maximization Algorithm (EM), regressive method, mean imputation, delete method, and Markov Chain Monte Carlo (MCMC) were used to supplement missing data respectively. The results: of different methods were compared according to distribution characteristics, accuracy and precision. Results HIV VL data could not be transferred into a normal distribution. All the methods showed good results in iterating data which is Missing Completely at Random Mechanism (MCAR). For the other types of missing data, regressive and MCMC methods were used to keep the main characteristic of the original data. The means of iterating database with different methods were all close to the original one. The EM, regressive method, mean imputation, and delete method under-estimate VL while MCMC overestimates it. Conclusion: MCMC can be used as the main imputation method for HIV virus loading missing data. The iterated data can be used as a reference for mean HIV VL estimation among the investigated population.
Bayesian posterior distributions without Markov chains.

PubMed

Cole, Stephen R; Chu, Haitao; Greenland, Sander; Hamra, Ghassan; Richardson, David B

2012-03-01

Bayesian posterior parameter distributions are often simulated using Markov chain Monte Carlo (MCMC) methods. However, MCMC methods are not always necessary and do not help the uninitiated understand Bayesian inference. As a bridge to understanding Bayesian inference, the authors illustrate a transparent rejection sampling method. In example 1, they illustrate rejection sampling using 36 cases and 198 controls from a case-control study (1976-1983) assessing the relation between residential exposure to magnetic fields and the development of childhood cancer. Results from rejection sampling (odds ratio (OR) = 1.69, 95% posterior interval (PI): 0.57, 5.00) were similar to MCMC results (OR = 1.69, 95% PI: 0.58, 4.95) and approximations from data-augmentation priors (OR = 1.74, 95% PI: 0.60, 5.06). In example 2, the authors apply rejection sampling to a cohort study of 315 human immunodeficiency virus seroconverters (1984-1998) to assess the relation between viral load after infection and 5-year incidence of acquired immunodeficiency syndrome, adjusting for (continuous) age at seroconversion and race. In this more complex example, rejection sampling required a notably longer run time than MCMC sampling but remained feasible and again yielded similar results. The transparency of the proposed approach comes at a price of being less broadly applicable than MCMC.
Bayesian Analysis of Evolutionary Divergence with Genomic Data under Diverse Demographic Models.

PubMed

Chung, Yujin; Hey, Jody

2017-06-01

We present a new Bayesian method for estimating demographic and phylogenetic history using population genomic data. Several key innovations are introduced that allow the study of diverse models within an Isolation-with-Migration framework. The new method implements a 2-step analysis, with an initial Markov chain Monte Carlo (MCMC) phase that samples simple coalescent trees, followed by the calculation of the joint posterior density for the parameters of a demographic model. In step 1, the MCMC sampling phase, the method uses a reduced state space, consisting of coalescent trees without migration paths, and a simple importance sampling distribution without the demography of interest. Once obtained, a single sample of trees can be used in step 2 to calculate the joint posterior density for model parameters under multiple diverse demographic models, without having to repeat MCMC runs. Because migration paths are not included in the state space of the MCMC phase, but rather are handled by analytic integration in step 2 of the analysis, the method is scalable to a large number of loci with excellent MCMC mixing properties. With an implementation of the new method in the computer program MIST, we demonstrate the method's accuracy, scalability, and other advantages using simulated data and DNA sequences of two common chimpanzee subspecies: Pan troglodytes (P. t.) troglodytes and P. t. verus. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
On an adaptive preconditioned Crank-Nicolson MCMC algorithm for infinite dimensional Bayesian inference

NASA Astrophysics Data System (ADS)

Hu, Zixi; Yao, Zhewei; Li, Jinglai

2017-03-01

Many scientific and engineering problems require to perform Bayesian inference for unknowns of infinite dimension. In such problems, many standard Markov Chain Monte Carlo (MCMC) algorithms become arbitrary slow under the mesh refinement, which is referred to as being dimension dependent. To this end, a family of dimensional independent MCMC algorithms, known as the preconditioned Crank-Nicolson (pCN) methods, were proposed to sample the infinite dimensional parameters. In this work we develop an adaptive version of the pCN algorithm, where the covariance operator of the proposal distribution is adjusted based on sampling history to improve the simulation efficiency. We show that the proposed algorithm satisfies an important ergodicity condition under some mild assumptions. Finally we provide numerical examples to demonstrate the performance of the proposed method.
A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data.

PubMed

Liang, Faming; Kim, Jinsu; Song, Qifan

2016-01-01

Markov chain Monte Carlo (MCMC) methods have proven to be a very powerful tool for analyzing data of complex structures. However, their computer-intensive nature, which typically require a large number of iterations and a complete scan of the full dataset for each iteration, precludes their use for big data analysis. In this paper, we propose the so-called bootstrap Metropolis-Hastings (BMH) algorithm, which provides a general framework for how to tame powerful MCMC methods to be used for big data analysis; that is to replace the full data log-likelihood by a Monte Carlo average of the log-likelihoods that are calculated in parallel from multiple bootstrap samples. The BMH algorithm possesses an embarrassingly parallel structure and avoids repeated scans of the full dataset in iterations, and is thus feasible for big data problems. Compared to the popular divide-and-combine method, BMH can be generally more efficient as it can asymptotically integrate the whole data information into a single simulation run. The BMH algorithm is very flexible. Like the Metropolis-Hastings algorithm, it can serve as a basic building block for developing advanced MCMC algorithms that are feasible for big data problems. This is illustrated in the paper by the tempering BMH algorithm, which can be viewed as a combination of parallel tempering and the BMH algorithm. BMH can also be used for model selection and optimization by combining with reversible jump MCMC and simulated annealing, respectively.
A Bootstrap Metropolis–Hastings Algorithm for Bayesian Analysis of Big Data

PubMed Central

Kim, Jinsu; Song, Qifan

2016-01-01

Markov chain Monte Carlo (MCMC) methods have proven to be a very powerful tool for analyzing data of complex structures. However, their computer-intensive nature, which typically require a large number of iterations and a complete scan of the full dataset for each iteration, precludes their use for big data analysis. In this paper, we propose the so-called bootstrap Metropolis-Hastings (BMH) algorithm, which provides a general framework for how to tame powerful MCMC methods to be used for big data analysis; that is to replace the full data log-likelihood by a Monte Carlo average of the log-likelihoods that are calculated in parallel from multiple bootstrap samples. The BMH algorithm possesses an embarrassingly parallel structure and avoids repeated scans of the full dataset in iterations, and is thus feasible for big data problems. Compared to the popular divide-and-combine method, BMH can be generally more efficient as it can asymptotically integrate the whole data information into a single simulation run. The BMH algorithm is very flexible. Like the Metropolis-Hastings algorithm, it can serve as a basic building block for developing advanced MCMC algorithms that are feasible for big data problems. This is illustrated in the paper by the tempering BMH algorithm, which can be viewed as a combination of parallel tempering and the BMH algorithm. BMH can also be used for model selection and optimization by combining with reversible jump MCMC and simulated annealing, respectively. PMID:29033469
SaaS enabled admission control for MCMC simulation in cloud computing infrastructures

NASA Astrophysics Data System (ADS)

Vázquez-Poletti, J. L.; Moreno-Vozmediano, R.; Han, R.; Wang, W.; Llorente, I. M.

2017-02-01

Markov Chain Monte Carlo (MCMC) methods are widely used in the field of simulation and modelling of materials, producing applications that require a great amount of computational resources. Cloud computing represents a seamless source for these resources in the form of HPC. However, resource over-consumption can be an important drawback, specially if the cloud provision process is not appropriately optimized. In the present contribution we propose a two-level solution that, on one hand, takes advantage of approximate computing for reducing the resource demand and on the other, uses admission control policies for guaranteeing an optimal provision to running applications.
Comparison of Bootstrapping and Markov Chain Monte Carlo for Copula Analysis of Hydrological Droughts

NASA Astrophysics Data System (ADS)

Yang, P.; Ng, T. L.; Yang, W.

2015-12-01

Effective water resources management depends on the reliable estimation of the uncertainty of drought events. Confidence intervals (CIs) are commonly applied to quantify this uncertainty. A CI seeks to be at the minimal length necessary to cover the true value of the estimated variable with the desired probability. In drought analysis where two or more variables (e.g., duration and severity) are often used to describe a drought, copulas have been found suitable for representing the joint probability behavior of these variables. However, the comprehensive assessment of the parameter uncertainties of copulas of droughts has been largely ignored, and the few studies that have recognized this issue have not explicitly compared the various methods to produce the best CIs. Thus, the objective of this study to compare the CIs generated using two widely applied uncertainty estimation methods, bootstrapping and Markov Chain Monte Carlo (MCMC). To achieve this objective, (1) the marginal distributions lognormal, Gamma, and Generalized Extreme Value, and the copula functions Clayton, Frank, and Plackett are selected to construct joint probability functions of two drought related variables. (2) The resulting joint functions are then fitted to 200 sets of simulated realizations of drought events with known distribution and extreme parameters and (3) from there, using bootstrapping and MCMC, CIs of the parameters are generated and compared. The effect of an informative prior on the CIs generated by MCMC is also evaluated. CIs are produced for different sample sizes (50, 100, and 200) of the simulated drought events for fitting the joint probability functions. Preliminary results assuming lognormal marginal distributions and the Clayton copula function suggest that for cases with small or medium sample sizes (~50-100), MCMC to be superior method if an informative prior exists. Where an informative prior is unavailable, for small sample sizes (~50), both bootstrapping and MCMC yield the same level of performance, and for medium sample sizes (~100), bootstrapping is better. For cases with a large sample size (~200), there is little difference between the CIs generated using bootstrapping and MCMC regardless of whether or not an informative prior exists.

Assessing an ensemble Kalman filter inference of Manning's n coefficient of an idealized tidal inlet against a polynomial chaos-based MCMC

NASA Astrophysics Data System (ADS)

Siripatana, Adil; Mayo, Talea; Sraj, Ihab; Knio, Omar; Dawson, Clint; Le Maitre, Olivier; Hoteit, Ibrahim

2017-08-01

Bayesian estimation/inversion is commonly used to quantify and reduce modeling uncertainties in coastal ocean model, especially in the framework of parameter estimation. Based on Bayes rule, the posterior probability distribution function (pdf) of the estimated quantities is obtained conditioned on available data. It can be computed either directly, using a Markov chain Monte Carlo (MCMC) approach, or by sequentially processing the data following a data assimilation approach, which is heavily exploited in large dimensional state estimation problems. The advantage of data assimilation schemes over MCMC-type methods arises from the ability to algorithmically accommodate a large number of uncertain quantities without significant increase in the computational requirements. However, only approximate estimates are generally obtained by this approach due to the restricted Gaussian prior and noise assumptions that are generally imposed in these methods. This contribution aims at evaluating the effectiveness of utilizing an ensemble Kalman-based data assimilation method for parameter estimation of a coastal ocean model against an MCMC polynomial chaos (PC)-based scheme. We focus on quantifying the uncertainties of a coastal ocean ADvanced CIRCulation (ADCIRC) model with respect to the Manning's n coefficients. Based on a realistic framework of observation system simulation experiments (OSSEs), we apply an ensemble Kalman filter and the MCMC method employing a surrogate of ADCIRC constructed by a non-intrusive PC expansion for evaluating the likelihood, and test both approaches under identical scenarios. We study the sensitivity of the estimated posteriors with respect to the parameters of the inference methods, including ensemble size, inflation factor, and PC order. A full analysis of both methods, in the context of coastal ocean model, suggests that an ensemble Kalman filter with appropriate ensemble size and well-tuned inflation provides reliable mean estimates and uncertainties of Manning's n coefficients compared to the full posterior distributions inferred by MCMC.
Estimating the Effective Sample Size of Tree Topologies from Bayesian Phylogenetic Analyses

PubMed Central

Lanfear, Robert; Hua, Xia; Warren, Dan L.

2016-01-01

Bayesian phylogenetic analyses estimate posterior distributions of phylogenetic tree topologies and other parameters using Markov chain Monte Carlo (MCMC) methods. Before making inferences from these distributions, it is important to assess their adequacy. To this end, the effective sample size (ESS) estimates how many truly independent samples of a given parameter the output of the MCMC represents. The ESS of a parameter is frequently much lower than the number of samples taken from the MCMC because sequential samples from the chain can be non-independent due to autocorrelation. Typically, phylogeneticists use a rule of thumb that the ESS of all parameters should be greater than 200. However, we have no method to calculate an ESS of tree topology samples, despite the fact that the tree topology is often the parameter of primary interest and is almost always central to the estimation of other parameters. That is, we lack a method to determine whether we have adequately sampled one of the most important parameters in our analyses. In this study, we address this problem by developing methods to estimate the ESS for tree topologies. We combine these methods with two new diagnostic plots for assessing posterior samples of tree topologies, and compare their performance on simulated and empirical data sets. Combined, the methods we present provide new ways to assess the mixing and convergence of phylogenetic tree topologies in Bayesian MCMC analyses. PMID:27435794
An MCMC method for the evaluation of the Fisher information matrix for non-linear mixed effect models.

PubMed

Riviere, Marie-Karelle; Ueckert, Sebastian; Mentré, France

2016-10-01

Non-linear mixed effect models (NLMEMs) are widely used for the analysis of longitudinal data. To design these studies, optimal design based on the expected Fisher information matrix (FIM) can be used instead of performing time-consuming clinical trial simulations. In recent years, estimation algorithms for NLMEMs have transitioned from linearization toward more exact higher-order methods. Optimal design, on the other hand, has mainly relied on first-order (FO) linearization to calculate the FIM. Although efficient in general, FO cannot be applied to complex non-linear models and with difficulty in studies with discrete data. We propose an approach to evaluate the expected FIM in NLMEMs for both discrete and continuous outcomes. We used Markov Chain Monte Carlo (MCMC) to integrate the derivatives of the log-likelihood over the random effects, and Monte Carlo to evaluate its expectation w.r.t. the observations. Our method was implemented in R using Stan, which efficiently draws MCMC samples and calculates partial derivatives of the log-likelihood. Evaluated on several examples, our approach showed good performance with relative standard errors (RSEs) close to those obtained by simulations. We studied the influence of the number of MC and MCMC samples and computed the uncertainty of the FIM evaluation. We also compared our approach to Adaptive Gaussian Quadrature, Laplace approximation, and FO. Our method is available in R-package MIXFIM and can be used to evaluate the FIM, its determinant with confidence intervals (CIs), and RSEs with CIs. © The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Combined state and parameter identification of nonlinear structural dynamical systems based on Rao-Blackwellization and Markov chain Monte Carlo simulations

NASA Astrophysics Data System (ADS)

Abhinav, S.; Manohar, C. S.

2018-03-01

The problem of combined state and parameter estimation in nonlinear state space models, based on Bayesian filtering methods, is considered. A novel approach, which combines Rao-Blackwellized particle filters for state estimation with Markov chain Monte Carlo (MCMC) simulations for parameter identification, is proposed. In order to ensure successful performance of the MCMC samplers, in situations involving large amount of dynamic measurement data and (or) low measurement noise, the study employs a modified measurement model combined with an importance sampling based correction. The parameters of the process noise covariance matrix are also included as quantities to be identified. The study employs the Rao-Blackwellization step at two stages: one, associated with the state estimation problem in the particle filtering step, and, secondly, in the evaluation of the ratio of likelihoods in the MCMC run. The satisfactory performance of the proposed method is illustrated on three dynamical systems: (a) a computational model of a nonlinear beam-moving oscillator system, (b) a laboratory scale beam traversed by a loaded trolley, and (c) an earthquake shake table study on a bending-torsion coupled nonlinear frame subjected to uniaxial support motion.
Ascertainment correction for Markov chain Monte Carlo segregation and linkage analysis of a quantitative trait.

PubMed

Ma, Jianzhong; Amos, Christopher I; Warwick Daw, E

2007-09-01

Although extended pedigrees are often sampled through probands with extreme levels of a quantitative trait, Markov chain Monte Carlo (MCMC) methods for segregation and linkage analysis have not been able to perform ascertainment corrections. Further, the extent to which ascertainment of pedigrees leads to biases in the estimation of segregation and linkage parameters has not been previously studied for MCMC procedures. In this paper, we studied these issues with a Bayesian MCMC approach for joint segregation and linkage analysis, as implemented in the package Loki. We first simulated pedigrees ascertained through individuals with extreme values of a quantitative trait in spirit of the sequential sampling theory of Cannings and Thompson [Cannings and Thompson [1977] Clin. Genet. 12:208-212]. Using our simulated data, we detected no bias in estimates of the trait locus location. However, in addition to allele frequencies, when the ascertainment threshold was higher than or close to the true value of the highest genotypic mean, bias was also found in the estimation of this parameter. When there were multiple trait loci, this bias destroyed the additivity of the effects of the trait loci, and caused biases in the estimation all genotypic means when a purely additive model was used for analyzing the data. To account for pedigree ascertainment with sequential sampling, we developed a Bayesian ascertainment approach and implemented Metropolis-Hastings updates in the MCMC samplers used in Loki. Ascertainment correction greatly reduced biases in parameter estimates. Our method is designed for multiple, but a fixed number of trait loci. Copyright (c) 2007 Wiley-Liss, Inc.
Uncertainty Quantification of GEOS-5 L-band Radiative Transfer Model Parameters Using Bayesian Inference and SMOS Observations

NASA Technical Reports Server (NTRS)

DeLannoy, Gabrielle J. M.; Reichle, Rolf H.; Vrugt, Jasper A.

2013-01-01

Uncertainties in L-band (1.4 GHz) radiative transfer modeling (RTM) affect the simulation of brightness temperatures (Tb) over land and the inversion of satellite-observed Tb into soil moisture retrievals. In particular, accurate estimates of the microwave soil roughness, vegetation opacity and scattering albedo for large-scale applications are difficult to obtain from field studies and often lack an uncertainty estimate. Here, a Markov Chain Monte Carlo (MCMC) simulation method is used to determine satellite-scale estimates of RTM parameters and their posterior uncertainty by minimizing the misfit between long-term averages and standard deviations of simulated and observed Tb at a range of incidence angles, at horizontal and vertical polarization, and for morning and evening overpasses. Tb simulations are generated with the Goddard Earth Observing System (GEOS-5) and confronted with Tb observations from the Soil Moisture Ocean Salinity (SMOS) mission. The MCMC algorithm suggests that the relative uncertainty of the RTM parameter estimates is typically less than 25 of the maximum a posteriori density (MAP) parameter value. Furthermore, the actual root-mean-square-differences in long-term Tb averages and standard deviations are found consistent with the respective estimated total simulation and observation error standard deviations of m3.1K and s2.4K. It is also shown that the MAP parameter values estimated through MCMC simulation are in close agreement with those obtained with Particle Swarm Optimization (PSO).
Profile-Based LC-MS Data Alignment—A Bayesian Approach

PubMed Central

Tsai, Tsung-Heng; Tadesse, Mahlet G.; Wang, Yue; Ressom, Habtom W.

2014-01-01

A Bayesian alignment model (BAM) is proposed for alignment of liquid chromatography-mass spectrometry (LC-MS) data. BAM belongs to the category of profile-based approaches, which are composed of two major components: a prototype function and a set of mapping functions. Appropriate estimation of these functions is crucial for good alignment results. BAM uses Markov chain Monte Carlo (MCMC) methods to draw inference on the model parameters and improves on existing MCMC-based alignment methods through 1) the implementation of an efficient MCMC sampler and 2) an adaptive selection of knots. A block Metropolis-Hastings algorithm that mitigates the problem of the MCMC sampler getting stuck at local modes of the posterior distribution is used for the update of the mapping function coefficients. In addition, a stochastic search variable selection (SSVS) methodology is used to determine the number and positions of knots. We applied BAM to a simulated data set, an LC-MS proteomic data set, and two LC-MS metabolomic data sets, and compared its performance with the Bayesian hierarchical curve registration (BHCR) model, the dynamic time-warping (DTW) model, and the continuous profile model (CPM). The advantage of applying appropriate profile-based retention time correction prior to performing a feature-based approach is also demonstrated through the metabolomic data sets. PMID:23929872
Asteroid orbital inversion using uniform phase-space sampling

NASA Astrophysics Data System (ADS)

Muinonen, K.; Pentikäinen, H.; Granvik, M.; Oszkiewicz, D.; Virtanen, J.

2014-07-01

We review statistical inverse methods for asteroid orbit computation from a small number of astrometric observations and short time intervals of observations. With the help of Markov-chain Monte Carlo methods (MCMC), we present a novel inverse method that utilizes uniform sampling of the phase space for the orbital elements. The statistical orbital ranging method (Virtanen et al. 2001, Muinonen et al. 2001) was set out to resolve the long-lasting challenges in the initial computation of orbits for asteroids. The ranging method starts from the selection of a pair of astrometric observations. Thereafter, the topocentric ranges and angular deviations in R.A. and Decl. are randomly sampled. The two Cartesian positions allow for the computation of orbital elements and, subsequently, the computation of ephemerides for the observation dates. Candidate orbital elements are included in the sample of accepted elements if the χ^2-value between the observed and computed observations is within a pre-defined threshold. The sample orbital elements obtain weights based on a certain debiasing procedure. When the weights are available, the full sample of orbital elements allows the probabilistic assessments for, e.g., object classification and ephemeris computation as well as the computation of collision probabilities. The MCMC ranging method (Oszkiewicz et al. 2009; see also Granvik et al. 2009) replaces the original sampling algorithm described above with a proposal probability density function (p.d.f.), and a chain of sample orbital elements results in the phase space. MCMC ranging is based on a bivariate Gaussian p.d.f. for the topocentric ranges, and allows for the sampling to focus on the phase-space domain with most of the probability mass. In the virtual-observation MCMC method (Muinonen et al. 2012), the proposal p.d.f. for the orbital elements is chosen to mimic the a posteriori p.d.f. for the elements: first, random errors are simulated for each observation, resulting in a set of virtual observations; second, corresponding virtual least-squares orbital elements are derived using the Nelder-Mead downhill simplex method; third, repeating the procedure two times allows for a computation of a difference for two sets of virtual orbital elements; and, fourth, this orbital-element difference constitutes a symmetric proposal in a random-walk Metropolis-Hastings algorithm, avoiding the explicit computation of the proposal p.d.f. In a discrete approximation, the allowed proposals coincide with the differences that are based on a large number of pre-computed sets of virtual least-squares orbital elements. The virtual-observation MCMC method is thus based on the characterization of the relevant volume in the orbital-element phase space. Here we utilize MCMC to map the phase-space domain of acceptable solutions. We can make use of the proposal p.d.f.s from the MCMC ranging and virtual-observation methods. The present phase-space mapping produces, upon convergence, a uniform sampling of the solution space within a pre-defined χ^2-value. The weights of the sampled orbital elements are then computed on the basis of the corresponding χ^2-values. The present method resembles the original ranging method. On one hand, MCMC mapping is insensitive to local extrema in the phase space and efficiently maps the solution space. This is somewhat contrary to the MCMC methods described above. On the other hand, MCMC mapping can suffer from producing a small number of sample elements with small χ^2-values, in resemblance to the original ranging method. We apply the methods to example near-Earth, main-belt, and transneptunian objects, and highlight the utilization of the methods in the data processing and analysis pipeline of the ESA Gaia space mission.
Approximate Bayesian computation for spatial SEIR(S) epidemic models.

PubMed

Brown, Grant D; Porter, Aaron T; Oleson, Jacob J; Hinman, Jessica A

2018-02-01

Approximate Bayesia n Computation (ABC) provides an attractive approach to estimation in complex Bayesian inferential problems for which evaluation of the kernel of the posterior distribution is impossible or computationally expensive. These highly parallelizable techniques have been successfully applied to many fields, particularly in cases where more traditional approaches such as Markov chain Monte Carlo (MCMC) are impractical. In this work, we demonstrate the application of approximate Bayesian inference to spatially heterogeneous Susceptible-Exposed-Infectious-Removed (SEIR) stochastic epidemic models. These models have a tractable posterior distribution, however MCMC techniques nevertheless become computationally infeasible for moderately sized problems. We discuss the practical implementation of these techniques via the open source ABSEIR package for R. The performance of ABC relative to traditional MCMC methods in a small problem is explored under simulation, as well as in the spatially heterogeneous context of the 2014 epidemic of Chikungunya in the Americas. Copyright © 2017 Elsevier Ltd. All rights reserved.
Species delimitation using Bayes factors: simulations and application to the Sceloporus scalaris species group (Squamata: Phrynosomatidae).

PubMed

Grummer, Jared A; Bryson, Robert W; Reeder, Tod W

2014-03-01

Current molecular methods of species delimitation are limited by the types of species delimitation models and scenarios that can be tested. Bayes factors allow for more flexibility in testing non-nested species delimitation models and hypotheses of individual assignment to alternative lineages. Here, we examined the efficacy of Bayes factors in delimiting species through simulations and empirical data from the Sceloporus scalaris species group. Marginal-likelihood scores of competing species delimitation models, from which Bayes factor values were compared, were estimated with four different methods: harmonic mean estimation (HME), smoothed harmonic mean estimation (sHME), path-sampling/thermodynamic integration (PS), and stepping-stone (SS) analysis. We also performed model selection using a posterior simulation-based analog of the Akaike information criterion through Markov chain Monte Carlo analysis (AICM). Bayes factor species delimitation results from the empirical data were then compared with results from the reversible-jump MCMC (rjMCMC) coalescent-based species delimitation method Bayesian Phylogenetics and Phylogeography (BP&P). Simulation results show that HME and sHME perform poorly compared with PS and SS marginal-likelihood estimators when identifying the true species delimitation model. Furthermore, Bayes factor delimitation (BFD) of species showed improved performance when species limits are tested by reassigning individuals between species, as opposed to either lumping or splitting lineages. In the empirical data, BFD through PS and SS analyses, as well as the rjMCMC method, each provide support for the recognition of all scalaris group taxa as independent evolutionary lineages. Bayes factor species delimitation and BP&P also support the recognition of three previously undescribed lineages. In both simulated and empirical data sets, harmonic and smoothed harmonic mean marginal-likelihood estimators provided much higher marginal-likelihood estimates than PS and SS estimators. The AICM displayed poor repeatability in both simulated and empirical data sets, and produced inconsistent model rankings across replicate runs with the empirical data. Our results suggest that species delimitation through the use of Bayes factors with marginal-likelihood estimates via PS or SS analyses provide a useful and complementary alternative to existing species delimitation methods.
Joint simulation of regional areas burned in Canadian forest fires: A Markov Chain Monte Carlo approach

Treesearch

Steen Magnussen

2009-01-01

Areas burned annually in 29 Canadian forest fire regions show a patchy and irregular correlation structure that significantly influences the distribution of annual totals for Canada and for groups of regions. A binary Monte Carlo Markov Chain (MCMC) is constructed for the purpose of joint simulation of regional areas burned in forest fires. For each year the MCMC...
Implementing informative priors for heterogeneity in meta-analysis using meta-regression and pseudo data.

PubMed

Rhodes, Kirsty M; Turner, Rebecca M; White, Ian R; Jackson, Dan; Spiegelhalter, David J; Higgins, Julian P T

2016-12-20

Many meta-analyses combine results from only a small number of studies, a situation in which the between-study variance is imprecisely estimated when standard methods are applied. Bayesian meta-analysis allows incorporation of external evidence on heterogeneity, providing the potential for more robust inference on the effect size of interest. We present a method for performing Bayesian meta-analysis using data augmentation, in which we represent an informative conjugate prior for between-study variance by pseudo data and use meta-regression for estimation. To assist in this, we derive predictive inverse-gamma distributions for the between-study variance expected in future meta-analyses. These may serve as priors for heterogeneity in new meta-analyses. In a simulation study, we compare approximate Bayesian methods using meta-regression and pseudo data against fully Bayesian approaches based on importance sampling techniques and Markov chain Monte Carlo (MCMC). We compare the frequentist properties of these Bayesian methods with those of the commonly used frequentist DerSimonian and Laird procedure. The method is implemented in standard statistical software and provides a less complex alternative to standard MCMC approaches. An importance sampling approach produces almost identical results to standard MCMC approaches, and results obtained through meta-regression and pseudo data are very similar. On average, data augmentation provides closer results to MCMC, if implemented using restricted maximum likelihood estimation rather than DerSimonian and Laird or maximum likelihood estimation. The methods are applied to real datasets, and an extension to network meta-analysis is described. The proposed method facilitates Bayesian meta-analysis in a way that is accessible to applied researchers. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
A surrogate-based sensitivity quantification and Bayesian inversion of a regional groundwater flow model

NASA Astrophysics Data System (ADS)

Chen, Mingjie; Izady, Azizallah; Abdalla, Osman A.; Amerjeed, Mansoor

2018-02-01

Bayesian inference using Markov Chain Monte Carlo (MCMC) provides an explicit framework for stochastic calibration of hydrogeologic models accounting for uncertainties; however, the MCMC sampling entails a large number of model calls, and could easily become computationally unwieldy if the high-fidelity hydrogeologic model simulation is time consuming. This study proposes a surrogate-based Bayesian framework to address this notorious issue, and illustrates the methodology by inverse modeling a regional MODFLOW model. The high-fidelity groundwater model is approximated by a fast statistical model using Bagging Multivariate Adaptive Regression Spline (BMARS) algorithm, and hence the MCMC sampling can be efficiently performed. In this study, the MODFLOW model is developed to simulate the groundwater flow in an arid region of Oman consisting of mountain-coast aquifers, and used to run representative simulations to generate training dataset for BMARS model construction. A BMARS-based Sobol' method is also employed to efficiently calculate input parameter sensitivities, which are used to evaluate and rank their importance for the groundwater flow model system. According to sensitivity analysis, insensitive parameters are screened out of Bayesian inversion of the MODFLOW model, further saving computing efforts. The posterior probability distribution of input parameters is efficiently inferred from the prescribed prior distribution using observed head data, demonstrating that the presented BMARS-based Bayesian framework is an efficient tool to reduce parameter uncertainties of a groundwater system.
Cognitive diagnosis modelling incorporating item response times.

PubMed

Zhan, Peida; Jiao, Hong; Liao, Dandan

2018-05-01

To provide more refined diagnostic feedback with collateral information in item response times (RTs), this study proposed joint modelling of attributes and response speed using item responses and RTs simultaneously for cognitive diagnosis. For illustration, an extended deterministic input, noisy 'and' gate (DINA) model was proposed for joint modelling of responses and RTs. Model parameter estimation was explored using the Bayesian Markov chain Monte Carlo (MCMC) method. The PISA 2012 computer-based mathematics data were analysed first. These real data estimates were treated as true values in a subsequent simulation study. A follow-up simulation study with ideal testing conditions was conducted as well to further evaluate model parameter recovery. The results indicated that model parameters could be well recovered using the MCMC approach. Further, incorporating RTs into the DINA model would improve attribute and profile correct classification rates and result in more accurate and precise estimation of the model parameters. © 2017 The British Psychological Society.
Modeling Nitrogen Dynamics in a Waste Stabilization Pond System Using Flexible Modeling Environment with MCMC.

PubMed

Mukhtar, Hussnain; Lin, Yu-Pin; Shipin, Oleg V; Petway, Joy R

2017-07-12

This study presents an approach for obtaining realization sets of parameters for nitrogen removal in a pilot-scale waste stabilization pond (WSP) system. The proposed approach was designed for optimal parameterization, local sensitivity analysis, and global uncertainty analysis of a dynamic simulation model for the WSP by using the R software package Flexible Modeling Environment (R-FME) with the Markov chain Monte Carlo (MCMC) method. Additionally, generalized likelihood uncertainty estimation (GLUE) was integrated into the FME to evaluate the major parameters that affect the simulation outputs in the study WSP. Comprehensive modeling analysis was used to simulate and assess nine parameters and concentrations of ON-N, NH₃-N and NO₃-N. Results indicate that the integrated FME-GLUE-based model, with good Nash-Sutcliffe coefficients (0.53-0.69) and correlation coefficients (0.76-0.83), successfully simulates the concentrations of ON-N, NH₃-N and NO₃-N. Moreover, the Arrhenius constant was the only parameter sensitive to model performances of ON-N and NH₃-N simulations. However, Nitrosomonas growth rate, the denitrification constant, and the maximum growth rate at 20 °C were sensitive to ON-N and NO₃-N simulation, which was measured using global sensitivity.
Data Analysis Recipes: Using Markov Chain Monte Carlo

NASA Astrophysics Data System (ADS)

Hogg, David W.; Foreman-Mackey, Daniel

2018-05-01

Markov Chain Monte Carlo (MCMC) methods for sampling probability density functions (combined with abundant computational resources) have transformed the sciences, especially in performing probabilistic inferences, or fitting models to data. In this primarily pedagogical contribution, we give a brief overview of the most basic MCMC method and some practical advice for the use of MCMC in real inference problems. We give advice on method choice, tuning for performance, methods for initialization, tests of convergence, troubleshooting, and use of the chain output to produce or report parameter estimates with associated uncertainties. We argue that autocorrelation time is the most important test for convergence, as it directly connects to the uncertainty on the sampling estimate of any quantity of interest. We emphasize that sampling is a method for doing integrals; this guides our thinking about how MCMC output is best used. .
Gradient-free MCMC methods for dynamic causal modelling

DOE PAGES

Sengupta, Biswa; Friston, Karl J.; Penny, Will D.

2015-03-14

Here, we compare the performance of four gradient-free MCMC samplers (random walk Metropolis sampling, slice-sampling, adaptive MCMC sampling and population-based MCMC sampling with tempering) in terms of the number of independent samples they can produce per unit computational time. For the Bayesian inversion of a single-node neural mass model, both adaptive and population-based samplers are more efficient compared with random walk Metropolis sampler or slice-sampling; yet adaptive MCMC sampling is more promising in terms of compute time. Slice-sampling yields the highest number of independent samples from the target density -- albeit at almost 1000% increase in computational time, in comparisonmore » to the most efficient algorithm (i.e., the adaptive MCMC sampler).« less
Bayesian Estimation of Multidimensional Item Response Models. A Comparison of Analytic and Simulation Algorithms

ERIC Educational Resources Information Center

Martin-Fernandez, Manuel; Revuelta, Javier

2017-01-01

This study compares the performance of two estimation algorithms of new usage, the Metropolis-Hastings Robins-Monro (MHRM) and the Hamiltonian MCMC (HMC), with two consolidated algorithms in the psychometric literature, the marginal likelihood via EM algorithm (MML-EM) and the Markov chain Monte Carlo (MCMC), in the estimation of multidimensional…
Time-dependent Reliability of Dynamic Systems using Subset Simulation with Splitting over a Series of Correlated Time Intervals

DTIC Science & Technology

2013-08-01

cost due to potential warranty costs, repairs and loss of market share. Reliability is the probability that the system will perform its intended...MCMC and splitting sampling schemes. Our proposed SS/ STP method is presented in Section 4, including accuracy bounds and computational effort
Markov chain Monte Carlo linkage analysis: effect of bin width on the probability of linkage.

PubMed

Slager, S L; Juo, S H; Durner, M; Hodge, S E

2001-01-01

We analyzed part of the Genetic Analysis Workshop (GAW) 12 simulated data using Monte Carlo Markov chain (MCMC) methods that are implemented in the computer program Loki. The MCMC method reports the "probability of linkage" (PL) across the chromosomal regions of interest. The point of maximum PL can then be taken as a "location estimate" for the location of the quantitative trait locus (QTL). However, Loki does not provide a formal statistical test of linkage. In this paper, we explore how the bin width used in the calculations affects the max PL and the location estimate. We analyzed age at onset (AO) and quantitative trait number 5, Q5, from 26 replicates of the general simulated data in one region where we knew a major gene, MG5, is located. For each trait, we found the max PL and the corresponding location estimate, using four different bin widths. We found that bin width, as expected, does affect the max PL and the location estimate, and we recommend that users of Loki explore how their results vary with different bin widths.

Sythesis of MCMC and Belief Propagation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ahn, Sungsoo; Chertkov, Michael; Shin, Jinwoo

Markov Chain Monte Carlo (MCMC) and Belief Propagation (BP) are the most popular algorithms for computational inference in Graphical Models (GM). In principle, MCMC is an exact probabilistic method which, however, often suffers from exponentially slow mixing. In contrast, BP is a deterministic method, which is typically fast, empirically very successful, however in general lacking control of accuracy over loopy graphs. In this paper, we introduce MCMC algorithms correcting the approximation error of BP, i.e., we provide a way to compensate for BP errors via a consecutive BP-aware MCMC. Our framework is based on the Loop Calculus (LC) approach whichmore » allows to express the BP error as a sum of weighted generalized loops. Although the full series is computationally intractable, it is known that a truncated series, summing up all 2-regular loops, is computable in polynomial-time for planar pair-wise binary GMs and it also provides a highly accurate approximation empirically. Motivated by this, we first propose a polynomial-time approximation MCMC scheme for the truncated series of general (non-planar) pair-wise binary models. Our main idea here is to use the Worm algorithm, known to provide fast mixing in other (related) problems, and then design an appropriate rejection scheme to sample 2-regular loops. Furthermore, we also design an efficient rejection-free MCMC scheme for approximating the full series. The main novelty underlying our design is in utilizing the concept of cycle basis, which provides an efficient decomposition of the generalized loops. In essence, the proposed MCMC schemes run on transformed GM built upon the non-trivial BP solution, and our experiments show that this synthesis of BP and MCMC outperforms both direct MCMC and bare BP schemes.« less
Gradient-free MCMC methods for dynamic causal modelling.

PubMed

Sengupta, Biswa; Friston, Karl J; Penny, Will D

2015-05-15

In this technical note we compare the performance of four gradient-free MCMC samplers (random walk Metropolis sampling, slice-sampling, adaptive MCMC sampling and population-based MCMC sampling with tempering) in terms of the number of independent samples they can produce per unit computational time. For the Bayesian inversion of a single-node neural mass model, both adaptive and population-based samplers are more efficient compared with random walk Metropolis sampler or slice-sampling; yet adaptive MCMC sampling is more promising in terms of compute time. Slice-sampling yields the highest number of independent samples from the target density - albeit at almost 1000% increase in computational time, in comparison to the most efficient algorithm (i.e., the adaptive MCMC sampler). Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Modeling Nitrogen Dynamics in a Waste Stabilization Pond System Using Flexible Modeling Environment with MCMC

PubMed Central

Mukhtar, Hussnain; Lin, Yu-Pin; Shipin, Oleg V.; Petway, Joy R.

2017-01-01

This study presents an approach for obtaining realization sets of parameters for nitrogen removal in a pilot-scale waste stabilization pond (WSP) system. The proposed approach was designed for optimal parameterization, local sensitivity analysis, and global uncertainty analysis of a dynamic simulation model for the WSP by using the R software package Flexible Modeling Environment (R-FME) with the Markov chain Monte Carlo (MCMC) method. Additionally, generalized likelihood uncertainty estimation (GLUE) was integrated into the FME to evaluate the major parameters that affect the simulation outputs in the study WSP. Comprehensive modeling analysis was used to simulate and assess nine parameters and concentrations of ON-N, NH3-N and NO3-N. Results indicate that the integrated FME-GLUE-based model, with good Nash–Sutcliffe coefficients (0.53–0.69) and correlation coefficients (0.76–0.83), successfully simulates the concentrations of ON-N, NH3-N and NO3-N. Moreover, the Arrhenius constant was the only parameter sensitive to model performances of ON-N and NH3-N simulations. However, Nitrosomonas growth rate, the denitrification constant, and the maximum growth rate at 20 °C were sensitive to ON-N and NO3-N simulation, which was measured using global sensitivity. PMID:28704958
A computer program for uncertainty analysis integrating regression and Bayesian methods

USGS Publications Warehouse

Lu, Dan; Ye, Ming; Hill, Mary C.; Poeter, Eileen P.; Curtis, Gary

2014-01-01

This work develops a new functionality in UCODE_2014 to evaluate Bayesian credible intervals using the Markov Chain Monte Carlo (MCMC) method. The MCMC capability in UCODE_2014 is based on the FORTRAN version of the differential evolution adaptive Metropolis (DREAM) algorithm of Vrugt et al. (2009), which estimates the posterior probability density function of model parameters in high-dimensional and multimodal sampling problems. The UCODE MCMC capability provides eleven prior probability distributions and three ways to initialize the sampling process. It evaluates parametric and predictive uncertainties and it has parallel computing capability based on multiple chains to accelerate the sampling process. This paper tests and demonstrates the MCMC capability using a 10-dimensional multimodal mathematical function, a 100-dimensional Gaussian function, and a groundwater reactive transport model. The use of the MCMC capability is made straightforward and flexible by adopting the JUPITER API protocol. With the new MCMC capability, UCODE_2014 can be used to calculate three types of uncertainty intervals, which all can account for prior information: (1) linear confidence intervals which require linearity and Gaussian error assumptions and typically 10s–100s of highly parallelizable model runs after optimization, (2) nonlinear confidence intervals which require a smooth objective function surface and Gaussian observation error assumptions and typically 100s–1,000s of partially parallelizable model runs after optimization, and (3) MCMC Bayesian credible intervals which require few assumptions and commonly 10,000s–100,000s or more partially parallelizable model runs. Ready access allows users to select methods best suited to their work, and to compare methods in many circumstances.
Modeling the Hyperdistribution of Item Parameters To Improve the Accuracy of Recovery in Estimation Procedures.

ERIC Educational Resources Information Center

Matthews-Lopez, Joy L.; Hombo, Catherine M.

The purpose of this study was to examine the recovery of item parameters in simulated Automatic Item Generation (AIG) conditions, using Markov chain Monte Carlo (MCMC) estimation methods to attempt to recover the generating distributions. To do this, variability in item and ability parameters was manipulated. Realistic AIG conditions were…
Marginal Maximum A Posteriori Item Parameter Estimation for the Generalized Graded Unfolding Model

ERIC Educational Resources Information Center

Roberts, James S.; Thompson, Vanessa M.

2011-01-01

A marginal maximum a posteriori (MMAP) procedure was implemented to estimate item parameters in the generalized graded unfolding model (GGUM). Estimates from the MMAP method were compared with those derived from marginal maximum likelihood (MML) and Markov chain Monte Carlo (MCMC) procedures in a recovery simulation that varied sample size,…
Maximal compression of the redshift-space galaxy power spectrum and bispectrum

NASA Astrophysics Data System (ADS)

Gualdi, Davide; Manera, Marc; Joachimi, Benjamin; Lahav, Ofer

2018-05-01

We explore two methods of compressing the redshift-space galaxy power spectrum and bispectrum with respect to a chosen set of cosmological parameters. Both methods involve reducing the dimension of the original data vector (e.g. 1000 elements) to the number of cosmological parameters considered (e.g. seven ) using the Karhunen-Loève algorithm. In the first case, we run MCMC sampling on the compressed data vector in order to recover the 1D and 2D posterior distributions. The second option, approximately 2000 times faster, works by orthogonalizing the parameter space through diagonalization of the Fisher information matrix before the compression, obtaining the posterior distributions without the need of MCMC sampling. Using these methods for future spectroscopic redshift surveys like DESI, Euclid, and PFS would drastically reduce the number of simulations needed to compute accurate covariance matrices with minimal loss of constraining power. We consider a redshift bin of a DESI-like experiment. Using the power spectrum combined with the bispectrum as a data vector, both compression methods on average recover the 68 {per cent} credible regions to within 0.7 {per cent} and 2 {per cent} of those resulting from standard MCMC sampling, respectively. These confidence intervals are also smaller than the ones obtained using only the power spectrum by 81 per cent, 80 per cent, and 82 per cent respectively, for the bias parameter b1, the growth rate f, and the scalar amplitude parameter As.
Geometric MCMC for infinite-dimensional inverse problems

NASA Astrophysics Data System (ADS)

Beskos, Alexandros; Girolami, Mark; Lan, Shiwei; Farrell, Patrick E.; Stuart, Andrew M.

2017-04-01

Bayesian inverse problems often involve sampling posterior distributions on infinite-dimensional function spaces. Traditional Markov chain Monte Carlo (MCMC) algorithms are characterized by deteriorating mixing times upon mesh-refinement, when the finite-dimensional approximations become more accurate. Such methods are typically forced to reduce step-sizes as the discretization gets finer, and thus are expensive as a function of dimension. Recently, a new class of MCMC methods with mesh-independent convergence times has emerged. However, few of them take into account the geometry of the posterior informed by the data. At the same time, recently developed geometric MCMC algorithms have been found to be powerful in exploring complicated distributions that deviate significantly from elliptic Gaussian laws, but are in general computationally intractable for models defined in infinite dimensions. In this work, we combine geometric methods on a finite-dimensional subspace with mesh-independent infinite-dimensional approaches. Our objective is to speed up MCMC mixing times, without significantly increasing the computational cost per step (for instance, in comparison with the vanilla preconditioned Crank-Nicolson (pCN) method). This is achieved by using ideas from geometric MCMC to probe the complex structure of an intrinsic finite-dimensional subspace where most data information concentrates, while retaining robust mixing times as the dimension grows by using pCN-like methods in the complementary subspace. The resulting algorithms are demonstrated in the context of three challenging inverse problems arising in subsurface flow, heat conduction and incompressible flow control. The algorithms exhibit up to two orders of magnitude improvement in sampling efficiency when compared with the pCN method.
An example of complex modelling in dentistry using Markov chain Monte Carlo (MCMC) simulation.

PubMed

Helfenstein, Ulrich; Menghini, Giorgio; Steiner, Marcel; Murati, Francesca

2002-09-01

In the usual regression setting one regression line is computed for a whole data set. In a more complex situation, each person may be observed for example at several points in time and thus a regression line might be calculated for each person. Additional complexities, such as various forms of errors in covariables may make a straightforward statistical evaluation difficult or even impossible. During recent years methods have been developed allowing convenient analysis of problems where the data and the corresponding models show these and many other forms of complexity. The methodology makes use of a Bayesian approach and Markov chain Monte Carlo (MCMC) simulations. The methods allow the construction of increasingly elaborate models by building them up from local sub-models. The essential structure of the models can be represented visually by directed acyclic graphs (DAG). This attractive property allows communication and discussion of the essential structure and the substantial meaning of a complex model without needing algebra. After presentation of the statistical methods an example from dentistry is presented in order to demonstrate their application and use. The dataset of the example had a complex structure; each of a set of children was followed up over several years. The number of new fillings in permanent teeth had been recorded at several ages. The dependent variables were markedly different from the normal distribution and could not be transformed to normality. In addition, explanatory variables were assumed to be measured with different forms of error. Illustration of how the corresponding models can be estimated conveniently via MCMC simulation, in particular, 'Gibbs sampling', using the freely available software BUGS is presented. In addition, how the measurement error may influence the estimates of the corresponding coefficients is explored. It is demonstrated that the effect of the independent variable on the dependent variable may be markedly underestimated if the measurement error is not taken into account ('regression dilution bias'). Markov chain Monte Carlo methods may be of great value to dentists in allowing analysis of data sets which exhibit a wide range of different forms of complexity.
Of bugs and birds: Markov Chain Monte Carlo for hierarchical modeling in wildlife research

USGS Publications Warehouse

Link, W.A.; Cam, E.; Nichols, J.D.; Cooch, E.G.

2002-01-01

Markov chain Monte Carlo (MCMC) is a statistical innovation that allows researchers to fit far more complex models to data than is feasible using conventional methods. Despite its widespread use in a variety of scientific fields, MCMC appears to be underutilized in wildlife applications. This may be due to a misconception that MCMC requires the adoption of a subjective Bayesian analysis, or perhaps simply to its lack of familiarity among wildlife researchers. We introduce the basic ideas of MCMC and software BUGS (Bayesian inference using Gibbs sampling), stressing that a simple and satisfactory intuition for MCMC does not require extraordinary mathematical sophistication. We illustrate the use of MCMC with an analysis of the association between latent factors governing individual heterogeneity in breeding and survival rates of kittiwakes (Rissa tridactyla). We conclude with a discussion of the importance of individual heterogeneity for understanding population dynamics and designing management plans.
A comparison between Gauss-Newton and Markov chain Monte Carlo basedmethods for inverting spectral induced polarization data for Cole-Coleparameters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Jinsong; Kemna, Andreas; Hubbard, Susan S.

2008-05-15

We develop a Bayesian model to invert spectral induced polarization (SIP) data for Cole-Cole parameters using Markov chain Monte Carlo (MCMC) sampling methods. We compare the performance of the MCMC based stochastic method with an iterative Gauss-Newton based deterministic method for Cole-Cole parameter estimation through inversion of synthetic and laboratory SIP data. The Gauss-Newton based method can provide an optimal solution for given objective functions under constraints, but the obtained optimal solution generally depends on the choice of initial values and the estimated uncertainty information is often inaccurate or insufficient. In contrast, the MCMC based inversion method provides extensive globalmore » information on unknown parameters, such as the marginal probability distribution functions, from which we can obtain better estimates and tighter uncertainty bounds of the parameters than with the deterministic method. Additionally, the results obtained with the MCMC method are independent of the choice of initial values. Because the MCMC based method does not explicitly offer single optimal solution for given objective functions, the deterministic and stochastic methods can complement each other. For example, the stochastic method can first be used to obtain the means of the unknown parameters by starting from an arbitrary set of initial values and the deterministic method can then be initiated using the means as starting values to obtain the optimal estimates of the Cole-Cole parameters.« less
Estimation Methods for One-Parameter Testlet Models

ERIC Educational Resources Information Center

Jiao, Hong; Wang, Shudong; He, Wei

2013-01-01

This study demonstrated the equivalence between the Rasch testlet model and the three-level one-parameter testlet model and explored the Markov Chain Monte Carlo (MCMC) method for model parameter estimation in WINBUGS. The estimation accuracy from the MCMC method was compared with those from the marginalized maximum likelihood estimation (MMLE)…
Searching for efficient Markov chain Monte Carlo proposal kernels

PubMed Central

Yang, Ziheng; Rodríguez, Carlos E.

2013-01-01

Markov chain Monte Carlo (MCMC) or the Metropolis–Hastings algorithm is a simulation algorithm that has made modern Bayesian statistical inference possible. Nevertheless, the efficiency of different Metropolis–Hastings proposal kernels has rarely been studied except for the Gaussian proposal. Here we propose a unique class of Bactrian kernels, which avoid proposing values that are very close to the current value, and compare their efficiency with a number of proposals for simulating different target distributions, with efficiency measured by the asymptotic variance of a parameter estimate. The uniform kernel is found to be more efficient than the Gaussian kernel, whereas the Bactrian kernel is even better. When optimal scales are used for both, the Bactrian kernel is at least 50% more efficient than the Gaussian. Implementation in a Bayesian program for molecular clock dating confirms the general applicability of our results to generic MCMC algorithms. Our results refute a previous claim that all proposals had nearly identical performance and will prompt further research into efficient MCMC proposals. PMID:24218600
Hierarchical multistage MCMC follow-up of continuous gravitational wave candidates

NASA Astrophysics Data System (ADS)

Ashton, G.; Prix, R.

2018-05-01

Leveraging Markov chain Monte Carlo optimization of the F statistic, we introduce a method for the hierarchical follow-up of continuous gravitational wave candidates identified by wide-parameter space semicoherent searches. We demonstrate parameter estimation for continuous wave sources and develop a framework and tools to understand and control the effective size of the parameter space, critical to the success of the method. Monte Carlo tests of simulated signals in noise demonstrate that this method is close to the theoretical optimal performance.
Modelling the spread of American foulbrood in honeybees

PubMed Central

Datta, Samik; Bull, James C.; Budge, Giles E.; Keeling, Matt J.

2013-01-01

We investigate the spread of American foulbrood (AFB), a disease caused by the bacterium Paenibacillus larvae, that affects bees and can be extremely damaging to beehives. Our dataset comes from an inspection period carried out during an AFB epidemic of honeybee colonies on the island of Jersey during the summer of 2010. The data include the number of hives of honeybees, location and owner of honeybee apiaries across the island. We use a spatial SIR model with an underlying owner network to simulate the epidemic and characterize the epidemic using a Markov chain Monte Carlo (MCMC) scheme to determine model parameters and infection times (including undetected ‘occult’ infections). Likely methods of infection spread can be inferred from the analysis, with both distance- and owner-based transmissions being found to contribute to the spread of AFB. The results of the MCMC are corroborated by simulating the epidemic using a stochastic SIR model, resulting in aggregate levels of infection that are comparable to the data. We use this stochastic SIR model to simulate the impact of different control strategies on controlling the epidemic. It is found that earlier inspections result in smaller epidemics and a higher likelihood of AFB extinction. PMID:24026473
Bayesian inference for dynamic transcriptional regulation; the Hes1 system as a case study.

PubMed

Heron, Elizabeth A; Finkenstädt, Bärbel; Rand, David A

2007-10-01

In this study, we address the problem of estimating the parameters of regulatory networks and provide the first application of Markov chain Monte Carlo (MCMC) methods to experimental data. As a case study, we consider a stochastic model of the Hes1 system expressed in terms of stochastic differential equations (SDEs) to which rigorous likelihood methods of inference can be applied. When fitting continuous-time stochastic models to discretely observed time series the lengths of the sampling intervals are important, and much of our study addresses the problem when the data are sparse. We estimate the parameters of an autoregulatory network providing results both for simulated and real experimental data from the Hes1 system. We develop an estimation algorithm using MCMC techniques which are flexible enough to allow for the imputation of latent data on a finer time scale and the presence of prior information about parameters which may be informed from other experiments as well as additional measurement error.
Smoothing spline ANOVA frailty model for recurrent event data.

PubMed

Du, Pang; Jiang, Yihua; Wang, Yuedong

2011-12-01

Gap time hazard estimation is of particular interest in recurrent event data. This article proposes a fully nonparametric approach for estimating the gap time hazard. Smoothing spline analysis of variance (ANOVA) decompositions are used to model the log gap time hazard as a joint function of gap time and covariates, and general frailty is introduced to account for between-subject heterogeneity and within-subject correlation. We estimate the nonparametric gap time hazard function and parameters in the frailty distribution using a combination of the Newton-Raphson procedure, the stochastic approximation algorithm (SAA), and the Markov chain Monte Carlo (MCMC) method. The convergence of the algorithm is guaranteed by decreasing the step size of parameter update and/or increasing the MCMC sample size along iterations. Model selection procedure is also developed to identify negligible components in a functional ANOVA decomposition of the log gap time hazard. We evaluate the proposed methods with simulation studies and illustrate its use through the analysis of bladder tumor data. © 2011, The International Biometric Society.
Protein structure refinement using a quantum mechanics-based chemical shielding predictor.

PubMed

Bratholm, Lars A; Jensen, Jan H

2017-03-01

The accurate prediction of protein chemical shifts using a quantum mechanics (QM)-based method has been the subject of intense research for more than 20 years but so far empirical methods for chemical shift prediction have proven more accurate. In this paper we show that a QM-based predictor of a protein backbone and CB chemical shifts (ProCS15, PeerJ , 2016, 3, e1344) is of comparable accuracy to empirical chemical shift predictors after chemical shift-based structural refinement that removes small structural errors. We present a method by which quantum chemistry based predictions of isotropic chemical shielding values (ProCS15) can be used to refine protein structures using Markov Chain Monte Carlo (MCMC) simulations, relating the chemical shielding values to the experimental chemical shifts probabilistically. Two kinds of MCMC structural refinement simulations were performed using force field geometry optimized X-ray structures as starting points: simulated annealing of the starting structure and constant temperature MCMC simulation followed by simulated annealing of a representative ensemble structure. Annealing of the CHARMM structure changes the CA-RMSD by an average of 0.4 Å but lowers the chemical shift RMSD by 1.0 and 0.7 ppm for CA and N. Conformational averaging has a relatively small effect (0.1-0.2 ppm) on the overall agreement with carbon chemical shifts but lowers the error for nitrogen chemical shifts by 0.4 ppm. If an amino acid specific offset is included the ProCS15 predicted chemical shifts have RMSD values relative to experiments that are comparable to popular empirical chemical shift predictors. The annealed representative ensemble structures differ in CA-RMSD relative to the initial structures by an average of 2.0 Å, with >2.0 Å difference for six proteins. In four of the cases, the largest structural differences arise in structurally flexible regions of the protein as determined by NMR, and in the remaining two cases, the large structural change may be due to force field deficiencies. The overall accuracy of the empirical methods are slightly improved by annealing the CHARMM structure with ProCS15, which may suggest that the minor structural changes introduced by ProCS15-based annealing improves the accuracy of the protein structures. Having established that QM-based chemical shift prediction can deliver the same accuracy as empirical shift predictors we hope this can help increase the accuracy of related approaches such as QM/MM or linear scaling approaches or interpreting protein structural dynamics from QM-derived chemical shift.
Uncertainty Analysis Based on Sparse Grid Collocation and Quasi-Monte Carlo Sampling with Application in Groundwater Modeling

NASA Astrophysics Data System (ADS)

Zhang, G.; Lu, D.; Ye, M.; Gunzburger, M.

2011-12-01

Markov Chain Monte Carlo (MCMC) methods have been widely used in many fields of uncertainty analysis to estimate the posterior distributions of parameters and credible intervals of predictions in the Bayesian framework. However, in practice, MCMC may be computationally unaffordable due to slow convergence and the excessive number of forward model executions required, especially when the forward model is expensive to compute. Both disadvantages arise from the curse of dimensionality, i.e., the posterior distribution is usually a multivariate function of parameters. Recently, sparse grid method has been demonstrated to be an effective technique for coping with high-dimensional interpolation or integration problems. Thus, in order to accelerate the forward model and avoid the slow convergence of MCMC, we propose a new method for uncertainty analysis based on sparse grid interpolation and quasi-Monte Carlo sampling. First, we construct a polynomial approximation of the forward model in the parameter space by using the sparse grid interpolation. This approximation then defines an accurate surrogate posterior distribution that can be evaluated repeatedly at minimal computational cost. Second, instead of using MCMC, a quasi-Monte Carlo method is applied to draw samples in the parameter space. Then, the desired probability density function of each prediction is approximated by accumulating the posterior density values of all the samples according to the prediction values. Our method has the following advantages: (1) the polynomial approximation of the forward model on the sparse grid provides a very efficient evaluation of the surrogate posterior distribution; (2) the quasi-Monte Carlo method retains the same accuracy in approximating the PDF of predictions but avoids all disadvantages of MCMC. The proposed method is applied to a controlled numerical experiment of groundwater flow modeling. The results show that our method attains the same accuracy much more efficiently than traditional MCMC.
Learning an Eddy Viscosity Model Using Shrinkage and Bayesian Calibration: A Jet-in-Crossflow Case Study

DOE PAGES

Ray, Jaideep; Lefantzi, Sophia; Arunajatesan, Srinivasan; ...

2017-09-07

In this paper, we demonstrate a statistical procedure for learning a high-order eddy viscosity model (EVM) from experimental data and using it to improve the predictive skill of a Reynolds-averaged Navier–Stokes (RANS) simulator. The method is tested in a three-dimensional (3D), transonic jet-in-crossflow (JIC) configuration. The process starts with a cubic eddy viscosity model (CEVM) developed for incompressible flows. It is fitted to limited experimental JIC data using shrinkage regression. The shrinkage process removes all the terms from the model, except an intercept, a linear term, and a quadratic one involving the square of the vorticity. The shrunk eddy viscositymore » model is implemented in an RANS simulator and calibrated, using vorticity measurements, to infer three parameters. The calibration is Bayesian and is solved using a Markov chain Monte Carlo (MCMC) method. A 3D probability density distribution for the inferred parameters is constructed, thus quantifying the uncertainty in the estimate. The phenomenal cost of using a 3D flow simulator inside an MCMC loop is mitigated by using surrogate models (“curve-fits”). A support vector machine classifier (SVMC) is used to impose our prior belief regarding parameter values, specifically to exclude nonphysical parameter combinations. The calibrated model is compared, in terms of its predictive skill, to simulations using uncalibrated linear and CEVMs. Finally, we find that the calibrated model, with one quadratic term, is more accurate than the uncalibrated simulator. The model is also checked at a flow condition at which the model was not calibrated.« less

Learning an Eddy Viscosity Model Using Shrinkage and Bayesian Calibration: A Jet-in-Crossflow Case Study

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ray, Jaideep; Lefantzi, Sophia; Arunajatesan, Srinivasan

In this paper, we demonstrate a statistical procedure for learning a high-order eddy viscosity model (EVM) from experimental data and using it to improve the predictive skill of a Reynolds-averaged Navier–Stokes (RANS) simulator. The method is tested in a three-dimensional (3D), transonic jet-in-crossflow (JIC) configuration. The process starts with a cubic eddy viscosity model (CEVM) developed for incompressible flows. It is fitted to limited experimental JIC data using shrinkage regression. The shrinkage process removes all the terms from the model, except an intercept, a linear term, and a quadratic one involving the square of the vorticity. The shrunk eddy viscositymore » model is implemented in an RANS simulator and calibrated, using vorticity measurements, to infer three parameters. The calibration is Bayesian and is solved using a Markov chain Monte Carlo (MCMC) method. A 3D probability density distribution for the inferred parameters is constructed, thus quantifying the uncertainty in the estimate. The phenomenal cost of using a 3D flow simulator inside an MCMC loop is mitigated by using surrogate models (“curve-fits”). A support vector machine classifier (SVMC) is used to impose our prior belief regarding parameter values, specifically to exclude nonphysical parameter combinations. The calibrated model is compared, in terms of its predictive skill, to simulations using uncalibrated linear and CEVMs. Finally, we find that the calibrated model, with one quadratic term, is more accurate than the uncalibrated simulator. The model is also checked at a flow condition at which the model was not calibrated.« less
Birth-death prior on phylogeny and speed dating

PubMed Central

2008-01-01

Background In recent years there has been a trend of leaving the strict molecular clock in order to infer dating of speciations and other evolutionary events. Explicit modeling of substitution rates and divergence times makes formulation of informative prior distributions for branch lengths possible. Models with birth-death priors on tree branching and auto-correlated or iid substitution rates among lineages have been proposed, enabling simultaneous inference of substitution rates and divergence times. This problem has, however, mainly been analysed in the Markov chain Monte Carlo (MCMC) framework, an approach requiring computation times of hours or days when applied to large phylogenies. Results We demonstrate that a hill-climbing maximum a posteriori (MAP) adaptation of the MCMC scheme results in considerable gain in computational efficiency. We demonstrate also that a novel dynamic programming (DP) algorithm for branch length factorization, useful both in the hill-climbing and in the MCMC setting, further reduces computation time. For the problem of inferring rates and times parameters on a fixed tree, we perform simulations, comparisons between hill-climbing and MCMC on a plant rbcL gene dataset, and dating analysis on an animal mtDNA dataset, showing that our methodology enables efficient, highly accurate analysis of very large trees. Datasets requiring a computation time of several days with MCMC can with our MAP algorithm be accurately analysed in less than a minute. From the results of our example analyses, we conclude that our methodology generally avoids getting trapped early in local optima. For the cases where this nevertheless can be a problem, for instance when we in addition to the parameters also infer the tree topology, we show that the problem can be evaded by using a simulated-annealing like (SAL) method in which we favour tree swaps early in the inference while biasing our focus towards rate and time parameter changes later on. Conclusion Our contribution leaves the field open for fast and accurate dating analysis of nucleotide sequence data. Modeling branch substitutions rates and divergence times separately allows us to include birth-death priors on the times without the assumption of a molecular clock. The methodology is easily adapted to take data from fossil records into account and it can be used together with a broad range of rate and substitution models. PMID:18318893
Birth-death prior on phylogeny and speed dating.

PubMed

Akerborg, Orjan; Sennblad, Bengt; Lagergren, Jens

2008-03-04

In recent years there has been a trend of leaving the strict molecular clock in order to infer dating of speciations and other evolutionary events. Explicit modeling of substitution rates and divergence times makes formulation of informative prior distributions for branch lengths possible. Models with birth-death priors on tree branching and auto-correlated or iid substitution rates among lineages have been proposed, enabling simultaneous inference of substitution rates and divergence times. This problem has, however, mainly been analysed in the Markov chain Monte Carlo (MCMC) framework, an approach requiring computation times of hours or days when applied to large phylogenies. We demonstrate that a hill-climbing maximum a posteriori (MAP) adaptation of the MCMC scheme results in considerable gain in computational efficiency. We demonstrate also that a novel dynamic programming (DP) algorithm for branch length factorization, useful both in the hill-climbing and in the MCMC setting, further reduces computation time. For the problem of inferring rates and times parameters on a fixed tree, we perform simulations, comparisons between hill-climbing and MCMC on a plant rbcL gene dataset, and dating analysis on an animal mtDNA dataset, showing that our methodology enables efficient, highly accurate analysis of very large trees. Datasets requiring a computation time of several days with MCMC can with our MAP algorithm be accurately analysed in less than a minute. From the results of our example analyses, we conclude that our methodology generally avoids getting trapped early in local optima. For the cases where this nevertheless can be a problem, for instance when we in addition to the parameters also infer the tree topology, we show that the problem can be evaded by using a simulated-annealing like (SAL) method in which we favour tree swaps early in the inference while biasing our focus towards rate and time parameter changes later on. Our contribution leaves the field open for fast and accurate dating analysis of nucleotide sequence data. Modeling branch substitutions rates and divergence times separately allows us to include birth-death priors on the times without the assumption of a molecular clock. The methodology is easily adapted to take data from fossil records into account and it can be used together with a broad range of rate and substitution models.
Reparametrization-based estimation of genetic parameters in multi-trait animal model using Integrated Nested Laplace Approximation.

PubMed

Mathew, Boby; Holand, Anna Marie; Koistinen, Petri; Léon, Jens; Sillanpää, Mikko J

2016-02-01

A novel reparametrization-based INLA approach as a fast alternative to MCMC for the Bayesian estimation of genetic parameters in multivariate animal model is presented. Multi-trait genetic parameter estimation is a relevant topic in animal and plant breeding programs because multi-trait analysis can take into account the genetic correlation between different traits and that significantly improves the accuracy of the genetic parameter estimates. Generally, multi-trait analysis is computationally demanding and requires initial estimates of genetic and residual correlations among the traits, while those are difficult to obtain. In this study, we illustrate how to reparametrize covariance matrices of a multivariate animal model/animal models using modified Cholesky decompositions. This reparametrization-based approach is used in the Integrated Nested Laplace Approximation (INLA) methodology to estimate genetic parameters of multivariate animal model. Immediate benefits are: (1) to avoid difficulties of finding good starting values for analysis which can be a problem, for example in Restricted Maximum Likelihood (REML); (2) Bayesian estimation of (co)variance components using INLA is faster to execute than using Markov Chain Monte Carlo (MCMC) especially when realized relationship matrices are dense. The slight drawback is that priors for covariance matrices are assigned for elements of the Cholesky factor but not directly to the covariance matrix elements as in MCMC. Additionally, we illustrate the concordance of the INLA results with the traditional methods like MCMC and REML approaches. We also present results obtained from simulated data sets with replicates and field data in rice.
Bayesian Inference on Malignant Breast Cancer in Nigeria: A Diagnosis of MCMC Convergence

PubMed Central

Ogunsakin, Ropo Ebenezer; Siaka, Lougue

2017-01-01

Background: There has been no previous study to classify malignant breast tumor in details based on Markov Chain Monte Carlo (MCMC) convergence in Western, Nigeria. This study therefore aims to profile patients living with benign and malignant breast tumor in two different hospitals among women of Western Nigeria, with a focus on prognostic factors and MCMC convergence. Materials and Methods: A hospital-based record was used to identify prognostic factors for malignant breast cancer among women of Western Nigeria. This paper describes Bayesian inference and demonstrates its usage to estimation of parameters of the logistic regression via Markov Chain Monte Carlo (MCMC) algorithm. The result of the Bayesian approach is compared with the classical statistics. Results: The mean age of the respondents was 42.2 ±16.6 years with 52% of the women aged between 35-49 years. The results of both techniques suggest that age and women with at least high school education have a significantly higher risk of being diagnosed with malignant breast tumors than benign breast tumors. The results also indicate a reduction of standard errors is associated with the coefficients obtained from the Bayesian approach. In addition, simulation result reveal that women with at least high school are 1.3 times more at risk of having malignant breast lesion in western Nigeria compared to benign breast lesion. Conclusion: We concluded that more efforts are required towards creating awareness and advocacy campaigns on how the prevalence of malignant breast lesions can be reduced, especially among women. The application of Bayesian produces precise estimates for modeling malignant breast cancer. PMID:29072396
A New Approach for Obtaining Cosmological Constraints from Type Ia Supernovae using Approximate Bayesian Computation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jennings, Elise; Wolf, Rachel; Sako, Masao

2016-11-09

Cosmological parameter estimation techniques that robustly account for systematic measurement uncertainties will be crucial for the next generation of cosmological surveys. We present a new analysis method, superABC, for obtaining cosmological constraints from Type Ia supernova (SN Ia) light curves using Approximate Bayesian Computation (ABC) without any likelihood assumptions. The ABC method works by using a forward model simulation of the data where systematic uncertainties can be simulated and marginalized over. A key feature of the method presented here is the use of two distinct metrics, the `Tripp' and `Light Curve' metrics, which allow us to compare the simulated data to the observed data set. The Tripp metric takes as input the parameters of models fit to each light curve with the SALT-II method, whereas the Light Curve metric uses the measured fluxes directly without model fitting. We apply the superABC sampler to a simulated data set ofmore » $$\\sim$$1000 SNe corresponding to the first season of the Dark Energy Survey Supernova Program. Varying $$\\Omega_m, w_0, \\alpha$$ and $$\\beta$$ and a magnitude offset parameter, with no systematics we obtain $$\\Delta(w_0) = w_0^{\\rm true} - w_0^{\\rm best \\, fit} = -0.036\\pm0.109$$ (a $$\\sim11$$% 1$$\\sigma$$ uncertainty) using the Tripp metric and $$\\Delta(w_0) = -0.055\\pm0.068$$ (a $$\\sim7$$% 1$$\\sigma$$ uncertainty) using the Light Curve metric. Including 1% calibration uncertainties in four passbands, adding 4 more parameters, we obtain $$\\Delta(w_0) = -0.062\\pm0.132$$ (a $$\\sim14$$% 1$$\\sigma$$ uncertainty) using the Tripp metric. Overall we find a $17$% increase in the uncertainty on $$w_0$$ with systematics compared to without. We contrast this with a MCMC approach where systematic effects are approximately included. We find that the MCMC method slightly underestimates the impact of calibration uncertainties for this simulated data set.« less
Hydrologic Process Parameterization of Electrical Resistivity Imaging of Solute Plumes Using POD McMC

NASA Astrophysics Data System (ADS)

Awatey, M. T.; Irving, J.; Oware, E. K.

2016-12-01

Markov chain Monte Carlo (McMC) inversion frameworks are becoming increasingly popular in geophysics due to their ability to recover multiple equally plausible geologic features that honor the limited noisy measurements. Standard McMC methods, however, become computationally intractable with increasing dimensionality of the problem, for example, when working with spatially distributed geophysical parameter fields. We present a McMC approach based on a sparse proper orthogonal decomposition (POD) model parameterization that implicitly incorporates the physics of the underlying process. First, we generate training images (TIs) via Monte Carlo simulations of the target process constrained to a conceptual model. We then apply POD to construct basis vectors from the TIs. A small number of basis vectors can represent most of the variability in the TIs, leading to dimensionality reduction. A projection of the starting model into the reduced basis space generates the starting POD coefficients. At each iteration, only coefficients within a specified sampling window are resimulated assuming a Gaussian prior. The sampling window grows at a specified rate as the number of iteration progresses starting from the coefficients corresponding to the highest ranked basis to those of the least informative basis. We found this gradual increment in the sampling window to be more stable compared to resampling all the coefficients right from the first iteration. We demonstrate the performance of the algorithm with both synthetic and lab-scale electrical resistivity imaging of saline tracer experiments, employing the same set of basis vectors for all inversions. We consider two scenarios of unimodal and bimodal plumes. The unimodal plume is consistent with the hypothesis underlying the generation of the TIs whereas bimodality in plume morphology was not theorized. We show that uncertainty quantification using McMC can proceed in the reduced dimensionality space while accounting for the physics of the underlying process.
Modeling and Bayesian parameter estimation for shape memory alloy bending actuators

NASA Astrophysics Data System (ADS)

Crews, John H.; Smith, Ralph C.

2012-04-01

In this paper, we employ a homogenized energy model (HEM) for shape memory alloy (SMA) bending actuators. Additionally, we utilize a Bayesian method for quantifying parameter uncertainty. The system consists of a SMA wire attached to a flexible beam. As the actuator is heated, the beam bends, providing endoscopic motion. The model parameters are fit to experimental data using an ordinary least-squares approach. The uncertainty in the fit model parameters is then quantified using Markov Chain Monte Carlo (MCMC) methods. The MCMC algorithm provides bounds on the parameters, which will ultimately be used in robust control algorithms. One purpose of the paper is to test the feasibility of the Random Walk Metropolis algorithm, the MCMC method used here.
An NCME Instructional Module on Estimating Item Response Theory Models Using Markov Chain Monte Carlo Methods

ERIC Educational Resources Information Center

Kim, Jee-Seon; Bolt, Daniel M.

2007-01-01

The purpose of this ITEMS module is to provide an introduction to Markov chain Monte Carlo (MCMC) estimation for item response models. A brief description of Bayesian inference is followed by an overview of the various facets of MCMC algorithms, including discussion of prior specification, sampling procedures, and methods for evaluating chain…
An investigation into exoplanet transits and uncertainties

NASA Astrophysics Data System (ADS)

Ji, Y.; Banks, T.; Budding, E.; Rhodes, M. D.

2017-06-01

A simple transit model is described along with tests of this model against published results for 4 exoplanet systems (Kepler-1, 2, 8, and 77). Data from the Kepler mission are used. The Markov Chain Monte Carlo (MCMC) method is applied to obtain realistic error estimates. Optimisation of limb darkening coefficients is subject to data quality. It is more likely for MCMC to derive an empirical limb darkening coefficient for light curves with S/N (signal to noise) above 15. Finally, the model is applied to Kepler data for 4 Kepler candidate systems (KOI 760.01, 767.01, 802.01, and 824.01) with previously unpublished results. Error estimates for these systems are obtained via the MCMC method.
Protein structure refinement using a quantum mechanics-based chemical shielding predictor† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c6sc04344e Click here for additional data file.

PubMed Central

2017-01-01

The accurate prediction of protein chemical shifts using a quantum mechanics (QM)-based method has been the subject of intense research for more than 20 years but so far empirical methods for chemical shift prediction have proven more accurate. In this paper we show that a QM-based predictor of a protein backbone and CB chemical shifts (ProCS15, PeerJ, 2016, 3, e1344) is of comparable accuracy to empirical chemical shift predictors after chemical shift-based structural refinement that removes small structural errors. We present a method by which quantum chemistry based predictions of isotropic chemical shielding values (ProCS15) can be used to refine protein structures using Markov Chain Monte Carlo (MCMC) simulations, relating the chemical shielding values to the experimental chemical shifts probabilistically. Two kinds of MCMC structural refinement simulations were performed using force field geometry optimized X-ray structures as starting points: simulated annealing of the starting structure and constant temperature MCMC simulation followed by simulated annealing of a representative ensemble structure. Annealing of the CHARMM structure changes the CA-RMSD by an average of 0.4 Å but lowers the chemical shift RMSD by 1.0 and 0.7 ppm for CA and N. Conformational averaging has a relatively small effect (0.1–0.2 ppm) on the overall agreement with carbon chemical shifts but lowers the error for nitrogen chemical shifts by 0.4 ppm. If an amino acid specific offset is included the ProCS15 predicted chemical shifts have RMSD values relative to experiments that are comparable to popular empirical chemical shift predictors. The annealed representative ensemble structures differ in CA-RMSD relative to the initial structures by an average of 2.0 Å, with >2.0 Å difference for six proteins. In four of the cases, the largest structural differences arise in structurally flexible regions of the protein as determined by NMR, and in the remaining two cases, the large structural change may be due to force field deficiencies. The overall accuracy of the empirical methods are slightly improved by annealing the CHARMM structure with ProCS15, which may suggest that the minor structural changes introduced by ProCS15-based annealing improves the accuracy of the protein structures. Having established that QM-based chemical shift prediction can deliver the same accuracy as empirical shift predictors we hope this can help increase the accuracy of related approaches such as QM/MM or linear scaling approaches or interpreting protein structural dynamics from QM-derived chemical shift. PMID:28451325
Incorporating approximation error in surrogate based Bayesian inversion

NASA Astrophysics Data System (ADS)

Zhang, J.; Zeng, L.; Li, W.; Wu, L.

2015-12-01

There are increasing interests in applying surrogates for inverse Bayesian modeling to reduce repetitive evaluations of original model. In this way, the computational cost is expected to be saved. However, the approximation error of surrogate model is usually overlooked. This is partly because that it is difficult to evaluate the approximation error for many surrogates. Previous studies have shown that, the direct combination of surrogates and Bayesian methods (e.g., Markov Chain Monte Carlo, MCMC) may lead to biased estimations when the surrogate cannot emulate the highly nonlinear original system. This problem can be alleviated by implementing MCMC in a two-stage manner. However, the computational cost is still high since a relatively large number of original model simulations are required. In this study, we illustrate the importance of incorporating approximation error in inverse Bayesian modeling. Gaussian process (GP) is chosen to construct the surrogate for its convenience in approximation error evaluation. Numerical cases of Bayesian experimental design and parameter estimation for contaminant source identification are used to illustrate this idea. It is shown that, once the surrogate approximation error is well incorporated into Bayesian framework, promising results can be obtained even when the surrogate is directly used, and no further original model simulations are required.
Invited commentary: Lost in estimation--searching for alternatives to markov chains to fit complex Bayesian models.

PubMed

Molitor, John

2012-03-01

Bayesian methods have seen an increase in popularity in a wide variety of scientific fields, including epidemiology. One of the main reasons for their widespread application is the power of the Markov chain Monte Carlo (MCMC) techniques generally used to fit these models. As a result, researchers often implicitly associate Bayesian models with MCMC estimation procedures. However, Bayesian models do not always require Markov-chain-based methods for parameter estimation. This is important, as MCMC estimation methods, while generally quite powerful, are complex and computationally expensive and suffer from convergence problems related to the manner in which they generate correlated samples used to estimate probability distributions for parameters of interest. In this issue of the Journal, Cole et al. (Am J Epidemiol. 2012;175(5):368-375) present an interesting paper that discusses non-Markov-chain-based approaches to fitting Bayesian models. These methods, though limited, can overcome some of the problems associated with MCMC techniques and promise to provide simpler approaches to fitting Bayesian models. Applied researchers will find these estimation approaches intuitively appealing and will gain a deeper understanding of Bayesian models through their use. However, readers should be aware that other non-Markov-chain-based methods are currently in active development and have been widely published in other fields.
Exact Bayesian Inference for Phylogenetic Birth-Death Models.

PubMed

Parag, K V; Pybus, O G

2018-04-26

Inferring the rates of change of a population from a reconstructed phylogeny of genetic sequences is a central problem in macro-evolutionary biology, epidemiology, and many other disciplines. A popular solution involves estimating the parameters of a birth-death process (BDP), which links the shape of the phylogeny to its birth and death rates. Modern BDP estimators rely on random Markov chain Monte Carlo (MCMC) sampling to infer these rates. Such methods, while powerful and scalable, cannot be guaranteed to converge, leading to results that may be hard to replicate or difficult to validate. We present a conceptually and computationally different parametric BDP inference approach using flexible and easy to implement Snyder filter (SF) algorithms. This method is deterministic so its results are provable, guaranteed, and reproducible. We validate the SF on constant rate BDPs and find that it solves BDP likelihoods known to produce robust estimates. We then examine more complex BDPs with time-varying rates. Our estimates compare well with a recently developed parametric MCMC inference method. Lastly, we performmodel selection on an empirical Agamid species phylogeny, obtaining results consistent with the literature. The SF makes no approximations, beyond those required for parameter quantisation and numerical integration, and directly computes the posterior distribution of model parameters. It is a promising alternative inference algorithm that may serve either as a standalone Bayesian estimator or as a useful diagnostic reference for validating more involved MCMC strategies. The Snyder filter is implemented in Matlab and the time-varying BDP models are simulated in R. The source code and data are freely available at https://github.com/kpzoo/snyder-birth-death-code. kris.parag@zoo.ox.ac.uk. Supplementary material is available at Bioinformatics online.
Multi-objective calibration and uncertainty analysis of hydrologic models; A comparative study between formal and informal methods

NASA Astrophysics Data System (ADS)

Shafii, M.; Tolson, B.; Matott, L. S.

2012-04-01

Hydrologic modeling has benefited from significant developments over the past two decades. This has resulted in building of higher levels of complexity into hydrologic models, which eventually makes the model evaluation process (parameter estimation via calibration and uncertainty analysis) more challenging. In order to avoid unreasonable parameter estimates, many researchers have suggested implementation of multi-criteria calibration schemes. Furthermore, for predictive hydrologic models to be useful, proper consideration of uncertainty is essential. Consequently, recent research has emphasized comprehensive model assessment procedures in which multi-criteria parameter estimation is combined with statistically-based uncertainty analysis routines such as Bayesian inference using Markov Chain Monte Carlo (MCMC) sampling. Such a procedure relies on the use of formal likelihood functions based on statistical assumptions, and moreover, the Bayesian inference structured on MCMC samplers requires a considerably large number of simulations. Due to these issues, especially in complex non-linear hydrological models, a variety of alternative informal approaches have been proposed for uncertainty analysis in the multi-criteria context. This study aims at exploring a number of such informal uncertainty analysis techniques in multi-criteria calibration of hydrological models. The informal methods addressed in this study are (i) Pareto optimality which quantifies the parameter uncertainty using the Pareto solutions, (ii) DDS-AU which uses the weighted sum of objective functions to derive the prediction limits, and (iii) GLUE which describes the total uncertainty through identification of behavioral solutions. The main objective is to compare such methods with MCMC-based Bayesian inference with respect to factors such as computational burden, and predictive capacity, which are evaluated based on multiple comparative measures. The measures for comparison are calculated both for calibration and evaluation periods. The uncertainty analysis methodologies are applied to a simple 5-parameter rainfall-runoff model, called HYMOD.
Using SAS PROC MCMC for Item Response Theory Models

PubMed Central

Samonte, Kelli

2014-01-01

Interest in using Bayesian methods for estimating item response theory models has grown at a remarkable rate in recent years. This attentiveness to Bayesian estimation has also inspired a growth in available software such as WinBUGS, R packages, BMIRT, MPLUS, and SAS PROC MCMC. This article intends to provide an accessible overview of Bayesian methods in the context of item response theory to serve as a useful guide for practitioners in estimating and interpreting item response theory (IRT) models. Included is a description of the estimation procedure used by SAS PROC MCMC. Syntax is provided for estimation of both dichotomous and polytomous IRT models, as well as a discussion on how to extend the syntax to accommodate more complex IRT models. PMID:29795834
Bayesian-MCMC-based parameter estimation of stealth aircraft RCS models

NASA Astrophysics Data System (ADS)

Xia, Wei; Dai, Xiao-Xia; Feng, Yuan

2015-12-01

When modeling a stealth aircraft with low RCS (Radar Cross Section), conventional parameter estimation methods may cause a deviation from the actual distribution, owing to the fact that the characteristic parameters are estimated via directly calculating the statistics of RCS. The Bayesian-Markov Chain Monte Carlo (Bayesian-MCMC) method is introduced herein to estimate the parameters so as to improve the fitting accuracies of fluctuation models. The parameter estimations of the lognormal and the Legendre polynomial models are reformulated in the Bayesian framework. The MCMC algorithm is then adopted to calculate the parameter estimates. Numerical results show that the distribution curves obtained by the proposed method exhibit improved consistence with the actual ones, compared with those fitted by the conventional method. The fitting accuracy could be improved by no less than 25% for both fluctuation models, which implies that the Bayesian-MCMC method might be a good candidate among the optimal parameter estimation methods for stealth aircraft RCS models. Project supported by the National Natural Science Foundation of China (Grant No. 61101173), the National Basic Research Program of China (Grant No. 613206), the National High Technology Research and Development Program of China (Grant No. 2012AA01A308), the State Scholarship Fund by the China Scholarship Council (CSC), and the Oversea Academic Training Funds, and University of Electronic Science and Technology of China (UESTC).
Two-dimensional probabilistic inversion of plane-wave electromagnetic data: methodology, model constraints and joint inversion with electrical resistivity data

NASA Astrophysics Data System (ADS)

Rosas-Carbajal, Marina; Linde, Niklas; Kalscheuer, Thomas; Vrugt, Jasper A.

2014-03-01

Probabilistic inversion methods based on Markov chain Monte Carlo (MCMC) simulation are well suited to quantify parameter and model uncertainty of nonlinear inverse problems. Yet, application of such methods to CPU-intensive forward models can be a daunting task, particularly if the parameter space is high dimensional. Here, we present a 2-D pixel-based MCMC inversion of plane-wave electromagnetic (EM) data. Using synthetic data, we investigate how model parameter uncertainty depends on model structure constraints using different norms of the likelihood function and the model constraints, and study the added benefits of joint inversion of EM and electrical resistivity tomography (ERT) data. Our results demonstrate that model structure constraints are necessary to stabilize the MCMC inversion results of a highly discretized model. These constraints decrease model parameter uncertainty and facilitate model interpretation. A drawback is that these constraints may lead to posterior distributions that do not fully include the true underlying model, because some of its features exhibit a low sensitivity to the EM data, and hence are difficult to resolve. This problem can be partly mitigated if the plane-wave EM data is augmented with ERT observations. The hierarchical Bayesian inverse formulation introduced and used herein is able to successfully recover the probabilistic properties of the measurement data errors and a model regularization weight. Application of the proposed inversion methodology to field data from an aquifer demonstrates that the posterior mean model realization is very similar to that derived from a deterministic inversion with similar model constraints.
Model Reduction via Principe Component Analysis and Markov Chain Monte Carlo (MCMC) Methods

NASA Astrophysics Data System (ADS)

Gong, R.; Chen, J.; Hoversten, M. G.; Luo, J.

2011-12-01

Geophysical and hydrogeological inverse problems often include a large number of unknown parameters, ranging from hundreds to millions, depending on parameterization and problems undertaking. This makes inverse estimation and uncertainty quantification very challenging, especially for those problems in two- or three-dimensional spatial domains. Model reduction technique has the potential of mitigating the curse of dimensionality by reducing total numbers of unknowns while describing the complex subsurface systems adequately. In this study, we explore the use of principal component analysis (PCA) and Markov chain Monte Carlo (MCMC) sampling methods for model reduction through the use of synthetic datasets. We compare the performances of three different but closely related model reduction approaches: (1) PCA methods with geometric sampling (referred to as 'Method 1'), (2) PCA methods with MCMC sampling (referred to as 'Method 2'), and (3) PCA methods with MCMC sampling and inclusion of random effects (referred to as 'Method 3'). We consider a simple convolution model with five unknown parameters as our goal is to understand and visualize the advantages and disadvantages of each method by comparing their inversion results with the corresponding analytical solutions. We generated synthetic data with noise added and invert them under two different situations: (1) the noised data and the covariance matrix for PCA analysis are consistent (referred to as the unbiased case), and (2) the noise data and the covariance matrix are inconsistent (referred to as biased case). In the unbiased case, comparison between the analytical solutions and the inversion results show that all three methods provide good estimates of the true values and Method 1 is computationally more efficient. In terms of uncertainty quantification, Method 1 performs poorly because of relatively small number of samples obtained, Method 2 performs best, and Method 3 overestimates uncertainty due to inclusion of random effects. However, in the biased case, only Method 3 correctly estimates all the unknown parameters, and both Methods 1 and 2 provide wrong values for the biased parameters. The synthetic case study demonstrates that if the covariance matrix for PCA analysis is inconsistent with true models, the PCA methods with geometric or MCMC sampling will provide incorrect estimates.
Application of the Markov Chain Monte Carlo method for snow water equivalent retrieval based on passive microwave measurements

NASA Astrophysics Data System (ADS)

Pan, J.; Durand, M. T.; Vanderjagt, B. J.

2015-12-01

Markov Chain Monte Carlo (MCMC) method is a retrieval algorithm based on Bayes' rule, which starts from an initial state of snow/soil parameters, and updates it to a series of new states by comparing the posterior probability of simulated snow microwave signals before and after each time of random walk. It is a realization of the Bayes' rule, which gives an approximation to the probability of the snow/soil parameters in condition of the measured microwave TB signals at different bands. Although this method could solve all snow parameters including depth, density, snow grain size and temperature at the same time, it still needs prior information of these parameters for posterior probability calculation. How the priors will influence the SWE retrieval is a big concern. Therefore, in this paper at first, a sensitivity test will be carried out to study how accurate the snow emission models and how explicit the snow priors need to be to maintain the SWE error within certain amount. The synthetic TB simulated from the measured snow properties plus a 2-K observation error will be used for this purpose. It aims to provide a guidance on the MCMC application under different circumstances. Later, the method will be used for the snowpits at different sites, including Sodankyla, Finland, Churchill, Canada and Colorado, USA, using the measured TB from ground-based radiometers at different bands. Based on the previous work, the error in these practical cases will be studied, and the error sources will be separated and quantified.

Stochastic Simulation and Forecast of Hydrologic Time Series Based on Probabilistic Chaos Expansion

NASA Astrophysics Data System (ADS)

Li, Z.; Ghaith, M.

2017-12-01

Hydrological processes are characterized by many complex features, such as nonlinearity, dynamics and uncertainty. How to quantify and address such complexities and uncertainties has been a challenging task for water engineers and managers for decades. To support robust uncertainty analysis, an innovative approach for the stochastic simulation and forecast of hydrologic time series is developed is this study. Probabilistic Chaos Expansions (PCEs) are established through probabilistic collocation to tackle uncertainties associated with the parameters of traditional hydrological models. The uncertainties are quantified in model outputs as Hermite polynomials with regard to standard normal random variables. Sequentially, multivariate analysis techniques are used to analyze the complex nonlinear relationships between meteorological inputs (e.g., temperature, precipitation, evapotranspiration, etc.) and the coefficients of the Hermite polynomials. With the established relationships between model inputs and PCE coefficients, forecasts of hydrologic time series can be generated and the uncertainties in the future time series can be further tackled. The proposed approach is demonstrated using a case study in China and is compared to a traditional stochastic simulation technique, the Markov-Chain Monte-Carlo (MCMC) method. Results show that the proposed approach can serve as a reliable proxy to complicated hydrological models. It can provide probabilistic forecasting in a more computationally efficient manner, compared to the traditional MCMC method. This work provides technical support for addressing uncertainties associated with hydrological modeling and for enhancing the reliability of hydrological modeling results. Applications of the developed approach can be extended to many other complicated geophysical and environmental modeling systems to support the associated uncertainty quantification and risk analysis.
Using SAS PROC MCMC for Item Response Theory Models

ERIC Educational Resources Information Center

Ames, Allison J.; Samonte, Kelli

2015-01-01

Interest in using Bayesian methods for estimating item response theory models has grown at a remarkable rate in recent years. This attentiveness to Bayesian estimation has also inspired a growth in available software such as WinBUGS, R packages, BMIRT, MPLUS, and SAS PROC MCMC. This article intends to provide an accessible overview of Bayesian…
Copula-based assessment of the relationship between food peaks and flood volumes using information on historical floods by Bayesian Monte Carlo Markov Chain simulations

NASA Astrophysics Data System (ADS)

Gaál, Ladislav; Szolgay, Ján.; Bacigál, Tomáå.¡; Kohnová, Silvia

2010-05-01

Copula-based estimation methods of hydro-climatological extremes have increasingly been gaining attention of researchers and practitioners in the last couple of years. Unlike the traditional estimation methods which are based on bivariate cumulative distribution functions (CDFs), copulas are a relatively flexible tool of statistics that allow for modelling dependencies between two or more variables such as flood peaks and flood volumes without making strict assumptions on the marginal distributions. The dependence structure and the reliability of the joint estimates of hydro-climatological extremes, mainly in the right tail of the joint CDF not only depends on the particular copula adopted but also on the data available for the estimation of the marginal distributions of the individual variables. Generally, data samples for frequency modelling have limited temporal extent, which is a considerable drawback of frequency analyses in practice. Therefore, it is advised to deal with statistical methods that improve any part of the process of copula construction and result in more reliable design values of hydrological variables. The scarcity of the data sample mostly in the extreme tail of the joint CDF can be bypassed, e.g., by using a considerably larger amount of simulated data by rainfall-runoff analysis or by including historical information on the variables under study. The latter approach of data extension is used here to make the quantile estimates of the individual marginals of the copula more reliable. In the presented paper it is proposed to use historical information in the frequency analysis of the marginal distributions in the framework of Bayesian Monte Carlo Markov Chain (MCMC) simulations. Generally, a Bayesian approach allows for a straightforward combination of different sources of information on floods (e.g. flood data from systematic measurements and historical flood records, respectively) in terms of a product of the corresponding likelihood functions. On the other hand, the MCMC algorithm is a numerical approach for sampling from the likelihood distributions. The Bayesian MCMC methods therefore provide an attractive way to estimate the uncertainty in parameters and quantile metrics of frequency distributions. The applicability of the method is demonstrated in a case study of the hydroelectric power station Orlík on the Vltava River. This site has a key role in the flood prevention of Prague, the capital city of the Czech Republic. The record length of the available flood data is 126 years from the period 1877-2002, while the flood event observed in 2002 that caused extensive damages and numerous casualties is treated as a historic one. To estimate the joint probabilities of flood peaks and volumes, different copulas are fitted and their goodness-of-fit are evaluated by bootstrap simulations. Finally, selected quantiles of flood volumes conditioned on given flood peaks are derived and compared with those obtained by the traditional method used in the practice of water management specialists of the Vltava River.
Comparison of missing value imputation methods in time series: the case of Turkish meteorological data

NASA Astrophysics Data System (ADS)

Yozgatligil, Ceylan; Aslan, Sipan; Iyigun, Cem; Batmaz, Inci

2013-04-01

This study aims to compare several imputation methods to complete the missing values of spatio-temporal meteorological time series. To this end, six imputation methods are assessed with respect to various criteria including accuracy, robustness, precision, and efficiency for artificially created missing data in monthly total precipitation and mean temperature series obtained from the Turkish State Meteorological Service. Of these methods, simple arithmetic average, normal ratio (NR), and NR weighted with correlations comprise the simple ones, whereas multilayer perceptron type neural network and multiple imputation strategy adopted by Monte Carlo Markov Chain based on expectation-maximization (EM-MCMC) are computationally intensive ones. In addition, we propose a modification on the EM-MCMC method. Besides using a conventional accuracy measure based on squared errors, we also suggest the correlation dimension (CD) technique of nonlinear dynamic time series analysis which takes spatio-temporal dependencies into account for evaluating imputation performances. Depending on the detailed graphical and quantitative analysis, it can be said that although computational methods, particularly EM-MCMC method, are computationally inefficient, they seem favorable for imputation of meteorological time series with respect to different missingness periods considering both measures and both series studied. To conclude, using the EM-MCMC algorithm for imputing missing values before conducting any statistical analyses of meteorological data will definitely decrease the amount of uncertainty and give more robust results. Moreover, the CD measure can be suggested for the performance evaluation of missing data imputation particularly with computational methods since it gives more precise results in meteorological time series.
Comparing hierarchical models via the marginalized deviance information criterion.

PubMed

Quintero, Adrian; Lesaffre, Emmanuel

2018-07-20

Hierarchical models are extensively used in pharmacokinetics and longitudinal studies. When the estimation is performed from a Bayesian approach, model comparison is often based on the deviance information criterion (DIC). In hierarchical models with latent variables, there are several versions of this statistic: the conditional DIC (cDIC) that incorporates the latent variables in the focus of the analysis and the marginalized DIC (mDIC) that integrates them out. Regardless of the asymptotic and coherency difficulties of cDIC, this alternative is usually used in Markov chain Monte Carlo (MCMC) methods for hierarchical models because of practical convenience. The mDIC criterion is more appropriate in most cases but requires integration of the likelihood, which is computationally demanding and not implemented in Bayesian software. Therefore, we consider a method to compute mDIC by generating replicate samples of the latent variables that need to be integrated out. This alternative can be easily conducted from the MCMC output of Bayesian packages and is widely applicable to hierarchical models in general. Additionally, we propose some approximations in order to reduce the computational complexity for large-sample situations. The method is illustrated with simulated data sets and 2 medical studies, evidencing that cDIC may be misleading whilst mDIC appears pertinent. Copyright © 2018 John Wiley & Sons, Ltd.
Accounting for the measurement error of spectroscopically inferred soil carbon data for improved precision of spatial predictions.

PubMed

Somarathna, P D S N; Minasny, Budiman; Malone, Brendan P; Stockmann, Uta; McBratney, Alex B

2018-08-01

Spatial modelling of environmental data commonly only considers spatial variability as the single source of uncertainty. In reality however, the measurement errors should also be accounted for. In recent years, infrared spectroscopy has been shown to offer low cost, yet invaluable information needed for digital soil mapping at meaningful spatial scales for land management. However, spectrally inferred soil carbon data are known to be less accurate compared to laboratory analysed measurements. This study establishes a methodology to filter out the measurement error variability by incorporating the measurement error variance in the spatial covariance structure of the model. The study was carried out in the Lower Hunter Valley, New South Wales, Australia where a combination of laboratory measured, and vis-NIR and MIR inferred topsoil and subsoil soil carbon data are available. We investigated the applicability of residual maximum likelihood (REML) and Markov Chain Monte Carlo (MCMC) simulation methods to generate parameters of the Matérn covariance function directly from the data in the presence of measurement error. The results revealed that the measurement error can be effectively filtered-out through the proposed technique. When the measurement error was filtered from the data, the prediction variance almost halved, which ultimately yielded a greater certainty in spatial predictions of soil carbon. Further, the MCMC technique was successfully used to define the posterior distribution of measurement error. This is an important outcome, as the MCMC technique can be used to estimate the measurement error if it is not explicitly quantified. Although this study dealt with soil carbon data, this method is amenable for filtering the measurement error of any kind of continuous spatial environmental data. Copyright © 2018 Elsevier B.V. All rights reserved.
Lotic ecosystem response to chronic metal contamination assessed by the resazurin-resorufin smart tracer with data assimilation by the Markov chain Monte Carlo method

NASA Astrophysics Data System (ADS)

Stanaway, D. J.; Flores, A. N.; Haggerty, R.; Benner, S. G.; Feris, K. P.

2011-12-01

Concurrent assessment of biogeochemical and solute transport data (i.e. advection, dispersion, transient storage) within lotic systems remains a challenge in eco-hydrological research. Recently, the Resazurin-Resorufin Smart Tracer System (RRST) was proposed as a mechanism to measure microbial activity at the sediment-water interface [Haggerty et al., 2008, 2009] associating metabolic and hydrologic processes and allowing for the reach scale extrapolation of biotic function in the context of a dynamic physical environment. This study presents a Markov Chain Monte Carlo (MCMC) data assimilation technique to solve the inverse model of the Raz Rru Advection Dispersion Equation (RRADE). The RRADE is a suite of dependent 1-D reactive ADEs, associated through the microbially mediated reduction of Raz to Rru (k12). This reduction is proportional to DO consumption (R^2=0.928). MCMC is a suite of algorithms that solve Bayes theorem to condition uncertain model states and parameters on imperfect observations. Here, the RRST is employed to quantify the effect of chronic metal exposure on hyporheic microbial metabolism along a 100+ year old metal contamination gradient in the Clark Fork River (CF). We hypothesized that 1) the energetic cost of metal tolerance limits heterotrophic microbial respiration in communities evolved in chronic metal contaminated environments, with respiration inhibition directly correlated to degree of contamination (observational experiment) and 2) when experiencing acute metal stress, respiration rate inhibition of metal tolerant communities is less than that of naïve communities (manipulative experiment). To test these hypotheses, 4 replicate columns containing sediment collected from differently contaminated CF reaches and reference sites were fed a solution of RRST, NaCl, and cadmium (manipulative experiment only) within 24 hrs post collection. Column effluent was collected and measured for Raz, Rru, and EC to determine the Raz Rru breakthrough curves (BTC), subsequently modeled by the RRADE and thereby allowing derivation of in situ rates of metabolism. RRADE parameter values are estimated through Metropolis Hastings MCMC optimization. Unknown prior parameter distributions (PD) were constrained via a sensitivity analysis, except for the empirically estimated velocity. MCMC simulations were initiated at random points within the PD. Convergence of target distributions (TD) is achieved when the variance of the mode values of the six RRADE parameters in independent model replication is at least 10^{-3} less than the mode value. Convergence of k12, the parameter of interest, was more resolved, with modal variance of replicate simulations ranging from 10^{-4} less than the modal value to 0. The MCMC algorithm presented here offers a robust approach to solve the inverse RRST model and could be easily adapted to other inverse problems.
Teaching Markov Chain Monte Carlo: Revealing the Basic Ideas behind the Algorithm

ERIC Educational Resources Information Center

Stewart, Wayne; Stewart, Sepideh

2014-01-01

For many scientists, researchers and students Markov chain Monte Carlo (MCMC) simulation is an important and necessary tool to perform Bayesian analyses. The simulation is often presented as a mathematical algorithm and then translated into an appropriate computer program. However, this can result in overlooking the fundamental and deeper…
Enhancing Data Assimilation by Evolutionary Particle Filter and Markov Chain Monte Carlo

NASA Astrophysics Data System (ADS)

Moradkhani, H.; Abbaszadeh, P.; Yan, H.

2016-12-01

Particle Filters (PFs) have received increasing attention by the researchers from different disciplines in hydro-geosciences as an effective method to improve model predictions in nonlinear and non-Gaussian dynamical systems. The implication of dual state and parameter estimation by means of data assimilation in hydrology and geoscience has evolved since 2005 from SIR-PF to PF-MCMC and now to the most effective and robust framework through evolutionary PF approach based on Genetic Algorithm (GA) and Markov Chain Monte Carlo (MCMC), the so-called EPF-MCMC. In this framework, the posterior distribution undergoes an evolutionary process to update an ensemble of prior states that more closely resemble realistic posterior probability distribution. The premise of this approach is that the particles move to optimal position using the GA optimization coupled with MCMC increasing the number of effective particles, hence the particle degeneracy is avoided while the particle diversity is improved. The proposed algorithm is applied on a conceptual and highly nonlinear hydrologic model and the effectiveness, robustness and reliability of the method in jointly estimating the states and parameters and also reducing the uncertainty is demonstrated for few river basins across the United States.
Markov Chain Monte Carlo estimation of species distributions: a case study of the swift fox in western Kansas

USGS Publications Warehouse

Sargeant, Glen A.; Sovada, Marsha A.; Slivinski, Christiane C.; Johnson, Douglas H.

2005-01-01

Accurate maps of species distributions are essential tools for wildlife research and conservation. Unfortunately, biologists often are forced to rely on maps derived from observed occurrences recorded opportunistically during observation periods of variable length. Spurious inferences are likely to result because such maps are profoundly affected by the duration and intensity of observation and by methods used to delineate distributions, especially when detection is uncertain. We conducted a systematic survey of swift fox (Vulpes velox) distribution in western Kansas, USA, and used Markov chain Monte Carlo (MCMC) image restoration to rectify these problems. During 1997–1999, we searched 355 townships (ca. 93 km) 1–3 times each for an average cost of $7,315 per year and achieved a detection rate (probability of detecting swift foxes, if present, during a single search) of = 0.69 (95% Bayesian confidence interval [BCI] = [0.60, 0.77]). Our analysis produced an estimate of the underlying distribution, rather than a map of observed occurrences, that reflected the uncertainty associated with estimates of model parameters. To evaluate our results, we analyzed simulated data with similar properties. Results of our simulations suggest negligible bias and good precision when probabilities of detection on ≥1 survey occasions (cumulative probabilities of detection) exceed 0.65. Although the use of MCMC image restoration has been limited by theoretical and computational complexities, alternatives do not possess the same advantages. Image models accommodate uncertain detection, do not require spatially independent data or a census of map units, and can be used to estimate species distributions directly from observations without relying on habitat covariates or parameters that must be estimated subjectively. These features facilitate economical surveys of large regions, the detection of temporal trends in distribution, and assessments of landscape-level relations between species and habitats. Requirements for the use of MCMC image restoration include study areas that can be partitioned into regular grids of mapping units, spatially contagious species distributions, reliable methods for identifying target species, and cumulative probabilities of detection ≥0.65.
Markov chain Monte Carlo estimation of species distributions: A case study of the swift fox in western Kansas

USGS Publications Warehouse

Sargeant, G.A.; Sovada, M.A.; Slivinski, C.C.; Johnson, D.H.

2005-01-01

Accurate maps of species distributions are essential tools for wildlife research and conservation. Unfortunately, biologists often are forced to rely on maps derived from observed occurrences recorded opportunistically during observation periods of variable length. Spurious inferences are likely to result because such maps are profoundly affected by the duration and intensity of observation and by methods used to delineate distributions, especially when detection is uncertain. We conducted a systematic survey of swift fox (Vulpes velox) distribution in western Kansas, USA, and used Markov chain Monte Carlo (MCMC) image restoration to rectify these problems. During 1997-1999, we searched 355 townships (ca. 93 km2) 1-3 times each for an average cost of $7,315 per year and achieved a detection rate (probability of detecting swift foxes, if present, during a single search) of ?? = 0.69 (95% Bayesian confidence interval [BCI] = [0.60, 0.77]). Our analysis produced an estimate of the underlying distribution, rather than a map of observed occurrences, that reflected the uncertainty associated with estimates of model parameters. To evaluate our results, we analyzed simulated data with similar properties. Results of our simulations suggest negligible bias and good precision when probabilities of detection on ???1 survey occasions (cumulative probabilities of detection) exceed 0.65. Although the use of MCMC image restoration has been limited by theoretical and computational complexities, alternatives do not possess the same advantages. Image models accommodate uncertain detection, do not require spatially independent data or a census of map units, and can be used to estimate species distributions directly from observations without relying on habitat covariates or parameters that must be estimated subjectively. These features facilitate economical surveys of large regions, the detection of temporal trends in distribution, and assessments of landscape-level relations between species and habitats. Requirements for the use of MCMC image restoration include study areas that can be partitioned into regular grids of mapping units, spatially contagious species distributions, reliable methods for identifying target species, and cumulative probabilities of detection ???0.65.
Estimation of the four-wave mixing noise probability-density function by the multicanonical Monte Carlo method.

PubMed

Neokosmidis, Ioannis; Kamalakis, Thomas; Chipouras, Aristides; Sphicopoulos, Thomas

2005-01-01

The performance of high-powered wavelength-division multiplexed (WDM) optical networks can be severely degraded by four-wave-mixing- (FWM-) induced distortion. The multicanonical Monte Carlo method (MCMC) is used to calculate the probability-density function (PDF) of the decision variable of a receiver, limited by FWM noise. Compared with the conventional Monte Carlo method previously used to estimate this PDF, the MCMC method is much faster and can accurately estimate smaller error probabilities. The method takes into account the correlation between the components of the FWM noise, unlike the Gaussian model, which is shown not to provide accurate results.
Bayesian inversion using a geologically realistic and discrete model space

NASA Astrophysics Data System (ADS)

Jaeggli, C.; Julien, S.; Renard, P.

2017-12-01

Since the early days of groundwater modeling, inverse methods play a crucial role. Many research and engineering groups aim to infer extensive knowledge of aquifer parameters from a sparse set of observations. Despite decades of dedicated research on this topic, there are still several major issues to be solved. In the hydrogeological framework, one is often confronted with underground structures that present very sharp contrasts of geophysical properties. In particular, subsoil structures such as karst conduits, channels, faults, or lenses, strongly influence groundwater flow and transport behavior of the underground. For this reason it can be essential to identify their location and shape very precisely. Unfortunately, when inverse methods are specially trained to consider such complex features, their computation effort often becomes unaffordably high. The following work is an attempt to solve this dilemma. We present a new method that is, in some sense, a compromise between the ergodicity of Markov chain Monte Carlo (McMC) methods and the efficient handling of data by the ensemble based Kalmann filters. The realistic and complex random fields are generated by a Multiple-Point Statistics (MPS) tool. Nonetheless, it is applicable with any conditional geostatistical simulation tool. Furthermore, the algorithm is independent of any parametrization what becomes most important when two parametric systems are equivalent (permeability and resistivity, speed and slowness, etc.). When compared to two existing McMC schemes, the computational effort was divided by a factor of 12.
emcee: The MCMC Hammer

NASA Astrophysics Data System (ADS)

Foreman-Mackey, Daniel; Hogg, David W.; Lang, Dustin; Goodman, Jonathan

2013-03-01

We introduce a stable, well tested Python implementation of the affine-invariant ensemble sampler for Markov chain Monte Carlo (MCMC) proposed by Goodman & Weare (2010). The code is open source and has already been used in several published projects in the astrophysics literature. The algorithm behind emcee has several advantages over traditional MCMC sampling methods and it has excellent performance as measured by the autocorrelation time (or function calls per independent sample). One major advantage of the algorithm is that it requires hand-tuning of only 1 or 2 parameters compared to ˜N2 for a traditional algorithm in an N-dimensional parameter space. In this document, we describe the algorithm and the details of our implementation. Exploiting the parallelism of the ensemble method, emcee permits any user to take advantage of multiple CPU cores without extra effort. The code is available online at http://dan.iel.fm/emcee under the GNU General Public License v2.
The use of simple reparameterizations to improve the efficiency of Markov chain Monte Carlo estimation for multilevel models with applications to discrete time survival models.

PubMed

Browne, William J; Steele, Fiona; Golalizadeh, Mousa; Green, Martin J

2009-06-01

We consider the application of Markov chain Monte Carlo (MCMC) estimation methods to random-effects models and in particular the family of discrete time survival models. Survival models can be used in many situations in the medical and social sciences and we illustrate their use through two examples that differ in terms of both substantive area and data structure. A multilevel discrete time survival analysis involves expanding the data set so that the model can be cast as a standard multilevel binary response model. For such models it has been shown that MCMC methods have advantages in terms of reducing estimate bias. However, the data expansion results in very large data sets for which MCMC estimation is often slow and can produce chains that exhibit poor mixing. Any way of improving the mixing will result in both speeding up the methods and more confidence in the estimates that are produced. The MCMC methodological literature is full of alternative algorithms designed to improve mixing of chains and we describe three reparameterization techniques that are easy to implement in available software. We consider two examples of multilevel survival analysis: incidence of mastitis in dairy cattle and contraceptive use dynamics in Indonesia. For each application we show where the reparameterization techniques can be used and assess their performance.
Hierarchical models and the analysis of bird survey information

USGS Publications Warehouse

Sauer, J.R.; Link, W.A.

2003-01-01

Management of birds often requires analysis of collections of estimates. We describe a hierarchical modeling approach to the analysis of these data, in which parameters associated with the individual species estimates are treated as random variables, and probability statements are made about the species parameters conditioned on the data. A Markov-Chain Monte Carlo (MCMC) procedure is used to fit the hierarchical model. This approach is computer intensive, and is based upon simulation. MCMC allows for estimation both of parameters and of derived statistics. To illustrate the application of this method, we use the case in which we are interested in attributes of a collection of estimates of population change. Using data for 28 species of grassland-breeding birds from the North American Breeding Bird Survey, we estimate the number of species with increasing populations, provide precision-adjusted rankings of species trends, and describe a measure of population stability as the probability that the trend for a species is within a certain interval. Hierarchical models can be applied to a variety of bird survey applications, and we are investigating their use in estimation of population change from survey data.
HMO selection and Medicare costs: Bayesian MCMC estimation of a robust panel data tobit model with survival.

PubMed

Hamilton, B H

1999-08-01

The fraction of US Medicare recipients enrolled in health maintenance organizations (HMOs) has increased substantially over the past 10 years. However, the impact of HMOs on health care costs is still hotly debated. In particular, it is argued that HMOs achieve cost reduction through 'cream-skimming' and enrolling relatively healthy patients. This paper develops a Bayesian panel data tobit model of HMO selection and Medicare expenditures for recent US retirees that accounts for mortality over the course of the panel. The model is estimated using Markov Chain Monte Carlo (MCMC) simulation methods, and is novel in that a multivariate t-link is used in place of normality to allow for the heavy-tailed distributions often found in health care expenditure data. The findings indicate that HMOs select individuals who are less likely to have positive health care expenditures prior to enrollment. However, there is no evidence that HMOs disenrol high cost patients. The results also indicate the importance of accounting for survival over the panel, since high mortality probabilities are associated with higher health care expenditures in the last year of life.
Bayesian inference based on dual generalized order statistics from the exponentiated Weibull model

NASA Astrophysics Data System (ADS)

Al Sobhi, Mashail M.

2015-02-01

Bayesian estimation for the two parameters and the reliability function of the exponentiated Weibull model are obtained based on dual generalized order statistics (DGOS). Also, Bayesian prediction bounds for future DGOS from exponentiated Weibull model are obtained. The symmetric and asymmetric loss functions are considered for Bayesian computations. The Markov chain Monte Carlo (MCMC) methods are used for computing the Bayes estimates and prediction bounds. The results have been specialized to the lower record values. Comparisons are made between Bayesian and maximum likelihood estimators via Monte Carlo simulation.
Mining geographic variations of Plasmodium vivax for active surveillance: a case study in China.

PubMed

Shi, Benyun; Tan, Qi; Zhou, Xiao-Nong; Liu, Jiming

2015-05-27

Geographic variations of an infectious disease characterize the spatial differentiation of disease incidences caused by various impact factors, such as environmental, demographic, and socioeconomic factors. Some factors may directly determine the force of infection of the disease (namely, explicit factors), while many other factors may indirectly affect the number of disease incidences via certain unmeasurable processes (namely, implicit factors). In this study, the impact of heterogeneous factors on geographic variations of Plasmodium vivax incidences is systematically investigate in Tengchong, Yunnan province, China. A space-time model that resembles a P. vivax transmission model and a hidden time-dependent process, is presented by taking into consideration both explicit and implicit factors. Specifically, the transmission model is built upon relevant demographic, environmental, and biophysical factors to describe the local infections of P. vivax. While the hidden time-dependent process is assessed by several socioeconomic factors to account for the imported cases of P. vivax. To quantitatively assess the impact of heterogeneous factors on geographic variations of P. vivax infections, a Markov chain Monte Carlo (MCMC) simulation method is developed to estimate the model parameters by fitting the space-time model to the reported spatial-temporal disease incidences. Since there is no ground-truth information available, the performance of the MCMC method is first evaluated against a synthetic dataset. The results show that the model parameters can be well estimated using the proposed MCMC method. Then, the proposed model is applied to investigate the geographic variations of P. vivax incidences among all 18 towns in Tengchong, Yunnan province, China. Based on the geographic variations, the 18 towns can be further classify into five groups with similar socioeconomic causality for P. vivax incidences. Although this study focuses mainly on the transmission of P. vivax, the proposed space-time model is general and can readily be extended to investigate geographic variations of other diseases. Practically, such a computational model will offer new insights into active surveillance and strategic planning for disease surveillance and control.
Markov Chain Monte Carlo Estimation of Item Parameters for the Generalized Graded Unfolding Model

ERIC Educational Resources Information Center

de la Torre, Jimmy; Stark, Stephen; Chernyshenko, Oleksandr S.

2006-01-01

The authors present a Markov Chain Monte Carlo (MCMC) parameter estimation procedure for the generalized graded unfolding model (GGUM) and compare it to the marginal maximum likelihood (MML) approach implemented in the GGUM2000 computer program, using simulated and real personality data. In the simulation study, test length, number of response…

Efficient estimation of ideal-observer performance in classification tasks involving high-dimensional complex backgrounds

PubMed Central

Park, Subok; Clarkson, Eric

2010-01-01

The Bayesian ideal observer is optimal among all observers and sets an absolute upper bound for the performance of any observer in classification tasks [Van Trees, Detection, Estimation, and Modulation Theory, Part I (Academic, 1968).]. Therefore, the ideal observer should be used for objective image quality assessment whenever possible. However, computation of ideal-observer performance is difficult in practice because this observer requires the full description of unknown, statistical properties of high-dimensional, complex data arising in real life problems. Previously, Markov-chain Monte Carlo (MCMC) methods were developed by Kupinski et al. [J. Opt. Soc. Am. A 20, 430(2003) ] and by Park et al. [J. Opt. Soc. Am. A 24, B136 (2007) and IEEE Trans. Med. Imaging 28, 657 (2009) ] to estimate the performance of the ideal observer and the channelized ideal observer (CIO), respectively, in classification tasks involving non-Gaussian random backgrounds. However, both algorithms had the disadvantage of long computation times. We propose a fast MCMC for real-time estimation of the likelihood ratio for the CIO. Our simulation results show that our method has the potential to speed up ideal-observer performance in tasks involving complex data when efficient channels are used for the CIO. PMID:19884916
A sampling algorithm for segregation analysis

PubMed Central

Tier, Bruce; Henshall, John

2001-01-01

Methods for detecting Quantitative Trait Loci (QTL) without markers have generally used iterative peeling algorithms for determining genotype probabilities. These algorithms have considerable shortcomings in complex pedigrees. A Monte Carlo Markov chain (MCMC) method which samples the pedigree of the whole population jointly is described. Simultaneous sampling of the pedigree was achieved by sampling descent graphs using the Metropolis-Hastings algorithm. A descent graph describes the inheritance state of each allele and provides pedigrees guaranteed to be consistent with Mendelian sampling. Sampling descent graphs overcomes most, if not all, of the limitations incurred by iterative peeling algorithms. The algorithm was able to find the QTL in most of the simulated populations. However, when the QTL was not modeled or found then its effect was ascribed to the polygenic component. No QTL were detected when they were not simulated. PMID:11742631
Genomic prediction using an iterative conditional expectation algorithm for a fast BayesC-like model.

PubMed

Dong, Linsong; Wang, Zhiyong

2018-06-11

Genomic prediction is feasible for estimating genomic breeding values because of dense genome-wide markers and credible statistical methods, such as Genomic Best Linear Unbiased Prediction (GBLUP) and various Bayesian methods. Compared with GBLUP, Bayesian methods propose more flexible assumptions for the distributions of SNP effects. However, most Bayesian methods are performed based on Markov chain Monte Carlo (MCMC) algorithms, leading to computational efficiency challenges. Hence, some fast Bayesian approaches, such as fast BayesB (fBayesB), were proposed to speed up the calculation. This study proposed another fast Bayesian method termed fast BayesC (fBayesC). The prior distribution of fBayesC assumes that a SNP with probability γ has a non-zero effect which comes from a normal density with a common variance. The simulated data from QTLMAS XII workshop and actual data on large yellow croaker were used to compare the predictive results of fBayesB, fBayesC and (MCMC-based) BayesC. The results showed that when γ was set as a small value, such as 0.01 in the simulated data or 0.001 in the actual data, fBayesB and fBayesC yielded lower prediction accuracies (abilities) than BayesC. In the actual data, fBayesC could yield very similar predictive abilities as BayesC when γ ≥ 0.01. When γ = 0.01, fBayesB could also yield similar results as fBayesC and BayesC. However, fBayesB could not yield an explicit result when γ ≥ 0.1, but a similar situation was not observed for fBayesC. Moreover, the computational speed of fBayesC was significantly faster than that of BayesC, making fBayesC a promising method for genomic prediction.
Bayesian seismic tomography by parallel interacting Markov chains

NASA Astrophysics Data System (ADS)

Gesret, Alexandrine; Bottero, Alexis; Romary, Thomas; Noble, Mark; Desassis, Nicolas

2014-05-01

The velocity field estimated by first arrival traveltime tomography is commonly used as a starting point for further seismological, mineralogical, tectonic or similar analysis. In order to interpret quantitatively the results, the tomography uncertainty values as well as their spatial distribution are required. The estimated velocity model is obtained through inverse modeling by minimizing an objective function that compares observed and computed traveltimes. This step is often performed by gradient-based optimization algorithms. The major drawback of such local optimization schemes, beyond the possibility of being trapped in a local minimum, is that they do not account for the multiple possible solutions of the inverse problem. They are therefore unable to assess the uncertainties linked to the solution. Within a Bayesian (probabilistic) framework, solving the tomography inverse problem aims at estimating the posterior probability density function of velocity model using a global sampling algorithm. Markov chains Monte-Carlo (MCMC) methods are known to produce samples of virtually any distribution. In such a Bayesian inversion, the total number of simulations we can afford is highly related to the computational cost of the forward model. Although fast algorithms have been recently developed for computing first arrival traveltimes of seismic waves, the complete browsing of the posterior distribution of velocity model is hardly performed, especially when it is high dimensional and/or multimodal. In the latter case, the chain may even stay stuck in one of the modes. In order to improve the mixing properties of classical single MCMC, we propose to make interact several Markov chains at different temperatures. This method can make efficient use of large CPU clusters, without increasing the global computational cost with respect to classical MCMC and is therefore particularly suited for Bayesian inversion. The exchanges between the chains allow a precise sampling of the high probability zones of the model space while avoiding the chains to end stuck in a probability maximum. This approach supplies thus a robust way to analyze the tomography imaging uncertainties. The interacting MCMC approach is illustrated on two synthetic examples of tomography of calibration shots such as encountered in induced microseismic studies. On the second application, a wavelet based model parameterization is presented that allows to significantly reduce the dimension of the problem, making thus the algorithm efficient even for a complex velocity model.
Quantifying MCMC exploration of phylogenetic tree space.

PubMed

Whidden, Chris; Matsen, Frederick A

2015-05-01

In order to gain an understanding of the effectiveness of phylogenetic Markov chain Monte Carlo (MCMC), it is important to understand how quickly the empirical distribution of the MCMC converges to the posterior distribution. In this article, we investigate this problem on phylogenetic tree topologies with a metric that is especially well suited to the task: the subtree prune-and-regraft (SPR) metric. This metric directly corresponds to the minimum number of MCMC rearrangements required to move between trees in common phylogenetic MCMC implementations. We develop a novel graph-based approach to analyze tree posteriors and find that the SPR metric is much more informative than simpler metrics that are unrelated to MCMC moves. In doing so, we show conclusively that topological peaks do occur in Bayesian phylogenetic posteriors from real data sets as sampled with standard MCMC approaches, investigate the efficiency of Metropolis-coupled MCMC (MCMCMC) in traversing the valleys between peaks, and show that conditional clade distribution (CCD) can have systematic problems when there are multiple peaks. © The Author(s) 2015. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.
Efficient Bayesian parameter estimation with implicit sampling and surrogate modeling for a vadose zone hydrological problem

NASA Astrophysics Data System (ADS)

Liu, Y.; Pau, G. S. H.; Finsterle, S.

2015-12-01

Parameter inversion involves inferring the model parameter values based on sparse observations of some observables. To infer the posterior probability distributions of the parameters, Markov chain Monte Carlo (MCMC) methods are typically used. However, the large number of forward simulations needed and limited computational resources limit the complexity of the hydrological model we can use in these methods. In view of this, we studied the implicit sampling (IS) method, an efficient importance sampling technique that generates samples in the high-probability region of the posterior distribution and thus reduces the number of forward simulations that we need to run. For a pilot-point inversion of a heterogeneous permeability field based on a synthetic ponded infiltration experiment simulated with TOUGH2 (a subsurface modeling code), we showed that IS with linear map provides an accurate Bayesian description of the parameterized permeability field at the pilot points with just approximately 500 forward simulations. We further studied the use of surrogate models to improve the computational efficiency of parameter inversion. We implemented two reduced-order models (ROMs) for the TOUGH2 forward model. One is based on polynomial chaos expansion (PCE), of which the coefficients are obtained using the sparse Bayesian learning technique to mitigate the "curse of dimensionality" of the PCE terms. The other model is Gaussian process regression (GPR) for which different covariance, likelihood and inference models are considered. Preliminary results indicate that ROMs constructed based on the prior parameter space perform poorly. It is thus impractical to replace this hydrological model by a ROM directly in a MCMC method. However, the IS method can work with a ROM constructed for parameters in the close vicinity of the maximum a posteriori probability (MAP) estimate. We will discuss the accuracy and computational efficiency of using ROMs in the implicit sampling procedure for the hydrological problem considered. This work was supported, in part, by the U.S. Dept. of Energy under Contract No. DE-AC02-05CH11231
Variable selection models for genomic selection using whole-genome sequence data and singular value decomposition.

PubMed

Meuwissen, Theo H E; Indahl, Ulf G; Ødegård, Jørgen

2017-12-27

Non-linear Bayesian genomic prediction models such as BayesA/B/C/R involve iteration and mostly Markov chain Monte Carlo (MCMC) algorithms, which are computationally expensive, especially when whole-genome sequence (WGS) data are analyzed. Singular value decomposition (SVD) of the genotype matrix can facilitate genomic prediction in large datasets, and can be used to estimate marker effects and their prediction error variances (PEV) in a computationally efficient manner. Here, we developed, implemented, and evaluated a direct, non-iterative method for the estimation of marker effects for the BayesC genomic prediction model. The BayesC model assumes a priori that markers have normally distributed effects with probability [Formula: see text] and no effect with probability (1 - [Formula: see text]). Marker effects and their PEV are estimated by using SVD and the posterior probability of the marker having a non-zero effect is calculated. These posterior probabilities are used to obtain marker-specific effect variances, which are subsequently used to approximate BayesC estimates of marker effects in a linear model. A computer simulation study was conducted to compare alternative genomic prediction methods, where a single reference generation was used to estimate marker effects, which were subsequently used for 10 generations of forward prediction, for which accuracies were evaluated. SVD-based posterior probabilities of markers having non-zero effects were generally lower than MCMC-based posterior probabilities, but for some regions the opposite occurred, resulting in clear signals for QTL-rich regions. The accuracies of breeding values estimated using SVD- and MCMC-based BayesC analyses were similar across the 10 generations of forward prediction. For an intermediate number of generations (2 to 5) of forward prediction, accuracies obtained with the BayesC model tended to be slightly higher than accuracies obtained using the best linear unbiased prediction of SNP effects (SNP-BLUP model). When reducing marker density from WGS data to 30 K, SNP-BLUP tended to yield the highest accuracies, at least in the short term. Based on SVD of the genotype matrix, we developed a direct method for the calculation of BayesC estimates of marker effects. Although SVD- and MCMC-based marker effects differed slightly, their prediction accuracies were similar. Assuming that the SVD of the marker genotype matrix is already performed for other reasons (e.g. for SNP-BLUP), computation times for the BayesC predictions were comparable to those of SNP-BLUP.
Itô-SDE MCMC method for Bayesian characterization of errors associated with data limitations in stochastic expansion methods for uncertainty quantification

NASA Astrophysics Data System (ADS)

Arnst, M.; Abello Álvarez, B.; Ponthot, J.-P.; Boman, R.

2017-11-01

This paper is concerned with the characterization and the propagation of errors associated with data limitations in polynomial-chaos-based stochastic methods for uncertainty quantification. Such an issue can arise in uncertainty quantification when only a limited amount of data is available. When the available information does not suffice to accurately determine the probability distributions that must be assigned to the uncertain variables, the Bayesian method for assigning these probability distributions becomes attractive because it allows the stochastic model to account explicitly for insufficiency of the available information. In previous work, such applications of the Bayesian method had already been implemented by using the Metropolis-Hastings and Gibbs Markov Chain Monte Carlo (MCMC) methods. In this paper, we present an alternative implementation, which uses an alternative MCMC method built around an Itô stochastic differential equation (SDE) that is ergodic for the Bayesian posterior. We draw together from the mathematics literature a number of formal properties of this Itô SDE that lend support to its use in the implementation of the Bayesian method, and we describe its discretization, including the choice of the free parameters, by using the implicit Euler method. We demonstrate the proposed methodology on a problem of uncertainty quantification in a complex nonlinear engineering application relevant to metal forming.
Do race, ethnicity, and psychiatric diagnoses matter in the prevalence of multiple chronic medical conditions?

PubMed

Cabassa, Leopoldo J; Humensky, Jennifer; Druss, Benjamin; Lewis-Fernández, Roberto; Gomes, Arminda P; Wang, Shuai; Blanco, Carlos

2013-06-01

The proportion of people in the United States with multiple chronic medical conditions (MCMC) is increasing. Yet, little is known about the relationship that race, ethnicity, and psychiatric disorders have on the prevalence of MCMCs in the general population. This study used data from wave 2 of the National Epidemiologic Survey on Alcohol and Related Conditions (N=33,107). Multinomial logistic regression models adjusting for sociodemographic variables, body mass index, and quality of life were used to examine differences in the 12-month prevalence of MCMC by race/ethnicity, psychiatric diagnosis, and the interactions between race/ethnicity and psychiatric diagnosis. Compared to non-Hispanic Whites, Hispanics reported lower odds of MCMC and African Americans reported higher odds of MCMC after adjusting for covariates. People with psychiatric disorders reported higher odds of MCMC compared with people without psychiatric disorders. There were significant interactions between race and psychiatric diagnosis associated with rates of MCMC. In the presence of certain psychiatric disorders, the odds of MCMC were higher among African Americans with psychiatric disorders compared to non-Hispanic Whites with similar psychiatric disorders. Our study results indicate that race, ethnicity, and psychiatric disorders are associated with the prevalence of MCMC. As the rates of MCMC rise, it is critical to identify which populations are at increased risk and how to best direct services to address their health care needs.
Bayesian inference on EMRI signals using low frequency approximations

NASA Astrophysics Data System (ADS)

Ali, Asad; Christensen, Nelson; Meyer, Renate; Röver, Christian

2012-07-01

Extreme mass ratio inspirals (EMRIs) are thought to be one of the most exciting gravitational wave sources to be detected with LISA. Due to their complicated nature and weak amplitudes the detection and parameter estimation of such sources is a challenging task. In this paper we present a statistical methodology based on Bayesian inference in which the estimation of parameters is carried out by advanced Markov chain Monte Carlo (MCMC) algorithms such as parallel tempering MCMC. We analysed high and medium mass EMRI systems that fall well inside the low frequency range of LISA. In the context of the Mock LISA Data Challenges, our investigation and results are also the first instance in which a fully Markovian algorithm is applied for EMRI searches. Results show that our algorithm worked well in recovering EMRI signals from different (simulated) LISA data sets having single and multiple EMRI sources and holds great promise for posterior computation under more realistic conditions. The search and estimation methods presented in this paper are general in their nature, and can be applied in any other scenario such as AdLIGO, AdVIRGO and Einstein Telescope with their respective response functions.
The Bayesian group lasso for confounded spatial data

USGS Publications Warehouse

Hefley, Trevor J.; Hooten, Mevin B.; Hanks, Ephraim M.; Russell, Robin E.; Walsh, Daniel P.

2017-01-01

Generalized linear mixed models for spatial processes are widely used in applied statistics. In many applications of the spatial generalized linear mixed model (SGLMM), the goal is to obtain inference about regression coefficients while achieving optimal predictive ability. When implementing the SGLMM, multicollinearity among covariates and the spatial random effects can make computation challenging and influence inference. We present a Bayesian group lasso prior with a single tuning parameter that can be chosen to optimize predictive ability of the SGLMM and jointly regularize the regression coefficients and spatial random effect. We implement the group lasso SGLMM using efficient Markov chain Monte Carlo (MCMC) algorithms and demonstrate how multicollinearity among covariates and the spatial random effect can be monitored as a derived quantity. To test our method, we compared several parameterizations of the SGLMM using simulated data and two examples from plant ecology and disease ecology. In all examples, problematic levels multicollinearity occurred and influenced sampling efficiency and inference. We found that the group lasso prior resulted in roughly twice the effective sample size for MCMC samples of regression coefficients and can have higher and less variable predictive accuracy based on out-of-sample data when compared to the standard SGLMM.
Accurate Phylogenetic Tree Reconstruction from Quartets: A Heuristic Approach

PubMed Central

Reaz, Rezwana; Bayzid, Md. Shamsuzzoha; Rahman, M. Sohel

2014-01-01

Supertree methods construct trees on a set of taxa (species) combining many smaller trees on the overlapping subsets of the entire set of taxa. A ‘quartet’ is an unrooted tree over taxa, hence the quartet-based supertree methods combine many -taxon unrooted trees into a single and coherent tree over the complete set of taxa. Quartet-based phylogeny reconstruction methods have been receiving considerable attentions in the recent years. An accurate and efficient quartet-based method might be competitive with the current best phylogenetic tree reconstruction methods (such as maximum likelihood or Bayesian MCMC analyses), without being as computationally intensive. In this paper, we present a novel and highly accurate quartet-based phylogenetic tree reconstruction method. We performed an extensive experimental study to evaluate the accuracy and scalability of our approach on both simulated and biological datasets. PMID:25117474
Inferring soil salinity in a drip irrigation system from multi-configuration EMI measurements using adaptive Markov chain Monte Carlo

NASA Astrophysics Data System (ADS)

Zaib Jadoon, Khan; Umer Altaf, Muhammad; McCabe, Matthew Francis; Hoteit, Ibrahim; Muhammad, Nisar; Moghadas, Davood; Weihermüller, Lutz

2017-10-01

A substantial interpretation of electromagnetic induction (EMI) measurements requires quantifying optimal model parameters and uncertainty of a nonlinear inverse problem. For this purpose, an adaptive Bayesian Markov chain Monte Carlo (MCMC) algorithm is used to assess multi-orientation and multi-offset EMI measurements in an agriculture field with non-saline and saline soil. In MCMC the posterior distribution is computed using Bayes' rule. The electromagnetic forward model based on the full solution of Maxwell's equations was used to simulate the apparent electrical conductivity measured with the configurations of EMI instrument, the CMD Mini-Explorer. Uncertainty in the parameters for the three-layered earth model are investigated by using synthetic data. Our results show that in the scenario of non-saline soil, the parameters of layer thickness as compared to layers electrical conductivity are not very informative and are therefore difficult to resolve. Application of the proposed MCMC-based inversion to field measurements in a drip irrigation system demonstrates that the parameters of the model can be well estimated for the saline soil as compared to the non-saline soil, and provides useful insight about parameter uncertainty for the assessment of the model outputs.
Inverse Modeling Using Markov Chain Monte Carlo Aided by Adaptive Stochastic Collocation Method with Transformation

NASA Astrophysics Data System (ADS)

Zhang, D.; Liao, Q.

2016-12-01

The Bayesian inference provides a convenient framework to solve statistical inverse problems. In this method, the parameters to be identified are treated as random variables. The prior knowledge, the system nonlinearity, and the measurement errors can be directly incorporated in the posterior probability density function (PDF) of the parameters. The Markov chain Monte Carlo (MCMC) method is a powerful tool to generate samples from the posterior PDF. However, since the MCMC usually requires thousands or even millions of forward simulations, it can be a computationally intensive endeavor, particularly when faced with large-scale flow and transport models. To address this issue, we construct a surrogate system for the model responses in the form of polynomials by the stochastic collocation method. In addition, we employ interpolation based on the nested sparse grids and takes into account the different importance of the parameters, under the condition of high random dimensions in the stochastic space. Furthermore, in case of low regularity such as discontinuous or unsmooth relation between the input parameters and the output responses, we introduce an additional transform process to improve the accuracy of the surrogate model. Once we build the surrogate system, we may evaluate the likelihood with very little computational cost. We analyzed the convergence rate of the forward solution and the surrogate posterior by Kullback-Leibler divergence, which quantifies the difference between probability distributions. The fast convergence of the forward solution implies fast convergence of the surrogate posterior to the true posterior. We also tested the proposed algorithm on water-flooding two-phase flow reservoir examples. The posterior PDF calculated from a very long chain with direct forward simulation is assumed to be accurate. The posterior PDF calculated using the surrogate model is in reasonable agreement with the reference, revealing a great improvement in terms of computational efficiency.
Including non-additive genetic effects in Bayesian methods for the prediction of genetic values based on genome-wide markers

PubMed Central

2011-01-01

Background Molecular marker information is a common source to draw inferences about the relationship between genetic and phenotypic variation. Genetic effects are often modelled as additively acting marker allele effects. The true mode of biological action can, of course, be different from this plain assumption. One possibility to better understand the genetic architecture of complex traits is to include intra-locus (dominance) and inter-locus (epistasis) interaction of alleles as well as the additive genetic effects when fitting a model to a trait. Several Bayesian MCMC approaches exist for the genome-wide estimation of genetic effects with high accuracy of genetic value prediction. Including pairwise interaction for thousands of loci would probably go beyond the scope of such a sampling algorithm because then millions of effects are to be estimated simultaneously leading to months of computation time. Alternative solving strategies are required when epistasis is studied. Methods We extended a fast Bayesian method (fBayesB), which was previously proposed for a purely additive model, to include non-additive effects. The fBayesB approach was used to estimate genetic effects on the basis of simulated datasets. Different scenarios were simulated to study the loss of accuracy of prediction, if epistatic effects were not simulated but modelled and vice versa. Results If 23 QTL were simulated to cause additive and dominance effects, both fBayesB and a conventional MCMC sampler BayesB yielded similar results in terms of accuracy of genetic value prediction and bias of variance component estimation based on a model including additive and dominance effects. Applying fBayesB to data with epistasis, accuracy could be improved by 5% when all pairwise interactions were modelled as well. The accuracy decreased more than 20% if genetic variation was spread over 230 QTL. In this scenario, accuracy based on modelling only additive and dominance effects was generally superior to that of the complex model including epistatic effects. Conclusions This simulation study showed that the fBayesB approach is convenient for genetic value prediction. Jointly estimating additive and non-additive effects (especially dominance) has reasonable impact on the accuracy of prediction and the proportion of genetic variation assigned to the additive genetic source. PMID:21867519
Improving the Fitness of High-Dimensional Biomechanical Models via Data-Driven Stochastic Exploration

PubMed Central

Bustamante, Carlos D.; Valero-Cuevas, Francisco J.

2010-01-01

The field of complex biomechanical modeling has begun to rely on Monte Carlo techniques to investigate the effects of parameter variability and measurement uncertainty on model outputs, search for optimal parameter combinations, and define model limitations. However, advanced stochastic methods to perform data-driven explorations, such as Markov chain Monte Carlo (MCMC), become necessary as the number of model parameters increases. Here, we demonstrate the feasibility and, what to our knowledge is, the first use of an MCMC approach to improve the fitness of realistically large biomechanical models. We used a Metropolis–Hastings algorithm to search increasingly complex parameter landscapes (3, 8, 24, and 36 dimensions) to uncover underlying distributions of anatomical parameters of a “truth model” of the human thumb on the basis of simulated kinematic data (thumbnail location, orientation, and linear and angular velocities) polluted by zero-mean, uncorrelated multivariate Gaussian “measurement noise.” Driven by these data, ten Markov chains searched each model parameter space for the subspace that best fit the data (posterior distribution). As expected, the convergence time increased, more local minima were found, and marginal distributions broadened as the parameter space complexity increased. In the 36-D scenario, some chains found local minima but the majority of chains converged to the true posterior distribution (confirmed using a cross-validation dataset), thus demonstrating the feasibility and utility of these methods for realistically large biomechanical problems. PMID:19272906
Measurement and Structural Model Class Separation in Mixture CFA: ML/EM versus MCMC

ERIC Educational Resources Information Center

Depaoli, Sarah

2012-01-01

Parameter recovery was assessed within mixture confirmatory factor analysis across multiple estimator conditions under different simulated levels of mixture class separation. Mixture class separation was defined in the measurement model (through factor loadings) and the structural model (through factor variances). Maximum likelihood (ML) via the…
Assessing Mediational Models: Testing and Interval Estimation for Indirect Effects.

PubMed

Biesanz, Jeremy C; Falk, Carl F; Savalei, Victoria

2010-08-06

Theoretical models specifying indirect or mediated effects are common in the social sciences. An indirect effect exists when an independent variable's influence on the dependent variable is mediated through an intervening variable. Classic approaches to assessing such mediational hypotheses ( Baron & Kenny, 1986 ; Sobel, 1982 ) have in recent years been supplemented by computationally intensive methods such as bootstrapping, the distribution of the product methods, and hierarchical Bayesian Markov chain Monte Carlo (MCMC) methods. These different approaches for assessing mediation are illustrated using data from Dunn, Biesanz, Human, and Finn (2007). However, little is known about how these methods perform relative to each other, particularly in more challenging situations, such as with data that are incomplete and/or nonnormal. This article presents an extensive Monte Carlo simulation evaluating a host of approaches for assessing mediation. We examine Type I error rates, power, and coverage. We study normal and nonnormal data as well as complete and incomplete data. In addition, we adapt a method, recently proposed in statistical literature, that does not rely on confidence intervals (CIs) to test the null hypothesis of no indirect effect. The results suggest that the new inferential method-the partial posterior p value-slightly outperforms existing ones in terms of maintaining Type I error rates while maximizing power, especially with incomplete data. Among confidence interval approaches, the bias-corrected accelerated (BC a ) bootstrapping approach often has inflated Type I error rates and inconsistent coverage and is not recommended; In contrast, the bootstrapped percentile confidence interval and the hierarchical Bayesian MCMC method perform best overall, maintaining Type I error rates, exhibiting reasonable power, and producing stable and accurate coverage rates.
Posterior uncertainty of GEOS-5 L-band radiative transfer model parameters and brightness temperatures after calibration with SMOS observations

NASA Astrophysics Data System (ADS)

De Lannoy, G. J.; Reichle, R. H.; Vrugt, J. A.

2012-12-01

Simulated L-band (1.4 GHz) brightness temperatures are very sensitive to the values of the parameters in the radiative transfer model (RTM). We assess the optimum RTM parameter values and their (posterior) uncertainty in the Goddard Earth Observing System (GEOS-5) land surface model using observations of multi-angular brightness temperature over North America from the Soil Moisture Ocean Salinity (SMOS) mission. Two different parameter estimation methods are being compared: (i) a particle swarm optimization (PSO) approach, and (ii) an MCMC simulation procedure using the differential evolution adaptive Metropolis (DREAM) algorithm. Our results demonstrate that both methods provide similar "optimal" parameter values. Yet, DREAM exhibits better convergence properties, resulting in a reduced spread of the posterior ensemble. The posterior parameter distributions derived with both methods are used for predictive uncertainty estimation of brightness temperature. This presentation will highlight our model-data synthesis framework and summarize our initial findings.
Cosmology with the largest galaxy cluster surveys: going beyond Fisher matrix forecasts

DOE Office of Scientific and Technical Information (OSTI.GOV)

Khedekar, Satej; Majumdar, Subhabrata, E-mail: satej@mpa-garching.mpg.de, E-mail: subha@tifr.res.in

2013-02-01

We make the first detailed MCMC likelihood study of cosmological constraints that are expected from some of the largest, ongoing and proposed, cluster surveys in different wave-bands and compare the estimates to the prevalent Fisher matrix forecasts. Mock catalogs of cluster counts expected from the surveys — eROSITA, WFXT, RCS2, DES and Planck, along with a mock dataset of follow-up mass calibrations are analyzed for this purpose. A fair agreement between MCMC and Fisher results is found only in the case of minimal models. However, for many cases, the marginalized constraints obtained from Fisher and MCMC methods can differ bymore » factors of 30-100%. The discrepancy can be alarmingly large for a time dependent dark energy equation of state, w(a); the Fisher methods are seen to under-estimate the constraints by as much as a factor of 4-5. Typically, Fisher estimates become more and more inappropriate as we move away from ΛCDM, to a constant-w dark energy to varying-w dark energy cosmologies. Fisher analysis, also, predicts incorrect parameter degeneracies. There are noticeable offsets in the likelihood contours obtained from Fisher methods that is caused due to an asymmetry in the posterior likelihood distribution as seen through a MCMC analysis. From the point of mass-calibration uncertainties, a high value of unknown scatter about the mean mass-observable relation, and its redshift dependence, is seen to have large degeneracies with the cosmological parameters σ{sub 8} and w(a) and can degrade the cosmological constraints considerably. We find that the addition of mass-calibrated cluster datasets can improve dark energy and σ{sub 8} constraints by factors of 2-3 from what can be obtained from CMB+SNe+BAO only . Finally, we show that a joint analysis of datasets of two (or more) different cluster surveys would significantly tighten cosmological constraints from using clusters only. Since, details of future cluster surveys are still being planned, we emphasize that optimal survey design must be done using MCMC analysis rather than Fisher forecasting.« less

Prospects of detection of the first sources with SKA using matched filters

NASA Astrophysics Data System (ADS)

Ghara, Raghunath; Choudhury, T. Roy; Datta, Kanan K.; Mellema, Garrelt; Choudhuri, Samir; Majumdar, Suman; Giri, Sambit K.

2018-05-01

The matched filtering technique is an efficient method to detect H ii bubbles and absorption regions in radio interferometric observations of the redshifted 21-cm signal from the epoch of reionization and the Cosmic Dawn. Here, we present an implementation of this technique to the upcoming observations such as the SKA1-low for a blind search of absorption regions at the Cosmic Dawn. The pipeline explores four dimensional parameter space on the simulated mock visibilities using a MCMC algorithm. The framework is able to efficiently determine the positions and sizes of the absorption/H ii regions in the field of view.
An adaptive Gaussian process-based method for efficient Bayesian experimental design in groundwater contaminant source identification problems: ADAPTIVE GAUSSIAN PROCESS-BASED INVERSION

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Jiangjiang; Li, Weixuan; Zeng, Lingzao

Surrogate models are commonly used in Bayesian approaches such as Markov Chain Monte Carlo (MCMC) to avoid repetitive CPU-demanding model evaluations. However, the approximation error of a surrogate may lead to biased estimations of the posterior distribution. This bias can be corrected by constructing a very accurate surrogate or implementing MCMC in a two-stage manner. Since the two-stage MCMC requires extra original model evaluations, the computational cost is still high. If the information of measurement is incorporated, a locally accurate approximation of the original model can be adaptively constructed with low computational cost. Based on this idea, we propose amore » Gaussian process (GP) surrogate-based Bayesian experimental design and parameter estimation approach for groundwater contaminant source identification problems. A major advantage of the GP surrogate is that it provides a convenient estimation of the approximation error, which can be incorporated in the Bayesian formula to avoid over-confident estimation of the posterior distribution. The proposed approach is tested with a numerical case study. Without sacrificing the estimation accuracy, the new approach achieves about 200 times of speed-up compared to our previous work using two-stage MCMC.« less
Dimension-independent likelihood-informed MCMC

DOE PAGES

Cui, Tiangang; Law, Kody J. H.; Marzouk, Youssef M.

2015-10-08

Many Bayesian inference problems require exploring the posterior distribution of highdimensional parameters that represent the discretization of an underlying function. Our work introduces a family of Markov chain Monte Carlo (MCMC) samplers that can adapt to the particular structure of a posterior distribution over functions. There are two distinct lines of research that intersect in the methods we develop here. First, we introduce a general class of operator-weighted proposal distributions that are well defined on function space, such that the performance of the resulting MCMC samplers is independent of the discretization of the function. Second, by exploiting local Hessian informationmore » and any associated lowdimensional structure in the change from prior to posterior distributions, we develop an inhomogeneous discretization scheme for the Langevin stochastic differential equation that yields operator-weighted proposals adapted to the non-Gaussian structure of the posterior. The resulting dimension-independent and likelihood-informed (DILI) MCMC samplers may be useful for a large class of high-dimensional problems where the target probability measure has a density with respect to a Gaussian reference measure. Finally, we use two nonlinear inverse problems in order to demonstrate the efficiency of these DILI samplers: an elliptic PDE coefficient inverse problem and path reconstruction in a conditioned diffusion.« less
Atmospheric Tracer Inverse Modeling Using Markov Chain Monte Carlo (MCMC)

NASA Astrophysics Data System (ADS)

Kasibhatla, P.

2004-12-01

In recent years, there has been an increasing emphasis on the use of Bayesian statistical estimation techniques to characterize the temporal and spatial variability of atmospheric trace gas sources and sinks. The applications have been varied in terms of the particular species of interest, as well as in terms of the spatial and temporal resolution of the estimated fluxes. However, one common characteristic has been the use of relatively simple statistical models for describing the measurement and chemical transport model error statistics and prior source statistics. For example, multivariate normal probability distribution functions (pdfs) are commonly used to model these quantities and inverse source estimates are derived for fixed values of pdf paramaters. While the advantage of this approach is that closed form analytical solutions for the a posteriori pdfs of interest are available, it is worth exploring Bayesian analysis approaches which allow for a more general treatment of error and prior source statistics. Here, we present an application of the Markov Chain Monte Carlo (MCMC) methodology to an atmospheric tracer inversion problem to demonstrate how more gereral statistical models for errors can be incorporated into the analysis in a relatively straightforward manner. The MCMC approach to Bayesian analysis, which has found wide application in a variety of fields, is a statistical simulation approach that involves computing moments of interest of the a posteriori pdf by efficiently sampling this pdf. The specific inverse problem that we focus on is the annual mean CO2 source/sink estimation problem considered by the TransCom3 project. TransCom3 was a collaborative effort involving various modeling groups and followed a common modeling and analysis protocoal. As such, this problem provides a convenient case study to demonstrate the applicability of the MCMC methodology to atmospheric tracer source/sink estimation problems.
Trans-dimensional MCMC methods for fully automatic motion analysis in tagged MRI.

PubMed

Smal, Ihor; Carranza-Herrezuelo, Noemí; Klein, Stefan; Niessen, Wiro; Meijering, Erik

2011-01-01

Tagged magnetic resonance imaging (tMRI) is a well-known noninvasive method allowing quantitative analysis of regional heart dynamics. Its clinical use has so far been limited, in part due to the lack of robustness and accuracy of existing tag tracking algorithms in dealing with low (and intrinsically time-varying) image quality. In this paper, we propose a novel probabilistic method for tag tracking, implemented by means of Bayesian particle filtering and a trans-dimensional Markov chain Monte Carlo (MCMC) approach, which efficiently combines information about the imaging process and tag appearance with prior knowledge about the heart dynamics obtained by means of non-rigid image registration. Experiments using synthetic image data (with ground truth) and real data (with expert manual annotation) from preclinical (small animal) and clinical (human) studies confirm that the proposed method yields higher consistency, accuracy, and intrinsic tag reliability assessment in comparison with other frequently used tag tracking methods.
Multilocus lod scores in large pedigrees: combination of exact and approximate calculations.

PubMed

Tong, Liping; Thompson, Elizabeth

2008-01-01

To detect the positions of disease loci, lod scores are calculated at multiple chromosomal positions given trait and marker data on members of pedigrees. Exact lod score calculations are often impossible when the size of the pedigree and the number of markers are both large. In this case, a Markov Chain Monte Carlo (MCMC) approach provides an approximation. However, to provide accurate results, mixing performance is always a key issue in these MCMC methods. In this paper, we propose two methods to improve MCMC sampling and hence obtain more accurate lod score estimates in shorter computation time. The first improvement generalizes the block-Gibbs meiosis (M) sampler to multiple meiosis (MM) sampler in which multiple meioses are updated jointly, across all loci. The second one divides the computations on a large pedigree into several parts by conditioning on the haplotypes of some 'key' individuals. We perform exact calculations for the descendant parts where more data are often available, and combine this information with sampling of the hidden variables in the ancestral parts. Our approaches are expected to be most useful for data on a large pedigree with a lot of missing data. (c) 2007 S. Karger AG, Basel
Multilocus Lod Scores in Large Pedigrees: Combination of Exact and Approximate Calculations

PubMed Central

Tong, Liping; Thompson, Elizabeth

2007-01-01

To detect the positions of disease loci, lod scores are calculated at multiple chromosomal positions given trait and marker data on members of pedigrees. Exact lod score calculations are often impossible when the size of the pedigree and the number of markers are both large. In this case, a Markov Chain Monte Carlo (MCMC) approach provides an approximation. However, to provide accurate results, mixing performance is always a key issue in these MCMC methods. In this paper, we propose two methods to improve MCMC sampling and hence obtain more accurate lod score estimates in shorter computation time. The first improvement generalizes the block-Gibbs meiosis (M) sampler to multiple meiosis (MM) sampler in which multiple meioses are updated jointly, across all loci. The second one divides the computations on a large pedigree into several parts by conditioning on the haplotypes of some ‘key’ individuals. We perform exact calculations for the descendant parts where more data are often available, and combine this information with sampling of the hidden variables in the ancestral parts. Our approaches are expected to be most useful for data on a large pedigree with a lot of missing data. PMID:17934317
Bayesian tomography by interacting Markov chains

NASA Astrophysics Data System (ADS)

Romary, T.

2017-12-01

In seismic tomography, we seek to determine the velocity of the undergound from noisy first arrival travel time observations. In most situations, this is an ill posed inverse problem that admits several unperfect solutions. Given an a priori distribution over the parameters of the velocity model, the Bayesian formulation allows to state this problem as a probabilistic one, with a solution under the form of a posterior distribution. The posterior distribution is generally high dimensional and may exhibit multimodality. Moreover, as it is known only up to a constant, the only sensible way to addressthis problem is to try to generate simulations from the posterior. The natural tools to perform these simulations are Monte Carlo Markov chains (MCMC). Classical implementations of MCMC algorithms generally suffer from slow mixing: the generated states are slow to enter the stationary regime, that is to fit the observations, and when one mode of the posterior is eventually identified, it may become difficult to visit others. Using a varying temperature parameter relaxing the constraint on the data may help to enter the stationary regime. Besides, the sequential nature of MCMC makes them ill fitted toparallel implementation. Running a large number of chains in parallel may be suboptimal as the information gathered by each chain is not mutualized. Parallel tempering (PT) can be seen as a first attempt to make parallel chains at different temperatures communicate but only exchange information between current states. In this talk, I will show that PT actually belongs to a general class of interacting Markov chains algorithm. I will also show that this class enables to design interacting schemes that can take advantage of the whole history of the chain, by authorizing exchanges toward already visited states. The algorithms will be illustrated with toy examples and an application to first arrival traveltime tomography.
Bayesian Treatment of Uncertainty in Environmental Modeling: Optimization, Sampling and Data Assimilation Using the DREAM Software Package

NASA Astrophysics Data System (ADS)

Vrugt, J. A.

2012-12-01

In the past decade much progress has been made in the treatment of uncertainty in earth systems modeling. Whereas initial approaches has focused mostly on quantification of parameter and predictive uncertainty, recent methods attempt to disentangle the effects of parameter, forcing (input) data, model structural and calibration data errors. In this talk I will highlight some of our recent work involving theory, concepts and applications of Bayesian parameter and/or state estimation. In particular, new methods for sequential Monte Carlo (SMC) and Markov Chain Monte Carlo (MCMC) simulation will be presented with emphasis on massively parallel distributed computing and quantification of model structural errors. The theoretical and numerical developments will be illustrated using model-data synthesis problems in hydrology, hydrogeology and geophysics.
An Efficient MCMC Algorithm to Sample Binary Matrices with Fixed Marginals

ERIC Educational Resources Information Center

Verhelst, Norman D.

2008-01-01

Uniform sampling of binary matrices with fixed margins is known as a difficult problem. Two classes of algorithms to sample from a distribution not too different from the uniform are studied in the literature: importance sampling and Markov chain Monte Carlo (MCMC). Existing MCMC algorithms converge slowly, require a long burn-in period and yield…
Multi-chain Markov chain Monte Carlo methods for computationally expensive models

NASA Astrophysics Data System (ADS)

Huang, M.; Ray, J.; Ren, H.; Hou, Z.; Bao, J.

2017-12-01

Markov chain Monte Carlo (MCMC) methods are used to infer model parameters from observational data. The parameters are inferred as probability densities, thus capturing estimation error due to sparsity of the data, and the shortcomings of the model. Multiple communicating chains executing the MCMC method have the potential to explore the parameter space better, and conceivably accelerate the convergence to the final distribution. We present results from tests conducted with the multi-chain method to show how the acceleration occurs i.e., for loose convergence tolerances, the multiple chains do not make much of a difference. The ensemble of chains also seems to have the ability to accelerate the convergence of a few chains that might start from suboptimal starting points. Finally, we show the performance of the chains in the estimation of O(10) parameters using computationally expensive forward models such as the Community Land Model, where the sampling burden is distributed over multiple chains.
Hydrologic Model Selection using Markov chain Monte Carlo methods

NASA Astrophysics Data System (ADS)

Marshall, L.; Sharma, A.; Nott, D.

2002-12-01

Estimation of parameter uncertainty (and in turn model uncertainty) allows assessment of the risk in likely applications of hydrological models. Bayesian statistical inference provides an ideal means of assessing parameter uncertainty whereby prior knowledge about the parameter is combined with information from the available data to produce a probability distribution (the posterior distribution) that describes uncertainty about the parameter and serves as a basis for selecting appropriate values for use in modelling applications. Widespread use of Bayesian techniques in hydrology has been hindered by difficulties in summarizing and exploring the posterior distribution. These difficulties have been largely overcome by recent advances in Markov chain Monte Carlo (MCMC) methods that involve random sampling of the posterior distribution. This study presents an adaptive MCMC sampling algorithm which has characteristics that are well suited to model parameters with a high degree of correlation and interdependence, as is often evident in hydrological models. The MCMC sampling technique is used to compare six alternative configurations of a commonly used conceptual rainfall-runoff model, the Australian Water Balance Model (AWBM), using 11 years of daily rainfall runoff data from the Bass river catchment in Australia. The alternative configurations considered fall into two classes - those that consider model errors to be independent of prior values, and those that model the errors as an autoregressive process. Each such class consists of three formulations that represent increasing levels of complexity (and parameterisation) of the original model structure. The results from this study point both to the importance of using Bayesian approaches in evaluating model performance, as well as the simplicity of the MCMC sampling framework that has the ability to bring such approaches within the reach of the applied hydrological community.
Constrained proper sampling of conformations of transition state ensemble of protein folding

PubMed Central

Lin, Ming; Zhang, Jian; Lu, Hsiao-Mei; Chen, Rong; Liang, Jie

2011-01-01

Characterizing the conformations of protein in the transition state ensemble (TSE) is important for studying protein folding. A promising approach pioneered by Vendruscolo [Nature (London) 409, 641 (2001)] to study TSE is to generate conformations that satisfy all constraints imposed by the experimentally measured ϕ values that provide information about the native likeness of the transition states. Faísca [J. Chem. Phys. 129, 095108 (2008)] generated conformations of TSE based on the criterion that, starting from a TS conformation, the probabilities of folding and unfolding are about equal through Markov Chain Monte Carlo (MCMC) simulations. In this study, we use the technique of constrained sequential Monte Carlo method [Lin , J. Chem. Phys. 129, 094101 (2008); Zhang Proteins 66, 61 (2007)] to generate TSE conformations of acylphosphatase of 98 residues that satisfy the ϕ-value constraints, as well as the criterion that each conformation has a folding probability of 0.5 by Monte Carlo simulations. We adopt a two stage process and first generate 5000 contact maps satisfying the ϕ-value constraints. Each contact map is then used to generate 1000 properly weighted conformations. After clustering similar conformations, we obtain a set of properly weighted samples of 4185 candidate clusters. Representative conformation of each of these cluster is then selected and 50 runs of Markov chain Monte Carlo (MCMC) simulation are carried using a regrowth move set. We then select a subset of 1501 conformations that have equal probabilities to fold and to unfold as the set of TSE. These 1501 samples characterize well the distribution of transition state ensemble conformations of acylphosphatase. Compared with previous studies, our approach can access much wider conformational space and can objectively generate conformations that satisfy the ϕ-value constraints and the criterion of 0.5 folding probability without bias. In contrast to previous studies, our results show that transition state conformations are very diverse and are far from nativelike when measured in cartesian root-mean-square deviation (cRMSD): the average cRMSD between TSE conformations and the native structure is 9.4 Å for this short protein, instead of 6 Å reported in previous studies. In addition, we found that the average fraction of native contacts in the TSE is 0.37, with enrichment in native-like β-sheets and a shortage of long range contacts, suggesting such contacts form at a later stage of folding. We further calculate the first passage time of folding of TSE conformations through calculation of physical time associated with the regrowth moves in MCMC simulation through mapping such moves to a Markovian state model, whose transition time was obtained by Langevin dynamics simulations. Our results indicate that despite the large structural diversity of the TSE, they are characterized by similar folding time. Our approach is general and can be used to study TSE in other macromolecules. PMID:21341875
A hybrid expectation maximisation and MCMC sampling algorithm to implement Bayesian mixture model based genomic prediction and QTL mapping.

PubMed

Wang, Tingting; Chen, Yi-Ping Phoebe; Bowman, Phil J; Goddard, Michael E; Hayes, Ben J

2016-09-21

Bayesian mixture models in which the effects of SNP are assumed to come from normal distributions with different variances are attractive for simultaneous genomic prediction and QTL mapping. These models are usually implemented with Monte Carlo Markov Chain (MCMC) sampling, which requires long compute times with large genomic data sets. Here, we present an efficient approach (termed HyB_BR), which is a hybrid of an Expectation-Maximisation algorithm, followed by a limited number of MCMC without the requirement for burn-in. To test prediction accuracy from HyB_BR, dairy cattle and human disease trait data were used. In the dairy cattle data, there were four quantitative traits (milk volume, protein kg, fat% in milk and fertility) measured in 16,214 cattle from two breeds genotyped for 632,002 SNPs. Validation of genomic predictions was in a subset of cattle either from the reference set or in animals from a third breeds that were not in the reference set. In all cases, HyB_BR gave almost identical accuracies to Bayesian mixture models implemented with full MCMC, however computational time was reduced by up to 1/17 of that required by full MCMC. The SNPs with high posterior probability of a non-zero effect were also very similar between full MCMC and HyB_BR, with several known genes affecting milk production in this category, as well as some novel genes. HyB_BR was also applied to seven human diseases with 4890 individuals genotyped for around 300 K SNPs in a case/control design, from the Welcome Trust Case Control Consortium (WTCCC). In this data set, the results demonstrated again that HyB_BR performed as well as Bayesian mixture models with full MCMC for genomic predictions and genetic architecture inference while reducing the computational time from 45 h with full MCMC to 3 h with HyB_BR. The results for quantitative traits in cattle and disease in humans demonstrate that HyB_BR can perform equally well as Bayesian mixture models implemented with full MCMC in terms of prediction accuracy, but with up to 17 times faster than the full MCMC implementations. The HyB_BR algorithm makes simultaneous genomic prediction, QTL mapping and inference of genetic architecture feasible in large genomic data sets.
Bayesian Group Bridge for Bi-level Variable Selection.

PubMed

Mallick, Himel; Yi, Nengjun

2017-06-01

A Bayesian bi-level variable selection method (BAGB: Bayesian Analysis of Group Bridge) is developed for regularized regression and classification. This new development is motivated by grouped data, where generic variables can be divided into multiple groups, with variables in the same group being mechanistically related or statistically correlated. As an alternative to frequentist group variable selection methods, BAGB incorporates structural information among predictors through a group-wise shrinkage prior. Posterior computation proceeds via an efficient MCMC algorithm. In addition to the usual ease-of-interpretation of hierarchical linear models, the Bayesian formulation produces valid standard errors, a feature that is notably absent in the frequentist framework. Empirical evidence of the attractiveness of the method is illustrated by extensive Monte Carlo simulations and real data analysis. Finally, several extensions of this new approach are presented, providing a unified framework for bi-level variable selection in general models with flexible penalties.
Bayesian Orbit Computation Tools for Objects on Geocentric Orbits

NASA Astrophysics Data System (ADS)

Virtanen, J.; Granvik, M.; Muinonen, K.; Oszkiewicz, D.

2013-08-01

We consider the space-debris orbital inversion problem via the concept of Bayesian inference. The methodology has been put forward for the orbital analysis of solar system small bodies in early 1990's [7] and results in a full solution of the statistical inverse problem given in terms of a posteriori probability density function (PDF) for the orbital parameters. We demonstrate the applicability of our statistical orbital analysis software to Earth orbiting objects, both using well-established Monte Carlo (MC) techniques (for a review, see e.g. [13] as well as recently developed Markov-chain MC (MCMC) techniques (e.g., [9]). In particular, we exploit the novel virtual observation MCMC method [8], which is based on the characterization of the phase-space volume of orbital solutions before the actual MCMC sampling. Our statistical methods and the resulting PDFs immediately enable probabilistic impact predictions to be carried out. Furthermore, this can be readily done also for very sparse data sets and data sets of poor quality - providing that some a priori information on the observational uncertainty is available. For asteroids, impact probabilities with the Earth from the discovery night onwards have been provided, e.g., by [11] and [10], the latter study includes the sampling of the observational-error standard deviation as a random variable.
Training-Image Based Geostatistical Inversion Using a Spatial Generative Adversarial Neural Network

NASA Astrophysics Data System (ADS)

Laloy, Eric; Hérault, Romain; Jacques, Diederik; Linde, Niklas

2018-01-01

Probabilistic inversion within a multiple-point statistics framework is often computationally prohibitive for high-dimensional problems. To partly address this, we introduce and evaluate a new training-image based inversion approach for complex geologic media. Our approach relies on a deep neural network of the generative adversarial network (GAN) type. After training using a training image (TI), our proposed spatial GAN (SGAN) can quickly generate 2-D and 3-D unconditional realizations. A key characteristic of our SGAN is that it defines a (very) low-dimensional parameterization, thereby allowing for efficient probabilistic inversion using state-of-the-art Markov chain Monte Carlo (MCMC) methods. In addition, available direct conditioning data can be incorporated within the inversion. Several 2-D and 3-D categorical TIs are first used to analyze the performance of our SGAN for unconditional geostatistical simulation. Training our deep network can take several hours. After training, realizations containing a few millions of pixels/voxels can be produced in a matter of seconds. This makes it especially useful for simulating many thousands of realizations (e.g., for MCMC inversion) as the relative cost of the training per realization diminishes with the considered number of realizations. Synthetic inversion case studies involving 2-D steady state flow and 3-D transient hydraulic tomography with and without direct conditioning data are used to illustrate the effectiveness of our proposed SGAN-based inversion. For the 2-D case, the inversion rapidly explores the posterior model distribution. For the 3-D case, the inversion recovers model realizations that fit the data close to the target level and visually resemble the true model well.
Bayesian Peptide Peak Detection for High Resolution TOF Mass Spectrometry.

PubMed

Zhang, Jianqiu; Zhou, Xiaobo; Wang, Honghui; Suffredini, Anthony; Zhang, Lin; Huang, Yufei; Wong, Stephen

2010-11-01

In this paper, we address the issue of peptide ion peak detection for high resolution time-of-flight (TOF) mass spectrometry (MS) data. A novel Bayesian peptide ion peak detection method is proposed for TOF data with resolution of 10 000-15 000 full width at half-maximum (FWHW). MS spectra exhibit distinct characteristics at this resolution, which are captured in a novel parametric model. Based on the proposed parametric model, a Bayesian peak detection algorithm based on Markov chain Monte Carlo (MCMC) sampling is developed. The proposed algorithm is tested on both simulated and real datasets. The results show a significant improvement in detection performance over a commonly employed method. The results also agree with expert's visual inspection. Moreover, better detection consistency is achieved across MS datasets from patients with identical pathological condition.
Bayesian Peptide Peak Detection for High Resolution TOF Mass Spectrometry

PubMed Central

Zhang, Jianqiu; Zhou, Xiaobo; Wang, Honghui; Suffredini, Anthony; Zhang, Lin; Huang, Yufei; Wong, Stephen

2011-01-01

In this paper, we address the issue of peptide ion peak detection for high resolution time-of-flight (TOF) mass spectrometry (MS) data. A novel Bayesian peptide ion peak detection method is proposed for TOF data with resolution of 10 000–15 000 full width at half-maximum (FWHW). MS spectra exhibit distinct characteristics at this resolution, which are captured in a novel parametric model. Based on the proposed parametric model, a Bayesian peak detection algorithm based on Markov chain Monte Carlo (MCMC) sampling is developed. The proposed algorithm is tested on both simulated and real datasets. The results show a significant improvement in detection performance over a commonly employed method. The results also agree with expert’s visual inspection. Moreover, better detection consistency is achieved across MS datasets from patients with identical pathological condition. PMID:21544266
DNA motif alignment by evolving a population of Markov chains.

PubMed

Bi, Chengpeng

2009-01-30

Deciphering cis-regulatory elements or de novo motif-finding in genomes still remains elusive although much algorithmic effort has been expended. The Markov chain Monte Carlo (MCMC) method such as Gibbs motif samplers has been widely employed to solve the de novo motif-finding problem through sequence local alignment. Nonetheless, the MCMC-based motif samplers still suffer from local maxima like EM. Therefore, as a prerequisite for finding good local alignments, these motif algorithms are often independently run a multitude of times, but without information exchange between different chains. Hence it would be worth a new algorithm design enabling such information exchange. This paper presents a novel motif-finding algorithm by evolving a population of Markov chains with information exchange (PMC), each of which is initialized as a random alignment and run by the Metropolis-Hastings sampler (MHS). It is progressively updated through a series of local alignments stochastically sampled. Explicitly, the PMC motif algorithm performs stochastic sampling as specified by a population-based proposal distribution rather than individual ones, and adaptively evolves the population as a whole towards a global maximum. The alignment information exchange is accomplished by taking advantage of the pooled motif site distributions. A distinct method for running multiple independent Markov chains (IMC) without information exchange, or dubbed as the IMC motif algorithm, is also devised to compare with its PMC counterpart. Experimental studies demonstrate that the performance could be improved if pooled information were used to run a population of motif samplers. The new PMC algorithm was able to improve the convergence and outperformed other popular algorithms tested using simulated and biological motif sequences.

A Detailed History of Intron-rich Eukaryotic Ancestors Inferred from a Global Survey of 100 Complete Genomes

PubMed Central

Csuros, Miklos; Rogozin, Igor B.; Koonin, Eugene V.

2011-01-01

Protein-coding genes in eukaryotes are interrupted by introns, but intron densities widely differ between eukaryotic lineages. Vertebrates, some invertebrates and green plants have intron-rich genes, with 6–7 introns per kilobase of coding sequence, whereas most of the other eukaryotes have intron-poor genes. We reconstructed the history of intron gain and loss using a probabilistic Markov model (Markov Chain Monte Carlo, MCMC) on 245 orthologous genes from 99 genomes representing the three of the five supergroups of eukaryotes for which multiple genome sequences are available. Intron-rich ancestors are confidently reconstructed for each major group, with 53 to 74% of the human intron density inferred with 95% confidence for the Last Eukaryotic Common Ancestor (LECA). The results of the MCMC reconstruction are compared with the reconstructions obtained using Maximum Likelihood (ML) and Dollo parsimony methods. An excellent agreement between the MCMC and ML inferences is demonstrated whereas Dollo parsimony introduces a noticeable bias in the estimations, typically yielding lower ancestral intron densities than MCMC and ML. Evolution of eukaryotic genes was dominated by intron loss, with substantial gain only at the bases of several major branches including plants and animals. The highest intron density, 120 to 130% of the human value, is inferred for the last common ancestor of animals. The reconstruction shows that the entire line of descent from LECA to mammals was intron-rich, a state conducive to the evolution of alternative splicing. PMID:21935348
Assessing the pollution risk of a groundwater source field at western Laizhou Bay under seawater intrusion

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zeng, Xiankui; Wu, Jichun; Wang, Dong, E-mail: wangdong@nju.edu.cn

Coastal areas have great significance for human living, economy and society development in the world. With the rapid increase of pressures from human activities and climate change, the safety of groundwater resource is under the threat of seawater intrusion in coastal areas. The area of Laizhou Bay is one of the most serious seawater intruded areas in China, since seawater intrusion phenomenon was firstly recognized in the middle of 1970s. This study assessed the pollution risk of a groundwater source filed of western Laizhou Bay area by inferring the probability distribution of groundwater Cl{sup −} concentration. The numerical model ofmore » seawater intrusion process is built by using SEAWAT4. The parameter uncertainty of this model is evaluated by Markov Chain Monte Carlo (MCMC) simulation, and DREAM{sub (ZS)} is used as sampling algorithm. Then, the predictive distribution of Cl{sup -} concentration at groundwater source field is inferred by using the samples of model parameters obtained from MCMC. After that, the pollution risk of groundwater source filed is assessed by the predictive quantiles of Cl{sup -} concentration. The results of model calibration and verification demonstrate that the DREAM{sub (ZS)} based MCMC is efficient and reliable to estimate model parameters under current observation. Under the condition of 95% confidence level, the groundwater source point will not be polluted by seawater intrusion in future five years (2015–2019). In addition, the 2.5% and 97.5% predictive quantiles show that the Cl{sup −} concentration of groundwater source field always vary between 175 mg/l and 200 mg/l. - Highlights: • The parameter uncertainty of seawater intrusion model is evaluated by MCMC. • Groundwater source field won’t be polluted by seawater intrusion in future 5 years. • The pollution risk is assessed by the predictive quantiles of Cl{sup −} concentration.« less
Nanoscale simulation of shale transport properties using the lattice Boltzmann method: Permeability and diffusivity

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Li; Zhang, Lei; Kang, Qinjun

Here, porous structures of shales are reconstructed using the markov chain monte carlo (MCMC) method based on scanning electron microscopy (SEM) images of shale samples from Sichuan Basin, China. Characterization analysis of the reconstructed shales is performed, including porosity, pore size distribution, specific surface area and pore connectivity. The lattice Boltzmann method (LBM) is adopted to simulate fluid flow and Knudsen diffusion within the reconstructed shales. Simulation results reveal that the tortuosity of the shales is much higher than that commonly employed in the Bruggeman equation, and such high tortuosity leads to extremely low intrinsic permeability. Correction of the intrinsicmore » permeability is performed based on the dusty gas model (DGM) by considering the contribution of Knudsen diffusion to the total flow flux, resulting in apparent permeability. The correction factor over a range of Knudsen number and pressure is estimated and compared with empirical correlations in the literature. We find that for the wide pressure range investigated, the correction factor is always greater than 1, indicating Knudsen diffusion always plays a role on shale gas transport mechanisms in the reconstructed shales. Specifically, we found that most of the values of correction factor fall in the slip and transition regime, with no Darcy flow regime observed.« less
Nanoscale simulation of shale transport properties using the lattice Boltzmann method: Permeability and diffusivity

DOE PAGES

Chen, Li; Zhang, Lei; Kang, Qinjun; ...

2015-01-28

Here, porous structures of shales are reconstructed using the markov chain monte carlo (MCMC) method based on scanning electron microscopy (SEM) images of shale samples from Sichuan Basin, China. Characterization analysis of the reconstructed shales is performed, including porosity, pore size distribution, specific surface area and pore connectivity. The lattice Boltzmann method (LBM) is adopted to simulate fluid flow and Knudsen diffusion within the reconstructed shales. Simulation results reveal that the tortuosity of the shales is much higher than that commonly employed in the Bruggeman equation, and such high tortuosity leads to extremely low intrinsic permeability. Correction of the intrinsicmore » permeability is performed based on the dusty gas model (DGM) by considering the contribution of Knudsen diffusion to the total flow flux, resulting in apparent permeability. The correction factor over a range of Knudsen number and pressure is estimated and compared with empirical correlations in the literature. We find that for the wide pressure range investigated, the correction factor is always greater than 1, indicating Knudsen diffusion always plays a role on shale gas transport mechanisms in the reconstructed shales. Specifically, we found that most of the values of correction factor fall in the slip and transition regime, with no Darcy flow regime observed.« less
Nanoscale simulation of shale transport properties using the lattice Boltzmann method: permeability and diffusivity

PubMed Central

Chen, Li; Zhang, Lei; Kang, Qinjun; Viswanathan, Hari S.; Yao, Jun; Tao, Wenquan

2015-01-01

Porous structures of shales are reconstructed using the markov chain monte carlo (MCMC) method based on scanning electron microscopy (SEM) images of shale samples from Sichuan Basin, China. Characterization analysis of the reconstructed shales is performed, including porosity, pore size distribution, specific surface area and pore connectivity. The lattice Boltzmann method (LBM) is adopted to simulate fluid flow and Knudsen diffusion within the reconstructed shales. Simulation results reveal that the tortuosity of the shales is much higher than that commonly employed in the Bruggeman equation, and such high tortuosity leads to extremely low intrinsic permeability. Correction of the intrinsic permeability is performed based on the dusty gas model (DGM) by considering the contribution of Knudsen diffusion to the total flow flux, resulting in apparent permeability. The correction factor over a range of Knudsen number and pressure is estimated and compared with empirical correlations in the literature. For the wide pressure range investigated, the correction factor is always greater than 1, indicating Knudsen diffusion always plays a role on shale gas transport mechanisms in the reconstructed shales. Specifically, we found that most of the values of correction factor fall in the slip and transition regime, with no Darcy flow regime observed. PMID:25627247
Modelling maximum river flow by using Bayesian Markov Chain Monte Carlo

NASA Astrophysics Data System (ADS)

Cheong, R. Y.; Gabda, D.

2017-09-01

Analysis of flood trends is vital since flooding threatens human living in terms of financial, environment and security. The data of annual maximum river flows in Sabah were fitted into generalized extreme value (GEV) distribution. Maximum likelihood estimator (MLE) raised naturally when working with GEV distribution. However, previous researches showed that MLE provide unstable results especially in small sample size. In this study, we used different Bayesian Markov Chain Monte Carlo (MCMC) based on Metropolis-Hastings algorithm to estimate GEV parameters. Bayesian MCMC method is a statistical inference which studies the parameter estimation by using posterior distribution based on Bayes’ theorem. Metropolis-Hastings algorithm is used to overcome the high dimensional state space faced in Monte Carlo method. This approach also considers more uncertainty in parameter estimation which then presents a better prediction on maximum river flow in Sabah.
Mass change distribution inverted from space-borne gravimetric data using a Monte Carlo method

NASA Astrophysics Data System (ADS)

Zhou, X.; Sun, X.; Wu, Y.; Sun, W.

2017-12-01

Mass estimate plays a key role in using temporally satellite gravimetric data to quantify the terrestrial water storage change. GRACE (Gravity Recovery and Climate Experiment) only observes the low degree gravity field changes, which can be used to estimate the total surface density or equivalent water height (EWH) variation, with a limited spatial resolution of 300 km. There are several methods to estimate the mass variation in an arbitrary region, such as averaging kernel, forward modelling and mass concentration (mascon). Mascon method can isolate the local mass from the gravity change at a large scale through solving the observation equation (objective function) which represents the relationship between unknown masses and the measurements. To avoid the unreasonable local mass inverted from smoothed gravity change map, regularization has to be used in the inversion. We herein give a Markov chain Monte Carlo (MCMC) method to objectively determine the regularization parameter for the non-negative mass inversion problem. We first apply this approach to the mass inversion from synthetic data. Result show MCMC can effectively reproduce the local mass variation taking GRACE measurement error into consideration. We then use MCMC to estimate the ground water change rate of North China Plain from GRACE gravity change rate from 2003 to 2014 under a supposition of the continuous ground water loss in this region. Inversion result show that the ground water loss rate in North China Plain is 7.6±0.2Gt/yr during past 12 years which is coincident with that from previous researches.
MCMC-ODPR: primer design optimization using Markov Chain Monte Carlo sampling.

PubMed

Kitchen, James L; Moore, Jonathan D; Palmer, Sarah A; Allaby, Robin G

2012-11-05

Next generation sequencing technologies often require numerous primer designs that require good target coverage that can be financially costly. We aimed to develop a system that would implement primer reuse to design degenerate primers that could be designed around SNPs, thus find the fewest necessary primers and the lowest cost whilst maintaining an acceptable coverage and provide a cost effective solution. We have implemented Metropolis-Hastings Markov Chain Monte Carlo for optimizing primer reuse. We call it the Markov Chain Monte Carlo Optimized Degenerate Primer Reuse (MCMC-ODPR) algorithm. After repeating the program 1020 times to assess the variance, an average of 17.14% fewer primers were found to be necessary using MCMC-ODPR for an equivalent coverage without implementing primer reuse. The algorithm was able to reuse primers up to five times. We compared MCMC-ODPR with single sequence primer design programs Primer3 and Primer-BLAST and achieved a lower primer cost per amplicon base covered of 0.21 and 0.19 and 0.18 primer nucleotides on three separate gene sequences, respectively. With multiple sequences, MCMC-ODPR achieved a lower cost per base covered of 0.19 than programs BatchPrimer3 and PAMPS, which achieved 0.25 and 0.64 primer nucleotides, respectively. MCMC-ODPR is a useful tool for designing primers at various melting temperatures at good target coverage. By combining degeneracy with optimal primer reuse the user may increase coverage of sequences amplified by the designed primers at significantly lower costs. Our analyses showed that overall MCMC-ODPR outperformed the other primer-design programs in our study in terms of cost per covered base.
MCMC-ODPR: Primer design optimization using Markov Chain Monte Carlo sampling

PubMed Central

2012-01-01

Background Next generation sequencing technologies often require numerous primer designs that require good target coverage that can be financially costly. We aimed to develop a system that would implement primer reuse to design degenerate primers that could be designed around SNPs, thus find the fewest necessary primers and the lowest cost whilst maintaining an acceptable coverage and provide a cost effective solution. We have implemented Metropolis-Hastings Markov Chain Monte Carlo for optimizing primer reuse. We call it the Markov Chain Monte Carlo Optimized Degenerate Primer Reuse (MCMC-ODPR) algorithm. Results After repeating the program 1020 times to assess the variance, an average of 17.14% fewer primers were found to be necessary using MCMC-ODPR for an equivalent coverage without implementing primer reuse. The algorithm was able to reuse primers up to five times. We compared MCMC-ODPR with single sequence primer design programs Primer3 and Primer-BLAST and achieved a lower primer cost per amplicon base covered of 0.21 and 0.19 and 0.18 primer nucleotides on three separate gene sequences, respectively. With multiple sequences, MCMC-ODPR achieved a lower cost per base covered of 0.19 than programs BatchPrimer3 and PAMPS, which achieved 0.25 and 0.64 primer nucleotides, respectively. Conclusions MCMC-ODPR is a useful tool for designing primers at various melting temperatures at good target coverage. By combining degeneracy with optimal primer reuse the user may increase coverage of sequences amplified by the designed primers at significantly lower costs. Our analyses showed that overall MCMC-ODPR outperformed the other primer-design programs in our study in terms of cost per covered base. PMID:23126469
IMa2p - Parallel MCMC and inference of ancient demography under the Isolation with Migration (IM) model

PubMed Central

Sethuraman, Arun; Hey, Jody

2015-01-01

IMa2 and related programs are used to study the divergence of closely related species and of populations within species. These methods are based on the sampling of genealogies using MCMC, and they can proceed quite slowly for larger data sets. We describe a parallel implementation, called IMa2p, that provides a nearly linear increase in genealogy sampling rate with the number of processors in use. IMa2p is written in OpenMPI and C++, and scales well for demographic analyses of a large number of loci and populations, which are difficult to study using the serial version of the program. PMID:26059786
Bayesian Analysis for Exponential Random Graph Models Using the Adaptive Exchange Sampler.

PubMed

Jin, Ick Hoon; Yuan, Ying; Liang, Faming

2013-10-01

Exponential random graph models have been widely used in social network analysis. However, these models are extremely difficult to handle from a statistical viewpoint, because of the intractable normalizing constant and model degeneracy. In this paper, we consider a fully Bayesian analysis for exponential random graph models using the adaptive exchange sampler, which solves the intractable normalizing constant and model degeneracy issues encountered in Markov chain Monte Carlo (MCMC) simulations. The adaptive exchange sampler can be viewed as a MCMC extension of the exchange algorithm, and it generates auxiliary networks via an importance sampling procedure from an auxiliary Markov chain running in parallel. The convergence of this algorithm is established under mild conditions. The adaptive exchange sampler is illustrated using a few social networks, including the Florentine business network, molecule synthetic network, and dolphins network. The results indicate that the adaptive exchange algorithm can produce more accurate estimates than approximate exchange algorithms, while maintaining the same computational efficiency.
Optimal Bayesian Adaptive Design for Test-Item Calibration.

PubMed

van der Linden, Wim J; Ren, Hao

2015-06-01

An optimal adaptive design for test-item calibration based on Bayesian optimality criteria is presented. The design adapts the choice of field-test items to the examinees taking an operational adaptive test using both the information in the posterior distributions of their ability parameters and the current posterior distributions of the field-test parameters. Different criteria of optimality based on the two types of posterior distributions are possible. The design can be implemented using an MCMC scheme with alternating stages of sampling from the posterior distributions of the test takers' ability parameters and the parameters of the field-test items while reusing samples from earlier posterior distributions of the other parameters. Results from a simulation study demonstrated the feasibility of the proposed MCMC implementation for operational item calibration. A comparison of performances for different optimality criteria showed faster calibration of substantial numbers of items for the criterion of D-optimality relative to A-optimality, a special case of c-optimality, and random assignment of items to the test takers.
Assessing Model Fitting of Megamaser Disks with Simulated Observations

NASA Astrophysics Data System (ADS)

Han, Jiwon; Braatz, James; Pesce, Dominic

2018-01-01

The Megamaser Cosmology Project (MCP) measures the Hubble Constant by determining distances to galaxies with observations of 22 GHz H20 megamasers. The megamasers arise in the circumnuclear accretion disks of active galaxies. In this research, we aim to improve the estimation of systematic errors in MCP measurements. Currently, the MCP fits a disk model to the observed maser data with a Markov Chain Monte Carlo (MCMC) code. The disk model is described by up to 14 global parameters, including up to 6 that describe the disk warping. We first assess the model by generating synthetic datasets in which the locations and dynamics of the maser spots are exactly known, and fitting the model to these data. By doing so, we can also test the effects of unmodeled substructure on the estimated uncertainties. Furthermore, in order to gain better understanding of the physics behind accretion disk warping, we develop a physics-driven model for the warp and test it with the MCMC approach.
Wireless Interconnects for Intra-chip & Inter-chip Transmission

NASA Astrophysics Data System (ADS)

Narde, Rounak Singh

With the emergence of Internet of Things and information revolution, the demand of high performance computing systems is increasing. The copper interconnects inside the computing chips have evolved into a sophisticated network of interconnects known as Network on Chip (NoC) comprising of routers, switches, repeaters, just like computer networks. When network on chip is implemented on a large scale like in Multicore Multichip (MCMC) systems for High Performance Computing (HPC) systems, length of interconnects increases and so are the problems like power dissipation, interconnect delays, clock synchronization and electrical noise. In this thesis, wireless interconnects are chosen as the substitute for wired copper interconnects. Wireless interconnects offer easy integration with CMOS fabrication and chip packaging. Using wireless interconnects working at unlicensed mm-wave band (57-64GHz), high data rate of Gbps can be achieved. This thesis presents study of transmission between zigzag antennas as wireless interconnects for Multichip multicores (MCMC) systems and 3D IC. For MCMC systems, a four-chips 16-cores model is analyzed with only four wireless interconnects in three configurations with different antenna orientations and locations. Return loss and transmission coefficients are simulated in ANSYS HFSS. Moreover, wireless interconnects are designed, fabricated and tested on a 6'' silicon wafer with resistivity of 55O-cm using a basic standard CMOS process. Wireless interconnect are designed to work at 30GHz using ANSYS HFSS. The fabricated antennas are resonating around 20GHz with a return loss of less than -10dB. The transmission coefficients between antenna pair within a 20mm x 20mm silicon die is found to be varying between -45dB to -55dB. Furthermore, wireless interconnect approach is extended for 3D IC. Wireless interconnects are implemented as zigzag antenna. This thesis extends the work of analyzing the wireless interconnects in 3D IC with different configurations of antenna orientations and coolants. The return loss and transmission coefficients are simulated using ANSYS HFSS.
BigFoot: Bayesian alignment and phylogenetic footprinting with MCMC

PubMed Central

Satija, Rahul; Novák, Ádám; Miklós, István; Lyngsø, Rune; Hein, Jotun

2009-01-01

Background We have previously combined statistical alignment and phylogenetic footprinting to detect conserved functional elements without assuming a fixed alignment. Considering a probability-weighted distribution of alignments removes sensitivity to alignment errors, properly accommodates regions of alignment uncertainty, and increases the accuracy of functional element prediction. Our method utilized standard dynamic programming hidden markov model algorithms to analyze up to four sequences. Results We present a novel approach, implemented in the software package BigFoot, for performing phylogenetic footprinting on greater numbers of sequences. We have developed a Markov chain Monte Carlo (MCMC) approach which samples both sequence alignments and locations of slowly evolving regions. We implement our method as an extension of the existing StatAlign software package and test it on well-annotated regions controlling the expression of the even-skipped gene in Drosophila and the α-globin gene in vertebrates. The results exhibit how adding additional sequences to the analysis has the potential to improve the accuracy of functional predictions, and demonstrate how BigFoot outperforms existing alignment-based phylogenetic footprinting techniques. Conclusion BigFoot extends a combined alignment and phylogenetic footprinting approach to analyze larger amounts of sequence data using MCMC. Our approach is robust to alignment error and uncertainty and can be applied to a variety of biological datasets. The source code and documentation are publicly available for download from PMID:19715598
BigFoot: Bayesian alignment and phylogenetic footprinting with MCMC.

PubMed

Satija, Rahul; Novák, Adám; Miklós, István; Lyngsø, Rune; Hein, Jotun

2009-08-28

We have previously combined statistical alignment and phylogenetic footprinting to detect conserved functional elements without assuming a fixed alignment. Considering a probability-weighted distribution of alignments removes sensitivity to alignment errors, properly accommodates regions of alignment uncertainty, and increases the accuracy of functional element prediction. Our method utilized standard dynamic programming hidden markov model algorithms to analyze up to four sequences. We present a novel approach, implemented in the software package BigFoot, for performing phylogenetic footprinting on greater numbers of sequences. We have developed a Markov chain Monte Carlo (MCMC) approach which samples both sequence alignments and locations of slowly evolving regions. We implement our method as an extension of the existing StatAlign software package and test it on well-annotated regions controlling the expression of the even-skipped gene in Drosophila and the alpha-globin gene in vertebrates. The results exhibit how adding additional sequences to the analysis has the potential to improve the accuracy of functional predictions, and demonstrate how BigFoot outperforms existing alignment-based phylogenetic footprinting techniques. BigFoot extends a combined alignment and phylogenetic footprinting approach to analyze larger amounts of sequence data using MCMC. Our approach is robust to alignment error and uncertainty and can be applied to a variety of biological datasets. The source code and documentation are publicly available for download from http://www.stats.ox.ac.uk/~satija/BigFoot/
Measurements of Kepler Planet Masses and Eccentricities from Transit Timing Variations: Analytic and N-body Results

NASA Astrophysics Data System (ADS)

Hadden, Sam; Lithwick, Yoram

2015-12-01

Several Kepler planets reside in multi-planet systems where gravitational interactions result in transit timing variations (TTVs) that provide exquisitely sensitive probes of their masses of and orbits. Measuring these planets' masses and orbits constrains their bulk compositions and can provide clues about their formation. However, inverting TTV measurements in order to infer planet properties can be challenging: it involves fitting a nonlinear model with a large number of parameters to noisy data, often with significant degeneracies between parameters. I present results from two complementary approaches to TTV inversion: Markov chain Monte Carlo simulations that use N-body integrations to compute transit times and a simplified analytic model for computing the TTVs of planets near mean motion resonances. The analytic model allows for straightforward interpretations of N-body results and provides an independent estimate of parameter uncertainties that can be compared to MCMC results which may be sensitive to factors such as priors. We have conducted extensive MCMC simulations along with analytic fits to model the TTVs of dozens of Kepler multi-planet systems. We find that the bulk of these sub-Jovian planets have low densities that necessitate significant gaseous envelopes. We also find that the planets' eccentricities are generally small but often definitively non-zero.
Estimating a Markovian Epidemic Model Using Household Serial Interval Data from the Early Phase of an Epidemic

PubMed Central

Black, Andrew J.; Ross, Joshua V.

2013-01-01

The clinical serial interval of an infectious disease is the time between date of symptom onset in an index case and the date of symptom onset in one of its secondary cases. It is a quantity which is commonly collected during a pandemic and is of fundamental importance to public health policy and mathematical modelling. In this paper we present a novel method for calculating the serial interval distribution for a Markovian model of household transmission dynamics. This allows the use of Bayesian MCMC methods, with explicit evaluation of the likelihood, to fit to serial interval data and infer parameters of the underlying model. We use simulated and real data to verify the accuracy of our methodology and illustrate the importance of accounting for household size. The output of our approach can be used to produce posterior distributions of population level epidemic characteristics. PMID:24023679
Modeling error distributions of growth curve models through Bayesian methods.

PubMed

Zhang, Zhiyong

2016-06-01

Growth curve models are widely used in social and behavioral sciences. However, typical growth curve models often assume that the errors are normally distributed although non-normal data may be even more common than normal data. In order to avoid possible statistical inference problems in blindly assuming normality, a general Bayesian framework is proposed to flexibly model normal and non-normal data through the explicit specification of the error distributions. A simulation study shows when the distribution of the error is correctly specified, one can avoid the loss in the efficiency of standard error estimates. A real example on the analysis of mathematical ability growth data from the Early Childhood Longitudinal Study, Kindergarten Class of 1998-99 is used to show the application of the proposed methods. Instructions and code on how to conduct growth curve analysis with both normal and non-normal error distributions using the the MCMC procedure of SAS are provided.
Bayesian median regression for temporal gene expression data

NASA Astrophysics Data System (ADS)

Yu, Keming; Vinciotti, Veronica; Liu, Xiaohui; 't Hoen, Peter A. C.

2007-09-01

Most of the existing methods for the identification of biologically interesting genes in a temporal expression profiling dataset do not fully exploit the temporal ordering in the dataset and are based on normality assumptions for the gene expression. In this paper, we introduce a Bayesian median regression model to detect genes whose temporal profile is significantly different across a number of biological conditions. The regression model is defined by a polynomial function where both time and condition effects as well as interactions between the two are included. MCMC-based inference returns the posterior distribution of the polynomial coefficients. From this a simple Bayes factor test is proposed to test for significance. The estimation of the median rather than the mean, and within a Bayesian framework, increases the robustness of the method compared to a Hotelling T2-test previously suggested. This is shown on simulated data and on muscular dystrophy gene expression data.

Systematic evaluation of sequential geostatistical resampling within MCMC for posterior sampling of near-surface geophysical inverse problems

NASA Astrophysics Data System (ADS)

Ruggeri, Paolo; Irving, James; Holliger, Klaus

2015-08-01

We critically examine the performance of sequential geostatistical resampling (SGR) as a model proposal mechanism for Bayesian Markov-chain-Monte-Carlo (MCMC) solutions to near-surface geophysical inverse problems. Focusing on a series of simple yet realistic synthetic crosshole georadar tomographic examples characterized by different numbers of data, levels of data error and degrees of model parameter spatial correlation, we investigate the efficiency of three different resampling strategies with regard to their ability to generate statistically independent realizations from the Bayesian posterior distribution. Quite importantly, our results show that, no matter what resampling strategy is employed, many of the examined test cases require an unreasonably high number of forward model runs to produce independent posterior samples, meaning that the SGR approach as currently implemented will not be computationally feasible for a wide range of problems. Although use of a novel gradual-deformation-based proposal method can help to alleviate these issues, it does not offer a full solution. Further, we find that the nature of the SGR is found to strongly influence MCMC performance; however no clear rule exists as to what set of inversion parameters and/or overall proposal acceptance rate will allow for the most efficient implementation. We conclude that although the SGR methodology is highly attractive as it allows for the consideration of complex geostatistical priors as well as conditioning to hard and soft data, further developments are necessary in the context of novel or hybrid MCMC approaches for it to be considered generally suitable for near-surface geophysical inversions.
Fitting Residual Error Structures for Growth Models in SAS PROC MCMC

ERIC Educational Resources Information Center

McNeish, Daniel

2017-01-01

In behavioral sciences broadly, estimating growth models with Bayesian methods is becoming increasingly common, especially to combat small samples common with longitudinal data. Although Mplus is becoming an increasingly common program for applied research employing Bayesian methods, the limited selection of prior distributions for the elements of…
Parameter identifiability and regional calibration for reservoir inflow prediction

NASA Astrophysics Data System (ADS)

Kolberg, Sjur; Engeland, Kolbjørn; Tøfte, Lena S.; Bruland, Oddbjørn

2013-04-01

The large hydropower producer Statkraft is currently testing regional, distributed models for operational reservoir inflow prediction. The need for simultaneous forecasts and consistent updating in a large number of catchments supports the shift from catchment-oriented to regional models. Low-quality naturalized inflow series in the reservoir catchments further encourages the use of donor catchments and regional simulation for calibration purposes. MCMC based parameter estimation (the Dream algorithm; Vrugt et al, 2009) is adapted to regional parameter estimation, and implemented within the open source ENKI framework. The likelihood is based on the concept of effectively independent number of observations, spatially as well as in time. Marginal and conditional (around an optimum) parameter distributions for each catchment may be extracted, even though the MCMC algorithm itself is guided only by the regional likelihood surface. Early results indicate that the average performance loss associated with regional calibration (difference in Nash-Sutcliffe R2 between regionally and locally optimal parameters) is in the range of 0.06. The importance of the seasonal snow storage and melt in Norwegian mountain catchments probably contributes to the high degree of similarity among catchments. The evaluation continues for several regions, focusing on posterior parameter uncertainty and identifiability. Vrugt, J. A., C. J. F. ter Braak, C. G. H. Diks, B. A. Robinson, J. M. Hyman and D. Higdon: Accelerating Markov Chain Monte Carlo Simulation by Differential Evolution with Self-Adaptive Randomized Subspace Sampling. Int. J. of nonlinear sciences and numerical simulation 10, 3, 273-290, 2009.
Extraction of global 21-cm signal from simulated data for the Dark Ages Radio Explorer (DARE) using an MCMC pipeline

NASA Astrophysics Data System (ADS)

Tauscher, Keith A.; Burns, Jack O.; Rapetti, David; Mirocha, Jordan; Monsalve, Raul A.

2017-01-01

The Dark Ages Radio Explorer (DARE) is a mission concept proposed to NASA in which a crossed dipole antenna collects low frequency (40-120 MHz) radio measurements above the farside of the Moon to detect and characterize the global 21-cm signal from the early (z~35-11) Universe's neutral hydrogen. Simulated data for DARE includes: 1) the global signal modeled using the ares code, 2) spectrally smooth Galactic foregrounds with spatial structure taken from multiple radio foreground maps averaged over a large, well characterized beam, 3) systematics introduced in the data by antenna/receiver reflections, and 4) the Moon. This simulated data is fed into a signal extraction pipeline. As the signal is 4-5 orders of magnitude below the Galactic synchrotron contribution, it is best extracted from the data using Bayesian techniques which take full advantage of prior knowledge of the instrument and foregrounds. For the DARE pipeline, we use the affine-invariant MCMC algorithm implemented in the Python package, emcee. The pipeline also employs singular value decomposition to use known spectral features of the antenna and receiver to form a natural basis with which to fit instrumental systematics. Taking advantage of high-fidelity measurements of the antenna beam (to ~20 ppm) and precise calibration of the instrument, the pipeline extracts the global 21-cm signal with an average RMS error of 10-15 mK for multiple signal models.
Vertically Integrated Seismological Analysis II : Inference

NASA Astrophysics Data System (ADS)

Arora, N. S.; Russell, S.; Sudderth, E.

2009-12-01

Methods for automatically associating detected waveform features with hypothesized seismic events, and localizing those events, are a critical component of efforts to verify the Comprehensive Test Ban Treaty (CTBT). As outlined in our companion abstract, we have developed a hierarchical model which views detection, association, and localization as an integrated probabilistic inference problem. In this abstract, we provide more details on the Markov chain Monte Carlo (MCMC) methods used to solve this inference task. MCMC generates samples from a posterior distribution π(x) over possible worlds x by defining a Markov chain whose states are the worlds x, and whose stationary distribution is π(x). In the Metropolis-Hastings (M-H) method, transitions in the Markov chain are constructed in two steps. First, given the current state x, a candidate next state x‧ is generated from a proposal distribution q(x‧ | x), which may be (more or less) arbitrary. Second, the transition to x‧ is not automatic, but occurs with an acceptance probability—α(x‧ | x) = min(1, π(x‧)q(x | x‧)/π(x)q(x‧ | x)). The seismic event model outlined in our companion abstract is quite similar to those used in multitarget tracking, for which MCMC has proved very effective. In this model, each world x is defined by a collection of events, a list of properties characterizing those events (times, locations, magnitudes, and types), and the association of each event to a set of observed detections. The target distribution π(x) = P(x | y), the posterior distribution over worlds x given the observed waveform data y at all stations. Proposal distributions then implement several types of moves between worlds. For example, birth moves create new events; death moves delete existing events; split moves partition the detections for an event into two new events; merge moves combine event pairs; swap moves modify the properties and assocations for pairs of events. Importantly, the rules for accepting such complex moves need not be hand-designed. Instead, they are automatically determined by the underlying probabilistic model, which is in turn calibrated via historical data and scientific knowledge. Consider a small seismic event which generates weak signals at several different stations, which might independently be mistaken for noise. A birth move may nevertheless hypothesize an event jointly explaining these detections. If the corresponding waveform data then aligns with the seismological knowledge encoded in the probabilistic model, the event may be detected even though no single station observes it unambiguously. Alternatively, if a large outlier reading is produced at a single station, moves which instantiate a corresponding (false) event would be rejected because of the absence of plausible detections at other sensors. More broadly, one of the main advantages of our MCMC approach is its consistent handling of the relative uncertainties in different information sources. By avoiding low-level thresholds, we expect to improve accuracy and robustness. At the conference, we will present results quantitatively validating our approach, using ground-truth associations and locations provided either by simulation or human analysts.
SCoPE: an efficient method of Cosmological Parameter Estimation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Das, Santanu; Souradeep, Tarun, E-mail: santanud@iucaa.ernet.in, E-mail: tarun@iucaa.ernet.in

Markov Chain Monte Carlo (MCMC) sampler is widely used for cosmological parameter estimation from CMB and other data. However, due to the intrinsic serial nature of the MCMC sampler, convergence is often very slow. Here we present a fast and independently written Monte Carlo method for cosmological parameter estimation named as Slick Cosmological Parameter Estimator (SCoPE), that employs delayed rejection to increase the acceptance rate of a chain, and pre-fetching that helps an individual chain to run on parallel CPUs. An inter-chain covariance update is also incorporated to prevent clustering of the chains allowing faster and better mixing of themore » chains. We use an adaptive method for covariance calculation to calculate and update the covariance automatically as the chains progress. Our analysis shows that the acceptance probability of each step in SCoPE is more than 95% and the convergence of the chains are faster. Using SCoPE, we carry out some cosmological parameter estimations with different cosmological models using WMAP-9 and Planck results. One of the current research interests in cosmology is quantifying the nature of dark energy. We analyze the cosmological parameters from two illustrative commonly used parameterisations of dark energy models. We also asses primordial helium fraction in the universe can be constrained by the present CMB data from WMAP-9 and Planck. The results from our MCMC analysis on the one hand helps us to understand the workability of the SCoPE better, on the other hand it provides a completely independent estimation of cosmological parameters from WMAP-9 and Planck data.« less
Hybrid Gibbs Sampling and MCMC for CMB Analysis at Small Angular Scales

NASA Technical Reports Server (NTRS)

Jewell, Jeffrey B.; Eriksen, H. K.; Wandelt, B. D.; Gorski, K. M.; Huey, G.; O'Dwyer, I. J.; Dickinson, C.; Banday, A. J.; Lawrence, C. R.

2008-01-01

A) Gibbs Sampling has now been validated as an efficient, statistically exact, and practically useful method for "low-L" (as demonstrated on WMAP temperature polarization data). B) We are extending Gibbs sampling to directly propagate uncertainties in both foreground and instrument models to total uncertainty in cosmological parameters for the entire range of angular scales relevant for Planck. C) Made possible by inclusion of foreground model parameters in Gibbs sampling and hybrid MCMC and Gibbs sampling for the low signal to noise (high-L) regime. D) Future items to be included in the Bayesian framework include: 1) Integration with Hybrid Likelihood (or posterior) code for cosmological parameters; 2) Include other uncertainties in instrumental systematics? (I.e. beam uncertainties, noise estimation, calibration errors, other).
Emulation: A fast stochastic Bayesian method to eliminate model space

NASA Astrophysics Data System (ADS)

Roberts, Alan; Hobbs, Richard; Goldstein, Michael

2010-05-01

Joint inversion of large 3D datasets has been the goal of geophysicists ever since the datasets first started to be produced. There are two broad approaches to this kind of problem, traditional deterministic inversion schemes and more recently developed Bayesian search methods, such as MCMC (Markov Chain Monte Carlo). However, using both these kinds of schemes has proved prohibitively expensive, both in computing power and time cost, due to the normally very large model space which needs to be searched using forward model simulators which take considerable time to run. At the heart of strategies aimed at accomplishing this kind of inversion is the question of how to reliably and practicably reduce the size of the model space in which the inversion is to be carried out. Here we present a practical Bayesian method, known as emulation, which can address this issue. Emulation is a Bayesian technique used with considerable success in a number of technical fields, such as in astronomy, where the evolution of the universe has been modelled using this technique, and in the petroleum industry where history matching is carried out of hydrocarbon reservoirs. The method of emulation involves building a fast-to-compute uncertainty-calibrated approximation to a forward model simulator. We do this by modelling the output data from a number of forward simulator runs by a computationally cheap function, and then fitting the coefficients defining this function to the model parameters. By calibrating the error of the emulator output with respect to the full simulator output, we can use this to screen out large areas of model space which contain only implausible models. For example, starting with what may be considered a geologically reasonable prior model space of 10000 models, using the emulator we can quickly show that only models which lie within 10% of that model space actually produce output data which is plausibly similar in character to an observed dataset. We can thus much more tightly constrain the input model space for a deterministic inversion or MCMC method. By using this technique jointly on several datasets (specifically seismic, gravity, and magnetotelluric (MT) describing the same region), we can include in our modelling uncertainties in the data measurements, the relationships between the various physical parameters involved, as well as the model representation uncertainty, and at the same time further reduce the range of plausible models to several percent of the original model space. Being stochastic in nature, the output posterior parameter distributions also allow our understanding of/beliefs about a geological region can be objectively updated, with full assessment of uncertainties, and so the emulator is also an inversion-type tool in it's own right, with the advantage (as with any Bayesian method) that our uncertainties from all sources (both data and model) can be fully evaluated.
Low Frequency Flats for Imaging Cameras on the Hubble Space Telescope

NASA Astrophysics Data System (ADS)

Kossakowski, Diana; Avila, Roberto J.; Borncamp, David; Grogin, Norman A.

2017-01-01

We created a revamped Low Frequency Flat (L-Flat) algorithm for the Hubble Space Telescope (HST) and all of its imaging cameras. The current program that makes these calibration files does not compile on modern computer systems and it requires translation to Python. We took the opportunity to explore various methods that reduce the scatter of photometric observations using chi-squared optimizers along with Markov Chain Monte Carlo (MCMC). We created simulations to validate the algorithms and then worked with the UV photometry of the globular cluster NGC6681 to update the calibration files for the Advanced Camera for Surveys (ACS) and Solar Blind Channel (SBC). The new software was made for general usage and therefore can be applied to any of the current imaging cameras on HST.
Honest Importance Sampling with Multiple Markov Chains

PubMed Central

Tan, Aixin; Doss, Hani; Hobert, James P.

2017-01-01

Importance sampling is a classical Monte Carlo technique in which a random sample from one probability density, π1, is used to estimate an expectation with respect to another, π. The importance sampling estimator is strongly consistent and, as long as two simple moment conditions are satisfied, it obeys a central limit theorem (CLT). Moreover, there is a simple consistent estimator for the asymptotic variance in the CLT, which makes for routine computation of standard errors. Importance sampling can also be used in the Markov chain Monte Carlo (MCMC) context. Indeed, if the random sample from π1 is replaced by a Harris ergodic Markov chain with invariant density π1, then the resulting estimator remains strongly consistent. There is a price to be paid however, as the computation of standard errors becomes more complicated. First, the two simple moment conditions that guarantee a CLT in the iid case are not enough in the MCMC context. Second, even when a CLT does hold, the asymptotic variance has a complex form and is difficult to estimate consistently. In this paper, we explain how to use regenerative simulation to overcome these problems. Actually, we consider a more general set up, where we assume that Markov chain samples from several probability densities, π1, …, πk, are available. We construct multiple-chain importance sampling estimators for which we obtain a CLT based on regeneration. We show that if the Markov chains converge to their respective target distributions at a geometric rate, then under moment conditions similar to those required in the iid case, the MCMC-based importance sampling estimator obeys a CLT. Furthermore, because the CLT is based on a regenerative process, there is a simple consistent estimator of the asymptotic variance. We illustrate the method with two applications in Bayesian sensitivity analysis. The first concerns one-way random effects models under different priors. The second involves Bayesian variable selection in linear regression, and for this application, importance sampling based on multiple chains enables an empirical Bayes approach to variable selection. PMID:28701855
Honest Importance Sampling with Multiple Markov Chains.

PubMed

Tan, Aixin; Doss, Hani; Hobert, James P

2015-01-01

Importance sampling is a classical Monte Carlo technique in which a random sample from one probability density, π 1 , is used to estimate an expectation with respect to another, π . The importance sampling estimator is strongly consistent and, as long as two simple moment conditions are satisfied, it obeys a central limit theorem (CLT). Moreover, there is a simple consistent estimator for the asymptotic variance in the CLT, which makes for routine computation of standard errors. Importance sampling can also be used in the Markov chain Monte Carlo (MCMC) context. Indeed, if the random sample from π 1 is replaced by a Harris ergodic Markov chain with invariant density π 1 , then the resulting estimator remains strongly consistent. There is a price to be paid however, as the computation of standard errors becomes more complicated. First, the two simple moment conditions that guarantee a CLT in the iid case are not enough in the MCMC context. Second, even when a CLT does hold, the asymptotic variance has a complex form and is difficult to estimate consistently. In this paper, we explain how to use regenerative simulation to overcome these problems. Actually, we consider a more general set up, where we assume that Markov chain samples from several probability densities, π 1 , …, π k , are available. We construct multiple-chain importance sampling estimators for which we obtain a CLT based on regeneration. We show that if the Markov chains converge to their respective target distributions at a geometric rate, then under moment conditions similar to those required in the iid case, the MCMC-based importance sampling estimator obeys a CLT. Furthermore, because the CLT is based on a regenerative process, there is a simple consistent estimator of the asymptotic variance. We illustrate the method with two applications in Bayesian sensitivity analysis. The first concerns one-way random effects models under different priors. The second involves Bayesian variable selection in linear regression, and for this application, importance sampling based on multiple chains enables an empirical Bayes approach to variable selection.
State and parameter estimation of two land surface models using the ensemble Kalman filter and the particle filter

NASA Astrophysics Data System (ADS)

Zhang, Hongjuan; Hendricks Franssen, Harrie-Jan; Han, Xujun; Vrugt, Jasper A.; Vereecken, Harry

2017-09-01

Land surface models (LSMs) use a large cohort of parameters and state variables to simulate the water and energy balance at the soil-atmosphere interface. Many of these model parameters cannot be measured directly in the field, and require calibration against measured fluxes of carbon dioxide, sensible and/or latent heat, and/or observations of the thermal and/or moisture state of the soil. Here, we evaluate the usefulness and applicability of four different data assimilation methods for joint parameter and state estimation of the Variable Infiltration Capacity Model (VIC-3L) and the Community Land Model (CLM) using a 5-month calibration (assimilation) period (March-July 2012) of areal-averaged SPADE soil moisture measurements at 5, 20, and 50 cm depths in the Rollesbroich experimental test site in the Eifel mountain range in western Germany. We used the EnKF with state augmentation or dual estimation, respectively, and the residual resampling PF with a simple, statistically deficient, or more sophisticated, MCMC-based parameter resampling method. The performance of the calibrated LSM models was investigated using SPADE water content measurements of a 5-month evaluation period (August-December 2012). As expected, all DA methods enhance the ability of the VIC and CLM models to describe spatiotemporal patterns of moisture storage within the vadose zone of the Rollesbroich site, particularly if the maximum baseflow velocity (VIC) or fractions of sand, clay, and organic matter of each layer (CLM) are estimated jointly with the model states of each soil layer. The differences between the soil moisture simulations of VIC-3L and CLM are much larger than the discrepancies among the four data assimilation methods. The EnKF with state augmentation or dual estimation yields the best performance of VIC-3L and CLM during the calibration and evaluation period, yet results are in close agreement with the PF using MCMC resampling. Overall, CLM demonstrated the best performance for the Rollesbroich site. The large systematic underestimation of water storage at 50 cm depth by VIC-3L during the first few months of the evaluation period questions, in part, the validity of its fixed water table depth at the bottom of the modeled soil domain.
The cosmological analysis of X-ray cluster surveys. IV. Testing ASpiX with template-based cosmological simulations

NASA Astrophysics Data System (ADS)

Valotti, A.; Pierre, M.; Farahi, A.; Evrard, A.; Faccioli, L.; Sauvageot, J.-L.; Clerc, N.; Pacaud, F.

2018-06-01

Context. This paper is the fourth of a series evaluating the ASpiX cosmological method, based on X-ray diagrams, which are constructed from simple cluster observable quantities, namely: count rate (CR), hardness ratio (HR), core radius (rc), and redshift. Aims: Following extensive tests on analytical toy catalogues (Paper III), we present the results of a more realistic study over a 711 deg2 template-based maps derived from a cosmological simulation. Methods: Dark matter haloes from the Aardvark simulation have been ascribed luminosities, temperatures, and core radii, using local scaling relations and assuming self-similar evolution. The predicted X-ray sky-maps were converted into XMM event lists, using a detailed instrumental simulator. The XXL pipeline runs on the resulting sky images, produces an observed cluster catalogue over which the tests have been performed. This allowed us to investigate the relative power of various combinations of the CR, HR, rc, and redshift information. Two fitting methods were used: a traditional Markov chain Monte Carlo (MCMC) approach and a simple minimisation procedure (Amoeba) whose mean uncertainties are a posteriori evaluated by means of synthetic catalogues. The results were analysed and compared to the predictions from the Fisher analysis (FA). Results: For this particular catalogue realisation, assuming that the scaling relations are perfectly known, the CR-HR combination gives σ8 and Ωm at the 10% level, while CR-HR-rc-z improves this to ≤3%. Adding a second HR improves the results from the CR-HR1-rc combination, but to a lesser extent than when adding the redshift information. When all coefficients of the mass-temperature relation (M-T, including scatter) are also fitted, the cosmological parameters are constrained to within 5-10% and larger for the M-T coefficients (up to a factor of two for the scatter). The errors returned by the MCMC, those by Amoeba and the FA predictions are in most cases in excellent agreement and always within a factor of two. We also study the impact of the scatter of the mass-size relation (M-Rc) on the number of detected clusters: for the cluster typical sizes usually assumed, the larger the scatter, the lower the number of detected objects. Conclusions: The present study confirms and extends the trends outlined in our previous analyses, namely the power of X-ray observable diagrams to successfully and easily fit at the same time, the cosmological parameters, cluster physics, and the survey selection, by involving all detected clusters. The accuracy levels quoted should not be considered as definitive. A number of simplifying hypotheses were made for the testing purpose, but this should affect any method in the same way. The next publication will consider in greater detail the impact of cluster shapes (selection and measurements) and of cluster physics on the final error budget by means of hydrodynamical simulations.
Item Response Theory Equating Using Bayesian Informative Priors.

ERIC Educational Resources Information Center

de la Torre, Jimmy; Patz, Richard J.

This paper seeks to extend the application of Markov chain Monte Carlo (MCMC) methods in item response theory (IRT) to include the estimation of equating relationships along with the estimation of test item parameters. A method is proposed that incorporates estimation of the equating relationship in the item calibration phase. Item parameters from…
Active Subspace Methods for Data-Intensive Inverse Problems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Qiqi

2017-04-27

The project has developed theory and computational tools to exploit active subspaces to reduce the dimension in statistical calibration problems. This dimension reduction enables MCMC methods to calibrate otherwise intractable models. The same theoretical and computational tools can also reduce the measurement dimension for calibration problems that use large stores of data.
A Bayesian Retrieval of Greenland Ice Sheet Internal Temperature from Ultra-wideband Software-defined Microwave Radiometer (UWBRAD) Measurements

NASA Astrophysics Data System (ADS)

Duan, Y.; Durand, M. T.; Jezek, K. C.; Yardim, C.; Bringer, A.; Aksoy, M.; Johnson, J. T.

2017-12-01

The ultra-wideband software-defined microwave radiometer (UWBRAD) is designed to provide ice sheet internal temperature product via measuring low frequency microwave emission. Twelve channels ranging from 0.5 to 2.0 GHz are covered by the instrument. A Greenland air-borne demonstration was demonstrated in September 2016, provided first demonstration of Ultra-wideband radiometer observations of geophysical scenes, including ice sheets. Another flight is planned for September 2017 for acquiring measurements in central ice sheet. A Bayesian framework is designed to retrieve the ice sheet internal temperature from simulated UWBRAD brightness temperature (Tb) measurements over Greenland flight path with limited prior information of the ground. A 1-D heat-flow model, the Robin Model, was used to model the ice sheet internal temperature profile with ground information. Synthetic UWBRAD Tb observations was generated via the partially coherent radiation transfer model, which utilizes the Robin model temperature profile and an exponential fit of ice density from Borehole measurement as input, and corrupted with noise. The effective surface temperature, geothermal heat flux, the variance of upper layer ice density, and the variance of fine scale density variation at deeper ice sheet were treated as unknown variables within the retrieval framework. Each parameter is defined with its possible range and set to be uniformly distributed. The Markov Chain Monte Carlo (MCMC) approach is applied to make the unknown parameters randomly walk in the parameter space. We investigate whether the variables can be improved over priors using the MCMC approach and contribute to the temperature retrieval theoretically. UWBRAD measurements near camp century from 2016 was also treated with the MCMC to examine the framework with scattering effect. The fine scale density fluctuation is an important parameter. It is the most sensitive yet highly unknown parameter in the estimation framework. Including the fine scale density fluctuation greatly improved the retrieval results. The ice sheet vertical temperature profile, especially the 10m temperature, can be well retrieved via the MCMC process. Future retrieval work will apply the Bayesian approach to UWBRAD airborne measurements.
Population forecasts for Bangladesh, using a Bayesian methodology.

PubMed

Mahsin, Md; Hossain, Syed Shahadat

2012-12-01

Population projection for many developing countries could be quite a challenging task for the demographers mostly due to lack of availability of enough reliable data. The objective of this paper is to present an overview of the existing methods for population forecasting and to propose an alternative based on the Bayesian statistics, combining the formality of inference. The analysis has been made using Markov Chain Monte Carlo (MCMC) technique for Bayesian methodology available with the software WinBUGS. Convergence diagnostic techniques available with the WinBUGS software have been applied to ensure the convergence of the chains necessary for the implementation of MCMC. The Bayesian approach allows for the use of observed data and expert judgements by means of appropriate priors, and a more realistic population forecasts, along with associated uncertainty, has been possible.
Assessment of parameter uncertainty in hydrological model using a Markov-Chain-Monte-Carlo-based multilevel-factorial-analysis method

NASA Astrophysics Data System (ADS)

Zhang, Junlong; Li, Yongping; Huang, Guohe; Chen, Xi; Bao, Anming

2016-07-01

Without a realistic assessment of parameter uncertainty, decision makers may encounter difficulties in accurately describing hydrologic processes and assessing relationships between model parameters and watershed characteristics. In this study, a Markov-Chain-Monte-Carlo-based multilevel-factorial-analysis (MCMC-MFA) method is developed, which can not only generate samples of parameters from a well constructed Markov chain and assess parameter uncertainties with straightforward Bayesian inference, but also investigate the individual and interactive effects of multiple parameters on model output through measuring the specific variations of hydrological responses. A case study is conducted for addressing parameter uncertainties in the Kaidu watershed of northwest China. Effects of multiple parameters and their interactions are quantitatively investigated using the MCMC-MFA with a three-level factorial experiment (totally 81 runs). A variance-based sensitivity analysis method is used to validate the results of parameters' effects. Results disclose that (i) soil conservation service runoff curve number for moisture condition II (CN2) and fraction of snow volume corresponding to 50% snow cover (SNO50COV) are the most significant factors to hydrological responses, implying that infiltration-excess overland flow and snow water equivalent represent important water input to the hydrological system of the Kaidu watershed; (ii) saturate hydraulic conductivity (SOL_K) and soil evaporation compensation factor (ESCO) have obvious effects on hydrological responses; this implies that the processes of percolation and evaporation would impact hydrological process in this watershed; (iii) the interactions of ESCO and SNO50COV as well as CN2 and SNO50COV have an obvious effect, implying that snow cover can impact the generation of runoff on land surface and the extraction of soil evaporative demand in lower soil layers. These findings can help enhance the hydrological model's capability for simulating/predicting water resources.
Real-time characterization of partially observed epidemics using surrogate models.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Safta, Cosmin; Ray, Jaideep; Lefantzi, Sophia

We present a statistical method, predicated on the use of surrogate models, for the 'real-time' characterization of partially observed epidemics. Observations consist of counts of symptomatic patients, diagnosed with the disease, that may be available in the early epoch of an ongoing outbreak. Characterization, in this context, refers to estimation of epidemiological parameters that can be used to provide short-term forecasts of the ongoing epidemic, as well as to provide gross information on the dynamics of the etiologic agent in the affected population e.g., the time-dependent infection rate. The characterization problem is formulated as a Bayesian inverse problem, and epidemiologicalmore » parameters are estimated as distributions using a Markov chain Monte Carlo (MCMC) method, thus quantifying the uncertainty in the estimates. In some cases, the inverse problem can be computationally expensive, primarily due to the epidemic simulator used inside the inversion algorithm. We present a method, based on replacing the epidemiological model with computationally inexpensive surrogates, that can reduce the computational time to minutes, without a significant loss of accuracy. The surrogates are created by projecting the output of an epidemiological model on a set of polynomial chaos bases; thereafter, computations involving the surrogate model reduce to evaluations of a polynomial. We find that the epidemic characterizations obtained with the surrogate models is very close to that obtained with the original model. We also find that the number of projections required to construct a surrogate model is O(10)-O(10{sup 2}) less than the number of samples required by the MCMC to construct a stationary posterior distribution; thus, depending upon the epidemiological models in question, it may be possible to omit the offline creation and caching of surrogate models, prior to their use in an inverse problem. The technique is demonstrated on synthetic data as well as observations from the 1918 influenza pandemic collected at Camp Custer, Michigan.« less
Two new methods to fit models for network meta-analysis with random inconsistency effects.

PubMed

Law, Martin; Jackson, Dan; Turner, Rebecca; Rhodes, Kirsty; Viechtbauer, Wolfgang

2016-07-28

Meta-analysis is a valuable tool for combining evidence from multiple studies. Network meta-analysis is becoming more widely used as a means to compare multiple treatments in the same analysis. However, a network meta-analysis may exhibit inconsistency, whereby the treatment effect estimates do not agree across all trial designs, even after taking between-study heterogeneity into account. We propose two new estimation methods for network meta-analysis models with random inconsistency effects. The model we consider is an extension of the conventional random-effects model for meta-analysis to the network meta-analysis setting and allows for potential inconsistency using random inconsistency effects. Our first new estimation method uses a Bayesian framework with empirically-based prior distributions for both the heterogeneity and the inconsistency variances. We fit the model using importance sampling and thereby avoid some of the difficulties that might be associated with using Markov Chain Monte Carlo (MCMC). However, we confirm the accuracy of our importance sampling method by comparing the results to those obtained using MCMC as the gold standard. The second new estimation method we describe uses a likelihood-based approach, implemented in the metafor package, which can be used to obtain (restricted) maximum-likelihood estimates of the model parameters and profile likelihood confidence intervals of the variance components. We illustrate the application of the methods using two contrasting examples. The first uses all-cause mortality as an outcome, and shows little evidence of between-study heterogeneity or inconsistency. The second uses "ear discharge" as an outcome, and exhibits substantial between-study heterogeneity and inconsistency. Both new estimation methods give results similar to those obtained using MCMC. The extent of heterogeneity and inconsistency should be assessed and reported in any network meta-analysis. Our two new methods can be used to fit models for network meta-analysis with random inconsistency effects. They are easily implemented using the accompanying R code in the Additional file 1. Using these estimation methods, the extent of inconsistency can be assessed and reported.

Stochastic Inversion of 2D Magnetotelluric Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Jinsong

2010-07-01

The algorithm is developed to invert 2D magnetotelluric (MT) data based on sharp boundary parametrization using a Bayesian framework. Within the algorithm, we consider the locations and the resistivity of regions formed by the interfaces are as unknowns. We use a parallel, adaptive finite-element algorithm to forward simulate frequency-domain MT responses of 2D conductivity structure. Those unknown parameters are spatially correlated and are described by a geostatistical model. The joint posterior probability distribution function is explored by Markov Chain Monte Carlo (MCMC) sampling methods. The developed stochastic model is effective for estimating the interface locations and resistivity. Most importantly, itmore » provides details uncertainty information on each unknown parameter. Hardware requirements: PC, Supercomputer, Multi-platform, Workstation; Software requirements C and Fortan; Operation Systems/version is Linux/Unix or Windows« less
Bayesian analysis of the flutter margin method in aeroelasticity

DOE PAGES

Khalil, Mohammad; Poirel, Dominique; Sarkar, Abhijit

2016-08-27

A Bayesian statistical framework is presented for Zimmerman and Weissenburger flutter margin method which considers the uncertainties in aeroelastic modal parameters. The proposed methodology overcomes the limitations of the previously developed least-square based estimation technique which relies on the Gaussian approximation of the flutter margin probability density function (pdf). Using the measured free-decay responses at subcritical (preflutter) airspeeds, the joint non-Gaussain posterior pdf of the modal parameters is sampled using the Metropolis–Hastings (MH) Markov chain Monte Carlo (MCMC) algorithm. The posterior MCMC samples of the modal parameters are then used to obtain the flutter margin pdfs and finally the fluttermore » speed pdf. The usefulness of the Bayesian flutter margin method is demonstrated using synthetic data generated from a two-degree-of-freedom pitch-plunge aeroelastic model. The robustness of the statistical framework is demonstrated using different sets of measurement data. In conclusion, it will be shown that the probabilistic (Bayesian) approach reduces the number of test points required in providing a flutter speed estimate for a given accuracy and precision.« less
Bayesian additive decision trees of biomarker by treatment interactions for predictive biomarker detection and subgroup identification.

PubMed

Zhao, Yang; Zheng, Wei; Zhuo, Daisy Y; Lu, Yuefeng; Ma, Xiwen; Liu, Hengchang; Zeng, Zhen; Laird, Glen

2017-10-11

Personalized medicine, or tailored therapy, has been an active and important topic in recent medical research. Many methods have been proposed in the literature for predictive biomarker detection and subgroup identification. In this article, we propose a novel decision tree-based approach applicable in randomized clinical trials. We model the prognostic effects of the biomarkers using additive regression trees and the biomarker-by-treatment effect using a single regression tree. Bayesian approach is utilized to periodically revise the split variables and the split rules of the decision trees, which provides a better overall fitting. Gibbs sampler is implemented in the MCMC procedure, which updates the prognostic trees and the interaction tree separately. We use the posterior distribution of the interaction tree to construct the predictive scores of the biomarkers and to identify the subgroup where the treatment is superior to the control. Numerical simulations show that our proposed method performs well under various settings comparing to existing methods. We also demonstrate an application of our method in a real clinical trial.
Bayesian Treed Multivariate Gaussian Process with Adaptive Design: Application to a Carbon Capture Unit

DOE Office of Scientific and Technical Information (OSTI.GOV)

Konomi, Bledar A.; Karagiannis, Georgios; Sarkar, Avik

2014-05-16

Computer experiments (numerical simulations) are widely used in scientific research to study and predict the behavior of complex systems, which usually have responses consisting of a set of distinct outputs. The computational cost of the simulations at high resolution are often expensive and become impractical for parametric studies at different input values. To overcome these difficulties we develop a Bayesian treed multivariate Gaussian process (BTMGP) as an extension of the Bayesian treed Gaussian process (BTGP) in order to model and evaluate a multivariate process. A suitable choice of covariance function and the prior distributions facilitates the different Markov chain Montemore » Carlo (MCMC) movements. We utilize this model to sequentially sample the input space for the most informative values, taking into account model uncertainty and expertise gained. A simulation study demonstrates the use of the proposed method and compares it with alternative approaches. We apply the sequential sampling technique and BTMGP to model the multiphase flow in a full scale regenerator of a carbon capture unit. The application presented in this paper is an important tool for research into carbon dioxide emissions from thermal power plants.« less
Reciprocal Sliding Friction Model for an Electro-Deposited Coating and Its Parameter Estimation Using Markov Chain Monte Carlo Method

PubMed Central

Kim, Kyungmok; Lee, Jaewook

2016-01-01

This paper describes a sliding friction model for an electro-deposited coating. Reciprocating sliding tests using ball-on-flat plate test apparatus are performed to determine an evolution of the kinetic friction coefficient. The evolution of the friction coefficient is classified into the initial running-in period, steady-state sliding, and transition to higher friction. The friction coefficient during the initial running-in period and steady-state sliding is expressed as a simple linear function. The friction coefficient in the transition to higher friction is described with a mathematical model derived from Kachanov-type damage law. The model parameters are then estimated using the Markov Chain Monte Carlo (MCMC) approach. It is identified that estimated friction coefficients obtained by MCMC approach are in good agreement with measured ones. PMID:28773359
The inner mass power spectrum of galaxies using strong gravitational lensing: beyond linear approximation

NASA Astrophysics Data System (ADS)

Chatterjee, Saikat; Koopmans, Léon V. E.

2018-02-01

In the last decade, the detection of individual massive dark matter sub-haloes has been possible using potential correction formalism in strong gravitational lens imaging. Here, we propose a statistical formalism to relate strong gravitational lens surface brightness anomalies to the lens potential fluctuations arising from dark matter distribution in the lens galaxy. We consider these fluctuations as a Gaussian random field in addition to the unperturbed smooth lens model. This is very similar to weak lensing formalism and we show that in this way we can measure the power spectrum of these perturbations to the potential. We test the method by applying it to simulated mock lenses of different geometries and by performing an MCMC analysis of the theoretical power spectra. This method can measure density fluctuations in early type galaxies on scales of 1-10 kpc at typical rms levels of a per cent, using a single lens system observed with the Hubble Space Telescope with typical signal-to-noise ratios obtained in a single orbit.
Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets.

PubMed

Datta, Abhirup; Banerjee, Sudipto; Finley, Andrew O; Gelfand, Alan E

2016-01-01

Spatial process models for analyzing geostatistical data entail computations that become prohibitive as the number of spatial locations become large. This article develops a class of highly scalable nearest-neighbor Gaussian process (NNGP) models to provide fully model-based inference for large geostatistical datasets. We establish that the NNGP is a well-defined spatial process providing legitimate finite-dimensional Gaussian densities with sparse precision matrices. We embed the NNGP as a sparsity-inducing prior within a rich hierarchical modeling framework and outline how computationally efficient Markov chain Monte Carlo (MCMC) algorithms can be executed without storing or decomposing large matrices. The floating point operations (flops) per iteration of this algorithm is linear in the number of spatial locations, thereby rendering substantial scalability. We illustrate the computational and inferential benefits of the NNGP over competing methods using simulation studies and also analyze forest biomass from a massive U.S. Forest Inventory dataset at a scale that precludes alternative dimension-reducing methods. Supplementary materials for this article are available online.
Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets

PubMed Central

Datta, Abhirup; Banerjee, Sudipto; Finley, Andrew O.; Gelfand, Alan E.

2018-01-01

Spatial process models for analyzing geostatistical data entail computations that become prohibitive as the number of spatial locations become large. This article develops a class of highly scalable nearest-neighbor Gaussian process (NNGP) models to provide fully model-based inference for large geostatistical datasets. We establish that the NNGP is a well-defined spatial process providing legitimate finite-dimensional Gaussian densities with sparse precision matrices. We embed the NNGP as a sparsity-inducing prior within a rich hierarchical modeling framework and outline how computationally efficient Markov chain Monte Carlo (MCMC) algorithms can be executed without storing or decomposing large matrices. The floating point operations (flops) per iteration of this algorithm is linear in the number of spatial locations, thereby rendering substantial scalability. We illustrate the computational and inferential benefits of the NNGP over competing methods using simulation studies and also analyze forest biomass from a massive U.S. Forest Inventory dataset at a scale that precludes alternative dimension-reducing methods. Supplementary materials for this article are available online. PMID:29720777
Linking big models to big data: efficient ecosystem model calibration through Bayesian model emulation

NASA Astrophysics Data System (ADS)

Fer, I.; Kelly, R.; Andrews, T.; Dietze, M.; Richardson, A. D.

2016-12-01

Our ability to forecast ecosystems is limited by how well we parameterize ecosystem models. Direct measurements for all model parameters are not always possible and inverse estimation of these parameters through Bayesian methods is computationally costly. A solution to computational challenges of Bayesian calibration is to approximate the posterior probability surface using a Gaussian Process that emulates the complex process-based model. Here we report the integration of this method within an ecoinformatics toolbox, Predictive Ecosystem Analyzer (PEcAn), and its application with two ecosystem models: SIPNET and ED2.1. SIPNET is a simple model, allowing application of MCMC methods both to the model itself and to its emulator. We used both approaches to assimilate flux (CO2 and latent heat), soil respiration, and soil carbon data from Bartlett Experimental Forest. This comparison showed that emulator is reliable in terms of convergence to the posterior distribution. A 10000-iteration MCMC analysis with SIPNET itself required more than two orders of magnitude greater computation time than an MCMC run of same length with its emulator. This difference would be greater for a more computationally demanding model. Validation of the emulator-calibrated SIPNET against both the assimilated data and out-of-sample data showed improved fit and reduced uncertainty around model predictions. We next applied the validated emulator method to the ED2, whose complexity precludes standard Bayesian data assimilation. We used the ED2 emulator to assimilate demographic data from a network of inventory plots. For validation of the calibrated ED2, we compared the model to results from Empirical Succession Mapping (ESM), a novel synthesis of successional patterns in Forest Inventory and Analysis data. Our results revealed that while the pre-assimilation ED2 formulation cannot capture the emergent demographic patterns from ESM analysis, constrained model parameters controlling demographic processes increased their agreement considerably.
A Bayesian approach to modeling 2D gravity data using polygon states

NASA Astrophysics Data System (ADS)

Titus, W. J.; Titus, S.; Davis, J. R.

2015-12-01

We present a Bayesian Markov chain Monte Carlo (MCMC) method for the 2D gravity inversion of a localized subsurface object with constant density contrast. Our models have four parameters: the density contrast, the number of vertices in a polygonal approximation of the object, an upper bound on the ratio of the perimeter squared to the area, and the vertices of a polygon container that bounds the object. Reasonable parameter values can be estimated prior to inversion using a forward model and geologic information. In addition, we assume that the field data have a common random uncertainty that lies between two bounds but that it has no systematic uncertainty. Finally, we assume that there is no uncertainty in the spatial locations of the measurement stations. For any set of model parameters, we use MCMC methods to generate an approximate probability distribution of polygons for the object. We then compute various probability distributions for the object, including the variance between the observed and predicted fields (an important quantity in the MCMC method), the area, the center of area, and the occupancy probability (the probability that a spatial point lies within the object). In addition, we compare probabilities of different models using parallel tempering, a technique which also mitigates trapping in local optima that can occur in certain model geometries. We apply our method to several synthetic data sets generated from objects of varying shape and location. We also analyze a natural data set collected across the Rio Grande Gorge Bridge in New Mexico, where the object (i.e. the air below the bridge) is known and the canyon is approximately 2D. Although there are many ways to view results, the occupancy probability proves quite powerful. We also find that the choice of the container is important. In particular, large containers should be avoided, because the more closely a container confines the object, the better the predictions match properties of object.
An adaptive sparse-grid high-order stochastic collocation method for Bayesian inference in groundwater reactive transport modeling

NASA Astrophysics Data System (ADS)

Zhang, Guannan; Lu, Dan; Ye, Ming; Gunzburger, Max; Webster, Clayton

2013-10-01

Bayesian analysis has become vital to uncertainty quantification in groundwater modeling, but its application has been hindered by the computational cost associated with numerous model executions required by exploring the posterior probability density function (PPDF) of model parameters. This is particularly the case when the PPDF is estimated using Markov Chain Monte Carlo (MCMC) sampling. In this study, a new approach is developed to improve the computational efficiency of Bayesian inference by constructing a surrogate of the PPDF, using an adaptive sparse-grid high-order stochastic collocation (aSG-hSC) method. Unlike previous works using first-order hierarchical basis, this paper utilizes a compactly supported higher-order hierarchical basis to construct the surrogate system, resulting in a significant reduction in the number of required model executions. In addition, using the hierarchical surplus as an error indicator allows locally adaptive refinement of sparse grids in the parameter space, which further improves computational efficiency. To efficiently build the surrogate system for the PPDF with multiple significant modes, optimization techniques are used to identify the modes, for which high-probability regions are defined and components of the aSG-hSC approximation are constructed. After the surrogate is determined, the PPDF can be evaluated by sampling the surrogate system directly without model execution, resulting in improved efficiency of the surrogate-based MCMC compared with conventional MCMC. The developed method is evaluated using two synthetic groundwater reactive transport models. The first example involves coupled linear reactions and demonstrates the accuracy of our high-order hierarchical basis approach in approximating high-dimensional posteriori distribution. The second example is highly nonlinear because of the reactions of uranium surface complexation, and demonstrates how the iterative aSG-hSC method is able to capture multimodal and non-Gaussian features of PPDF caused by model nonlinearity. Both experiments show that aSG-hSC is an effective and efficient tool for Bayesian inference.
Inference of multi-Gaussian property fields by probabilistic inversion of crosshole ground penetrating radar data using an improved dimensionality reduction

NASA Astrophysics Data System (ADS)

Hunziker, Jürg; Laloy, Eric; Linde, Niklas

2016-04-01

Deterministic inversion procedures can often explain field data, but they only deliver one final subsurface model that depends on the initial model and regularization constraints. This leads to poor insights about the uncertainties associated with the inferred model properties. In contrast, probabilistic inversions can provide an ensemble of model realizations that accurately span the range of possible models that honor the available calibration data and prior information allowing a quantitative description of model uncertainties. We reconsider the problem of inferring the dielectric permittivity (directly related to radar velocity) structure of the subsurface by inversion of first-arrival travel times from crosshole ground penetrating radar (GPR) measurements. We rely on the DREAM_(ZS) algorithm that is a state-of-the-art Markov chain Monte Carlo (MCMC) algorithm. Such algorithms need several orders of magnitude more forward simulations than deterministic algorithms and often become infeasible in high parameter dimensions. To enable high-resolution imaging with MCMC, we use a recently proposed dimensionality reduction approach that allows reproducing 2D multi-Gaussian fields with far fewer parameters than a classical grid discretization. We consider herein a dimensionality reduction from 5000 to 257 unknowns. The first 250 parameters correspond to a spectral representation of random and uncorrelated spatial fluctuations while the remaining seven geostatistical parameters are (1) the standard deviation of the data error, (2) the mean and (3) the variance of the relative electric permittivity, (4) the integral scale along the major axis of anisotropy, (5) the anisotropy angle, (6) the ratio of the integral scale along the minor axis of anisotropy to the integral scale along the major axis of anisotropy and (7) the shape parameter of the Matérn function. The latter essentially defines the type of covariance function (e.g., exponential, Whittle, Gaussian). We present an improved formulation of the dimensionality reduction, and numerically show how it reduces artifacts in the generated models and provides better posterior estimation of the subsurface geostatistical structure. We next show that the results of the method compare very favorably against previous deterministic and stochastic inversion results obtained at the South Oyster Bacterial Transport Site in Virginia, USA. The long-term goal of this work is to enable MCMC-based full waveform inversion of crosshole GPR data.
MCMC genome rearrangement.

PubMed

Miklós, István

2003-10-01

As more and more genomes have been sequenced, genomic data is rapidly accumulating. Genome-wide mutations are believed more neutral than local mutations such as substitutions, insertions and deletions, therefore phylogenetic investigations based on inversions, transpositions and inverted transpositions are less biased by the hypothesis on neutral evolution. Although efficient algorithms exist for obtaining the inversion distance of two signed permutations, there is no reliable algorithm when both inversions and transpositions are considered. Moreover, different type of mutations happen with different rates, and it is not clear how to weight them in a distance based approach. We introduce a Markov Chain Monte Carlo method to genome rearrangement based on a stochastic model of evolution, which can estimate the number of different evolutionary events needed to sort a signed permutation. The performance of the method was tested on simulated data, and the estimated numbers of different types of mutations were reliable. Human and Drosophila mitochondrial data were also analysed with the new method. The mixing time of the Markov Chain is short both in terms of CPU times and number of proposals. The source code in C is available on request from the author.
Monte Carlo Bayesian inference on a statistical model of sub-gridcolumn moisture variability using high-resolution cloud observations. Part 1: Method.

PubMed

Norris, Peter M; da Silva, Arlindo M

2016-07-01

A method is presented to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation. The gridcolumn model includes assumed probability density function (PDF) intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used in the current study are Moderate Resolution Imaging Spectroradiometer (MODIS) cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. The current study uses a skewed-triangle distribution for layer moisture. The article also includes a discussion of the Metropolis and multiple-try Metropolis versions of MCMC.
Monte Carlo Bayesian Inference on a Statistical Model of Sub-Gridcolumn Moisture Variability Using High-Resolution Cloud Observations. Part 1: Method

NASA Technical Reports Server (NTRS)

Norris, Peter M.; Da Silva, Arlindo M.

2016-01-01

A method is presented to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation. The gridcolumn model includes assumed probability density function (PDF) intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used in the current study are Moderate Resolution Imaging Spectroradiometer (MODIS) cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. The current study uses a skewed-triangle distribution for layer moisture. The article also includes a discussion of the Metropolis and multiple-try Metropolis versions of MCMC.
Monte Carlo Bayesian inference on a statistical model of sub-gridcolumn moisture variability using high-resolution cloud observations. Part 1: Method

PubMed Central

Norris, Peter M.; da Silva, Arlindo M.

2018-01-01

A method is presented to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation. The gridcolumn model includes assumed probability density function (PDF) intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used in the current study are Moderate Resolution Imaging Spectroradiometer (MODIS) cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. The current study uses a skewed-triangle distribution for layer moisture. The article also includes a discussion of the Metropolis and multiple-try Metropolis versions of MCMC. PMID:29618847
Bayesian inference for OPC modeling

NASA Astrophysics Data System (ADS)

Burbine, Andrew; Sturtevant, John; Fryer, David; Smith, Bruce W.

2016-03-01

The use of optical proximity correction (OPC) demands increasingly accurate models of the photolithographic process. Model building and inference techniques in the data science community have seen great strides in the past two decades which make better use of available information. This paper aims to demonstrate the predictive power of Bayesian inference as a method for parameter selection in lithographic models by quantifying the uncertainty associated with model inputs and wafer data. Specifically, the method combines the model builder's prior information about each modelling assumption with the maximization of each observation's likelihood as a Student's t-distributed random variable. Through the use of a Markov chain Monte Carlo (MCMC) algorithm, a model's parameter space is explored to find the most credible parameter values. During parameter exploration, the parameters' posterior distributions are generated by applying Bayes' rule, using a likelihood function and the a priori knowledge supplied. The MCMC algorithm used, an affine invariant ensemble sampler (AIES), is implemented by initializing many walkers which semiindependently explore the space. The convergence of these walkers to global maxima of the likelihood volume determine the parameter values' highest density intervals (HDI) to reveal champion models. We show that this method of parameter selection provides insights into the data that traditional methods do not and outline continued experiments to vet the method.
Time-domain induced polarization - an analysis of Cole-Cole parameter resolution and correlation using Markov Chain Monte Carlo inversion

NASA Astrophysics Data System (ADS)

Madsen, Line Meldgaard; Fiandaca, Gianluca; Auken, Esben; Christiansen, Anders Vest

2017-12-01

The application of time-domain induced polarization (TDIP) is increasing with advances in acquisition techniques, data processing and spectral inversion schemes. An inversion of TDIP data for the spectral Cole-Cole parameters is a non-linear problem, but by applying a 1-D Markov Chain Monte Carlo (MCMC) inversion algorithm, a full non-linear uncertainty analysis of the parameters and the parameter correlations can be accessed. This is essential to understand to what degree the spectral Cole-Cole parameters can be resolved from TDIP data. MCMC inversions of synthetic TDIP data, which show bell-shaped probability distributions with a single maximum, show that the Cole-Cole parameters can be resolved from TDIP data if an acquisition range above two decades in time is applied. Linear correlations between the Cole-Cole parameters are observed and by decreasing the acquisitions ranges, the correlations increase and become non-linear. It is further investigated how waveform and parameter values influence the resolution of the Cole-Cole parameters. A limiting factor is the value of the frequency exponent, C. As C decreases, the resolution of all the Cole-Cole parameters decreases and the results become increasingly non-linear. While the values of the time constant, τ, must be in the acquisition range to resolve the parameters well, the choice between a 50 per cent and a 100 per cent duty cycle for the current injection does not have an influence on the parameter resolution. The limits of resolution and linearity are also studied in a comparison between the MCMC and a linearized gradient-based inversion approach. The two methods are consistent for resolved models, but the linearized approach tends to underestimate the uncertainties for poorly resolved parameters due to the corresponding non-linear features. Finally, an MCMC inversion of 1-D field data verifies that spectral Cole-Cole parameters can also be resolved from TD field measurements.
Analysing child mortality in Nigeria with geoadditive discrete-time survival models.

PubMed

Adebayo, Samson B; Fahrmeir, Ludwig

2005-03-15

Child mortality reflects a country's level of socio-economic development and quality of life. In developing countries, mortality rates are not only influenced by socio-economic, demographic and health variables but they also vary considerably across regions and districts. In this paper, we analysed child mortality in Nigeria with flexible geoadditive discrete-time survival models. This class of models allows us to measure small-area district-specific spatial effects simultaneously with possibly non-linear or time-varying effects of other factors. Inference is fully Bayesian and uses computationally efficient Markov chain Monte Carlo (MCMC) simulation techniques. The application is based on the 1999 Nigeria Demographic and Health Survey. Our method assesses effects at a high level of temporal and spatial resolution not available with traditional parametric models, and the results provide some evidence on how to reduce child mortality by improving socio-economic and public health conditions. Copyright (c) 2004 John Wiley & Sons, Ltd.
Cultural Consensus Theory: Aggregating Continuous Responses in a Finite Interval

NASA Astrophysics Data System (ADS)

Batchelder, William H.; Strashny, Alex; Romney, A. Kimball

Cultural consensus theory (CCT) consists of cognitive models for aggregating responses of "informants" to test items about some domain of their shared cultural knowledge. This paper develops a CCT model for items requiring bounded numerical responses, e.g. probability estimates, confidence judgments, or similarity judgments. The model assumes that each item generates a latent random representation in each informant, with mean equal to the consensus answer and variance depending jointly on the informant and the location of the consensus answer. The manifest responses may reflect biases of the informants. Markov Chain Monte Carlo (MCMC) methods were used to estimate the model, and simulation studies validated the approach. The model was applied to an existing cross-cultural dataset involving native Japanese and English speakers judging the similarity of emotion terms. The results sharpened earlier studies that showed that both cultures appear to have very similar cognitive representations of emotion terms.

Bayesian comparison of protein structures using partial Procrustes distance.

PubMed

Ejlali, Nasim; Faghihi, Mohammad Reza; Sadeghi, Mehdi

2017-09-26

An important topic in bioinformatics is the protein structure alignment. Some statistical methods have been proposed for this problem, but most of them align two protein structures based on the global geometric information without considering the effect of neighbourhood in the structures. In this paper, we provide a Bayesian model to align protein structures, by considering the effect of both local and global geometric information of protein structures. Local geometric information is incorporated to the model through the partial Procrustes distance of small substructures. These substructures are composed of β-carbon atoms from the side chains. Parameters are estimated using a Markov chain Monte Carlo (MCMC) approach. We evaluate the performance of our model through some simulation studies. Furthermore, we apply our model to a real dataset and assess the accuracy and convergence rate. Results show that our model is much more efficient than previous approaches.
Inference of Epidemiological Dynamics Based on Simulated Phylogenies Using Birth-Death and Coalescent Models

PubMed Central

Boskova, Veronika; Bonhoeffer, Sebastian; Stadler, Tanja

2014-01-01

Quantifying epidemiological dynamics is crucial for understanding and forecasting the spread of an epidemic. The coalescent and the birth-death model are used interchangeably to infer epidemiological parameters from the genealogical relationships of the pathogen population under study, which in turn are inferred from the pathogen genetic sequencing data. To compare the performance of these widely applied models, we performed a simulation study. We simulated phylogenetic trees under the constant rate birth-death model and the coalescent model with a deterministic exponentially growing infected population. For each tree, we re-estimated the epidemiological parameters using both a birth-death and a coalescent based method, implemented as an MCMC procedure in BEAST v2.0. In our analyses that estimate the growth rate of an epidemic based on simulated birth-death trees, the point estimates such as the maximum a posteriori/maximum likelihood estimates are not very different. However, the estimates of uncertainty are very different. The birth-death model had a higher coverage than the coalescent model, i.e. contained the true value in the highest posterior density (HPD) interval more often (2–13% vs. 31–75% error). The coverage of the coalescent decreases with decreasing basic reproductive ratio and increasing sampling probability of infecteds. We hypothesize that the biases in the coalescent are due to the assumption of deterministic rather than stochastic population size changes. Both methods performed reasonably well when analyzing trees simulated under the coalescent. The methods can also identify other key epidemiological parameters as long as one of the parameters is fixed to its true value. In summary, when using genetic data to estimate epidemic dynamics, our results suggest that the birth-death method will be less sensitive to population fluctuations of early outbreaks than the coalescent method that assumes a deterministic exponentially growing infected population. PMID:25375100
Experiences with Markov Chain Monte Carlo Convergence Assessment in Two Psychometric Examples

ERIC Educational Resources Information Center

Sinharay, Sandip

2004-01-01

There is an increasing use of Markov chain Monte Carlo (MCMC) algorithms for fitting statistical models in psychometrics, especially in situations where the traditional estimation techniques are very difficult to apply. One of the disadvantages of using an MCMC algorithm is that it is not straightforward to determine the convergence of the…
A Technology Solution Strengthens Comprehensive Environmental Management

DTIC Science & Technology

2012-05-23

General Navigation  Chemical Approval Example  NEPA Coordination Example  Safety PPE Example  Summary Marine Corps Support Facility...coordination, completion and documentation through automated workflows of various business processes  Chemical Approval  NEPA Coordination  Safety ...Completion Diagram Government Employee/M CMC MCMC Chemical Manager MCMC HS&E Specialist IMO Chemical Safety Specialist IMO Chemical Environmental
Locating hazardous gas leaks in the atmosphere via modified genetic, MCMC and particle swarm optimization algorithms

NASA Astrophysics Data System (ADS)

Wang, Ji; Zhang, Ru; Yan, Yuting; Dong, Xiaoqiang; Li, Jun Ming

2017-05-01

Hazardous gas leaks in the atmosphere can cause significant economic losses in addition to environmental hazards, such as fires and explosions. A three-stage hazardous gas leak source localization method was developed that uses movable and stationary gas concentration sensors. The method calculates a preliminary source inversion with a modified genetic algorithm (MGA) and has the potential to crossover with eliminated individuals from the population, following the selection of the best candidate. The method then determines a search zone using Markov Chain Monte Carlo (MCMC) sampling, utilizing a partial evaluation strategy. The leak source is then accurately localized using a modified guaranteed convergence particle swarm optimization algorithm with several bad-performing individuals, following selection of the most successful individual with dynamic updates. The first two stages are based on data collected by motionless sensors, and the last stage is based on data from movable robots with sensors. The measurement error adaptability and the effect of the leak source location were analyzed. The test results showed that this three-stage localization process can localize a leak source within 1.0 m of the source for different leak source locations, with measurement error standard deviation smaller than 2.0.
Triadic split-merge sampler

NASA Astrophysics Data System (ADS)

van Rossum, Anne C.; Lin, Hai Xiang; Dubbeldam, Johan; van der Herik, H. Jaap

2018-04-01

In machine vision typical heuristic methods to extract parameterized objects out of raw data points are the Hough transform and RANSAC. Bayesian models carry the promise to optimally extract such parameterized objects given a correct definition of the model and the type of noise at hand. A category of solvers for Bayesian models are Markov chain Monte Carlo methods. Naive implementations of MCMC methods suffer from slow convergence in machine vision due to the complexity of the parameter space. Towards this blocked Gibbs and split-merge samplers have been developed that assign multiple data points to clusters at once. In this paper we introduce a new split-merge sampler, the triadic split-merge sampler, that perform steps between two and three randomly chosen clusters. This has two advantages. First, it reduces the asymmetry between the split and merge steps. Second, it is able to propose a new cluster that is composed out of data points from two different clusters. Both advantages speed up convergence which we demonstrate on a line extraction problem. We show that the triadic split-merge sampler outperforms the conventional split-merge sampler. Although this new MCMC sampler is demonstrated in this machine vision context, its application extend to the very general domain of statistical inference.
An Application of Bayesian Approach in Modeling Risk of Death in an Intensive Care Unit

PubMed Central

Wong, Rowena Syn Yin; Ismail, Noor Azina

2016-01-01

Background and Objectives There are not many studies that attempt to model intensive care unit (ICU) risk of death in developing countries, especially in South East Asia. The aim of this study was to propose and describe application of a Bayesian approach in modeling in-ICU deaths in a Malaysian ICU. Methods This was a prospective study in a mixed medical-surgery ICU in a multidisciplinary tertiary referral hospital in Malaysia. Data collection included variables that were defined in Acute Physiology and Chronic Health Evaluation IV (APACHE IV) model. Bayesian Markov Chain Monte Carlo (MCMC) simulation approach was applied in the development of four multivariate logistic regression predictive models for the ICU, where the main outcome measure was in-ICU mortality risk. The performance of the models were assessed through overall model fit, discrimination and calibration measures. Results from the Bayesian models were also compared against results obtained using frequentist maximum likelihood method. Results The study involved 1,286 consecutive ICU admissions between January 1, 2009 and June 30, 2010, of which 1,111 met the inclusion criteria. Patients who were admitted to the ICU were generally younger, predominantly male, with low co-morbidity load and mostly under mechanical ventilation. The overall in-ICU mortality rate was 18.5% and the overall mean Acute Physiology Score (APS) was 68.5. All four models exhibited good discrimination, with area under receiver operating characteristic curve (AUC) values approximately 0.8. Calibration was acceptable (Hosmer-Lemeshow p-values > 0.05) for all models, except for model M3. Model M1 was identified as the model with the best overall performance in this study. Conclusion Four prediction models were proposed, where the best model was chosen based on its overall performance in this study. This study has also demonstrated the promising potential of the Bayesian MCMC approach as an alternative in the analysis and modeling of in-ICU mortality outcomes. PMID:27007413
Real-time individual predictions of prostate cancer recurrence using joint models

PubMed Central

Taylor, Jeremy M. G.; Park, Yongseok; Ankerst, Donna P.; Proust-Lima, Cecile; Williams, Scott; Kestin, Larry; Bae, Kyoungwha; Pickles, Tom; Sandler, Howard

2012-01-01

Summary Patients who were previously treated for prostate cancer with radiation therapy are monitored at regular intervals using a laboratory test called Prostate Specific Antigen (PSA). If the value of the PSA test starts to rise, this is an indication that the prostate cancer is more likely to recur, and the patient may wish to initiate new treatments. Such patients could be helped in making medical decisions by an accurate estimate of the probability of recurrence of the cancer in the next few years. In this paper, we describe the methodology for giving the probability of recurrence for a new patient, as implemented on a web-based calculator. The methods use a joint longitudinal survival model. The model is developed on a training dataset of 2,386 patients and tested on a dataset of 846 patients. Bayesian estimation methods are used with one Markov chain Monte Carlo (MCMC) algorithm developed for estimation of the parameters from the training dataset and a second quick MCMC developed for prediction of the risk of recurrence that uses the longitudinal PSA measures from a new patient. PMID:23379600
Simultaneous fitting of genomic-BLUP and Bayes-C components in a genomic prediction model.

PubMed

Iheshiulor, Oscar O M; Woolliams, John A; Svendsen, Morten; Solberg, Trygve; Meuwissen, Theo H E

2017-08-24

The rapid adoption of genomic selection is due to two key factors: availability of both high-throughput dense genotyping and statistical methods to estimate and predict breeding values. The development of such methods is still ongoing and, so far, there is no consensus on the best approach. Currently, the linear and non-linear methods for genomic prediction (GP) are treated as distinct approaches. The aim of this study was to evaluate the implementation of an iterative method (called GBC) that incorporates aspects of both linear [genomic-best linear unbiased prediction (G-BLUP)] and non-linear (Bayes-C) methods for GP. The iterative nature of GBC makes it less computationally demanding similar to other non-Markov chain Monte Carlo (MCMC) approaches. However, as a Bayesian method, GBC differs from both MCMC- and non-MCMC-based methods by combining some aspects of G-BLUP and Bayes-C methods for GP. Its relative performance was compared to those of G-BLUP and Bayes-C. We used an imputed 50 K single-nucleotide polymorphism (SNP) dataset based on the Illumina Bovine50K BeadChip, which included 48,249 SNPs and 3244 records. Daughter yield deviations for somatic cell count, fat yield, milk yield, and protein yield were used as response variables. GBC was frequently (marginally) superior to G-BLUP and Bayes-C in terms of prediction accuracy and was significantly better than G-BLUP only for fat yield. On average across the four traits, GBC yielded a 0.009 and 0.006 increase in prediction accuracy over G-BLUP and Bayes-C, respectively. Computationally, GBC was very much faster than Bayes-C and similar to G-BLUP. Our results show that incorporating some aspects of G-BLUP and Bayes-C in a single model can improve accuracy of GP over the commonly used method: G-BLUP. Generally, GBC did not statistically perform better than G-BLUP and Bayes-C, probably due to the close relationships between reference and validation individuals. Nevertheless, it is a flexible tool, in the sense, that it simultaneously incorporates some aspects of linear and non-linear models for GP, thereby exploiting family relationships while also accounting for linkage disequilibrium between SNPs and genes with large effects. The application of GBC in GP merits further exploration.
Mathematical modeling, analysis and Markov Chain Monte Carlo simulation of Ebola epidemics

NASA Astrophysics Data System (ADS)

Tulu, Thomas Wetere; Tian, Boping; Wu, Zunyou

Ebola virus infection is a severe infectious disease with the highest case fatality rate which become the global public health treat now. What makes the disease the worst of all is no specific effective treatment available, its dynamics is not much researched and understood. In this article a new mathematical model incorporating both vaccination and quarantine to study the dynamics of Ebola epidemic has been developed and comprehensively analyzed. The existence as well as uniqueness of the solution to the model is also verified and the basic reproduction number is calculated. Besides, stability conditions are also checked and finally simulation is done using both Euler method and one of the top ten most influential algorithm known as Markov Chain Monte Carlo (MCMC) method. Different rates of vaccination to predict the effect of vaccination on the infected individual over time and that of quarantine are discussed. The results show that quarantine and vaccination are very effective ways to control Ebola epidemic. From our study it was also seen that there is less possibility of an individual for getting Ebola virus for the second time if they survived his/her first infection. Last but not least real data has been fitted to the model, showing that it can used to predict the dynamic of Ebola epidemic.
A Bayesian method for detecting pairwise associations in compositional data

PubMed Central

Ventz, Steffen; Huttenhower, Curtis

2017-01-01

Compositional data consist of vectors of proportions normalized to a constant sum from a basis of unobserved counts. The sum constraint makes inference on correlations between unconstrained features challenging due to the information loss from normalization. However, such correlations are of long-standing interest in fields including ecology. We propose a novel Bayesian framework (BAnOCC: Bayesian Analysis of Compositional Covariance) to estimate a sparse precision matrix through a LASSO prior. The resulting posterior, generated by MCMC sampling, allows uncertainty quantification of any function of the precision matrix, including the correlation matrix. We also use a first-order Taylor expansion to approximate the transformation from the unobserved counts to the composition in order to investigate what characteristics of the unobserved counts can make the correlations more or less difficult to infer. On simulated datasets, we show that BAnOCC infers the true network as well as previous methods while offering the advantage of posterior inference. Larger and more realistic simulated datasets further showed that BAnOCC performs well as measured by type I and type II error rates. Finally, we apply BAnOCC to a microbial ecology dataset from the Human Microbiome Project, which in addition to reproducing established ecological results revealed unique, competition-based roles for Proteobacteria in multiple distinct habitats. PMID:29140991
Potential uncertainty reduction in model-averaged benchmark dose estimates informed by an additional dose study.

PubMed

Shao, Kan; Small, Mitchell J

2011-10-01

A methodology is presented for assessing the information value of an additional dosage experiment in existing bioassay studies. The analysis demonstrates the potential reduction in the uncertainty of toxicity metrics derived from expanded studies, providing insights for future studies. Bayesian methods are used to fit alternative dose-response models using Markov chain Monte Carlo (MCMC) simulation for parameter estimation and Bayesian model averaging (BMA) is used to compare and combine the alternative models. BMA predictions for benchmark dose (BMD) are developed, with uncertainty in these predictions used to derive the lower bound BMDL. The MCMC and BMA results provide a basis for a subsequent Monte Carlo analysis that backcasts the dosage where an additional test group would have been most beneficial in reducing the uncertainty in the BMD prediction, along with the magnitude of the expected uncertainty reduction. Uncertainty reductions are measured in terms of reduced interval widths of predicted BMD values and increases in BMDL values that occur as a result of this reduced uncertainty. The methodology is illustrated using two existing data sets for TCDD carcinogenicity, fitted with two alternative dose-response models (logistic and quantal-linear). The example shows that an additional dose at a relatively high value would have been most effective for reducing the uncertainty in BMA BMD estimates, with predicted reductions in the widths of uncertainty intervals of approximately 30%, and expected increases in BMDL values of 5-10%. The results demonstrate that dose selection for studies that subsequently inform dose-response models can benefit from consideration of how these models will be fit, combined, and interpreted. © 2011 Society for Risk Analysis.
Development of engine activity cycles for the prime movers of unconventional natural gas well development.

PubMed

Johnson, Derek; Heltzel, Robert; Nix, Andrew; Barrow, Rebekah

2017-03-01

With the advent of unconventional natural gas resources, new research focuses on the efficiency and emissions of the prime movers powering these fleets. These prime movers also play important roles in emissions inventories for this sector. Industry seeks to reduce operating costs by decreasing the required fuel demands of these high horsepower engines but conducting in-field or full-scale research on new technologies is cost prohibitive. As such, this research completed extensive in-use data collection efforts for the engines powering over-the-road trucks, drilling engines, and hydraulic stimulation pump engines. These engine activity data were processed in order to make representative test cycles using a Markov Chain, Monte Carlo (MCMC) simulation method. Such cycles can be applied under controlled environments on scaled engines for future research. In addition to MCMC, genetic algorithms were used to improve the overall performance values for the test cycles and smoothing was applied to ensure regression criteria were met during implementation on a test engine and dynamometer. The variations in cycle and in-use statistics are presented along with comparisons to conventional test cycles used for emissions compliance. Development of representative, engine dynamometer test cycles, from in-use activity data, is crucial in understanding fuel efficiency and emissions for engine operating modes that are different from cycles mandated by the Code of Federal Regulations. Representative cycles were created for the prime movers of unconventional well development-over-the-road (OTR) trucks and drilling and hydraulic fracturing engines. The representative cycles are implemented on scaled engines to reduce fuel consumption during research and development of new technologies in controlled laboratory environments.
An MCMC determination of the primordial helium abundance

NASA Astrophysics Data System (ADS)

Aver, Erik; Olive, Keith A.; Skillman, Evan D.

2012-04-01

Spectroscopic observations of the chemical abundances in metal-poor H II regions provide an independent method for estimating the primordial helium abundance. H II regions are described by several physical parameters such as electron density, electron temperature, and reddening, in addition to y, the ratio of helium to hydrogen. It had been customary to estimate or determine self-consistently these parameters to calculate y. Frequentist analyses of the parameter space have been shown to be successful in these parameter determinations, and Markov Chain Monte Carlo (MCMC) techniques have proven to be very efficient in sampling this parameter space. Nevertheless, accurate determination of the primordial helium abundance from observations of H II regions is constrained by both systematic and statistical uncertainties. In an attempt to better reduce the latter, and continue to better characterize the former, we apply MCMC methods to the large dataset recently compiled by Izotov, Thuan, & Stasińska (2007). To improve the reliability of the determination, a high quality dataset is needed. In pursuit of this, a variety of cuts are explored. The efficacy of the He I λ4026 emission line as a constraint on the solutions is first examined, revealing the introduction of systematic bias through its absence. As a clear measure of the quality of the physical solution, a χ2 analysis proves instrumental in the selection of data compatible with the theoretical model. Nearly two-thirds of the observations fall outside a standard 95% confidence level cut, which highlights the care necessary in selecting systems and warrants further investigation into potential deficiencies of the model or data. In addition, the method also allows us to exclude systems for which parameter estimations are statistical outliers. As a result, the final selected dataset gains in reliability and exhibits improved consistency. Regression to zero metallicity yields Yp = 0.2534 ± 0.0083, in broad agreement with the WMAP result. The inclusion of more observations shows promise for further reducing the uncertainty, but more high quality spectra are required.
Using Bayesian neural networks to classify forest scenes

NASA Astrophysics Data System (ADS)

Vehtari, Aki; Heikkonen, Jukka; Lampinen, Jouko; Juujarvi, Jouni

1998-10-01

We present results that compare the performance of Bayesian learning methods for neural networks on the task of classifying forest scenes into trees and background. Classification task is demanding due to the texture richness of the trees, occlusions of the forest scene objects and diverse lighting conditions under operation. This makes it difficult to determine which are optimal image features for the classification. A natural way to proceed is to extract many different types of potentially suitable features, and to evaluate their usefulness in later processing stages. One approach to cope with large number of features is to use Bayesian methods to control the model complexity. Bayesian learning uses a prior on model parameters, combines this with evidence from a training data, and the integrates over the resulting posterior to make predictions. With this method, we can use large networks and many features without fear of overfitting. For this classification task we compare two Bayesian learning methods for multi-layer perceptron (MLP) neural networks: (1) The evidence framework of MacKay uses a Gaussian approximation to the posterior weight distribution and maximizes with respect to hyperparameters. (2) In a Markov Chain Monte Carlo (MCMC) method due to Neal, the posterior distribution of the network parameters is numerically integrated using the MCMC method. As baseline classifiers for comparison we use (3) MLP early stop committee, (4) K-nearest-neighbor and (5) Classification And Regression Tree.
Neural Network Emulation of Reionization Simulations

NASA Astrophysics Data System (ADS)

Schmit, Claude J.; Pritchard, Jonathan R.

2018-05-01

Next generation radio experiments such as LOFAR, HERA and SKA are expected to probe the Epoch of Reionization and claim a first direct detection of the cosmic 21cm signal within the next decade. One of the major challenges for these experiments will be dealing with enormous incoming data volumes. Machine learning is key to increasing our data analysis efficiency. We consider the use of an artificial neural network to emulate 21cmFAST simulations and use it in a Bayesian parameter inference study. We then compare the network predictions to a direct evaluation of the EoR simulations and analyse the dependence of the results on the training set size. We find that the use of a training set of size 100 samples can recover the error contours of a full scale MCMC analysis which evaluates the model at each step.
A Bayesian approach to the modelling of α Cen A

NASA Astrophysics Data System (ADS)

Bazot, M.; Bourguignon, S.; Christensen-Dalsgaard, J.

2012-12-01

Determining the physical characteristics of a star is an inverse problem consisting of estimating the parameters of models for the stellar structure and evolution, and knowing certain observable quantities. We use a Bayesian approach to solve this problem for α Cen A, which allows us to incorporate prior information on the parameters to be estimated, in order to better constrain the problem. Our strategy is based on the use of a Markov chain Monte Carlo (MCMC) algorithm to estimate the posterior probability densities of the stellar parameters: mass, age, initial chemical composition, etc. We use the stellar evolutionary code ASTEC to model the star. To constrain this model both seismic and non-seismic observations were considered. Several different strategies were tested to fit these values, using either two free parameters or five free parameters in ASTEC. We are thus able to show evidence that MCMC methods become efficient with respect to more classical grid-based strategies when the number of parameters increases. The results of our MCMC algorithm allow us to derive estimates for the stellar parameters and robust uncertainties thanks to the statistical analysis of the posterior probability densities. We are also able to compute odds for the presence of a convective core in α Cen A. When using core-sensitive seismic observational constraints, these can rise above ˜40 per cent. The comparison of results to previous studies also indicates that these seismic constraints are of critical importance for our knowledge of the structure of this star.
Estimation of interplate coupling along Nankai trough considering the block motion model based on onland GNSS and seafloor GPS/A observation data using MCMC method

NASA Astrophysics Data System (ADS)

Kimura, H.; Ito, T.; Tadokoro, K.

2017-12-01

Introduction In southwest Japan, Philippine sea plate is subducting under the overriding plate such as Amurian plate, and mega interplate earthquakes has occurred at about 100 years interval. There is no occurrence of mega interplate earthquakes in southwest Japan, although it has passed about 70 years since the last mega interplate earthquakes: 1944 and 1946 along Nankai trough, meaning that the strain has been accumulated at plate interface. Therefore, it is essential to reveal the interplate coupling more precisely for predicting or understanding the mechanism of next occurring mega interplate earthquake. Recently, seafloor geodetic observation revealed the detailed interplate coupling distribution in expected source region of Nankai trough earthquake (e.g., Yokota et al. [2016]). In this study, we estimated interplate coupling in southwest Japan, considering block motion model and using seafloor geodetic observation data as well as onland GNSS observation data, based on Markov Chain Monte Carlo (MCMC) method. Method Observed crustal deformation is assumed that sum of rigid block motion and elastic deformation due to coupling at block boundaries. We modeled this relationship as a non-linear inverse problem that the unknown parameters are Euler pole of each block and coupling at each subfault, and solved them simultaneously based on MCMC method. Input data we used in this study are 863 onland GNSS observation data and 24 seafloor GPS/A observation data. We made some block division models based on the map of active fault tracing and selected the best model based on Akaike's Information Criterion (AIC): that is consist of 12 blocks. Result We find that the interplate coupling along Nankai trough has heterogeneous spatial distribution, strong at the depth of 0 to 20km at off Tokai region, and 0 to 30km at off Shikoku region. Moreover, we find that observed crustal deformation at off Tokai region is well explained by elastic deformation due to subducting Izu Micro Plate. We will present more details of our result, and discuss about not only interplate coupling but also rigid block motion, elastic deformation due to inland fault coupling, and resolution of estimated parameters.
A modelling approach to assessing the timescale uncertainties in proxy series with chronological errors

NASA Astrophysics Data System (ADS)

Divine, D. V.; Godtliebsen, F.; Rue, H.

2012-01-01

The paper proposes an approach to assessment of timescale errors in proxy-based series with chronological uncertainties. The method relies on approximation of the physical process(es) forming a proxy archive by a random Gamma process. Parameters of the process are partly data-driven and partly determined from prior assumptions. For a particular case of a linear accumulation model and absolutely dated tie points an analytical solution is found suggesting the Beta-distributed probability density on age estimates along the length of a proxy archive. In a general situation of uncertainties in the ages of the tie points the proposed method employs MCMC simulations of age-depth profiles yielding empirical confidence intervals on the constructed piecewise linear best guess timescale. It is suggested that the approach can be further extended to a more general case of a time-varying expected accumulation between the tie points. The approach is illustrated by using two ice and two lake/marine sediment cores representing the typical examples of paleoproxy archives with age models based on tie points of mixed origin.
Investigation of error sources in regional inverse estimates of greenhouse gas emissions in Canada

NASA Astrophysics Data System (ADS)

Chan, E.; Chan, D.; Ishizawa, M.; Vogel, F.; Brioude, J.; Delcloo, A.; Wu, Y.; Jin, B.

2015-08-01

Inversion models can use atmospheric concentration measurements to estimate surface fluxes. This study is an evaluation of the errors in a regional flux inversion model for different provinces of Canada, Alberta (AB), Saskatchewan (SK) and Ontario (ON). Using CarbonTracker model results as the target, the synthetic data experiment analyses examined the impacts of the errors from the Bayesian optimisation method, prior flux distribution and the atmospheric transport model, as well as their interactions. The scaling factors for different sub-regions were estimated by the Markov chain Monte Carlo (MCMC) simulation and cost function minimization (CFM) methods. The CFM method results are sensitive to the relative size of the assumed model-observation mismatch and prior flux error variances. Experiment results show that the estimation error increases with the number of sub-regions using the CFM method. For the region definitions that lead to realistic flux estimates, the numbers of sub-regions for the western region of AB/SK combined and the eastern region of ON are 11 and 4 respectively. The corresponding annual flux estimation errors for the western and eastern regions using the MCMC (CFM) method are -7 and -3 % (0 and 8 %) respectively, when there is only prior flux error. The estimation errors increase to 36 and 94 % (40 and 232 %) resulting from transport model error alone. When prior and transport model errors co-exist in the inversions, the estimation errors become 5 and 85 % (29 and 201 %). This result indicates that estimation errors are dominated by the transport model error and can in fact cancel each other and propagate to the flux estimates non-linearly. In addition, it is possible for the posterior flux estimates having larger differences than the prior compared to the target fluxes, and the posterior uncertainty estimates could be unrealistically small that do not cover the target. The systematic evaluation of the different components of the inversion model can help in the understanding of the posterior estimates and percentage errors. Stable and realistic sub-regional and monthly flux estimates for western region of AB/SK can be obtained, but not for the eastern region of ON. This indicates that it is likely a real observation-based inversion for the annual provincial emissions will work for the western region whereas; improvements are needed with the current inversion setup before real inversion is performed for the eastern region.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hou Fengji; Hogg, David W.; Goodman, Jonathan

Markov chain Monte Carlo (MCMC) proves to be powerful for Bayesian inference and in particular for exoplanet radial velocity fitting because MCMC provides more statistical information and makes better use of data than common approaches like chi-square fitting. However, the nonlinear density functions encountered in these problems can make MCMC time-consuming. In this paper, we apply an ensemble sampler respecting affine invariance to orbital parameter extraction from radial velocity data. This new sampler has only one free parameter, and does not require much tuning for good performance, which is important for automatization. The autocorrelation time of this sampler is approximatelymore » the same for all parameters and far smaller than Metropolis-Hastings, which means it requires many fewer function calls to produce the same number of independent samples. The affine-invariant sampler speeds up MCMC by hundreds of times compared with Metropolis-Hastings in the same computing situation. This novel sampler would be ideal for projects involving large data sets such as statistical investigations of planet distribution. The biggest obstacle to ensemble samplers is the existence of multiple local optima; we present a clustering technique to deal with local optima by clustering based on the likelihood of the walkers in the ensemble. We demonstrate the effectiveness of the sampler on real radial velocity data.« less
Enhancing hydrologic data assimilation by evolutionary Particle Filter and Markov Chain Monte Carlo

NASA Astrophysics Data System (ADS)

Abbaszadeh, Peyman; Moradkhani, Hamid; Yan, Hongxiang

2018-01-01

Particle Filters (PFs) have received increasing attention by researchers from different disciplines including the hydro-geosciences, as an effective tool to improve model predictions in nonlinear and non-Gaussian dynamical systems. The implication of dual state and parameter estimation using the PFs in hydrology has evolved since 2005 from the PF-SIR (sampling importance resampling) to PF-MCMC (Markov Chain Monte Carlo), and now to the most effective and robust framework through evolutionary PF approach based on Genetic Algorithm (GA) and MCMC, the so-called EPFM. In this framework, the prior distribution undergoes an evolutionary process based on the designed mutation and crossover operators of GA. The merit of this approach is that the particles move to an appropriate position by using the GA optimization and then the number of effective particles is increased by means of MCMC, whereby the particle degeneracy is avoided and the particle diversity is improved. In this study, the usefulness and effectiveness of the proposed EPFM is investigated by applying the technique on a conceptual and highly nonlinear hydrologic model over four river basins located in different climate and geographical regions of the United States. Both synthetic and real case studies demonstrate that the EPFM improves both the state and parameter estimation more effectively and reliably as compared with the PF-MCMC.
EXOFIT: orbital parameters of extrasolar planets from radial velocities

NASA Astrophysics Data System (ADS)

Balan, Sreekumar T.; Lahav, Ofer

2009-04-01

Retrieval of orbital parameters of extrasolar planets poses considerable statistical challenges. Due to sparse sampling, measurement errors, parameters degeneracy and modelling limitations, there are no unique values of basic parameters, such as period and eccentricity. Here, we estimate the orbital parameters from radial velocity data in a Bayesian framework by utilizing Markov Chain Monte Carlo (MCMC) simulations with the Metropolis-Hastings algorithm. We follow a methodology recently proposed by Gregory and Ford. Our implementation of MCMC is based on the object-oriented approach outlined by Graves. We make our resulting code, EXOFIT, publicly available with this paper. It can search for either one or two planets as illustrated on mock data. As an example we re-analysed the orbital solution of companions to HD 187085 and HD 159868 from the published radial velocity data. We confirm the degeneracy reported for orbital parameters of the companion to HD 187085, and show that a low-eccentricity orbit is more probable for this planet. For HD 159868, we obtained slightly different orbital solution and a relatively high `noise' factor indicating the presence of an unaccounted signal in the radial velocity data. EXOFIT is designed in such a way that it can be extended for a variety of probability models, including different Bayesian priors.
Bayesian Monte Carlo and Maximum Likelihood Approach for ...

EPA Pesticide Factsheets

Model uncertainty estimation and risk assessment is essential to environmental management and informed decision making on pollution mitigation strategies. In this study, we apply a probabilistic methodology, which combines Bayesian Monte Carlo simulation and Maximum Likelihood estimation (BMCML) to calibrate a lake oxygen recovery model. We first derive an analytical solution of the differential equation governing lake-averaged oxygen dynamics as a function of time-variable wind speed. Statistical inferences on model parameters and predictive uncertainty are then drawn by Bayesian conditioning of the analytical solution on observed daily wind speed and oxygen concentration data obtained from an earlier study during two recovery periods on a eutrophic lake in upper state New York. The model is calibrated using oxygen recovery data for one year and statistical inferences were validated using recovery data for another year. Compared with essentially two-step, regression and optimization approach, the BMCML results are more comprehensive and performed relatively better in predicting the observed temporal dissolved oxygen levels (DO) in the lake. BMCML also produced comparable calibration and validation results with those obtained using popular Markov Chain Monte Carlo technique (MCMC) and is computationally simpler and easier to implement than the MCMC. Next, using the calibrated model, we derive an optimal relationship between liquid film-transfer coefficien
Preparation of biocompatible magnetite-carboxymethyl cellulose nanocomposite: characterization of nanocomposite by FTIR, XRD, FESEM and TEM.

PubMed

Habibi, Neda

2014-10-15

The preparation and characterization of magnetite-carboxymethyl cellulose nano-composite (M-CMC) material is described. Magnetite nano-particles were synthesized by a modified co-precipitation method using ferrous chloride tetrahydrate and ferric chloride hexahydrate in ammonium hydroxide solution. The M-CMC nano-composite particles were synthesized by embedding the magnetite nanoparticles inside carboxymethyl cellulose (CMC) using a freshly prepared mixture of Fe3O4 with CMC precursor. Morphology, particle size, and structural properties of magnetite-carboxymethyl cellulose nano-composite was accomplished using X-ray powder diffraction (XRD), transmission electron microscopy (TEM), Fourier transformed infrared (FTIR) and field emission scanning electron microscopy (FESEM) analysis. As a result, magnetite nano-particles with an average size of 35nm were obtained. The biocompatible Fe3O4-carboxymethyl cellulose nano-composite particles obtained from the natural CMC polymers have a potential range of application in biomedical field. Copyright © 2014 Elsevier B.V. All rights reserved.
Asteroid mass estimation using Markov-chain Monte Carlo

NASA Astrophysics Data System (ADS)

Siltala, Lauri; Granvik, Mikael

2017-11-01

Estimates for asteroid masses are based on their gravitational perturbations on the orbits of other objects such as Mars, spacecraft, or other asteroids and/or their satellites. In the case of asteroid-asteroid perturbations, this leads to an inverse problem in at least 13 dimensions where the aim is to derive the mass of the perturbing asteroid(s) and six orbital elements for both the perturbing asteroid(s) and the test asteroid(s) based on astrometric observations. We have developed and implemented three different mass estimation algorithms utilizing asteroid-asteroid perturbations: the very rough 'marching' approximation, in which the asteroids' orbital elements are not fitted, thereby reducing the problem to a one-dimensional estimation of the mass, an implementation of the Nelder-Mead simplex method, and most significantly, a Markov-chain Monte Carlo (MCMC) approach. We describe each of these algorithms with particular focus on the MCMC algorithm, and present example results using both synthetic and real data. Our results agree with the published mass estimates, but suggest that the published uncertainties may be misleading as a consequence of using linearized mass-estimation methods. Finally, we discuss remaining challenges with the algorithms as well as future plans.
Learning Weight Uncertainty with Stochastic Gradient MCMC for Shape Classification

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Chunyuan; Stevens, Andrew J.; Chen, Changyou

2016-08-10

Learning the representation of shape cues in 2D & 3D objects for recognition is a fundamental task in computer vision. Deep neural networks (DNNs) have shown promising performance on this task. Due to the large variability of shapes, accurate recognition relies on good estimates of model uncertainty, ignored in traditional training of DNNs, typically learned via stochastic optimization. This paper leverages recent advances in stochastic gradient Markov Chain Monte Carlo (SG-MCMC) to learn weight uncertainty in DNNs. It yields principled Bayesian interpretations for the commonly used Dropout/DropConnect techniques and incorporates them into the SG-MCMC framework. Extensive experiments on 2D &more » 3D shape datasets and various DNN models demonstrate the superiority of the proposed approach over stochastic optimization. Our approach yields higher recognition accuracy when used in conjunction with Dropout and Batch-Normalization.« less
Neural Dynamics as Sampling: A Model for Stochastic Computation in Recurrent Networks of Spiking Neurons

PubMed Central

Buesing, Lars; Bill, Johannes; Nessler, Bernhard; Maass, Wolfgang

2011-01-01

The organization of computations in networks of spiking neurons in the brain is still largely unknown, in particular in view of the inherently stochastic features of their firing activity and the experimentally observed trial-to-trial variability of neural systems in the brain. In principle there exists a powerful computational framework for stochastic computations, probabilistic inference by sampling, which can explain a large number of macroscopic experimental data in neuroscience and cognitive science. But it has turned out to be surprisingly difficult to create a link between these abstract models for stochastic computations and more detailed models of the dynamics of networks of spiking neurons. Here we create such a link and show that under some conditions the stochastic firing activity of networks of spiking neurons can be interpreted as probabilistic inference via Markov chain Monte Carlo (MCMC) sampling. Since common methods for MCMC sampling in distributed systems, such as Gibbs sampling, are inconsistent with the dynamics of spiking neurons, we introduce a different approach based on non-reversible Markov chains that is able to reflect inherent temporal processes of spiking neuronal activity through a suitable choice of random variables. We propose a neural network model and show by a rigorous theoretical analysis that its neural activity implements MCMC sampling of a given distribution, both for the case of discrete and continuous time. This provides a step towards closing the gap between abstract functional models of cortical computation and more detailed models of networks of spiking neurons. PMID:22096452
Markov switching multinomial logit model: An application to accident-injury severities.

PubMed

Malyshkina, Nataliya V; Mannering, Fred L

2009-07-01

In this study, two-state Markov switching multinomial logit models are proposed for statistical modeling of accident-injury severities. These models assume Markov switching over time between two unobserved states of roadway safety as a means of accounting for potential unobserved heterogeneity. The states are distinct in the sense that in different states accident-severity outcomes are generated by separate multinomial logit processes. To demonstrate the applicability of the approach, two-state Markov switching multinomial logit models are estimated for severity outcomes of accidents occurring on Indiana roads over a four-year time period. Bayesian inference methods and Markov Chain Monte Carlo (MCMC) simulations are used for model estimation. The estimated Markov switching models result in a superior statistical fit relative to the standard (single-state) multinomial logit models for a number of roadway classes and accident types. It is found that the more frequent state of roadway safety is correlated with better weather conditions and that the less frequent state is correlated with adverse weather conditions.
Bayes Factor Covariance Testing in Item Response Models.

PubMed

Fox, Jean-Paul; Mulder, Joris; Sinharay, Sandip

2017-12-01

Two marginal one-parameter item response theory models are introduced, by integrating out the latent variable or random item parameter. It is shown that both marginal response models are multivariate (probit) models with a compound symmetry covariance structure. Several common hypotheses concerning the underlying covariance structure are evaluated using (fractional) Bayes factor tests. The support for a unidimensional factor (i.e., assumption of local independence) and differential item functioning are evaluated by testing the covariance components. The posterior distribution of common covariance components is obtained in closed form by transforming latent responses with an orthogonal (Helmert) matrix. This posterior distribution is defined as a shifted-inverse-gamma, thereby introducing a default prior and a balanced prior distribution. Based on that, an MCMC algorithm is described to estimate all model parameters and to compute (fractional) Bayes factor tests. Simulation studies are used to show that the (fractional) Bayes factor tests have good properties for testing the underlying covariance structure of binary response data. The method is illustrated with two real data studies.
Technical Note: Approximate Bayesian parameterization of a complex tropical forest model

NASA Astrophysics Data System (ADS)

Hartig, F.; Dislich, C.; Wiegand, T.; Huth, A.

2013-08-01

Inverse parameter estimation of process-based models is a long-standing problem in ecology and evolution. A key problem of inverse parameter estimation is to define a metric that quantifies how well model predictions fit to the data. Such a metric can be expressed by general cost or objective functions, but statistical inversion approaches are based on a particular metric, the probability of observing the data given the model, known as the likelihood. Deriving likelihoods for dynamic models requires making assumptions about the probability for observations to deviate from mean model predictions. For technical reasons, these assumptions are usually derived without explicit consideration of the processes in the simulation. Only in recent years have new methods become available that allow generating likelihoods directly from stochastic simulations. Previous applications of these approximate Bayesian methods have concentrated on relatively simple models. Here, we report on the application of a simulation-based likelihood approximation for FORMIND, a parameter-rich individual-based model of tropical forest dynamics. We show that approximate Bayesian inference, based on a parametric likelihood approximation placed in a conventional MCMC, performs well in retrieving known parameter values from virtual field data generated by the forest model. We analyze the results of the parameter estimation, examine the sensitivity towards the choice and aggregation of model outputs and observed data (summary statistics), and show results from using this method to fit the FORMIND model to field data from an Ecuadorian tropical forest. Finally, we discuss differences of this approach to Approximate Bayesian Computing (ABC), another commonly used method to generate simulation-based likelihood approximations. Our results demonstrate that simulation-based inference, which offers considerable conceptual advantages over more traditional methods for inverse parameter estimation, can successfully be applied to process-based models of high complexity. The methodology is particularly suited to heterogeneous and complex data structures and can easily be adjusted to other model types, including most stochastic population and individual-based models. Our study therefore provides a blueprint for a fairly general approach to parameter estimation of stochastic process-based models in ecology and evolution.
SPOTting model parameters using a ready-made Python package

NASA Astrophysics Data System (ADS)

Houska, Tobias; Kraft, Philipp; Breuer, Lutz

2015-04-01

The selection and parameterization of reliable process descriptions in ecological modelling is driven by several uncertainties. The procedure is highly dependent on various criteria, like the used algorithm, the likelihood function selected and the definition of the prior parameter distributions. A wide variety of tools have been developed in the past decades to optimize parameters. Some of the tools are closed source. Due to this, the choice for a specific parameter estimation method is sometimes more dependent on its availability than the performance. A toolbox with a large set of methods can support users in deciding about the most suitable method. Further, it enables to test and compare different methods. We developed the SPOT (Statistical Parameter Optimization Tool), an open source python package containing a comprehensive set of modules, to analyze and optimize parameters of (environmental) models. SPOT comes along with a selected set of algorithms for parameter optimization and uncertainty analyses (Monte Carlo, MC; Latin Hypercube Sampling, LHS; Maximum Likelihood, MLE; Markov Chain Monte Carlo, MCMC; Scuffled Complex Evolution, SCE-UA; Differential Evolution Markov Chain, DE-MCZ), together with several likelihood functions (Bias, (log-) Nash-Sutcliff model efficiency, Correlation Coefficient, Coefficient of Determination, Covariance, (Decomposed-, Relative-, Root-) Mean Squared Error, Mean Absolute Error, Agreement Index) and prior distributions (Binomial, Chi-Square, Dirichlet, Exponential, Laplace, (log-, multivariate-) Normal, Pareto, Poisson, Cauchy, Uniform, Weibull) to sample from. The model-independent structure makes it suitable to analyze a wide range of applications. We apply all algorithms of the SPOT package in three different case studies. Firstly, we investigate the response of the Rosenbrock function, where the MLE algorithm shows its strengths. Secondly, we study the Griewank function, which has a challenging response surface for optimization methods. Here we see simple algorithms like the MCMC struggling to find the global optimum of the function, while algorithms like SCE-UA and DE-MCZ show their strengths. Thirdly, we apply an uncertainty analysis of a one-dimensional physically based hydrological model build with the Catchment Modelling Framework (CMF). The model is driven by meteorological and groundwater data from a Free Air Carbon Enrichment (FACE) experiment in Linden (Hesse, Germany). Simulation results are evaluated with measured soil moisture data. We search for optimal parameter sets of the van Genuchten-Mualem function and find different equally optimal solutions with some of the algorithms. The case studies reveal that the implemented SPOT methods work sufficiently well. They further show the benefit of having one tool at hand that includes a number of parameter search methods, likelihood functions and a priori parameter distributions within one platform independent package.
Inverse Modeling of Hydrologic Parameters Using Surface Flux and Runoff Observations in the Community Land Model

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sun, Yu; Hou, Zhangshuan; Huang, Maoyi

2013-12-10

This study demonstrates the possibility of inverting hydrologic parameters using surface flux and runoff observations in version 4 of the Community Land Model (CLM4). Previous studies showed that surface flux and runoff calculations are sensitive to major hydrologic parameters in CLM4 over different watersheds, and illustrated the necessity and possibility of parameter calibration. Two inversion strategies, the deterministic least-square fitting and stochastic Markov-Chain Monte-Carlo (MCMC) - Bayesian inversion approaches, are evaluated by applying them to CLM4 at selected sites. The unknowns to be estimated include surface and subsurface runoff generation parameters and vadose zone soil water parameters. We find thatmore » using model parameters calibrated by the least-square fitting provides little improvements in the model simulations but the sampling-based stochastic inversion approaches are consistent - as more information comes in, the predictive intervals of the calibrated parameters become narrower and the misfits between the calculated and observed responses decrease. In general, parameters that are identified to be significant through sensitivity analyses and statistical tests are better calibrated than those with weak or nonlinear impacts on flux or runoff observations. Temporal resolution of observations has larger impacts on the results of inverse modeling using heat flux data than runoff data. Soil and vegetation cover have important impacts on parameter sensitivities, leading to the different patterns of posterior distributions of parameters at different sites. Overall, the MCMC-Bayesian inversion approach effectively and reliably improves the simulation of CLM under different climates and environmental conditions. Bayesian model averaging of the posterior estimates with different reference acceptance probabilities can smooth the posterior distribution and provide more reliable parameter estimates, but at the expense of wider uncertainty bounds.« less
Inverse modeling of hydrologic parameters using surface flux and runoff observations in the Community Land Model

NASA Astrophysics Data System (ADS)

Sun, Y.; Hou, Z.; Huang, M.; Tian, F.; Leung, L. Ruby

2013-12-01

This study demonstrates the possibility of inverting hydrologic parameters using surface flux and runoff observations in version 4 of the Community Land Model (CLM4). Previous studies showed that surface flux and runoff calculations are sensitive to major hydrologic parameters in CLM4 over different watersheds, and illustrated the necessity and possibility of parameter calibration. Both deterministic least-square fitting and stochastic Markov-chain Monte Carlo (MCMC)-Bayesian inversion approaches are evaluated by applying them to CLM4 at selected sites with different climate and soil conditions. The unknowns to be estimated include surface and subsurface runoff generation parameters and vadose zone soil water parameters. We find that using model parameters calibrated by the sampling-based stochastic inversion approaches provides significant improvements in the model simulations compared to using default CLM4 parameter values, and that as more information comes in, the predictive intervals (ranges of posterior distributions) of the calibrated parameters become narrower. In general, parameters that are identified to be significant through sensitivity analyses and statistical tests are better calibrated than those with weak or nonlinear impacts on flux or runoff observations. Temporal resolution of observations has larger impacts on the results of inverse modeling using heat flux data than runoff data. Soil and vegetation cover have important impacts on parameter sensitivities, leading to different patterns of posterior distributions of parameters at different sites. Overall, the MCMC-Bayesian inversion approach effectively and reliably improves the simulation of CLM under different climates and environmental conditions. Bayesian model averaging of the posterior estimates with different reference acceptance probabilities can smooth the posterior distribution and provide more reliable parameter estimates, but at the expense of wider uncertainty bounds.
Improved Assimilation of Streamflow and Satellite Soil Moisture with the Evolutionary Particle Filter and Geostatistical Modeling

NASA Astrophysics Data System (ADS)

Yan, Hongxiang; Moradkhani, Hamid; Abbaszadeh, Peyman

2017-04-01

Assimilation of satellite soil moisture and streamflow data into hydrologic models using has received increasing attention over the past few years. Currently, these observations are increasingly used to improve the model streamflow and soil moisture predictions. However, the performance of this land data assimilation (DA) system still suffers from two limitations: 1) satellite data scarcity and quality; and 2) particle weight degeneration. In order to overcome these two limitations, we propose two possible solutions in this study. First, the general Gaussian geostatistical approach is proposed to overcome the limitation in the space/time resolution of satellite soil moisture products thus improving their accuracy at uncovered/biased grid cells. Secondly, an evolutionary PF approach based on Genetic Algorithm (GA) and Markov Chain Monte Carlo (MCMC), the so-called EPF-MCMC, is developed to further reduce weight degeneration and improve the robustness of the land DA system. This study provides a detailed analysis of the joint and separate assimilation of streamflow and satellite soil moisture into a distributed Sacramento Soil Moisture Accounting (SAC-SMA) model, with the use of recently developed EPF-MCMC and the general Gaussian geostatistical approach. Performance is assessed over several basins in the USA selected from Model Parameter Estimation Experiment (MOPEX) and located in different climate regions. The results indicate that: 1) the general Gaussian approach can predict the soil moisture at uncovered grid cells within the expected satellite data quality threshold; 2) assimilation of satellite soil moisture inferred from the general Gaussian model can significantly improve the soil moisture predictions; and 3) in terms of both deterministic and probabilistic measures, the EPF-MCMC can achieve better streamflow predictions. These results recommend that the geostatistical model is a helpful tool to aid the remote sensing technique and the EPF-MCMC is a reliable and effective DA approach in hydrologic applications.
CALIBRATION OF SEMI-ANALYTIC MODELS OF GALAXY FORMATION USING PARTICLE SWARM OPTIMIZATION

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ruiz, Andrés N.; Domínguez, Mariano J.; Yaryura, Yamila

2015-03-10

We present a fast and accurate method to select an optimal set of parameters in semi-analytic models of galaxy formation and evolution (SAMs). Our approach compares the results of a model against a set of observables applying a stochastic technique called Particle Swarm Optimization (PSO), a self-learning algorithm for localizing regions of maximum likelihood in multidimensional spaces that outperforms traditional sampling methods in terms of computational cost. We apply the PSO technique to the SAG semi-analytic model combined with merger trees extracted from a standard Lambda Cold Dark Matter N-body simulation. The calibration is performed using a combination of observedmore » galaxy properties as constraints, including the local stellar mass function and the black hole to bulge mass relation. We test the ability of the PSO algorithm to find the best set of free parameters of the model by comparing the results with those obtained using a MCMC exploration. Both methods find the same maximum likelihood region, however, the PSO method requires one order of magnitude fewer evaluations. This new approach allows a fast estimation of the best-fitting parameter set in multidimensional spaces, providing a practical tool to test the consequences of including other astrophysical processes in SAMs.« less
An efficient assisted history matching and uncertainty quantification workflow using Gaussian processes proxy models and variogram based sensitivity analysis: GP-VARS

NASA Astrophysics Data System (ADS)

Rana, Sachin; Ertekin, Turgay; King, Gregory R.

2018-05-01

Reservoir history matching is frequently viewed as an optimization problem which involves minimizing misfit between simulated and observed data. Many gradient and evolutionary strategy based optimization algorithms have been proposed to solve this problem which typically require a large number of numerical simulations to find feasible solutions. Therefore, a new methodology referred to as GP-VARS is proposed in this study which uses forward and inverse Gaussian processes (GP) based proxy models combined with a novel application of variogram analysis of response surface (VARS) based sensitivity analysis to efficiently solve high dimensional history matching problems. Empirical Bayes approach is proposed to optimally train GP proxy models for any given data. The history matching solutions are found via Bayesian optimization (BO) on forward GP models and via predictions of inverse GP model in an iterative manner. An uncertainty quantification method using MCMC sampling in conjunction with GP model is also presented to obtain a probabilistic estimate of reservoir properties and estimated ultimate recovery (EUR). An application of the proposed GP-VARS methodology on PUNQ-S3 reservoir is presented in which it is shown that GP-VARS provides history match solutions in approximately four times less numerical simulations as compared to the differential evolution (DE) algorithm. Furthermore, a comparison of uncertainty quantification results obtained by GP-VARS, EnKF and other previously published methods shows that the P50 estimate of oil EUR obtained by GP-VARS is in close agreement to the true values for the PUNQ-S3 reservoir.
On the validity of cosmological Fisher matrix forecasts

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wolz, Laura; Kilbinger, Martin; Weller, Jochen

2012-09-01

We present a comparison of Fisher matrix forecasts for cosmological probes with Monte Carlo Markov Chain (MCMC) posterior likelihood estimation methods. We analyse the performance of future Dark Energy Task Force (DETF) stage-III and stage-IV dark-energy surveys using supernovae, baryon acoustic oscillations and weak lensing as probes. We concentrate in particular on the dark-energy equation of state parameters w{sub 0} and w{sub a}. For purely geometrical probes, and especially when marginalising over w{sub a}, we find considerable disagreement between the two methods, since in this case the Fisher matrix can not reproduce the highly non-elliptical shape of the likelihood function.more » More quantitatively, the Fisher method underestimates the marginalized errors for purely geometrical probes between 30%-70%. For cases including structure formation such as weak lensing, we find that the posterior probability contours from the Fisher matrix estimation are in good agreement with the MCMC contours and the forecasted errors only changing on the 5% level. We then explore non-linear transformations resulting in physically-motivated parameters and investigate whether these parameterisations exhibit a Gaussian behaviour. We conclude that for the purely geometrical probes and, more generally, in cases where it is not known whether the likelihood is close to Gaussian, the Fisher matrix is not the appropriate tool to produce reliable forecasts.« less
Bayesian calibration of terrestrial ecosystem models: A study of advanced Markov chain Monte Carlo methods

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lu, Dan; Ricciuto, Daniel; Walker, Anthony

Calibration of terrestrial ecosystem models is important but challenging. Bayesian inference implemented by Markov chain Monte Carlo (MCMC) sampling provides a comprehensive framework to estimate model parameters and associated uncertainties using their posterior distributions. The effectiveness and efficiency of the method strongly depend on the MCMC algorithm used. In this study, a Differential Evolution Adaptive Metropolis (DREAM) algorithm was used to estimate posterior distributions of 21 parameters for the data assimilation linked ecosystem carbon (DALEC) model using 14 years of daily net ecosystem exchange data collected at the Harvard Forest Environmental Measurement Site eddy-flux tower. The DREAM is a multi-chainmore » method and uses differential evolution technique for chain movement, allowing it to be efficiently applied to high-dimensional problems, and can reliably estimate heavy-tailed and multimodal distributions that are difficult for single-chain schemes using a Gaussian proposal distribution. The results were evaluated against the popular Adaptive Metropolis (AM) scheme. DREAM indicated that two parameters controlling autumn phenology have multiple modes in their posterior distributions while AM only identified one mode. The calibration of DREAM resulted in a better model fit and predictive performance compared to the AM. DREAM provides means for a good exploration of the posterior distributions of model parameters. Lastly, it reduces the risk of false convergence to a local optimum and potentially improves the predictive performance of the calibrated model.« less
Bayesian calibration of terrestrial ecosystem models: A study of advanced Markov chain Monte Carlo methods

DOE PAGES

Lu, Dan; Ricciuto, Daniel; Walker, Anthony; ...

2017-02-22

Calibration of terrestrial ecosystem models is important but challenging. Bayesian inference implemented by Markov chain Monte Carlo (MCMC) sampling provides a comprehensive framework to estimate model parameters and associated uncertainties using their posterior distributions. The effectiveness and efficiency of the method strongly depend on the MCMC algorithm used. In this study, a Differential Evolution Adaptive Metropolis (DREAM) algorithm was used to estimate posterior distributions of 21 parameters for the data assimilation linked ecosystem carbon (DALEC) model using 14 years of daily net ecosystem exchange data collected at the Harvard Forest Environmental Measurement Site eddy-flux tower. The DREAM is a multi-chainmore » method and uses differential evolution technique for chain movement, allowing it to be efficiently applied to high-dimensional problems, and can reliably estimate heavy-tailed and multimodal distributions that are difficult for single-chain schemes using a Gaussian proposal distribution. The results were evaluated against the popular Adaptive Metropolis (AM) scheme. DREAM indicated that two parameters controlling autumn phenology have multiple modes in their posterior distributions while AM only identified one mode. The calibration of DREAM resulted in a better model fit and predictive performance compared to the AM. DREAM provides means for a good exploration of the posterior distributions of model parameters. Lastly, it reduces the risk of false convergence to a local optimum and potentially improves the predictive performance of the calibrated model.« less

Iterative Importance Sampling Algorithms for Parameter Estimation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grout, Ray W; Morzfeld, Matthias; Day, Marcus S.

In parameter estimation problems one computes a posterior distribution over uncertain parameters defined jointly by a prior distribution, a model, and noisy data. Markov chain Monte Carlo (MCMC) is often used for the numerical solution of such problems. An alternative to MCMC is importance sampling, which can exhibit near perfect scaling with the number of cores on high performance computing systems because samples are drawn independently. However, finding a suitable proposal distribution is a challenging task. Several sampling algorithms have been proposed over the past years that take an iterative approach to constructing a proposal distribution. We investigate the applicabilitymore » of such algorithms by applying them to two realistic and challenging test problems, one in subsurface flow, and one in combustion modeling. More specifically, we implement importance sampling algorithms that iterate over the mean and covariance matrix of Gaussian or multivariate t-proposal distributions. Our implementation leverages massively parallel computers, and we present strategies to initialize the iterations using 'coarse' MCMC runs or Gaussian mixture models.« less
Technical Note: Approximate Bayesian parameterization of a process-based tropical forest model

NASA Astrophysics Data System (ADS)

Hartig, F.; Dislich, C.; Wiegand, T.; Huth, A.

2014-02-01

Inverse parameter estimation of process-based models is a long-standing problem in many scientific disciplines. A key question for inverse parameter estimation is how to define the metric that quantifies how well model predictions fit to the data. This metric can be expressed by general cost or objective functions, but statistical inversion methods require a particular metric, the probability of observing the data given the model parameters, known as the likelihood. For technical and computational reasons, likelihoods for process-based stochastic models are usually based on general assumptions about variability in the observed data, and not on the stochasticity generated by the model. Only in recent years have new methods become available that allow the generation of likelihoods directly from stochastic simulations. Previous applications of these approximate Bayesian methods have concentrated on relatively simple models. Here, we report on the application of a simulation-based likelihood approximation for FORMIND, a parameter-rich individual-based model of tropical forest dynamics. We show that approximate Bayesian inference, based on a parametric likelihood approximation placed in a conventional Markov chain Monte Carlo (MCMC) sampler, performs well in retrieving known parameter values from virtual inventory data generated by the forest model. We analyze the results of the parameter estimation, examine its sensitivity to the choice and aggregation of model outputs and observed data (summary statistics), and demonstrate the application of this method by fitting the FORMIND model to field data from an Ecuadorian tropical forest. Finally, we discuss how this approach differs from approximate Bayesian computation (ABC), another method commonly used to generate simulation-based likelihood approximations. Our results demonstrate that simulation-based inference, which offers considerable conceptual advantages over more traditional methods for inverse parameter estimation, can be successfully applied to process-based models of high complexity. The methodology is particularly suitable for heterogeneous and complex data structures and can easily be adjusted to other model types, including most stochastic population and individual-based models. Our study therefore provides a blueprint for a fairly general approach to parameter estimation of stochastic process-based models.
Towards a formal genealogical classification of the Lezgian languages (North Caucasus): testing various phylogenetic methods on lexical data.

PubMed

Kassian, Alexei

2015-01-01

A lexicostatistical classification is proposed for 20 languages and dialects of the Lezgian group of the North Caucasian family, based on meticulously compiled 110-item wordlists, published as part of the Global Lexicostatistical Database project. The lexical data have been subsequently analyzed with the aid of the principal phylogenetic methods, both distance-based and character-based: Starling neighbor joining (StarlingNJ), Neighbor joining (NJ), Unweighted pair group method with arithmetic mean (UPGMA), Bayesian Markov chain Monte Carlo (MCMC), Unweighted maximum parsimony (UMP). Cognation indexes within the input matrix were marked by two different algorithms: traditional etymological approach and phonetic similarity, i.e., the automatic method of consonant classes (Levenshtein distances). Due to certain reasons (first of all, high lexicographic quality of the wordlists and a consensus about the Lezgian phylogeny among Caucasologists), the Lezgian database is a perfect testing area for appraisal of phylogenetic methods. For the etymology-based input matrix, all the phylogenetic methods, with the possible exception of UMP, have yielded trees that are sufficiently compatible with each other to generate a consensus phylogenetic tree of the Lezgian lects. The obtained consensus tree agrees with the traditional expert classification as well as some of the previously proposed formal classifications of this linguistic group. Contrary to theoretical expectations, the UMP method has suggested the least plausible tree of all. In the case of the phonetic similarity-based input matrix, the distance-based methods (StarlingNJ, NJ, UPGMA) have produced the trees that are rather close to the consensus etymology-based tree and the traditional expert classification, whereas the character-based methods (Bayesian MCMC, UMP) have yielded less likely topologies.
Towards a Formal Genealogical Classification of the Lezgian Languages (North Caucasus): Testing Various Phylogenetic Methods on Lexical Data

PubMed Central

Kassian, Alexei

2015-01-01

A lexicostatistical classification is proposed for 20 languages and dialects of the Lezgian group of the North Caucasian family, based on meticulously compiled 110-item wordlists, published as part of the Global Lexicostatistical Database project. The lexical data have been subsequently analyzed with the aid of the principal phylogenetic methods, both distance-based and character-based: Starling neighbor joining (StarlingNJ), Neighbor joining (NJ), Unweighted pair group method with arithmetic mean (UPGMA), Bayesian Markov chain Monte Carlo (MCMC), Unweighted maximum parsimony (UMP). Cognation indexes within the input matrix were marked by two different algorithms: traditional etymological approach and phonetic similarity, i.e., the automatic method of consonant classes (Levenshtein distances). Due to certain reasons (first of all, high lexicographic quality of the wordlists and a consensus about the Lezgian phylogeny among Caucasologists), the Lezgian database is a perfect testing area for appraisal of phylogenetic methods. For the etymology-based input matrix, all the phylogenetic methods, with the possible exception of UMP, have yielded trees that are sufficiently compatible with each other to generate a consensus phylogenetic tree of the Lezgian lects. The obtained consensus tree agrees with the traditional expert classification as well as some of the previously proposed formal classifications of this linguistic group. Contrary to theoretical expectations, the UMP method has suggested the least plausible tree of all. In the case of the phonetic similarity-based input matrix, the distance-based methods (StarlingNJ, NJ, UPGMA) have produced the trees that are rather close to the consensus etymology-based tree and the traditional expert classification, whereas the character-based methods (Bayesian MCMC, UMP) have yielded less likely topologies. PMID:25719456
Single neuron modeling and data assimilation in BNST neurons

NASA Astrophysics Data System (ADS)

Farsian, Reza

Neurons, although tiny in size, are vastly complicated systems, which are responsible for the most basic yet essential functions of any nervous system. Even the most simple models of single neurons are usually high dimensional, nonlinear, and contain many parameters and states which are unobservable in a typical neurophysiological experiment. One of the most fundamental problems in experimental neurophysiology is the estimation of these parameters and states, since knowing their values is essential in identification, model construction, and forward prediction of biological neurons. Common methods of parameter and state estimation do not perform well for neural models due to their high dimensionality and nonlinearity. In this dissertation, two alternative approaches for parameters and state estimation of biological neurons have been demonstrated: dynamical parameter estimation (DPE) and a Markov Chain Monte Carlo (MCMC) method. The first method uses elements of chaos control and synchronization theory for parameter and state estimation. MCMC is a statistical approach which uses a path integral formulation to evaluate a mean and an error bound for these unobserved parameters and states. These methods have been applied to biological system of neurons in Bed Nucleus of Stria Termialis neurons (BNST) of rats. State and parameters of neurons in both systems were estimated, and their value were used for recreating a realistic model and predicting the behavior of the neurons successfully. The knowledge of biological parameters can ultimately provide a better understanding of the internal dynamics of a neuron in order to build robust models of neuron networks.
LATTE Linking Acoustic Tests and Tagging Using Statistical Estimation

DTIC Science & Technology

2015-09-30

the complexity of the model: (from simplest to most complex) Kalman filter , Markov chain Monte-Carlo (MCMC) and ABC. Many of these methods have been...using SMMs fitted using Kalman filters . Therefore, using the DTAG data, we can estimate the distributions associated with 2D horizontal displacement...speed (a key problem in the previous Kalman filter implementation). This new approach also allows the animal’s horizontal movement direction to differ
Asteroid mass estimation using Markov-Chain Monte Carlo techniques

NASA Astrophysics Data System (ADS)

Siltala, Lauri; Granvik, Mikael

2016-10-01

Estimates for asteroid masses are based on their gravitational perturbations on the orbits of other objects such as Mars, spacecraft, or other asteroids and/or their satellites. In the case of asteroid-asteroid perturbations, this leads to a 13-dimensional inverse problem where the aim is to derive the mass of the perturbing asteroid and six orbital elements for both the perturbing asteroid and the test asteroid using astrometric observations. We have developed and implemented three different mass estimation algorithms utilizing asteroid-asteroid perturbations into the OpenOrb asteroid-orbit-computation software: the very rough 'marching' approximation, in which the asteroid orbits are fixed at a given epoch, reducing the problem to a one-dimensional estimation of the mass, an implementation of the Nelder-Mead simplex method, and most significantly, a Markov-Chain Monte Carlo (MCMC) approach. We will introduce each of these algorithms with particular focus on the MCMC algorithm, and present example results for both synthetic and real data. Our results agree with the published mass estimates, but suggest that the published uncertainties may be misleading as a consequence of using linearized mass-estimation methods. Finally, we discuss remaining challenges with the algorithms as well as future plans, particularly in connection with ESA's Gaia mission.
Parameter Estimation of Partial Differential Equation Models.

PubMed

Xun, Xiaolei; Cao, Jiguo; Mallick, Bani; Carroll, Raymond J; Maity, Arnab

2013-01-01

Partial differential equation (PDE) models are commonly used to model complex dynamic systems in applied sciences such as biology and finance. The forms of these PDE models are usually proposed by experts based on their prior knowledge and understanding of the dynamic system. Parameters in PDE models often have interesting scientific interpretations, but their values are often unknown, and need to be estimated from the measurements of the dynamic system in the present of measurement errors. Most PDEs used in practice have no analytic solutions, and can only be solved with numerical methods. Currently, methods for estimating PDE parameters require repeatedly solving PDEs numerically under thousands of candidate parameter values, and thus the computational load is high. In this article, we propose two methods to estimate parameters in PDE models: a parameter cascading method and a Bayesian approach. In both methods, the underlying dynamic process modeled with the PDE model is represented via basis function expansion. For the parameter cascading method, we develop two nested levels of optimization to estimate the PDE parameters. For the Bayesian method, we develop a joint model for data and the PDE, and develop a novel hierarchical model allowing us to employ Markov chain Monte Carlo (MCMC) techniques to make posterior inference. Simulation studies show that the Bayesian method and parameter cascading method are comparable, and both outperform other available methods in terms of estimation accuracy. The two methods are demonstrated by estimating parameters in a PDE model from LIDAR data.
Application of advanced data assimilation techniques to the study of cloud and precipitation feedbacks in the tropical climate system

NASA Astrophysics Data System (ADS)

Posselt, Derek J.

The research documented in this study centers around two topics: evaluation of the response of precipitating cloud systems to changes in the tropical climate system, and assimilation of cloud and precipitation information from remote-sensing platforms. The motivation for this work proceeds from the following outstanding problems: (1) Use of models to study the response of clouds to perturbations in the climate system is hampered by uncertainties in cloud microphysical parameterizations. (2) Though there is an ever-growing set of available observations, cloud and precipitation assimilation remains a difficult problem, particularly in the tropics. (3) Though it is widely acknowledged that cloud and precipitation processes play a key role in regulating the Earth's response to surface warming, the response of the tropical hydrologic cycle to climate perturbations remains largely unknown. The above issues are addressed in the following manner. First, Markov chain Monte Carlo (MCMC) methods are used to quantify the sensitivity of the NASA Goddard Cumulus Ensemble (GCE) cloud resolving model (CRM) to changes in its cloud odcrnpbymiC8l parameters. TRMM retrievals of precipitation rate, cloud properties, and radiative fluxes and heating rates over the South China Sea are then assimilated into the GCE model to constrain cloud microphysical parameters to values characteristic of convection in the tropics, and the resulting observation-constrained model is used to assess the response of the tropical hydrologic cycle to surface warming. The major findings of this study are the following: (1) MCMC provides an effective tool with which to evaluate both model parameterizations and the assumption of Gaussian statistics used in optimal estimation procedures. (2) Statistics of the tropical radiation budget and hydrologic cycle can be used to effectively constrain CRM cloud microphysical parameters. (3) For 2D CRM simulations run with and without shear, the precipitation efficiency of cloud systems increases with increasing sea surface temperature, while the high cloud fraction and outgoing shortwave radiation decrease.
An Application of Bayesian Approach in Modeling Risk of Death in an Intensive Care Unit.

PubMed

Wong, Rowena Syn Yin; Ismail, Noor Azina

2016-01-01

There are not many studies that attempt to model intensive care unit (ICU) risk of death in developing countries, especially in South East Asia. The aim of this study was to propose and describe application of a Bayesian approach in modeling in-ICU deaths in a Malaysian ICU. This was a prospective study in a mixed medical-surgery ICU in a multidisciplinary tertiary referral hospital in Malaysia. Data collection included variables that were defined in Acute Physiology and Chronic Health Evaluation IV (APACHE IV) model. Bayesian Markov Chain Monte Carlo (MCMC) simulation approach was applied in the development of four multivariate logistic regression predictive models for the ICU, where the main outcome measure was in-ICU mortality risk. The performance of the models were assessed through overall model fit, discrimination and calibration measures. Results from the Bayesian models were also compared against results obtained using frequentist maximum likelihood method. The study involved 1,286 consecutive ICU admissions between January 1, 2009 and June 30, 2010, of which 1,111 met the inclusion criteria. Patients who were admitted to the ICU were generally younger, predominantly male, with low co-morbidity load and mostly under mechanical ventilation. The overall in-ICU mortality rate was 18.5% and the overall mean Acute Physiology Score (APS) was 68.5. All four models exhibited good discrimination, with area under receiver operating characteristic curve (AUC) values approximately 0.8. Calibration was acceptable (Hosmer-Lemeshow p-values > 0.05) for all models, except for model M3. Model M1 was identified as the model with the best overall performance in this study. Four prediction models were proposed, where the best model was chosen based on its overall performance in this study. This study has also demonstrated the promising potential of the Bayesian MCMC approach as an alternative in the analysis and modeling of in-ICU mortality outcomes.
Evaluating impacts using a BACI design, ratios, and a Bayesian approach with a focus on restoration.

PubMed

Conner, Mary M; Saunders, W Carl; Bouwes, Nicolaas; Jordan, Chris

2015-10-01

Before-after-control-impact (BACI) designs are an effective method to evaluate natural and human-induced perturbations on ecological variables when treatment sites cannot be randomly chosen. While effect sizes of interest can be tested with frequentist methods, using Bayesian Markov chain Monte Carlo (MCMC) sampling methods, probabilities of effect sizes, such as a ≥20 % increase in density after restoration, can be directly estimated. Although BACI and Bayesian methods are used widely for assessing natural and human-induced impacts for field experiments, the application of hierarchal Bayesian modeling with MCMC sampling to BACI designs is less common. Here, we combine these approaches and extend the typical presentation of results with an easy to interpret ratio, which provides an answer to the main study question-"How much impact did a management action or natural perturbation have?" As an example of this approach, we evaluate the impact of a restoration project, which implemented beaver dam analogs, on survival and density of juvenile steelhead. Results indicated the probabilities of a ≥30 % increase were high for survival and density after the dams were installed, 0.88 and 0.99, respectively, while probabilities for a higher increase of ≥50 % were variable, 0.17 and 0.82, respectively. This approach demonstrates a useful extension of Bayesian methods that can easily be generalized to other study designs from simple (e.g., single factor ANOVA, paired t test) to more complicated block designs (e.g., crossover, split-plot). This approach is valuable for estimating the probabilities of restoration impacts or other management actions.
Inclusion of historical information in flood frequency analysis using a Bayesian MCMC technique: a case study for the power dam Orlík, Czech Republic

NASA Astrophysics Data System (ADS)

Gaál, Ladislav; Szolgay, Ján; Kohnová, Silvia; Hlavčová, Kamila; Viglione, Alberto

2010-01-01

The paper deals with at-site flood frequency estimation in the case when also information on hydrological events from the past with extraordinary magnitude are available. For the joint frequency analysis of systematic observations and historical data, respectively, the Bayesian framework is chosen, which, through adequately defined likelihood functions, allows for incorporation of different sources of hydrological information, e.g., maximum annual flood peaks, historical events as well as measurement errors. The distribution of the parameters of the fitted distribution function and the confidence intervals of the flood quantiles are derived by means of the Markov chain Monte Carlo simulation (MCMC) technique. The paper presents a sensitivity analysis related to the choice of the most influential parameters of the statistical model, which are the length of the historical period h and the perception threshold X0. These are involved in the statistical model under the assumption that except for the events termed as ‘historical’ ones, none of the (unknown) peak discharges from the historical period h should have exceeded the threshold X0. Both higher values of h and lower values of X0 lead to narrower confidence intervals of the estimated flood quantiles; however, it is emphasized that one should be prudent of selecting those parameters, in order to avoid making inferences with wrong assumptions on the unknown hydrological events having occurred in the past. The Bayesian MCMC methodology is presented on the example of the maximum discharges observed during the warm half year at the station Vltava-Kamýk (Czech Republic) in the period 1877-2002. Although the 2002 flood peak, which is related to the vast flooding that affected a large part of Central Europe at that time, occurred in the near past, in the analysis it is treated virtually as a ‘historical’ event in order to illustrate some crucial aspects of including information on extreme historical floods into at-site flood frequency analyses.
A Bayesian approach to modeling diffraction profiles and application to ferroelectric materials

DOE PAGES

Iamsasri, Thanakorn; Guerrier, Jonathon; Esteves, Giovanni; ...

2017-02-01

A new statistical approach for modeling diffraction profiles is introduced, using Bayesian inference and a Markov chain Monte Carlo (MCMC) algorithm. This method is demonstrated by modeling the degenerate reflections during application of an electric field to two different ferroelectric materials: thin-film lead zirconate titanate (PZT) of composition PbZr 0.3Ti 0.7O 3and a bulk commercial PZT polycrystalline ferroelectric. Here, the new method offers a unique uncertainty quantification of the model parameters that can be readily propagated into new calculated parameters.
Bayesian inversion of surface-wave data for radial and azimuthal shear-wave anisotropy, with applications to central Mongolia and west-central Italy

NASA Astrophysics Data System (ADS)

Ravenna, Matteo; Lebedev, Sergei

2018-04-01

Seismic anisotropy provides important information on the deformation history of the Earth's interior. Rayleigh and Love surface-waves are sensitive to and can be used to determine both radial and azimuthal shear-wave anisotropies at depth, but parameter trade-offs give rise to substantial model non-uniqueness. Here, we explore the trade-offs between isotropic and anisotropic structure parameters and present a suite of methods for the inversion of surface-wave, phase-velocity curves for radial and azimuthal anisotropies. One Markov chain Monte Carlo (McMC) implementation inverts Rayleigh and Love dispersion curves for a radially anisotropic shear velocity profile of the crust and upper mantle. Another McMC implementation inverts Rayleigh phase velocities and their azimuthal anisotropy for profiles of vertically polarized shear velocity and its depth-dependent azimuthal anisotropy. The azimuthal anisotropy inversion is fully non-linear, with the forward problem solved numerically at different azimuths for every model realization, which ensures that any linearization biases are avoided. The computations are performed in parallel, in order to reduce the computing time. The often challenging issue of data noise estimation is addressed by means of a Hierarchical Bayesian approach, with the variance of the noise treated as an unknown during the radial anisotropy inversion. In addition to the McMC inversions, we also present faster, non-linear gradient-search inversions for the same anisotropic structure. The results of the two approaches are mutually consistent; the advantage of the McMC inversions is that they provide a measure of uncertainty of the models. Applying the method to broad-band data from the Baikal-central Mongolia region, we determine radial anisotropy from the crust down to the transition-zone depths. Robust negative anisotropy (Vsh < Vsv) in the asthenosphere, at 100-300 km depths, presents strong new evidence for a vertical component of asthenospheric flow. This is consistent with an upward flow from below the thick lithosphere of the Siberian Craton to below the thinner lithosphere of central Mongolia, likely to give rise to decompression melting and the scattered, sporadic volcanism observed in the Baikal Rift area, as proposed previously. Inversion of phase-velocity data from west-central Italy for azimuthal anisotropy reveals a clear change in the shear-wave fast-propagation direction at 70-100 km depths, near the lithosphere-asthenosphere boundary. The orientation of the fabric in the lithosphere is roughly E-W, parallel to the direction of stretching over the last 10 m.y. The orientation of the fabric in the asthenosphere is NW-SE, matching the fast directions inferred from shear-wave splitting and probably indicating the direction of the asthenospheric flow.
Regression without truth with Markov chain Monte-Carlo

NASA Astrophysics Data System (ADS)

Madan, Hennadii; Pernuš, Franjo; Likar, Boštjan; Å piclin, Žiga

2017-03-01

Regression without truth (RWT) is a statistical technique for estimating error model parameters of each method in a group of methods used for measurement of a certain quantity. A very attractive aspect of RWT is that it does not rely on a reference method or "gold standard" data, which is otherwise difficult RWT was used for a reference-free performance comparison of several methods for measuring left ventricular ejection fraction (EF), i.e. a percentage of blood leaving the ventricle each time the heart contracts, and has since been applied for various other quantitative imaging biomarkerss (QIBs). Herein, we show how Markov chain Monte-Carlo (MCMC), a computational technique for drawing samples from a statistical distribution with probability density function known only up to a normalizing coefficient, can be used to augment RWT to gain a number of important benefits compared to the original approach based on iterative optimization. For instance, the proposed MCMC-based RWT enables the estimation of joint posterior distribution of the parameters of the error model, straightforward quantification of uncertainty of the estimates, estimation of true value of the measurand and corresponding credible intervals (CIs), does not require a finite support for prior distribution of the measureand generally has a much improved robustness against convergence to non-global maxima. The proposed approach is validated using synthetic data that emulate the EF data for 45 patients measured with 8 different methods. The obtained results show that 90% CI of the corresponding parameter estimates contain the true values of all error model parameters and the measurand. A potential real-world application is to take measurements of a certain QIB several different methods and then use the proposed framework to compute the estimates of the true values and their uncertainty, a vital information for diagnosis based on QIB.
Comparison of sampling techniques for Bayesian parameter estimation

NASA Astrophysics Data System (ADS)

Allison, Rupert; Dunkley, Joanna

2014-02-01

The posterior probability distribution for a set of model parameters encodes all that the data have to tell us in the context of a given model; it is the fundamental quantity for Bayesian parameter estimation. In order to infer the posterior probability distribution we have to decide how to explore parameter space. Here we compare three prescriptions for how parameter space is navigated, discussing their relative merits. We consider Metropolis-Hasting sampling, nested sampling and affine-invariant ensemble Markov chain Monte Carlo (MCMC) sampling. We focus on their performance on toy-model Gaussian likelihoods and on a real-world cosmological data set. We outline the sampling algorithms themselves and elaborate on performance diagnostics such as convergence time, scope for parallelization, dimensional scaling, requisite tunings and suitability for non-Gaussian distributions. We find that nested sampling delivers high-fidelity estimates for posterior statistics at low computational cost, and should be adopted in favour of Metropolis-Hastings in many cases. Affine-invariant MCMC is competitive when computing clusters can be utilized for massive parallelization. Affine-invariant MCMC and existing extensions to nested sampling naturally probe multimodal and curving distributions.
Bayesian Population Genomic Inference of Crossing Over and Gene Conversion

PubMed Central

Padhukasahasram, Badri; Rannala, Bruce

2011-01-01

Meiotic recombination is a fundamental cellular mechanism in sexually reproducing organisms and its different forms, crossing over and gene conversion both play an important role in shaping genetic variation in populations. Here, we describe a coalescent-based full-likelihood Markov chain Monte Carlo (MCMC) method for jointly estimating the crossing-over, gene-conversion, and mean tract length parameters from population genomic data under a Bayesian framework. Although computationally more expensive than methods that use approximate likelihoods, the relative efficiency of our method is expected to be optimal in theory. Furthermore, it is also possible to obtain a posterior sample of genealogies for the data using this method. We first check the performance of the new method on simulated data and verify its correctness. We also extend the method for inference under models with variable gene-conversion and crossing-over rates and demonstrate its ability to identify recombination hotspots. Then, we apply the method to two empirical data sets that were sequenced in the telomeric regions of the X chromosome of Drosophila melanogaster. Our results indicate that gene conversion occurs more frequently than crossing over in the su-w and su-s gene sequences while the local rates of crossing over as inferred by our program are not low. The mean tract lengths for gene-conversion events are estimated to be ∼70 bp and 430 bp, respectively, for these data sets. Finally, we discuss ideas and optimizations for reducing the execution time of our algorithm. PMID:21840857
Approximate likelihood calculation on a phylogeny for Bayesian estimation of divergence times.

PubMed

dos Reis, Mario; Yang, Ziheng

2011-07-01

The molecular clock provides a powerful way to estimate species divergence times. If information on some species divergence times is available from the fossil or geological record, it can be used to calibrate a phylogeny and estimate divergence times for all nodes in the tree. The Bayesian method provides a natural framework to incorporate different sources of information concerning divergence times, such as information in the fossil and molecular data. Current models of sequence evolution are intractable in a Bayesian setting, and Markov chain Monte Carlo (MCMC) is used to generate the posterior distribution of divergence times and evolutionary rates. This method is computationally expensive, as it involves the repeated calculation of the likelihood function. Here, we explore the use of Taylor expansion to approximate the likelihood during MCMC iteration. The approximation is much faster than conventional likelihood calculation. However, the approximation is expected to be poor when the proposed parameters are far from the likelihood peak. We explore the use of parameter transforms (square root, logarithm, and arcsine) to improve the approximation to the likelihood curve. We found that the new methods, particularly the arcsine-based transform, provided very good approximations under relaxed clock models and also under the global clock model when the global clock is not seriously violated. The approximation is poorer for analysis under the global clock when the global clock is seriously wrong and should thus not be used. The results suggest that the approximate method may be useful for Bayesian dating analysis using large data sets.
Emulation of simulations of atmospheric dispersion at Fukushima for Sobol' sensitivity analysis

NASA Astrophysics Data System (ADS)

Girard, Sylvain; Korsakissok, Irène; Mallet, Vivien

2015-04-01

Polyphemus/Polair3D, from which derives IRSN's operational model ldX, was used to simulate the atmospheric dispersion at the Japan scale of radionuclides after the Fukushima disaster. A previous study with the screening method of Morris had shown that - The sensitivities depend a lot on the considered output; - Only a few of the inputs are non-influential on all considered outputs; - Most influential inputs have either non-linear effects or are interacting. These preliminary results called for a more detailed sensitivity analysis, especially regarding the characterization of interactions. The method of Sobol' allows for a precise evaluation of interactions but requires large simulation samples. Gaussian process emulators for each considered outputs were built in order to relieve this computational burden. Globally aggregated outputs proved to be easy to emulate with high accuracy, and associated Sobol' indices are in broad agreement with previous results obtained with the Morris method. More localized outputs, such as temporal averages of gamma dose rates at measurement stations, resulted in lesser emulator performances: tests simulations could not satisfactorily be reproduced by some emulators. These outputs are of special interest because they can be compared to available observations, for instance for calibration purpose. A thorough inspection of prediction residuals hinted that the model response to wind perturbations often behaved in very distinct regimes relatively to some thresholds. Complementing the initial sample with wind perturbations set to the extreme values allowed for sensible improvement of some of the emulators while other remained too unreliable to be used in a sensitivity analysis. Adaptive sampling or regime-wise emulation could be tried to circumvent this issue. Sobol' indices for local outputs revealed interesting patterns, mostly dominated by the winds, with very high interactions. The emulators will be useful for subsequent studies. Indeed, our goal is to characterize the model output uncertainty but too little information is available about input uncertainties. Hence, calibration of the input distributions with observation and a Bayesian approach seem necessary. This would probably involve methods such as MCMC which would be intractable without emulators.
Bayesian calibration of terrestrial ecosystem models: a study of advanced Markov chain Monte Carlo methods

NASA Astrophysics Data System (ADS)

Lu, Dan; Ricciuto, Daniel; Walker, Anthony; Safta, Cosmin; Munger, William

2017-09-01

Calibration of terrestrial ecosystem models is important but challenging. Bayesian inference implemented by Markov chain Monte Carlo (MCMC) sampling provides a comprehensive framework to estimate model parameters and associated uncertainties using their posterior distributions. The effectiveness and efficiency of the method strongly depend on the MCMC algorithm used. In this work, a differential evolution adaptive Metropolis (DREAM) algorithm is used to estimate posterior distributions of 21 parameters for the data assimilation linked ecosystem carbon (DALEC) model using 14 years of daily net ecosystem exchange data collected at the Harvard Forest Environmental Measurement Site eddy-flux tower. The calibration of DREAM results in a better model fit and predictive performance compared to the popular adaptive Metropolis (AM) scheme. Moreover, DREAM indicates that two parameters controlling autumn phenology have multiple modes in their posterior distributions while AM only identifies one mode. The application suggests that DREAM is very suitable to calibrate complex terrestrial ecosystem models, where the uncertain parameter size is usually large and existence of local optima is always a concern. In addition, this effort justifies the assumptions of the error model used in Bayesian calibration according to the residual analysis. The result indicates that a heteroscedastic, correlated, Gaussian error model is appropriate for the problem, and the consequent constructed likelihood function can alleviate the underestimation of parameter uncertainty that is usually caused by using uncorrelated error models.

User's guide for mapIMG 3--Map image re-projection software package

USGS Publications Warehouse

Finn, Michael P.; Mattli, David M.

2012-01-01

Version 0.0 (1995), Dan Steinwand, U.S. Geological Survey (USGS)/Earth Resources Observation Systems (EROS) Data Center (EDC)--Version 0.0 was a command line version for UNIX that required four arguments: the input metadata, the output metadata, the input data file, and the output destination path. Version 1.0 (2003), Stephen Posch and Michael P. Finn, USGS/Mid-Continent Mapping Center (MCMC--Version 1.0 added a GUI interface that was built using the Qt library for cross platform development. Version 1.01 (2004), Jason Trent and Michael P. Finn, USGS/MCMC--Version 1.01 suggested bounds for the parameters of each projection. Support was added for larger input files, storage of the last used input and output folders, and for TIFF/ GeoTIFF input images. Version 2.0 (2005), Robert Buehler, Jason Trent, and Michael P. Finn, USGS/National Geospatial Technical Operations Center (NGTOC)--Version 2.0 added Resampling Methods (Mean, Mode, Min, Max, and Sum), updated the GUI design, and added the viewer/pre-viewer. The metadata style was changed to XML and was switched to a new naming convention. Version 3.0 (2009), David Mattli and Michael P. Finn, USGS/Center of Excellence for Geospatial Information Science (CEGIS)--Version 3.0 brings optimized resampling methods, an updated GUI, support for less than global datasets, UTM support and the whole codebase was ported to Qt4.
The Efficacy of Consensus Tree Methods for Summarizing Phylogenetic Relationships from a Posterior Sample of Trees Estimated from Morphological Data.

PubMed

O'Reilly, Joseph E; Donoghue, Philip C J

2018-03-01

Consensus trees are required to summarize trees obtained through MCMC sampling of a posterior distribution, providing an overview of the distribution of estimated parameters such as topology, branch lengths, and divergence times. Numerous consensus tree construction methods are available, each presenting a different interpretation of the tree sample. The rise of morphological clock and sampled-ancestor methods of divergence time estimation, in which times and topology are coestimated, has increased the popularity of the maximum clade credibility (MCC) consensus tree method. The MCC method assumes that the sampled, fully resolved topology with the highest clade credibility is an adequate summary of the most probable clades, with parameter estimates from compatible sampled trees used to obtain the marginal distributions of parameters such as clade ages and branch lengths. Using both simulated and empirical data, we demonstrate that MCC trees, and trees constructed using the similar maximum a posteriori (MAP) method, often include poorly supported and incorrect clades when summarizing diffuse posterior samples of trees. We demonstrate that the paucity of information in morphological data sets contributes to the inability of MCC and MAP trees to accurately summarise of the posterior distribution. Conversely, majority-rule consensus (MRC) trees represent a lower proportion of incorrect nodes when summarizing the same posterior samples of trees. Thus, we advocate the use of MRC trees, in place of MCC or MAP trees, in attempts to summarize the results of Bayesian phylogenetic analyses of morphological data.
The Efficacy of Consensus Tree Methods for Summarizing Phylogenetic Relationships from a Posterior Sample of Trees Estimated from Morphological Data

PubMed Central

O’Reilly, Joseph E; Donoghue, Philip C J

2018-01-01

Abstract Consensus trees are required to summarize trees obtained through MCMC sampling of a posterior distribution, providing an overview of the distribution of estimated parameters such as topology, branch lengths, and divergence times. Numerous consensus tree construction methods are available, each presenting a different interpretation of the tree sample. The rise of morphological clock and sampled-ancestor methods of divergence time estimation, in which times and topology are coestimated, has increased the popularity of the maximum clade credibility (MCC) consensus tree method. The MCC method assumes that the sampled, fully resolved topology with the highest clade credibility is an adequate summary of the most probable clades, with parameter estimates from compatible sampled trees used to obtain the marginal distributions of parameters such as clade ages and branch lengths. Using both simulated and empirical data, we demonstrate that MCC trees, and trees constructed using the similar maximum a posteriori (MAP) method, often include poorly supported and incorrect clades when summarizing diffuse posterior samples of trees. We demonstrate that the paucity of information in morphological data sets contributes to the inability of MCC and MAP trees to accurately summarise of the posterior distribution. Conversely, majority-rule consensus (MRC) trees represent a lower proportion of incorrect nodes when summarizing the same posterior samples of trees. Thus, we advocate the use of MRC trees, in place of MCC or MAP trees, in attempts to summarize the results of Bayesian phylogenetic analyses of morphological data. PMID:29106675
No control genes required: Bayesian analysis of qRT-PCR data.

PubMed

Matz, Mikhail V; Wright, Rachel M; Scott, James G

2013-01-01

Model-based analysis of data from quantitative reverse-transcription PCR (qRT-PCR) is potentially more powerful and versatile than traditional methods. Yet existing model-based approaches cannot properly deal with the higher sampling variances associated with low-abundant targets, nor do they provide a natural way to incorporate assumptions about the stability of control genes directly into the model-fitting process. In our method, raw qPCR data are represented as molecule counts, and described using generalized linear mixed models under Poisson-lognormal error. A Markov Chain Monte Carlo (MCMC) algorithm is used to sample from the joint posterior distribution over all model parameters, thereby estimating the effects of all experimental factors on the expression of every gene. The Poisson-based model allows for the correct specification of the mean-variance relationship of the PCR amplification process, and can also glean information from instances of no amplification (zero counts). Our method is very flexible with respect to control genes: any prior knowledge about the expected degree of their stability can be directly incorporated into the model. Yet the method provides sensible answers without such assumptions, or even in the complete absence of control genes. We also present a natural Bayesian analogue of the "classic" analysis, which uses standard data pre-processing steps (logarithmic transformation and multi-gene normalization) but estimates all gene expression changes jointly within a single model. The new methods are considerably more flexible and powerful than the standard delta-delta Ct analysis based on pairwise t-tests. Our methodology expands the applicability of the relative-quantification analysis protocol all the way to the lowest-abundance targets, and provides a novel opportunity to analyze qRT-PCR data without making any assumptions concerning target stability. These procedures have been implemented as the MCMC.qpcr package in R.
Forward and Inverse Modeling of Self-potential. A Tomography of Groundwater Flow and Comparison Between Deterministic and Stochastic Inversion Methods

NASA Astrophysics Data System (ADS)

Quintero-Chavarria, E.; Ochoa Gutierrez, L. H.

2016-12-01

Applications of the Self-potential Method in the fields of Hydrogeology and Environmental Sciences have had significant developments during the last two decades with a strong use on groundwater flows identification. Although only few authors deal with the forward problem's solution -especially in geophysics literature- different inversion procedures are currently being developed but in most cases they are compared with unconventional groundwater velocity fields and restricted to structured meshes. This research solves the forward problem based on the finite element method using the St. Venant's Principle to transform a point dipole, which is the field generated by a single vector, into a distribution of electrical monopoles. Then, two simple aquifer models were generated with specific boundary conditions and head potentials, velocity fields and electric potentials in the medium were computed. With the model's surface electric potential, the inverse problem is solved to retrieve the source of electric potential (vector field associated to groundwater flow) using deterministic and stochastic approaches. The first approach was carried out by implementing a Tikhonov regularization with a stabilized operator adapted to the finite element mesh while for the second a hierarchical Bayesian model based on Markov chain Monte Carlo (McMC) and Markov Random Fields (MRF) was constructed. For all implemented methods, the result between the direct and inverse models was contrasted in two ways: 1) shape and distribution of the vector field, and 2) magnitude's histogram. Finally, it was concluded that inversion procedures are improved when the velocity field's behavior is considered, thus, the deterministic method is more suitable for unconfined aquifers than confined ones. McMC has restricted applications and requires a lot of information (particularly in potentials fields) while MRF has a remarkable response especially when dealing with confined aquifers.
magicaxis: Pretty scientific plotting with minor-tick and log minor-tick support

NASA Astrophysics Data System (ADS)

Robotham, Aaron S. G.

2016-04-01

The R suite magicaxis makes useful and pretty plots for scientific plotting and includes functions for base plotting, with particular emphasis on pretty axis labelling in a number of circumstances that are often used in scientific plotting. It also includes functions for generating images and contours that reflect the 2D quantile levels of the data designed particularly for output of MCMC posteriors where visualizing the location of the 68% and 95% 2D quantiles for covariant parameters is a necessary part of the post MCMC analysis, can generate low and high error bars, and allows clipping of values, rejection of bad values, and log stretching.
MontePython 3: Parameter inference code for cosmology

NASA Astrophysics Data System (ADS)

Brinckmann, Thejs; Lesgourgues, Julien; Audren, Benjamin; Benabed, Karim; Prunet, Simon

2018-05-01

MontePython 3 provides numerous ways to explore parameter space using Monte Carlo Markov Chain (MCMC) sampling, including Metropolis-Hastings, Nested Sampling, Cosmo Hammer, and a Fisher sampling method. This improved version of the Monte Python (ascl:1307.002) parameter inference code for cosmology offers new ingredients that improve the performance of Metropolis-Hastings sampling, speeding up convergence and offering significant time improvement in difficult runs. Additional likelihoods and plotting options are available, as are post-processing algorithms such as Importance Sampling and Adding Derived Parameter.
The Full Monte Carlo: A Live Performance with Stars

NASA Astrophysics Data System (ADS)

Meng, Xiao-Li

2014-06-01

Markov chain Monte Carlo (MCMC) is being applied increasingly often in modern Astrostatistics. It is indeed incredibly powerful, but also very dangerous. It is popular because of its apparent generality (from simple to highly complex problems) and simplicity (the availability of out-of-the-box recipes). It is dangerous because it always produces something but there is no surefire way to verify or even diagnosis that the “something” is remotely close to what the MCMC theory predicts or one hopes. Using very simple models (e.g., conditionally Gaussian), this talk starts with a tutorial of the two most popular MCMC algorithms, namely, the Gibbs Sampler and the Metropolis-Hasting Algorithm, and illustratestheir good, bad, and ugly implementations via live demonstration. The talk ends with a story of how a recent advance, the Ancillary-Sufficient Interweaving Strategy (ASIS) (Yu and Meng, 2011, http://www.stat.harvard.edu/Faculty_Content/meng/jcgs.2011-article.pdf)reduces the danger. It was discovered almost by accident during a Ph.D. student’s (Yaming Yu) struggle with fitting a Cox process model for detecting changes in source intensity of photon counts observed by the Chandra X-ray telescope from a (candidate) neutron/quark star.
Modeling Well Sampled Composite Spectral Energy Distributions of Distant Galaxies via an MCMC-driven Inference Framework

NASA Astrophysics Data System (ADS)

Pasha, Imad; Kriek, Mariska; Johnson, Benjamin; Conroy, Charlie

2018-01-01

Using a novel, MCMC-driven inference framework, we have modeled the stellar and dust emission of 32 composite spectral energy distributions (SEDs), which span from the near-ultraviolet (NUV) to far infrared (FIR). The composite SEDs were originally constructed in a previous work from the photometric catalogs of the NEWFIRM Medium-Band Survey, in which SEDs of individual galaxies at 0.5 < z < 2.0 were iteratively matched and sorted into types based on their rest-frame UV-to-NIR photometry. In a subsequent work, MIPS 24 μm was added for each SED type, and in this work, PACS 100 μm, PACS160 μm, SPIRE 25 μm, and SPIRE 350 μm photometry have been added to extend the range of the composite SEDs into the FIR. We fit the composite SEDs with the Prospector code, which utilizes an MCMC sampling to explore the parameter space for models created by the Flexible Stellar Population Synthesis (FSPS) code, in order to investigate how specific star formation rate (sSFR), dust temperature, and other galaxy properties vary with SED type.This work is also being used to better constrain the SPS models within FSPS.
RECONSTRUCTING THREE-DIMENSIONAL JET GEOMETRY FROM TWO-DIMENSIONAL IMAGES

NASA Astrophysics Data System (ADS)

Avachat, Sayali; Perlman, Eric S.; Li, Kunyang; Kosak, Katie

2018-01-01

Relativistic jets in AGN are one of the most interesting and complex structures in the Universe. Some of the jets can be spread over hundreds of kilo parsecs from the central engine and display various bends, knots and hotspots. Observations of the jets can prove helpful in understanding the emission and particle acceleration processes from sub-arcsec to kilo parsec scales and the role of magnetic field in it. The M87 jet has many bright knots as well as regions of small and large bends. We attempt to model the jet geometry using the observed 2 dimensional structure. The radio and optical images of the jet show evidence of presence of helical magnetic field throughout. Using the observed structure in the sky frame, our goal is to gain an insight into the intrinsic 3 dimensional geometry in the jets frame. The structure of the bends in jet's frame may be quite different than what we see in the sky frame. The knowledge of the intrinsic structure will be helpful in understanding the appearance of the magnetic field and hence polarization morphology. To achieve this, we are using numerical methods to solve the non-linear equations based on the jet geometry. We are using the Log Likelihood method and algorithm based on Markov Chain Monte Carlo (MCMC) simulations.
On the estimation of the reproduction number based on misreported epidemic data.

PubMed

Azmon, Amin; Faes, Christel; Hens, Niel

2014-03-30

Epidemic data often suffer from underreporting and delay in reporting. In this paper, we investigated the impact of delays and underreporting on estimates of reproduction number. We used a thinned version of the epidemic renewal equation to describe the epidemic process while accounting for the underlying reporting system. Assuming a constant reporting parameter, we used different delay patterns to represent the delay structure in our model. Instead of assuming a fixed delay distribution, we estimated the delay parameters while assuming a smooth function for the reproduction number over time. In order to estimate the parameters, we used a Bayesian semiparametric approach with penalized splines, allowing both flexibility and exact inference provided by MCMC. To show the performance of our method, we performed different simulation studies. We conducted sensitivity analyses to investigate the impact of misspecification of the delay pattern and the impact of assuming nonconstant reporting parameters on the estimates of the reproduction numbers. We showed that, whenever available, additional information about time-dependent underreporting can be taken into account. As an application of our method, we analyzed confirmed daily A(H1N1) v2009 cases made publicly available by the World Health Organization for Mexico and the USA. Copyright © 2013 John Wiley & Sons, Ltd.
Retrievals of abundances of hydrocarbon and nitrile species in Titan’s upper atmosphere

NASA Astrophysics Data System (ADS)

Yung, Yuk; Fan, Siteng; Shemansky, D. E.; Li, Cheng; Gao, Peter

2017-10-01

We develop an innovative retrieval method for Titan occultation measurements by the Cassini UVIS experiment. The T35 occultation is analyzed to illustrate the methodology. A significant number of occultations observed using the UVIS spectrographs show loss of pointing control required for correction of the spectral vectors. Consequently, only three stellar occultations have been analyzed to date. We use the Markov Chain Monte-Carlo (MCMC) method to retrieve the abundances or upper limits of thirteen hydrocarbon and nitrile species (N2, CH4, C2H2, C2H4, C2H6, HCN, C4H2, C6N2, C6H6, tholin, HC3N, C2N2, NH3) along with the pointing error using the Cassini/UVIS simulator. These numbers are derived for the fast T35 occultation, which has never been analyzed because of large pointing errors. Uncertainty in the retrievals is determined using an intrinsic fitting probability distribution function. The Caltech/JPL photochemical and kinetics model, KINETICS, is used to calculate the atmospheric aforementioned species. Comparisons between model and observations reveal gaps in our current understanding of the chemical kinetics of hydrocarbons and nitrile species, especially for C6H6.
SAChES: Scalable Adaptive Chain-Ensemble Sampling.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Swiler, Laura Painton; Ray, Jaideep; Ebeida, Mohamed Salah

We present the development of a parallel Markov Chain Monte Carlo (MCMC) method called SAChES, Scalable Adaptive Chain-Ensemble Sampling. This capability is targed to Bayesian calibration of com- putationally expensive simulation models. SAChES involves a hybrid of two methods: Differential Evo- lution Monte Carlo followed by Adaptive Metropolis. Both methods involve parallel chains. Differential evolution allows one to explore high-dimensional parameter spaces using loosely coupled (i.e., largely asynchronous) chains. Loose coupling allows the use of large chain ensembles, with far more chains than the number of parameters to explore. This reduces per-chain sampling burden, enables high-dimensional inversions and the usemore » of computationally expensive forward models. The large number of chains can also ameliorate the impact of silent-errors, which may affect only a few chains. The chain ensemble can also be sampled to provide an initial condition when an aberrant chain is re-spawned. Adaptive Metropolis takes the best points from the differential evolution and efficiently hones in on the poste- rior density. The multitude of chains in SAChES is leveraged to (1) enable efficient exploration of the parameter space; and (2) ensure robustness to silent errors which may be unavoidable in extreme-scale computational platforms of the future. This report outlines SAChES, describes four papers that are the result of the project, and discusses some additional results.« less
Comprehensive cosmographic analysis by Markov chain method

NASA Astrophysics Data System (ADS)

Capozziello, S.; Lazkoz, R.; Salzano, V.

2011-12-01

We study the possibility of extracting model independent information about the dynamics of the Universe by using cosmography. We intend to explore it systematically, to learn about its limitations and its real possibilities. Here we are sticking to the series expansion approach on which cosmography is based. We apply it to different data sets: Supernovae type Ia (SNeIa), Hubble parameter extracted from differential galaxy ages, gamma ray bursts, and the baryon acoustic oscillations data. We go beyond past results in the literature extending the series expansion up to the fourth order in the scale factor, which implies the analysis of the deceleration q0, the jerk j0, and the snap s0. We use the Markov chain Monte Carlo method (MCMC) to analyze the data statistically. We also try to relate direct results from cosmography to dark energy (DE) dynamical models parametrized by the Chevallier-Polarski-Linder model, extracting clues about the matter content and the dark energy parameters. The main results are: (a) even if relying on a mathematical approximate assumption such as the scale factor series expansion in terms of time, cosmography can be extremely useful in assessing dynamical properties of the Universe; (b) the deceleration parameter clearly confirms the present acceleration phase; (c) the MCMC method can help giving narrower constraints in parameter estimation, in particular for higher order cosmographic parameters (the jerk and the snap), with respect to the literature; and (d) both the estimation of the jerk and the DE parameters reflect the possibility of a deviation from the ΛCDM cosmological model.
On the applicability of surrogate-based Markov chain Monte Carlo-Bayesian inversion to the Community Land Model: Case studies at flux tower sites: SURROGATE-BASED MCMC FOR CLM

DOE Office of Scientific and Technical Information (OSTI.GOV)

Huang, Maoyi; Ray, Jaideep; Hou, Zhangshuan

2016-07-04

The Community Land Model (CLM) has been widely used in climate and Earth system modeling. Accurate estimation of model parameters is needed for reliable model simulations and predictions under current and future conditions, respectively. In our previous work, a subset of hydrological parameters has been identified to have significant impact on surface energy fluxes at selected flux tower sites based on parameter screening and sensitivity analysis, which indicate that the parameters could potentially be estimated from surface flux observations at the towers. To date, such estimates do not exist. In this paper, we assess the feasibility of applying a Bayesianmore » model calibration technique to estimate CLM parameters at selected flux tower sites under various site conditions. The parameters are estimated as a joint probability density function (PDF) that provides estimates of uncertainty of the parameters being inverted, conditional on climatologically-average latent heat fluxes derived from observations. We find that the simulated mean latent heat fluxes from CLM using the calibrated parameters are generally improved at all sites when compared to those obtained with CLM simulations using default parameter sets. Further, our calibration method also results in credibility bounds around the simulated mean fluxes which bracket the measured data. The modes (or maximum a posteriori values) and 95% credibility intervals of the site-specific posterior PDFs are tabulated as suggested parameter values for each site. Analysis of relationships between the posterior PDFs and site conditions suggests that the parameter values are likely correlated with the plant functional type, which needs to be confirmed in future studies by extending the approach to more sites.« less
On the applicability of surrogate-based MCMC-Bayesian inversion to the Community Land Model: Case studies at Flux tower sites

DOE PAGES

Huang, Maoyi; Ray, Jaideep; Hou, Zhangshuan; ...

2016-06-01

The Community Land Model (CLM) has been widely used in climate and Earth system modeling. Accurate estimation of model parameters is needed for reliable model simulations and predictions under current and future conditions, respectively. In our previous work, a subset of hydrological parameters has been identified to have significant impact on surface energy fluxes at selected flux tower sites based on parameter screening and sensitivity analysis, which indicate that the parameters could potentially be estimated from surface flux observations at the towers. To date, such estimates do not exist. In this paper, we assess the feasibility of applying a Bayesianmore » model calibration technique to estimate CLM parameters at selected flux tower sites under various site conditions. The parameters are estimated as a joint probability density function (PDF) that provides estimates of uncertainty of the parameters being inverted, conditional on climatologically average latent heat fluxes derived from observations. We find that the simulated mean latent heat fluxes from CLM using the calibrated parameters are generally improved at all sites when compared to those obtained with CLM simulations using default parameter sets. Further, our calibration method also results in credibility bounds around the simulated mean fluxes which bracket the measured data. The modes (or maximum a posteriori values) and 95% credibility intervals of the site-specific posterior PDFs are tabulated as suggested parameter values for each site. As a result, analysis of relationships between the posterior PDFs and site conditions suggests that the parameter values are likely correlated with the plant functional type, which needs to be confirmed in future studies by extending the approach to more sites.« less
On the applicability of surrogate-based MCMC-Bayesian inversion to the Community Land Model: Case studies at Flux tower sites

DOE Office of Scientific and Technical Information (OSTI.GOV)

Huang, Maoyi; Ray, Jaideep; Hou, Zhangshuan

The Community Land Model (CLM) has been widely used in climate and Earth system modeling. Accurate estimation of model parameters is needed for reliable model simulations and predictions under current and future conditions, respectively. In our previous work, a subset of hydrological parameters has been identified to have significant impact on surface energy fluxes at selected flux tower sites based on parameter screening and sensitivity analysis, which indicate that the parameters could potentially be estimated from surface flux observations at the towers. To date, such estimates do not exist. In this paper, we assess the feasibility of applying a Bayesianmore » model calibration technique to estimate CLM parameters at selected flux tower sites under various site conditions. The parameters are estimated as a joint probability density function (PDF) that provides estimates of uncertainty of the parameters being inverted, conditional on climatologically average latent heat fluxes derived from observations. We find that the simulated mean latent heat fluxes from CLM using the calibrated parameters are generally improved at all sites when compared to those obtained with CLM simulations using default parameter sets. Further, our calibration method also results in credibility bounds around the simulated mean fluxes which bracket the measured data. The modes (or maximum a posteriori values) and 95% credibility intervals of the site-specific posterior PDFs are tabulated as suggested parameter values for each site. As a result, analysis of relationships between the posterior PDFs and site conditions suggests that the parameter values are likely correlated with the plant functional type, which needs to be confirmed in future studies by extending the approach to more sites.« less
A Bayesian connectivity-based approach to constructing probabilistic gene regulatory networks.

PubMed

Zhou, Xiaobo; Wang, Xiaodong; Pal, Ranadip; Ivanov, Ivan; Bittner, Michael; Dougherty, Edward R

2004-11-22

We have hypothesized that the construction of transcriptional regulatory networks using a method that optimizes connectivity would lead to regulation consistent with biological expectations. A key expectation is that the hypothetical networks should produce a few, very strong attractors, highly similar to the original observations, mimicking biological state stability and determinism. Another central expectation is that, since it is expected that the biological control is distributed and mutually reinforcing, interpretation of the observations should lead to a very small number of connection schemes. We propose a fully Bayesian approach to constructing probabilistic gene regulatory networks (PGRNs) that emphasizes network topology. The method computes the possible parent sets of each gene, the corresponding predictors and the associated probabilities based on a nonlinear perceptron model, using a reversible jump Markov chain Monte Carlo (MCMC) technique, and an MCMC method is employed to search the network configurations to find those with the highest Bayesian scores to construct the PGRN. The Bayesian method has been used to construct a PGRN based on the observed behavior of a set of genes whose expression patterns vary across a set of melanoma samples exhibiting two very different phenotypes with respect to cell motility and invasiveness. Key biological features have been faithfully reflected in the model. Its steady-state distribution contains attractors that are either identical or very similar to the states observed in the data, and many of the attractors are singletons, which mimics the biological propensity to stably occupy a given state. Most interestingly, the connectivity rules for the most optimal generated networks constituting the PGRN are remarkably similar, as would be expected for a network operating on a distributed basis, with strong interactions between the components.
Probabilistic Magnetotelluric Inversion with Adaptive Regularisation Using the No-U-Turns Sampler

NASA Astrophysics Data System (ADS)

Conway, Dennis; Simpson, Janelle; Didana, Yohannes; Rugari, Joseph; Heinson, Graham

2018-04-01

We present the first inversion of magnetotelluric (MT) data using a Hamiltonian Monte Carlo algorithm. The inversion of MT data is an underdetermined problem which leads to an ensemble of feasible models for a given dataset. A standard approach in MT inversion is to perform a deterministic search for the single solution which is maximally smooth for a given data-fit threshold. An alternative approach is to use Markov Chain Monte Carlo (MCMC) methods, which have been used in MT inversion to explore the entire solution space and produce a suite of likely models. This approach has the advantage of assigning confidence to resistivity models, leading to better geological interpretations. Recent advances in MCMC techniques include the No-U-Turns Sampler (NUTS), an efficient and rapidly converging method which is based on Hamiltonian Monte Carlo. We have implemented a 1D MT inversion which uses the NUTS algorithm. Our model includes a fixed number of layers of variable thickness and resistivity, as well as probabilistic smoothing constraints which allow sharp and smooth transitions. We present the results of a synthetic study and show the accuracy of the technique, as well as the fast convergence, independence of starting models, and sampling efficiency. Finally, we test our technique on MT data collected from a site in Boulia, Queensland, Australia to show its utility in geological interpretation and ability to provide probabilistic estimates of features such as depth to basement.
Bayesian calibration of terrestrial ecosystem models: a study of advanced Markov chain Monte Carlo methods

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lu, Dan; Ricciuto, Daniel M.; Walker, Anthony P.

Calibration of terrestrial ecosystem models is important but challenging. Bayesian inference implemented by Markov chain Monte Carlo (MCMC) sampling provides a comprehensive framework to estimate model parameters and associated uncertainties using their posterior distributions. The effectiveness and efficiency of the method strongly depend on the MCMC algorithm used. In this work, a differential evolution adaptive Metropolis (DREAM) algorithm is used to estimate posterior distributions of 21 parameters for the data assimilation linked ecosystem carbon (DALEC) model using 14 years of daily net ecosystem exchange data collected at the Harvard Forest Environmental Measurement Site eddy-flux tower. The calibration of DREAM results inmore » a better model fit and predictive performance compared to the popular adaptive Metropolis (AM) scheme. Moreover, DREAM indicates that two parameters controlling autumn phenology have multiple modes in their posterior distributions while AM only identifies one mode. The application suggests that DREAM is very suitable to calibrate complex terrestrial ecosystem models, where the uncertain parameter size is usually large and existence of local optima is always a concern. In addition, this effort justifies the assumptions of the error model used in Bayesian calibration according to the residual analysis. Here, the result indicates that a heteroscedastic, correlated, Gaussian error model is appropriate for the problem, and the consequent constructed likelihood function can alleviate the underestimation of parameter uncertainty that is usually caused by using uncorrelated error models.« less

Bayesian calibration of terrestrial ecosystem models: a study of advanced Markov chain Monte Carlo methods

DOE PAGES

Lu, Dan; Ricciuto, Daniel M.; Walker, Anthony P.; ...

2017-09-27

Calibration of terrestrial ecosystem models is important but challenging. Bayesian inference implemented by Markov chain Monte Carlo (MCMC) sampling provides a comprehensive framework to estimate model parameters and associated uncertainties using their posterior distributions. The effectiveness and efficiency of the method strongly depend on the MCMC algorithm used. In this work, a differential evolution adaptive Metropolis (DREAM) algorithm is used to estimate posterior distributions of 21 parameters for the data assimilation linked ecosystem carbon (DALEC) model using 14 years of daily net ecosystem exchange data collected at the Harvard Forest Environmental Measurement Site eddy-flux tower. The calibration of DREAM results inmore » a better model fit and predictive performance compared to the popular adaptive Metropolis (AM) scheme. Moreover, DREAM indicates that two parameters controlling autumn phenology have multiple modes in their posterior distributions while AM only identifies one mode. The application suggests that DREAM is very suitable to calibrate complex terrestrial ecosystem models, where the uncertain parameter size is usually large and existence of local optima is always a concern. In addition, this effort justifies the assumptions of the error model used in Bayesian calibration according to the residual analysis. Here, the result indicates that a heteroscedastic, correlated, Gaussian error model is appropriate for the problem, and the consequent constructed likelihood function can alleviate the underestimation of parameter uncertainty that is usually caused by using uncorrelated error models.« less
Multiple-Event Seismic Location Using the Markov-Chain Monte Carlo Technique

NASA Astrophysics Data System (ADS)

Myers, S. C.; Johannesson, G.; Hanley, W.

2005-12-01

We develop a new multiple-event location algorithm (MCMCloc) that utilizes the Markov-Chain Monte Carlo (MCMC) method. Unlike most inverse methods, the MCMC approach produces a suite of solutions, each of which is consistent with observations and prior estimates of data and model uncertainties. Model parameters in MCMCloc consist of event hypocenters, and travel-time predictions. Data are arrival time measurements and phase assignments. Posteriori estimates of event locations, path corrections, pick errors, and phase assignments are made through analysis of the posteriori suite of acceptable solutions. Prior uncertainty estimates include correlations between travel-time predictions, correlations between measurement errors, the probability of misidentifying one phase for another, and the probability of spurious data. Inclusion of prior constraints on location accuracy allows direct utilization of ground-truth locations or well-constrained location parameters (e.g. from InSAR) that aid in the accuracy of the solution. Implementation of a correlation structure for travel-time predictions allows MCMCloc to operate over arbitrarily large geographic areas. Transition in behavior between a multiple-event locator for tightly clustered events and a single-event locator for solitary events is controlled by the spatial correlation of travel-time predictions. We test the MCMC locator on a regional data set of Nevada Test Site nuclear explosions. Event locations and origin times are known for these events, allowing us to test the features of MCMCloc using a high-quality ground truth data set. Preliminary tests suggest that MCMCloc provides excellent relative locations, often outperforming traditional multiple-event location algorithms, and excellent absolute locations are attained when constraints from one or more ground truth event are included. When phase assignments are switched, we find that MCMCloc properly corrects the error when predicted arrival times are separated by several seconds. In cases where the predicted arrival times are within the combined uncertainty of prediction and measurement errors, MCMCloc determines the probability of one or the other phase assignment and propagates this uncertainty into all model parameters. We find that MCMCloc is a promising method for simultaneously locating large, geographically distributed data sets. Because we incorporate prior knowledge on many parameters, MCMCloc is ideal for combining trusted data with data of unknown reliability. This work was performed under the auspices of the U.S. Department of Energy by the University of California Lawrence Livermore National Laboratory under contract No. W-7405-Eng-48, Contribution UCRL-ABS-215048
Near-optimal alternative generation using modified hit-and-run sampling for non-linear, non-convex problems

NASA Astrophysics Data System (ADS)

Rosenberg, D. E.; Alafifi, A.

2016-12-01

Water resources systems analysis often focuses on finding optimal solutions. Yet an optimal solution is optimal only for the modelled issues and managers often seek near-optimal alternatives that address un-modelled objectives, preferences, limits, uncertainties, and other issues. Early on, Modelling to Generate Alternatives (MGA) formalized near-optimal as the region comprising the original problem constraints plus a new constraint that allowed performance within a specified tolerance of the optimal objective function value. MGA identified a few maximally-different alternatives from the near-optimal region. Subsequent work applied Markov Chain Monte Carlo (MCMC) sampling to generate a larger number of alternatives that span the near-optimal region of linear problems or select portions for non-linear problems. We extend the MCMC Hit-And-Run method to generate alternatives that span the full extent of the near-optimal region for non-linear, non-convex problems. First, start at a feasible hit point within the near-optimal region, then run a random distance in a random direction to a new hit point. Next, repeat until generating the desired number of alternatives. The key step at each iterate is to run a random distance along the line in the specified direction to a new hit point. If linear equity constraints exist, we construct an orthogonal basis and use a null space transformation to confine hits and runs to a lower-dimensional space. Linear inequity constraints define the convex bounds on the line that runs through the current hit point in the specified direction. We then use slice sampling to identify a new hit point along the line within bounds defined by the non-linear inequity constraints. This technique is computationally efficient compared to prior near-optimal alternative generation techniques such MGA, MCMC Metropolis-Hastings, evolutionary, or firefly algorithms because search at each iteration is confined to the hit line, the algorithm can move in one step to any point in the near-optimal region, and each iterate generates a new, feasible alternative. We use the method to generate alternatives that span the near-optimal regions of simple and more complicated water management problems and may be preferred to optimal solutions. We also discuss extensions to handle non-linear equity constraints.
Probabilistic Damage Characterization Using the Computationally-Efficient Bayesian Approach

NASA Technical Reports Server (NTRS)

Warner, James E.; Hochhalter, Jacob D.

2016-01-01

This work presents a computationally-ecient approach for damage determination that quanti es uncertainty in the provided diagnosis. Given strain sensor data that are polluted with measurement errors, Bayesian inference is used to estimate the location, size, and orientation of damage. This approach uses Bayes' Theorem to combine any prior knowledge an analyst may have about the nature of the damage with information provided implicitly by the strain sensor data to form a posterior probability distribution over possible damage states. The unknown damage parameters are then estimated based on samples drawn numerically from this distribution using a Markov Chain Monte Carlo (MCMC) sampling algorithm. Several modi cations are made to the traditional Bayesian inference approach to provide signi cant computational speedup. First, an ecient surrogate model is constructed using sparse grid interpolation to replace a costly nite element model that must otherwise be evaluated for each sample drawn with MCMC. Next, the standard Bayesian posterior distribution is modi ed using a weighted likelihood formulation, which is shown to improve the convergence of the sampling process. Finally, a robust MCMC algorithm, Delayed Rejection Adaptive Metropolis (DRAM), is adopted to sample the probability distribution more eciently. Numerical examples demonstrate that the proposed framework e ectively provides damage estimates with uncertainty quanti cation and can yield orders of magnitude speedup over standard Bayesian approaches.
Efficient inference for genetic association studies with multiple outcomes.

PubMed

Ruffieux, Helene; Davison, Anthony C; Hager, Jorg; Irincheeva, Irina

2017-10-01

Combined inference for heterogeneous high-dimensional data is critical in modern biology, where clinical and various kinds of molecular data may be available from a single study. Classical genetic association studies regress a single clinical outcome on many genetic variants one by one, but there is an increasing demand for joint analysis of many molecular outcomes and genetic variants in order to unravel functional interactions. Unfortunately, most existing approaches to joint modeling are either too simplistic to be powerful or are impracticable for computational reasons. Inspired by Richardson and others (2010, Bayesian Statistics 9), we consider a sparse multivariate regression model that allows simultaneous selection of predictors and associated responses. As Markov chain Monte Carlo (MCMC) inference on such models can be prohibitively slow when the number of genetic variants exceeds a few thousand, we propose a variational inference approach which produces posterior information very close to that of MCMC inference, at a much reduced computational cost. Extensive numerical experiments show that our approach outperforms popular variable selection methods and tailored Bayesian procedures, dealing within hours with problems involving hundreds of thousands of genetic variants and tens to hundreds of clinical or molecular outcomes. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Modeling two strains of disease via aggregate-level infectivity curves.

PubMed

Romanescu, Razvan; Deardon, Rob

2016-04-01

Well formulated models of disease spread, and efficient methods to fit them to observed data, are powerful tools for aiding the surveillance and control of infectious diseases. Our project considers the problem of the simultaneous spread of two related strains of disease in a context where spatial location is the key driver of disease spread. We start our modeling work with the individual level models (ILMs) of disease transmission, and extend these models to accommodate the competing spread of the pathogens in a two-tier hierarchical population (whose levels we refer to as 'farm' and 'animal'). The postulated interference mechanism between the two strains is a period of cross-immunity following infection. We also present a framework for speeding up the computationally intensive process of fitting the ILM to data, typically done using Markov chain Monte Carlo (MCMC) in a Bayesian framework, by turning the inference into a two-stage process. First, we approximate the number of animals infected on a farm over time by infectivity curves. These curves are fit to data sampled from farms, using maximum likelihood estimation, then, conditional on the fitted curves, Bayesian MCMC inference proceeds for the remaining parameters. Finally, we use posterior predictive distributions of salient epidemic summary statistics, in order to assess the model fitted.
Bayesian inference of nonlinear unsteady aerodynamics from aeroelastic limit cycle oscillations

NASA Astrophysics Data System (ADS)

Sandhu, Rimple; Poirel, Dominique; Pettit, Chris; Khalil, Mohammad; Sarkar, Abhijit

2016-07-01

A Bayesian model selection and parameter estimation algorithm is applied to investigate the influence of nonlinear and unsteady aerodynamic loads on the limit cycle oscillation (LCO) of a pitching airfoil in the transitional Reynolds number regime. At small angles of attack, laminar boundary layer trailing edge separation causes negative aerodynamic damping leading to the LCO. The fluid-structure interaction of the rigid, but elastically mounted, airfoil and nonlinear unsteady aerodynamics is represented by two coupled nonlinear stochastic ordinary differential equations containing uncertain parameters and model approximation errors. Several plausible aerodynamic models with increasing complexity are proposed to describe the aeroelastic system leading to LCO. The likelihood in the posterior parameter probability density function (pdf) is available semi-analytically using the extended Kalman filter for the state estimation of the coupled nonlinear structural and unsteady aerodynamic model. The posterior parameter pdf is sampled using a parallel and adaptive Markov Chain Monte Carlo (MCMC) algorithm. The posterior probability of each model is estimated using the Chib-Jeliazkov method that directly uses the posterior MCMC samples for evidence (marginal likelihood) computation. The Bayesian algorithm is validated through a numerical study and then applied to model the nonlinear unsteady aerodynamic loads using wind-tunnel test data at various Reynolds numbers.
Bayesian inference of nonlinear unsteady aerodynamics from aeroelastic limit cycle oscillations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sandhu, Rimple; Poirel, Dominique; Pettit, Chris

2016-07-01

A Bayesian model selection and parameter estimation algorithm is applied to investigate the influence of nonlinear and unsteady aerodynamic loads on the limit cycle oscillation (LCO) of a pitching airfoil in the transitional Reynolds number regime. At small angles of attack, laminar boundary layer trailing edge separation causes negative aerodynamic damping leading to the LCO. The fluid–structure interaction of the rigid, but elastically mounted, airfoil and nonlinear unsteady aerodynamics is represented by two coupled nonlinear stochastic ordinary differential equations containing uncertain parameters and model approximation errors. Several plausible aerodynamic models with increasing complexity are proposed to describe the aeroelastic systemmore » leading to LCO. The likelihood in the posterior parameter probability density function (pdf) is available semi-analytically using the extended Kalman filter for the state estimation of the coupled nonlinear structural and unsteady aerodynamic model. The posterior parameter pdf is sampled using a parallel and adaptive Markov Chain Monte Carlo (MCMC) algorithm. The posterior probability of each model is estimated using the Chib–Jeliazkov method that directly uses the posterior MCMC samples for evidence (marginal likelihood) computation. The Bayesian algorithm is validated through a numerical study and then applied to model the nonlinear unsteady aerodynamic loads using wind-tunnel test data at various Reynolds numbers.« less
Randomized Trial of a Peer-Led, Telephone-Based Empowerment Intervention for Persons With Chronic Spinal Cord Injury Improves Health Self-Management.

PubMed

Houlihan, Bethlyn Vergo; Brody, Miriam; Everhart-Skeels, Sarah; Pernigotti, Diana; Burnett, Sam; Zazula, Judi; Green, Christa; Hasiotis, Stathis; Belliveau, Timothy; Seetharama, Subramani; Rosenblum, David; Jette, Alan

2017-06-01

To evaluate the impact of "My Care My Call" (MCMC), a peer-led, telephone-based health self-management intervention in adults with chronic spinal cord injury (SCI). Single-blinded randomized controlled trial. General community. Convenience sample of adults with SCI (N=84; mean time post-SCI, 9.9y; mean age, 46y; 73.8% men; 44% with paraplegia; 58% white). Trained peer health coaches applied the person-centered health self-management intervention with 42 experimental subjects over 6 months on a tapered call schedule. The 42 control subjects received usual care. Both groups received the MCMC Resource Guide. Primary outcome-health self-management as measured by the Patient Activation Measure (PAM). Secondary outcomes-global ratings of service/resource use, health-related quality of life, and quality of primary care. Intervention participants averaged 12 calls over 6 months (averaging 21.8min each), with distinct variation. At 6 months, intervention participants reported a significantly greater change in PAM scores (6mo: estimate, 7.029; 95% confidence interval, .1018-13.956; P=.0468) compared with controls, with a trend toward significance at 4 months. At 6 months, intervention participants reported a significantly greater decrease in social/role activity limitations (estimate, -.443; P=.0389), greater life satisfaction (estimate, 1.0091; P=.0522), greater services/resources awareness (estimate, 1.678; P=.0253), greater overall service use (estimate, 1.069; P=.0240), and a greater number of services used (estimate, 1.542; P=.0077). Subgroups most impacted by MCMC on PAM change scores included the following: high social support, white persons, men, 1 to 6 years postinjury, and tetraplegic. This trial demonstrates that the MCMC peer-led, health self-management intervention achieved a positive impact on self-management to prevent secondary conditions in adults with SCI. These results warrant a larger, multisite trial of its efficacy and cost-effectiveness. Copyright © 2017 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Bayesian estimation of differential transcript usage from RNA-seq data.

PubMed

Papastamoulis, Panagiotis; Rattray, Magnus

2017-11-27

Next generation sequencing allows the identification of genes consisting of differentially expressed transcripts, a term which usually refers to changes in the overall expression level. A specific type of differential expression is differential transcript usage (DTU) and targets changes in the relative within gene expression of a transcript. The contribution of this paper is to: (a) extend the use of cjBitSeq to the DTU context, a previously introduced Bayesian model which is originally designed for identifying changes in overall expression levels and (b) propose a Bayesian version of DRIMSeq, a frequentist model for inferring DTU. cjBitSeq is a read based model and performs fully Bayesian inference by MCMC sampling on the space of latent state of each transcript per gene. BayesDRIMSeq is a count based model and estimates the Bayes Factor of a DTU model against a null model using Laplace's approximation. The proposed models are benchmarked against the existing ones using a recent independent simulation study as well as a real RNA-seq dataset. Our results suggest that the Bayesian methods exhibit similar performance with DRIMSeq in terms of precision/recall but offer better calibration of False Discovery Rate.
Joint Estimation of Contamination, Error and Demography for Nuclear DNA from Ancient Humans

PubMed Central

Slatkin, Montgomery

2016-01-01

When sequencing an ancient DNA sample from a hominin fossil, DNA from present-day humans involved in excavation and extraction will be sequenced along with the endogenous material. This type of contamination is problematic for downstream analyses as it will introduce a bias towards the population of the contaminating individual(s). Quantifying the extent of contamination is a crucial step as it allows researchers to account for possible biases that may arise in downstream genetic analyses. Here, we present an MCMC algorithm to co-estimate the contamination rate, sequencing error rate and demographic parameters—including drift times and admixture rates—for an ancient nuclear genome obtained from human remains, when the putative contaminating DNA comes from present-day humans. We assume we have a large panel representing the putative contaminant population (e.g. European, East Asian or African). The method is implemented in a C++ program called ‘Demographic Inference with Contamination and Error’ (DICE). We applied it to simulations and genome data from ancient Neanderthals and modern humans. With reasonable levels of genome sequence coverage (>3X), we find we can recover accurate estimates of all these parameters, even when the contamination rate is as high as 50%. PMID:27049965
Bayesian analysis for erosion modelling of sediments in combined sewer systems.

PubMed

Kanso, A; Chebbo, G; Tassin, B

2005-01-01

Previous research has confirmed that the sediments at the bed of combined sewer systems are the main source of particulate and organic pollution during rain events contributing to combined sewer overflows. However, existing urban stormwater models utilize inappropriate sediment transport formulas initially developed from alluvial hydrodynamics. Recently, a model has been formulated and profoundly assessed based on laboratory experiments to simulate the erosion of sediments in sewer pipes taking into account the increase in strength with depth in the weak layer of deposits. In order to objectively evaluate this model, this paper presents a Bayesian analysis of the model using field data collected in sewer pipes in Paris under known hydraulic conditions. The test has been performed using a MCMC sampling method for calibration and uncertainty assessment. Results demonstrate the capacity of the model to reproduce erosion as a direct response to the increase in bed shear stress. This is due to the model description of the erosional strength in the deposits and to the shape of the measured bed shear stress. However, large uncertainties in some of the model parameters suggest that the model could be over-parameterised and necessitates a large amount of informative data for its calibration.
Bayesian linkage and segregation analysis: factoring the problem.

PubMed

Matthysse, S

2000-01-01

Complex segregation analysis and linkage methods are mathematical techniques for the genetic dissection of complex diseases. They are used to delineate complex modes of familial transmission and to localize putative disease susceptibility loci to specific chromosomal locations. The computational problem of Bayesian linkage and segregation analysis is one of integration in high-dimensional spaces. In this paper, three available techniques for Bayesian linkage and segregation analysis are discussed: Markov Chain Monte Carlo (MCMC), importance sampling, and exact calculation. The contribution of each to the overall integration will be explicitly discussed.
Coronal loop seismology using damping of standing kink oscillations by mode coupling. II. additional physical effects and Bayesian analysis

NASA Astrophysics Data System (ADS)

Pascoe, D. J.; Anfinogentov, S.; Nisticò, G.; Goddard, C. R.; Nakariakov, V. M.

2017-04-01

Context. The strong damping of kink oscillations of coronal loops can be explained by mode coupling. The damping envelope depends on the transverse density profile of the loop. Observational measurements of the damping envelope have been used to determine the transverse loop structure which is important for understanding other physical processes such as heating. Aims: The general damping envelope describing the mode coupling of kink waves consists of a Gaussian damping regime followed by an exponential damping regime. Recent observational detection of these damping regimes has been employed as a seismological tool. We extend the description of the damping behaviour to account for additional physical effects, namely a time-dependent period of oscillation, the presence of additional longitudinal harmonics, and the decayless regime of standing kink oscillations. Methods: We examine four examples of standing kink oscillations observed by the Atmospheric Imaging Assembly (AIA) onboard the Solar Dynamics Observatory (SDO). We use forward modelling of the loop position and investigate the dependence on the model parameters using Bayesian inference and Markov chain Monte Carlo (MCMC) sampling. Results: Our improvements to the physical model combined with the use of Bayesian inference and MCMC produce improved estimates of model parameters and their uncertainties. Calculation of the Bayes factor also allows us to compare the suitability of different physical models. We also use a new method based on spline interpolation of the zeroes of the oscillation to accurately describe the background trend of the oscillating loop. Conclusions: This powerful and robust method allows for accurate seismology of coronal loops, in particular the transverse density profile, and potentially reveals additional physical effects.
Assessment of parametric uncertainty for groundwater reactive transport modeling,

USGS Publications Warehouse

Shi, Xiaoqing; Ye, Ming; Curtis, Gary P.; Miller, Geoffery L.; Meyer, Philip D.; Kohler, Matthias; Yabusaki, Steve; Wu, Jichun

2014-01-01

The validity of using Gaussian assumptions for model residuals in uncertainty quantification of a groundwater reactive transport model was evaluated in this study. Least squares regression methods explicitly assume Gaussian residuals, and the assumption leads to Gaussian likelihood functions, model parameters, and model predictions. While the Bayesian methods do not explicitly require the Gaussian assumption, Gaussian residuals are widely used. This paper shows that the residuals of the reactive transport model are non-Gaussian, heteroscedastic, and correlated in time; characterizing them requires using a generalized likelihood function such as the formal generalized likelihood function developed by Schoups and Vrugt (2010). For the surface complexation model considered in this study for simulating uranium reactive transport in groundwater, parametric uncertainty is quantified using the least squares regression methods and Bayesian methods with both Gaussian and formal generalized likelihood functions. While the least squares methods and Bayesian methods with Gaussian likelihood function produce similar Gaussian parameter distributions, the parameter distributions of Bayesian uncertainty quantification using the formal generalized likelihood function are non-Gaussian. In addition, predictive performance of formal generalized likelihood function is superior to that of least squares regression and Bayesian methods with Gaussian likelihood function. The Bayesian uncertainty quantification is conducted using the differential evolution adaptive metropolis (DREAM(zs)) algorithm; as a Markov chain Monte Carlo (MCMC) method, it is a robust tool for quantifying uncertainty in groundwater reactive transport models. For the surface complexation model, the regression-based local sensitivity analysis and Morris- and DREAM(ZS)-based global sensitivity analysis yield almost identical ranking of parameter importance. The uncertainty analysis may help select appropriate likelihood functions, improve model calibration, and reduce predictive uncertainty in other groundwater reactive transport and environmental modeling.
No Control Genes Required: Bayesian Analysis of qRT-PCR Data

PubMed Central

Matz, Mikhail V.; Wright, Rachel M.; Scott, James G.

2013-01-01

Background Model-based analysis of data from quantitative reverse-transcription PCR (qRT-PCR) is potentially more powerful and versatile than traditional methods. Yet existing model-based approaches cannot properly deal with the higher sampling variances associated with low-abundant targets, nor do they provide a natural way to incorporate assumptions about the stability of control genes directly into the model-fitting process. Results In our method, raw qPCR data are represented as molecule counts, and described using generalized linear mixed models under Poisson-lognormal error. A Markov Chain Monte Carlo (MCMC) algorithm is used to sample from the joint posterior distribution over all model parameters, thereby estimating the effects of all experimental factors on the expression of every gene. The Poisson-based model allows for the correct specification of the mean-variance relationship of the PCR amplification process, and can also glean information from instances of no amplification (zero counts). Our method is very flexible with respect to control genes: any prior knowledge about the expected degree of their stability can be directly incorporated into the model. Yet the method provides sensible answers without such assumptions, or even in the complete absence of control genes. We also present a natural Bayesian analogue of the “classic” analysis, which uses standard data pre-processing steps (logarithmic transformation and multi-gene normalization) but estimates all gene expression changes jointly within a single model. The new methods are considerably more flexible and powerful than the standard delta-delta Ct analysis based on pairwise t-tests. Conclusions Our methodology expands the applicability of the relative-quantification analysis protocol all the way to the lowest-abundance targets, and provides a novel opportunity to analyze qRT-PCR data without making any assumptions concerning target stability. These procedures have been implemented as the MCMC.qpcr package in R. PMID:23977043
A Bayesian trans-dimensional approach for the fusion of multiple geophysical datasets

NASA Astrophysics Data System (ADS)

JafarGandomi, Arash; Binley, Andrew

2013-09-01

We propose a Bayesian fusion approach to integrate multiple geophysical datasets with different coverage and sensitivity. The fusion strategy is based on the capability of various geophysical methods to provide enough resolution to identify either subsurface material parameters or subsurface structure, or both. We focus on electrical resistivity as the target material parameter and electrical resistivity tomography (ERT), electromagnetic induction (EMI), and ground penetrating radar (GPR) as the set of geophysical methods. However, extending the approach to different sets of geophysical parameters and methods is straightforward. Different geophysical datasets are entered into a trans-dimensional Markov chain Monte Carlo (McMC) search-based joint inversion algorithm. The trans-dimensional property of the McMC algorithm allows dynamic parameterisation of the model space, which in turn helps to avoid bias of the post-inversion results towards a particular model. Given that we are attempting to develop an approach that has practical potential, we discretize the subsurface into an array of one-dimensional earth-models. Accordingly, the ERT data that are collected by using two-dimensional acquisition geometry are re-casted to a set of equivalent vertical electric soundings. Different data are inverted either individually or jointly to estimate one-dimensional subsurface models at discrete locations. We use Shannon's information measure to quantify the information obtained from the inversion of different combinations of geophysical datasets. Information from multiple methods is brought together via introducing joint likelihood function and/or constraining the prior information. A Bayesian maximum entropy approach is used for spatial fusion of spatially dispersed estimated one-dimensional models and mapping of the target parameter. We illustrate the approach with a synthetic dataset and then apply it to a field dataset. We show that the proposed fusion strategy is successful not only in enhancing the subsurface information but also as a survey design tool to identify the appropriate combination of the geophysical tools and show whether application of an individual method for further investigation of a specific site is beneficial.
Correlation between centre offsets and gas velocity dispersion of galaxy clusters in cosmological simulations

NASA Astrophysics Data System (ADS)

Li, Ming-Hua; Zhu, Weishan; Zhao, Dong

2018-05-01

The gas is the dominant component of baryonic matter in most galaxy groups and clusters. The spatial offsets of gas centre from the halo centre could be an indicator of the dynamical state of cluster. Knowledge of such offsets is important for estimate the uncertainties when using clusters as cosmological probes. In this paper, we study the centre offsets roff between the gas and that of all the matter within halo systems in ΛCDM cosmological hydrodynamic simulations. We focus on two kinds of centre offsets: one is the three-dimensional PB offsets between the gravitational potential minimum of the entire halo and the barycentre of the ICM, and the other is the two-dimensional PX offsets between the potential minimum of the halo and the iterative centroid of the projected synthetic X-ray emission of the halo. Haloes at higher redshifts tend to have larger values of rescaled offsets roff/r200 and larger gas velocity dispersion σ v^gas/σ _{200}. For both types of offsets, we find that the correlation between the rescaled centre offsets roff/r200 and the rescaled 3D gas velocity dispersion, σ _v^gas/σ _{200} can be approximately described by a quadratic function as r_{off}/r_{200} ∝ (σ v^gas/σ _{200} - k_2)2. A Bayesian analysis with MCMC method is employed to estimate the model parameters. Dependence of the correlation relation on redshifts and the gas mass fraction are also investigated.
Uncertainty analysis for fluorescence tomography with Monte Carlo method

NASA Astrophysics Data System (ADS)

Reinbacher-Köstinger, Alice; Freiberger, Manuel; Scharfetter, Hermann

2011-07-01

Fluorescence tomography seeks to image an inaccessible fluorophore distribution inside an object like a small animal by injecting light at the boundary and measuring the light emitted by the fluorophore. Optical parameters (e.g. the conversion efficiency or the fluorescence life-time) of certain fluorophores depend on physiologically interesting quantities like the pH value or the oxygen concentration in the tissue, which allows functional rather than just anatomical imaging. To reconstruct the concentration and the life-time from the boundary measurements, a nonlinear inverse problem has to be solved. It is, however, difficult to estimate the uncertainty of the reconstructed parameters in case of iterative algorithms and a large number of degrees of freedom. Uncertainties in fluorescence tomography applications arise from model inaccuracies, discretization errors, data noise and a priori errors. Thus, a Markov chain Monte Carlo method (MCMC) was used to consider all these uncertainty factors exploiting Bayesian formulation of conditional probabilities. A 2-D simulation experiment was carried out for a circular object with two inclusions. Both inclusions had a 2-D Gaussian distribution of the concentration and constant life-time inside of a representative area of the inclusion. Forward calculations were done with the diffusion approximation of Boltzmann's transport equation. The reconstruction results show that the percent estimation error of the lifetime parameter is by a factor of approximately 10 lower than that of the concentration. This finding suggests that lifetime imaging may provide more accurate information than concentration imaging only. The results must be interpreted with caution, however, because the chosen simulation setup represents a special case and a more detailed analysis remains to be done in future to clarify if the findings can be generalized.
Convergence analysis of surrogate-based methods for Bayesian inverse problems

NASA Astrophysics Data System (ADS)

Yan, Liang; Zhang, Yuan-Xiang

2017-12-01

The major challenges in the Bayesian inverse problems arise from the need for repeated evaluations of the forward model, as required by Markov chain Monte Carlo (MCMC) methods for posterior sampling. Many attempts at accelerating Bayesian inference have relied on surrogates for the forward model, typically constructed through repeated forward simulations that are performed in an offline phase. Although such approaches can be quite effective at reducing computation cost, there has been little analysis of the approximation on posterior inference. In this work, we prove error bounds on the Kullback-Leibler (KL) distance between the true posterior distribution and the approximation based on surrogate models. Our rigorous error analysis show that if the forward model approximation converges at certain rate in the prior-weighted L 2 norm, then the posterior distribution generated by the approximation converges to the true posterior at least two times faster in the KL sense. The error bound on the Hellinger distance is also provided. To provide concrete examples focusing on the use of the surrogate model based methods, we present an efficient technique for constructing stochastic surrogate models to accelerate the Bayesian inference approach. The Christoffel least squares algorithms, based on generalized polynomial chaos, are used to construct a polynomial approximation of the forward solution over the support of the prior distribution. The numerical strategy and the predicted convergence rates are then demonstrated on the nonlinear inverse problems, involving the inference of parameters appearing in partial differential equations.

Intercoalescence time distribution of incomplete gene genealogies in temporally varying populations, and applications in population genetic inference.

PubMed

Chen, Hua

2013-03-01

Tracing back to a specific time T in the past, the genealogy of a sample of haplotypes may not have reached their common ancestor and may leave m lineages extant. For such an incomplete genealogy truncated at a specific time T in the past, the distribution and expectation of the intercoalescence times conditional on T are derived in an exact form in this paper for populations of deterministically time-varying sizes, specifically, for populations growing exponentially. The derived intercoalescence time distribution can be integrated to the coalescent-based joint allele frequency spectrum (JAFS) theory, and is useful for population genetic inference from large-scale genomic data, without relying on computationally intensive approaches, such as importance sampling and Markov Chain Monte Carlo (MCMC) methods. The inference of several important parameters relying on this derived conditional distribution is demonstrated: quantifying population growth rate and onset time, and estimating the number of ancestral lineages at a specific ancient time. Simulation studies confirm validity of the derivation and statistical efficiency of the methods using the derived intercoalescence time distribution. Two examples of real data are given to show the inference of the population growth rate of a European sample from the NIEHS Environmental Genome Project, and the number of ancient lineages of 31 mitochondrial genomes from Tibetan populations. © 2013 Blackwell Publishing Ltd/University College London.
Comparison of statistical sampling methods with ScannerBit, the GAMBIT scanning module

NASA Astrophysics Data System (ADS)

Martinez, Gregory D.; McKay, James; Farmer, Ben; Scott, Pat; Roebber, Elinore; Putze, Antje; Conrad, Jan

2017-11-01

We introduce ScannerBit, the statistics and sampling module of the public, open-source global fitting framework GAMBIT. ScannerBit provides a standardised interface to different sampling algorithms, enabling the use and comparison of multiple computational methods for inferring profile likelihoods, Bayesian posteriors, and other statistical quantities. The current version offers random, grid, raster, nested sampling, differential evolution, Markov Chain Monte Carlo (MCMC) and ensemble Monte Carlo samplers. We also announce the release of a new standalone differential evolution sampler, Diver, and describe its design, usage and interface to ScannerBit. We subject Diver and three other samplers (the nested sampler MultiNest, the MCMC GreAT, and the native ScannerBit implementation of the ensemble Monte Carlo algorithm T-Walk) to a battery of statistical tests. For this we use a realistic physical likelihood function, based on the scalar singlet model of dark matter. We examine the performance of each sampler as a function of its adjustable settings, and the dimensionality of the sampling problem. We evaluate performance on four metrics: optimality of the best fit found, completeness in exploring the best-fit region, number of likelihood evaluations, and total runtime. For Bayesian posterior estimation at high resolution, T-Walk provides the most accurate and timely mapping of the full parameter space. For profile likelihood analysis in less than about ten dimensions, we find that Diver and MultiNest score similarly in terms of best fit and speed, outperforming GreAT and T-Walk; in ten or more dimensions, Diver substantially outperforms the other three samplers on all metrics.
The Joker: A Custom Monte Carlo Sampler for Binary-star and Exoplanet Radial Velocity Data

NASA Astrophysics Data System (ADS)

Price-Whelan, Adrian M.; Hogg, David W.; Foreman-Mackey, Daniel; Rix, Hans-Walter

2017-03-01

Given sparse or low-quality radial velocity measurements of a star, there are often many qualitatively different stellar or exoplanet companion orbit models that are consistent with the data. The consequent multimodality of the likelihood function leads to extremely challenging search, optimization, and Markov chain Monte Carlo (MCMC) posterior sampling over the orbital parameters. Here we create a custom Monte Carlo sampler for sparse or noisy radial velocity measurements of two-body systems that can produce posterior samples for orbital parameters even when the likelihood function is poorly behaved. The six standard orbital parameters for a binary system can be split into four nonlinear parameters (period, eccentricity, argument of pericenter, phase) and two linear parameters (velocity amplitude, barycenter velocity). We capitalize on this by building a sampling method in which we densely sample the prior probability density function (pdf) in the nonlinear parameters and perform rejection sampling using a likelihood function marginalized over the linear parameters. With sparse or uninformative data, the sampling obtained by this rejection sampling is generally multimodal and dense. With informative data, the sampling becomes effectively unimodal but too sparse: in these cases we follow the rejection sampling with standard MCMC. The method produces correct samplings in orbital parameters for data that include as few as three epochs. The Joker can therefore be used to produce proper samplings of multimodal pdfs, which are still informative and can be used in hierarchical (population) modeling. We give some examples that show how the posterior pdf depends sensitively on the number and time coverage of the observations and their uncertainties.
Monte Carlo Bayesian Inference on a Statistical Model of Sub-Gridcolumn Moisture Variability using High-Resolution Cloud Observations

NASA Astrophysics Data System (ADS)

Norris, P. M.; da Silva, A. M., Jr.

2016-12-01

Norris and da Silva recently published a method to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation (CDA). The gridcolumn model includes assumed-PDF intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used are MODIS cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. In the example provided, the method is able to restore marine stratocumulus near the Californian coast where the background state has a clear swath. The new approach not only significantly reduces mean and standard deviation biases with respect to the assimilated observables, but also improves the simulated rotational-Ramman scattering cloud optical centroid pressure against independent (non-assimilated) retrievals from the OMI instrument. One obvious difficulty for the method, and other CDA methods, is the lack of information content in passive cloud observables on cloud vertical structure, beyond cloud-top and thickness, thus necessitating strong dependence on the background vertical moisture structure. It is found that a simple flow-dependent correlation modification due to Riishojgaard is helpful, better honoring inversion structures in the background state.
Bayesian forecasting and uncertainty quantifying of stream flows using Metropolis-Hastings Markov Chain Monte Carlo algorithm

NASA Astrophysics Data System (ADS)

Wang, Hongrui; Wang, Cheng; Wang, Ying; Gao, Xiong; Yu, Chen

2017-06-01

This paper presents a Bayesian approach using Metropolis-Hastings Markov Chain Monte Carlo algorithm and applies this method for daily river flow rate forecast and uncertainty quantification for Zhujiachuan River using data collected from Qiaotoubao Gage Station and other 13 gage stations in Zhujiachuan watershed in China. The proposed method is also compared with the conventional maximum likelihood estimation (MLE) for parameter estimation and quantification of associated uncertainties. While the Bayesian method performs similarly in estimating the mean value of daily flow rate, it performs over the conventional MLE method on uncertainty quantification, providing relatively narrower reliable interval than the MLE confidence interval and thus more precise estimation by using the related information from regional gage stations. The Bayesian MCMC method might be more favorable in the uncertainty analysis and risk management.
Characterizing the Trade Space Between Capability and Complexity in Next Generation Cloud and Precipitation Observing Systems Using Markov Chain Monte Carlos Techniques

NASA Astrophysics Data System (ADS)

Xu, Z.; Mace, G. G.; Posselt, D. J.

2017-12-01

As we begin to contemplate the next generation atmospheric observing systems, it will be critically important that we are able to make informed decisions regarding the trade space between scientific capability and the need to keep complexity and cost within definable limits. To explore this trade space as it pertains to understanding key cloud and precipitation processes, we are developing a Markov Chain Monte Carlo (MCMC) algorithm suite that allows us to arbitrarily define the specifications of candidate observing systems and then explore how the uncertainties in key retrieved geophysical parameters respond to that observing system. MCMC algorithms produce a more complete posterior solution space, and allow for an objective examination of information contained in measurements. In our initial implementation, MCMC experiments are performed to retrieve vertical profiles of cloud and precipitation properties from a spectrum of active and passive measurements collected by aircraft during the ACE Radiation Definition Experiments (RADEX). Focusing on shallow cumulus clouds observed during the Integrated Precipitation and Hydrology EXperiment (IPHEX), observing systems in this study we consider W and Ka-band radar reflectivity, path-integrated attenuation at those frequencies, 31 and 94 GHz brightness temperatures as well as visible and near-infrared reflectance. By varying the sensitivity and uncertainty of these measurements, we quantify the capacity of various combinations of observations to characterize the physical properties of clouds and precipitation.
RadVel: The Radial Velocity Modeling Toolkit

NASA Astrophysics Data System (ADS)

Fulton, Benjamin J.; Petigura, Erik A.; Blunt, Sarah; Sinukoff, Evan

2018-04-01

RadVel is an open-source Python package for modeling Keplerian orbits in radial velocity (RV) timeseries. RadVel provides a convenient framework to fit RVs using maximum a posteriori optimization and to compute robust confidence intervals by sampling the posterior probability density via Markov Chain Monte Carlo (MCMC). RadVel allows users to float or fix parameters, impose priors, and perform Bayesian model comparison. We have implemented real-time MCMC convergence tests to ensure adequate sampling of the posterior. RadVel can output a number of publication-quality plots and tables. Users may interface with RadVel through a convenient command-line interface or directly from Python. The code is object-oriented and thus naturally extensible. We encourage contributions from the community. Documentation is available at http://radvel.readthedocs.io.
Hydrogeophysical Assessment of Aquifer Uncertainty Using Simulated Annealing driven MRF-Based Stochastic Joint Inversion

NASA Astrophysics Data System (ADS)

Oware, E. K.

2017-12-01

Geophysical quantification of hydrogeological parameters typically involve limited noisy measurements coupled with inadequate understanding of the target phenomenon. Hence, a deterministic solution is unrealistic in light of the largely uncertain inputs. Stochastic imaging (SI), in contrast, provides multiple equiprobable realizations that enable probabilistic assessment of aquifer properties in a realistic manner. Generation of geologically realistic prior models is central to SI frameworks. Higher-order statistics for representing prior geological features in SI are, however, usually borrowed from training images (TIs), which may produce undesirable outcomes if the TIs are unpresentatitve of the target structures. The Markov random field (MRF)-based SI strategy provides a data-driven alternative to TI-based SI algorithms. In the MRF-based method, the simulation of spatial features is guided by Gibbs energy (GE) minimization. Local configurations with smaller GEs have higher likelihood of occurrence and vice versa. The parameters of the Gibbs distribution for computing the GE are estimated from the hydrogeophysical data, thereby enabling the generation of site-specific structures in the absence of reliable TIs. In Metropolis-like SI methods, the variance of the transition probability controls the jump-size. The procedure is a standard Markov chain Monte Carlo (McMC) method when a constant variance is assumed, and becomes simulated annealing (SA) when the variance (cooling temperature) is allowed to decrease gradually with time. We observe that in certain problems, the large variance typically employed at the beginning to hasten burn-in may be unideal for sampling at the equilibrium state. The powerfulness of SA stems from its flexibility to adaptively scale the variance at different stages of the sampling. Degeneration of results were reported in a previous implementation of the MRF-based SI strategy based on a constant variance. Here, we present an updated version of the algorithm based on SA that appears to resolve the degeneration problem with seemingly improved results. We illustrate the performance of the SA version with a joint inversion of time-lapse concentration and electrical resistivity measurements in a hypothetical trinary hydrofacies aquifer characterization problem.
Comparison of Filtering Methods for the Modeling and Retrospective Forecasting of Influenza Epidemics

PubMed Central

Yang, Wan; Karspeck, Alicia; Shaman, Jeffrey

2014-01-01

A variety of filtering methods enable the recursive estimation of system state variables and inference of model parameters. These methods have found application in a range of disciplines and settings, including engineering design and forecasting, and, over the last two decades, have been applied to infectious disease epidemiology. For any system of interest, the ideal filter depends on the nonlinearity and complexity of the model to which it is applied, the quality and abundance of observations being entrained, and the ultimate application (e.g. forecast, parameter estimation, etc.). Here, we compare the performance of six state-of-the-art filter methods when used to model and forecast influenza activity. Three particle filters—a basic particle filter (PF) with resampling and regularization, maximum likelihood estimation via iterated filtering (MIF), and particle Markov chain Monte Carlo (pMCMC)—and three ensemble filters—the ensemble Kalman filter (EnKF), the ensemble adjustment Kalman filter (EAKF), and the rank histogram filter (RHF)—were used in conjunction with a humidity-forced susceptible-infectious-recovered-susceptible (SIRS) model and weekly estimates of influenza incidence. The modeling frameworks, first validated with synthetic influenza epidemic data, were then applied to fit and retrospectively forecast the historical incidence time series of seven influenza epidemics during 2003–2012, for 115 cities in the United States. Results suggest that when using the SIRS model the ensemble filters and the basic PF are more capable of faithfully recreating historical influenza incidence time series, while the MIF and pMCMC do not perform as well for multimodal outbreaks. For forecast of the week with the highest influenza activity, the accuracies of the six model-filter frameworks are comparable; the three particle filters perform slightly better predicting peaks 1–5 weeks in the future; the ensemble filters are more accurate predicting peaks in the past. PMID:24762780
Occupancy in community-level studies

USGS Publications Warehouse

MacKenzie, Darryl I.; Nichols, James; Royle, Andy; Pollock, Kenneth H.; Bailey, Larissa L.; Hines, James

2018-01-01

Another type of multi-species studies, are those focused on community-level metrics such as species richness. In this chapter we detail how some of the single-species occupancy models described in earlier chapters have been applied, or extended, for use in such studies, while accounting for imperfect detection. We highlight how Bayesian methods using MCMC are particularly useful in such settings to easily calculate relevant community-level summaries based on presence/absence data. These modeling approaches can be used to assess richness at a single point in time, or to investigate changes in the species pool over time.
MC3: Multi-core Markov-chain Monte Carlo code

NASA Astrophysics Data System (ADS)

Cubillos, Patricio; Harrington, Joseph; Lust, Nate; Foster, AJ; Stemm, Madison; Loredo, Tom; Stevenson, Kevin; Campo, Chris; Hardin, Matt; Hardy, Ryan

2016-10-01

MC3 (Multi-core Markov-chain Monte Carlo) is a Bayesian statistics tool that can be executed from the shell prompt or interactively through the Python interpreter with single- or multiple-CPU parallel computing. It offers Markov-chain Monte Carlo (MCMC) posterior-distribution sampling for several algorithms, Levenberg-Marquardt least-squares optimization, and uniform non-informative, Jeffreys non-informative, or Gaussian-informative priors. MC3 can share the same value among multiple parameters and fix the value of parameters to constant values, and offers Gelman-Rubin convergence testing and correlated-noise estimation with time-averaging or wavelet-based likelihood estimation methods.
Estimation for coefficient of variation of an extension of the exponential distribution under type-II censoring scheme

NASA Astrophysics Data System (ADS)

Bakoban, Rana A.

2017-08-01

The coefficient of variation [CV] has several applications in applied statistics. So in this paper, we adopt Bayesian and non-Bayesian approaches for the estimation of CV under type-II censored data from extension exponential distribution [EED]. The point and interval estimate of the CV are obtained for each of the maximum likelihood and parametric bootstrap techniques. Also the Bayesian approach with the help of MCMC method is presented. A real data set is presented and analyzed, hence the obtained results are used to assess the obtained theoretical results.
Fitting mechanistic epidemic models to data: A comparison of simple Markov chain Monte Carlo approaches.

PubMed

Li, Michael; Dushoff, Jonathan; Bolker, Benjamin M

2018-07-01

Simple mechanistic epidemic models are widely used for forecasting and parameter estimation of infectious diseases based on noisy case reporting data. Despite the widespread application of models to emerging infectious diseases, we know little about the comparative performance of standard computational-statistical frameworks in these contexts. Here we build a simple stochastic, discrete-time, discrete-state epidemic model with both process and observation error and use it to characterize the effectiveness of different flavours of Bayesian Markov chain Monte Carlo (MCMC) techniques. We use fits to simulated data, where parameters (and future behaviour) are known, to explore the limitations of different platforms and quantify parameter estimation accuracy, forecasting accuracy, and computational efficiency across combinations of modeling decisions (e.g. discrete vs. continuous latent states, levels of stochasticity) and computational platforms (JAGS, NIMBLE, Stan).
MCMC multilocus lod scores: application of a new approach.

PubMed

George, Andrew W; Wijsman, Ellen M; Thompson, Elizabeth A

2005-01-01

On extended pedigrees with extensive missing data, the calculation of multilocus likelihoods for linkage analysis is often beyond the computational bounds of exact methods. Growing interest therefore surrounds the implementation of Monte Carlo estimation methods. In this paper, we demonstrate the speed and accuracy of a new Markov chain Monte Carlo method for the estimation of linkage likelihoods through an analysis of real data from a study of early-onset Alzheimer's disease. For those data sets where comparison with exact analysis is possible, we achieved up to a 100-fold increase in speed. Our approach is implemented in the program lm_bayes within the framework of the freely available MORGAN 2.6 package for Monte Carlo genetic analysis (http://www.stat.washington.edu/thompson/Genepi/MORGAN/Morgan.shtml).
A Bayesian Uncertainty Framework for Conceptual Snowmelt and Hydrologic Models Applied to the Tenderfoot Creek Experimental Forest

NASA Astrophysics Data System (ADS)

Smith, T.; Marshall, L.

2007-12-01

In many mountainous regions, the single most important parameter in forecasting the controls on regional water resources is snowpack (Williams et al., 1999). In an effort to bridge the gap between theoretical understanding and functional modeling of snow-driven watersheds, a flexible hydrologic modeling framework is being developed. The aim is to create a suite of models that move from parsimonious structures, concentrated on aggregated watershed response, to those focused on representing finer scale processes and distributed response. This framework will operate as a tool to investigate the link between hydrologic model predictive performance, uncertainty, model complexity, and observable hydrologic processes. Bayesian methods, and particularly Markov chain Monte Carlo (MCMC) techniques, are extremely useful in uncertainty assessment and parameter estimation of hydrologic models. However, these methods have some difficulties in implementation. In a traditional Bayesian setting, it can be difficult to reconcile multiple data types, particularly those offering different spatial and temporal coverage, depending on the model type. These difficulties are also exacerbated by sensitivity of MCMC algorithms to model initialization and complex parameter interdependencies. As a way of circumnavigating some of the computational complications, adaptive MCMC algorithms have been developed to take advantage of the information gained from each successive iteration. Two adaptive algorithms are compared is this study, the Adaptive Metropolis (AM) algorithm, developed by Haario et al (2001), and the Delayed Rejection Adaptive Metropolis (DRAM) algorithm, developed by Haario et al (2006). While neither algorithm is truly Markovian, it has been proven that each satisfies the desired ergodicity and stationarity properties of Markov chains. Both algorithms were implemented as the uncertainty and parameter estimation framework for a conceptual rainfall-runoff model based on the Probability Distributed Model (PDM), developed by Moore (1985). We implement the modeling framework in Stringer Creek watershed in the Tenderfoot Creek Experimental Forest (TCEF), Montana. The snowmelt-driven watershed offers that additional challenge of modeling snow accumulation and melt and current efforts are aimed at developing a temperature- and radiation-index snowmelt model. Auxiliary data available from within TCEF's watersheds are used to support in the understanding of information value as it relates to predictive performance. Because the model is based on lumped parameters, auxiliary data are hard to incorporate directly. However, these additional data offer benefits through the ability to inform prior distributions of the lumped, model parameters. By incorporating data offering different information into the uncertainty assessment process, a cross-validation technique is engaged to better ensure that modeled results reflect real process complexity.
Bayesian forecasting and uncertainty quantifying of stream flows using Metropolis–Hastings Markov Chain Monte Carlo algorithm

DOE PAGES

Wang, Hongrui; Wang, Cheng; Wang, Ying; ...

2017-04-05

This paper presents a Bayesian approach using Metropolis-Hastings Markov Chain Monte Carlo algorithm and applies this method for daily river flow rate forecast and uncertainty quantification for Zhujiachuan River using data collected from Qiaotoubao Gage Station and other 13 gage stations in Zhujiachuan watershed in China. The proposed method is also compared with the conventional maximum likelihood estimation (MLE) for parameter estimation and quantification of associated uncertainties. While the Bayesian method performs similarly in estimating the mean value of daily flow rate, it performs over the conventional MLE method on uncertainty quantification, providing relatively narrower reliable interval than the MLEmore » confidence interval and thus more precise estimation by using the related information from regional gage stations. As a result, the Bayesian MCMC method might be more favorable in the uncertainty analysis and risk management.« less
Monte Carlo Analysis of Reservoir Models Using Seismic Data and Geostatistical Models

NASA Astrophysics Data System (ADS)

Zunino, A.; Mosegaard, K.; Lange, K.; Melnikova, Y.; Hansen, T. M.

2013-12-01

We present a study on the analysis of petroleum reservoir models consistent with seismic data and geostatistical constraints performed on a synthetic reservoir model. Our aim is to invert directly for structure and rock bulk properties of the target reservoir zone. To infer the rock facies, porosity and oil saturation seismology alone is not sufficient but a rock physics model must be taken into account, which links the unknown properties to the elastic parameters. We then combine a rock physics model with a simple convolutional approach for seismic waves to invert the "measured" seismograms. To solve this inverse problem, we employ a Markov chain Monte Carlo (MCMC) method, because it offers the possibility to handle non-linearity, complex and multi-step forward models and provides realistic estimates of uncertainties. However, for large data sets the MCMC method may be impractical because of a very high computational demand. To face this challenge one strategy is to feed the algorithm with realistic models, hence relying on proper prior information. To address this problem, we utilize an algorithm drawn from geostatistics to generate geologically plausible models which represent samples of the prior distribution. The geostatistical algorithm learns the multiple-point statistics from prototype models (in the form of training images), then generates thousands of different models which are accepted or rejected by a Metropolis sampler. To further reduce the computation time we parallelize the software and run it on multi-core machines. The solution of the inverse problem is then represented by a collection of reservoir models in terms of facies, porosity and oil saturation, which constitute samples of the posterior distribution. We are finally able to produce probability maps of the properties we are interested in by performing statistical analysis on the collection of solutions.
Finite element model updating using the shadow hybrid Monte Carlo technique

NASA Astrophysics Data System (ADS)

Boulkaibet, I.; Mthembu, L.; Marwala, T.; Friswell, M. I.; Adhikari, S.

2015-02-01

Recent research in the field of finite element model updating (FEM) advocates the adoption of Bayesian analysis techniques to dealing with the uncertainties associated with these models. However, Bayesian formulations require the evaluation of the Posterior Distribution Function which may not be available in analytical form. This is the case in FEM updating. In such cases sampling methods can provide good approximations of the Posterior distribution when implemented in the Bayesian context. Markov Chain Monte Carlo (MCMC) algorithms are the most popular sampling tools used to sample probability distributions. However, the efficiency of these algorithms is affected by the complexity of the systems (the size of the parameter space). The Hybrid Monte Carlo (HMC) offers a very important MCMC approach to dealing with higher-dimensional complex problems. The HMC uses the molecular dynamics (MD) steps as the global Monte Carlo (MC) moves to reach areas of high probability where the gradient of the log-density of the Posterior acts as a guide during the search process. However, the acceptance rate of HMC is sensitive to the system size as well as the time step used to evaluate the MD trajectory. To overcome this limitation we propose the use of the Shadow Hybrid Monte Carlo (SHMC) algorithm. The SHMC algorithm is a modified version of the Hybrid Monte Carlo (HMC) and designed to improve sampling for large-system sizes and time steps. This is done by sampling from a modified Hamiltonian function instead of the normal Hamiltonian function. In this paper, the efficiency and accuracy of the SHMC method is tested on the updating of two real structures; an unsymmetrical H-shaped beam structure and a GARTEUR SM-AG19 structure and is compared to the application of the HMC algorithm on the same structures.
[Economic Evaluation of mFOLFOX6-based First-line Regimens for Unresectable Advanced or Recurrent Colorectal Cancer Using Clinical Decision Analysis].

PubMed

Shida, Toshihiro; Endo, Yuji; Shiraishi, Tadashi; Yoshioka, Takashi; Suzuki, Kaoru; Kobayashi, Yuka; Ono, Yuki; Ito, Toshinori; Inoue, Tadao

2018-01-01

We evaluated four representative chemotherapy regimens for unresectable advanced or recurrent KRAS-wild type colorectal cancer: mFOLFOX6, mFOLFOX6+bevacizumab (Bmab), cetuximab (Cmab), or panitumumab (Pmab). We employed a decision analysis method in combination with clinical and economic evidence. The health outcomes of the regimens were analyzed on the basis of overall and progression-free survival. The data were drawn from the literature on randomized controlled clinical trials of the above-mentioned drugs. The total costs of the regimens were calculated on the basis of direct costs obtained from the medical records of patients diagnosed with unresectable advanced or recurrent colorectal cancer at Yamagata University Hospital and Yamagata Prefecture Central Hospital. Cost effectiveness was analyzed using a Markov chain Monte Carlo (MCMC) method. The study was designed from the viewpoint of public medical care. The MCMC analysis revealed that expected life months and expected cost were 20 months/3,527,119 yen for mFOLFOX6, 27 months/8,270,625 yen for mFOLFOX6+Bmab, 29 months/13,174,6297 yen for mFOLFOX6+Cmab, and 6 months/12,613,445 yen for mFOLFOX6+Pmab. Incremental costs per effectiveness ratios per life month against mFOLFOX6 were 637,592 yen for mFOLFOX6+Bmab, 1,075,162 yen for mFOLFOX6+Cmab, and 587,455 yen for mFOLFOX6+Pmab. Compared to the conventional mFOLFOX6 regimen, molecular-targeted drug regimens provide better health outcomes, but the cost increases accordingly. mFOLFOX 6+Pmab is the most cost-effective regimen among those surveyed in this study.
Demonstration of emulator-based Bayesian calibration of safety analysis codes: Theory and formulation

DOE PAGES

Yurko, Joseph P.; Buongiorno, Jacopo; Youngblood, Robert

2015-05-28

System codes for simulation of safety performance of nuclear plants may contain parameters whose values are not known very accurately. New information from tests or operating experience is incorporated into safety codes by a process known as calibration, which reduces uncertainty in the output of the code and thereby improves its support for decision-making. The work reported here implements several improvements on classic calibration techniques afforded by modern analysis techniques. The key innovation has come from development of code surrogate model (or code emulator) construction and prediction algorithms. Use of a fast emulator makes the calibration processes used here withmore » Markov Chain Monte Carlo (MCMC) sampling feasible. This study uses Gaussian Process (GP) based emulators, which have been used previously to emulate computer codes in the nuclear field. The present work describes the formulation of an emulator that incorporates GPs into a factor analysis-type or pattern recognition-type model. This “function factorization” Gaussian Process (FFGP) model allows overcoming limitations present in standard GP emulators, thereby improving both accuracy and speed of the emulator-based calibration process. Calibration of a friction-factor example using a Method of Manufactured Solution is performed to illustrate key properties of the FFGP based process.« less

How Much Can We Learn from a Single Chromatographic Experiment? A Bayesian Perspective.

PubMed

Wiczling, Paweł; Kaliszan, Roman

2016-01-05

In this work, we proposed and investigated a Bayesian inference procedure to find the desired chromatographic conditions based on known analyte properties (lipophilicity, pKa, and polar surface area) using one preliminary experiment. A previously developed nonlinear mixed effect model was used to specify the prior information about a new analyte with known physicochemical properties. Further, the prior (no preliminary data) and posterior predictive distribution (prior + one experiment) were determined sequentially to search towards the desired separation. The following isocratic high-performance reversed-phase liquid chromatographic conditions were sought: (1) retention time of a single analyte within the range of 4-6 min and (2) baseline separation of two analytes with retention times within the range of 4-10 min. The empirical posterior Bayesian distribution of parameters was estimated using the "slice sampling" Markov Chain Monte Carlo (MCMC) algorithm implemented in Matlab. The simulations with artificial analytes and experimental data of ketoprofen and papaverine were used to test the proposed methodology. The simulation experiment showed that for a single and two randomly selected analytes, there is 97% and 74% probability of obtaining a successful chromatogram using none or one preliminary experiment. The desired separation for ketoprofen and papaverine was established based on a single experiment. It was confirmed that the search for a desired separation rarely requires a large number of chromatographic analyses at least for a simple optimization problem. The proposed Bayesian-based optimization scheme is a powerful method of finding a desired chromatographic separation based on a small number of preliminary experiments.
Global validation of a process-based model on vegetation gross primary production using eddy covariance observations.

PubMed

Liu, Dan; Cai, Wenwen; Xia, Jiangzhou; Dong, Wenjie; Zhou, Guangsheng; Chen, Yang; Zhang, Haicheng; Yuan, Wenping

2014-01-01

Gross Primary Production (GPP) is the largest flux in the global carbon cycle. However, large uncertainties in current global estimations persist. In this study, we examined the performance of a process-based model (Integrated BIosphere Simulator, IBIS) at 62 eddy covariance sites around the world. Our results indicated that the IBIS model explained 60% of the observed variation in daily GPP at all validation sites. Comparison with a satellite-based vegetation model (Eddy Covariance-Light Use Efficiency, EC-LUE) revealed that the IBIS simulations yielded comparable GPP results as the EC-LUE model. Global mean GPP estimated by the IBIS model was 107.50±1.37 Pg C year(-1) (mean value ± standard deviation) across the vegetated area for the period 2000-2006, consistent with the results of the EC-LUE model (109.39±1.48 Pg C year(-1)). To evaluate the uncertainty introduced by the parameter Vcmax, which represents the maximum photosynthetic capacity, we inversed Vcmax using Markov Chain-Monte Carlo (MCMC) procedures. Using the inversed Vcmax values, the simulated global GPP increased by 16.5 Pg C year(-1), indicating that IBIS model is sensitive to Vcmax, and large uncertainty exists in model parameterization.
On thinning of chains in MCMC

USGS Publications Warehouse

Link, William A.; Eaton, Mitchell J.

2012-01-01

4. We discuss the background and prevalence of thinning, illustrate its consequences, discuss circumstances when it might be regarded as a reasonable option and recommend against routine thinning of chains unless necessitated by computer memory limitations.
Cosmological parameter estimation using Particle Swarm Optimization

NASA Astrophysics Data System (ADS)

Prasad, J.; Souradeep, T.

2014-03-01

Constraining parameters of a theoretical model from observational data is an important exercise in cosmology. There are many theoretically motivated models, which demand greater number of cosmological parameters than the standard model of cosmology uses, and make the problem of parameter estimation challenging. It is a common practice to employ Bayesian formalism for parameter estimation for which, in general, likelihood surface is probed. For the standard cosmological model with six parameters, likelihood surface is quite smooth and does not have local maxima, and sampling based methods like Markov Chain Monte Carlo (MCMC) method are quite successful. However, when there are a large number of parameters or the likelihood surface is not smooth, other methods may be more effective. In this paper, we have demonstrated application of another method inspired from artificial intelligence, called Particle Swarm Optimization (PSO) for estimating cosmological parameters from Cosmic Microwave Background (CMB) data taken from the WMAP satellite.
An efficient Bayesian meta-analysis approach for studying cross-phenotype genetic associations

PubMed Central

Majumdar, Arunabha; Haldar, Tanushree; Bhattacharya, Sourabh; Witte, John S.

2018-01-01

Simultaneous analysis of genetic associations with multiple phenotypes may reveal shared genetic susceptibility across traits (pleiotropy). For a locus exhibiting overall pleiotropy, it is important to identify which specific traits underlie this association. We propose a Bayesian meta-analysis approach (termed CPBayes) that uses summary-level data across multiple phenotypes to simultaneously measure the evidence of aggregate-level pleiotropic association and estimate an optimal subset of traits associated with the risk locus. This method uses a unified Bayesian statistical framework based on a spike and slab prior. CPBayes performs a fully Bayesian analysis by employing the Markov Chain Monte Carlo (MCMC) technique Gibbs sampling. It takes into account heterogeneity in the size and direction of the genetic effects across traits. It can be applied to both cohort data and separate studies of multiple traits having overlapping or non-overlapping subjects. Simulations show that CPBayes can produce higher accuracy in the selection of associated traits underlying a pleiotropic signal than the subset-based meta-analysis ASSET. We used CPBayes to undertake a genome-wide pleiotropic association study of 22 traits in the large Kaiser GERA cohort and detected six independent pleiotropic loci associated with at least two phenotypes. This includes a locus at chromosomal region 1q24.2 which exhibits an association simultaneously with the risk of five different diseases: Dermatophytosis, Hemorrhoids, Iron Deficiency, Osteoporosis and Peripheral Vascular Disease. We provide an R-package ‘CPBayes’ implementing the proposed method. PMID:29432419
A modelling approach to assessing the timescale uncertainties in proxy series with chronological errors

NASA Astrophysics Data System (ADS)

Divine, D.; Godtliebsen, F.; Rue, H.

2012-04-01

Detailed knowledge of past climate variations is of high importance for gaining a better insight into the possible future climate scenarios. The relative shortness of available high quality instrumental climate data conditions the use of various climate proxy archives in making inference about past climate evolution. It, however, requires an accurate assessment of timescale errors in proxy-based paleoclimatic reconstructions. We here propose an approach to assessment of timescale errors in proxy-based series with chronological uncertainties. The method relies on approximation of the physical process(es) forming a proxy archive by a random Gamma process. Parameters of the process are partly data-driven and partly determined from prior assumptions. For a particular case of a linear accumulation model and absolutely dated tie points an analytical solution is found suggesting the Beta-distributed probability density on age estimates along the length of a proxy archive. In a general situation of uncertainties in the ages of the tie points the proposed method employs MCMC simulations of age-depth profiles yielding empirical confidence intervals on the constructed piecewise linear best guess timescale. It is suggested that the approach can be further extended to a more general case of a time-varying expected accumulation between the tie points. The approach is illustrated by using two ice and two lake/marine sediment cores representing the typical examples of paleoproxy archives with age models constructed using tie points of mixed origin.
Stochastic approach for radionuclides quantification

NASA Astrophysics Data System (ADS)

Clement, A.; Saurel, N.; Perrin, G.

2018-01-01

Gamma spectrometry is a passive non-destructive assay used to quantify radionuclides present in more or less complex objects. Basic methods using empirical calibration with a standard in order to quantify the activity of nuclear materials by determining the calibration coefficient are useless on non-reproducible, complex and single nuclear objects such as waste packages. Package specifications as composition or geometry change from one package to another and involve a high variability of objects. Current quantification process uses numerical modelling of the measured scene with few available data such as geometry or composition. These data are density, material, screen, geometric shape, matrix composition, matrix and source distribution. Some of them are strongly dependent on package data knowledge and operator backgrounds. The French Commissariat à l'Energie Atomique (CEA) is developing a new methodology to quantify nuclear materials in waste packages and waste drums without operator adjustment and internal package configuration knowledge. This method suggests combining a global stochastic approach which uses, among others, surrogate models available to simulate the gamma attenuation behaviour, a Bayesian approach which considers conditional probability densities of problem inputs, and Markov Chains Monte Carlo algorithms (MCMC) which solve inverse problems, with gamma ray emission radionuclide spectrum, and outside dimensions of interest objects. The methodology is testing to quantify actinide activity in different kind of matrix, composition, and configuration of sources standard in terms of actinide masses, locations and distributions. Activity uncertainties are taken into account by this adjustment methodology.
Development of Cloud and Precipitation Property Retrieval Algorithms and Measurement Simulators from ASR Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mace, Gerald G.

What has made the ASR program unique is the amount of information that is available. The suite of recently deployed instruments significantly expands the scope of the program (Mather and Voyles, 2013). The breadth of this information allows us to pose sophisticated process-level questions. Our ASR project, now entering its third year, has been about developing algorithms that use this information in ways that fully exploit the new capacity of the ARM data streams. Using optimal estimation (OE) and Markov Chain Monte Carlo (MCMC) inversion techniques, we have developed methodologies that allow us to use multiple radar frequency Doppler spectramore » along with lidar and passive constraints where data streams can be added or subtracted efficiently and algorithms can be reformulated for various combinations of hydrometeors by exchanging sets of empirical coefficients. These methodologies have been applied to boundary layer clouds, mixed phase snow cloud systems, and cirrus.« less
Bayesian inversion of a CRN depth profile to infer Quaternary erosion of the northwestern Campine Plateau (NE Belgium)

NASA Astrophysics Data System (ADS)

Laloy, Eric; Beerten, Koen; Vanacker, Veerle; Christl, Marcus; Rogiers, Bart; Wouters, Laurent

2017-07-01

The rate at which low-lying sandy areas in temperate regions, such as the Campine Plateau (NE Belgium), have been eroding during the Quaternary is a matter of debate. Current knowledge on the average pace of landscape evolution in the Campine area is largely based on geological inferences and modern analogies. We performed a Bayesian inversion of an in situ-produced 10Be concentration depth profile to infer the average long-term erosion rate together with two other parameters: the surface exposure age and the inherited 10Be concentration. Compared to the latest advances in probabilistic inversion of cosmogenic radionuclide (CRN) data, our approach has the following two innovative components: it (1) uses Markov chain Monte Carlo (MCMC) sampling and (2) accounts (under certain assumptions) for the contribution of model errors to posterior uncertainty. To investigate to what extent our approach differs from the state of the art in practice, a comparison against the Bayesian inversion method implemented in the CRONUScalc program is made. Both approaches identify similar maximum a posteriori (MAP) parameter values, but posterior parameter and predictive uncertainty derived using the method taken in CRONUScalc is moderately underestimated. A simple way for producing more consistent uncertainty estimates with the CRONUScalc-like method in the presence of model errors is therefore suggested. Our inferred erosion rate of 39 ± 8. 9 mm kyr-1 (1σ) is relatively large in comparison with landforms that erode under comparable (paleo-)climates elsewhere in the world. We evaluate this value in the light of the erodibility of the substrate and sudden base level lowering during the Middle Pleistocene. A denser sampling scheme of a two-nuclide concentration depth profile would allow for better inferred erosion rate resolution, and including more uncertain parameters in the MCMC inversion.
Modeling and estimating the jump risk of exchange rates: Applications to RMB

NASA Astrophysics Data System (ADS)

Wang, Yiming; Tong, Hanfei

2008-11-01

In this paper we propose a new type of continuous-time stochastic volatility model, SVDJ, for the spot exchange rate of RMB, and other foreign currencies. In the model, we assume that the change of exchange rate can be decomposed into two components. One is the normally small-cope innovation driven by the diffusion motion; the other is a large drop or rise engendered by the Poisson counting process. Furthermore, we develop a MCMC method to estimate our model. Empirical results indicate the significant existence of jumps in the exchange rate. Jump components explain a large proportion of the exchange rate change.
Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks

PubMed Central

2017-01-01

Whole-genome sequencing of pathogens from host samples becomes more and more routine during infectious disease outbreaks. These data provide information on possible transmission events which can be used for further epidemiologic analyses, such as identification of risk factors for infectivity and transmission. However, the relationship between transmission events and sequence data is obscured by uncertainty arising from four largely unobserved processes: transmission, case observation, within-host pathogen dynamics and mutation. To properly resolve transmission events, these processes need to be taken into account. Recent years have seen much progress in theory and method development, but existing applications make simplifying assumptions that often break up the dependency between the four processes, or are tailored to specific datasets with matching model assumptions and code. To obtain a method with wider applicability, we have developed a novel approach to reconstruct transmission trees with sequence data. Our approach combines elementary models for transmission, case observation, within-host pathogen dynamics, and mutation, under the assumption that the outbreak is over and all cases have been observed. We use Bayesian inference with MCMC for which we have designed novel proposal steps to efficiently traverse the posterior distribution, taking account of all unobserved processes at once. This allows for efficient sampling of transmission trees from the posterior distribution, and robust estimation of consensus transmission trees. We implemented the proposed method in a new R package phybreak. The method performs well in tests of both new and published simulated data. We apply the model to five datasets on densely sampled infectious disease outbreaks, covering a wide range of epidemiological settings. Using only sampling times and sequences as data, our analyses confirmed the original results or improved on them: the more realistic infection times place more confidence in the inferred transmission trees. PMID:28545083
Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks.

PubMed

Klinkenberg, Don; Backer, Jantien A; Didelot, Xavier; Colijn, Caroline; Wallinga, Jacco

2017-05-01

Whole-genome sequencing of pathogens from host samples becomes more and more routine during infectious disease outbreaks. These data provide information on possible transmission events which can be used for further epidemiologic analyses, such as identification of risk factors for infectivity and transmission. However, the relationship between transmission events and sequence data is obscured by uncertainty arising from four largely unobserved processes: transmission, case observation, within-host pathogen dynamics and mutation. To properly resolve transmission events, these processes need to be taken into account. Recent years have seen much progress in theory and method development, but existing applications make simplifying assumptions that often break up the dependency between the four processes, or are tailored to specific datasets with matching model assumptions and code. To obtain a method with wider applicability, we have developed a novel approach to reconstruct transmission trees with sequence data. Our approach combines elementary models for transmission, case observation, within-host pathogen dynamics, and mutation, under the assumption that the outbreak is over and all cases have been observed. We use Bayesian inference with MCMC for which we have designed novel proposal steps to efficiently traverse the posterior distribution, taking account of all unobserved processes at once. This allows for efficient sampling of transmission trees from the posterior distribution, and robust estimation of consensus transmission trees. We implemented the proposed method in a new R package phybreak. The method performs well in tests of both new and published simulated data. We apply the model to five datasets on densely sampled infectious disease outbreaks, covering a wide range of epidemiological settings. Using only sampling times and sequences as data, our analyses confirmed the original results or improved on them: the more realistic infection times place more confidence in the inferred transmission trees.
A Bayesian approach for parameter estimation and prediction using a computationally intensive model

DOE PAGES

Higdon, Dave; McDonnell, Jordan D.; Schunck, Nicolas; ...

2015-02-05

Bayesian methods have been successful in quantifying uncertainty in physics-based problems in parameter estimation and prediction. In these cases, physical measurements y are modeled as the best fit of a physics-based modelmore » $$\\eta (\\theta )$$, where θ denotes the uncertain, best input setting. Hence the statistical model is of the form $$y=\\eta (\\theta )+\\epsilon ,$$ where $$\\epsilon $$ accounts for measurement, and possibly other, error sources. When nonlinearity is present in $$\\eta (\\cdot )$$, the resulting posterior distribution for the unknown parameters in the Bayesian formulation is typically complex and nonstandard, requiring computationally demanding computational approaches such as Markov chain Monte Carlo (MCMC) to produce multivariate draws from the posterior. Although generally applicable, MCMC requires thousands (or even millions) of evaluations of the physics model $$\\eta (\\cdot )$$. This requirement is problematic if the model takes hours or days to evaluate. To overcome this computational bottleneck, we present an approach adapted from Bayesian model calibration. This approach combines output from an ensemble of computational model runs with physical measurements, within a statistical formulation, to carry out inference. A key component of this approach is a statistical response surface, or emulator, estimated from the ensemble of model runs. We demonstrate this approach with a case study in estimating parameters for a density functional theory model, using experimental mass/binding energy measurements from a collection of atomic nuclei. Lastly, we also demonstrate how this approach produces uncertainties in predictions for recent mass measurements obtained at Argonne National Laboratory.« less
Bayesian hierarchical modelling of continuous non‐negative longitudinal data with a spike at zero: An application to a study of birds visiting gardens in winter

PubMed Central

Buckland, Stephen T.; King, Ruth; Toms, Mike P.

2015-01-01

The development of methods for dealing with continuous data with a spike at zero has lagged behind those for overdispersed or zero‐inflated count data. We consider longitudinal ecological data corresponding to an annual average of 26 weekly maximum counts of birds, and are hence effectively continuous, bounded below by zero but also with a discrete mass at zero. We develop a Bayesian hierarchical Tweedie regression model that can directly accommodate the excess number of zeros common to this type of data, whilst accounting for both spatial and temporal correlation. Implementation of the model is conducted in a Markov chain Monte Carlo (MCMC) framework, using reversible jump MCMC to explore uncertainty across both parameter and model spaces. This regression modelling framework is very flexible and removes the need to make strong assumptions about mean‐variance relationships a priori. It can also directly account for the spike at zero, whilst being easily applicable to other types of data and other model formulations. Whilst a correlative study such as this cannot prove causation, our results suggest that an increase in an avian predator may have led to an overall decrease in the number of one of its prey species visiting garden feeding stations in the United Kingdom. This may reflect a change in behaviour of house sparrows to avoid feeding stations frequented by sparrowhawks, or a reduction in house sparrow population size as a result of sparrowhawk increase. PMID:25737026
Stochastic static fault slip inversion from geodetic data with non-negativity and bound constraints

NASA Astrophysics Data System (ADS)

Nocquet, J.-M.

2018-07-01

Despite surface displacements observed by geodesy are linear combinations of slip at faults in an elastic medium, determining the spatial distribution of fault slip remains a ill-posed inverse problem. A widely used approach to circumvent the illness of the inversion is to add regularization constraints in terms of smoothing and/or damping so that the linear system becomes invertible. However, the choice of regularization parameters is often arbitrary, and sometimes leads to significantly different results. Furthermore, the resolution analysis is usually empirical and cannot be made independently of the regularization. The stochastic approach of inverse problems provides a rigorous framework where the a priori information about the searched parameters is combined with the observations in order to derive posterior probabilities of the unkown parameters. Here, I investigate an approach where the prior probability density function (pdf) is a multivariate Gaussian function, with single truncation to impose positivity of slip or double truncation to impose positivity and upper bounds on slip for interseismic modelling. I show that the joint posterior pdf is similar to the linear untruncated Gaussian case and can be expressed as a truncated multivariate normal (TMVN) distribution. The TMVN form can then be used to obtain semi-analytical formulae for the single, 2-D or n-D marginal pdf. The semi-analytical formula involves the product of a Gaussian by an integral term that can be evaluated using recent developments in TMVN probabilities calculations. Posterior mean and covariance can also be efficiently derived. I show that the maximum posterior (MAP) can be obtained using a non-negative least-squares algorithm for the single truncated case or using the bounded-variable least-squares algorithm for the double truncated case. I show that the case of independent uniform priors can be approximated using TMVN. The numerical equivalence to Bayesian inversions using Monte Carlo Markov chain (MCMC) sampling is shown for a synthetic example and a real case for interseismic modelling in Central Peru. The TMVN method overcomes several limitations of the Bayesian approach using MCMC sampling. First, the need of computer power is largely reduced. Second, unlike Bayesian MCMC-based approach, marginal pdf, mean, variance or covariance are obtained independently one from each other. Third, the probability and cumulative density functions can be obtained with any density of points. Finally, determining the MAP is extremely fast.
Bayesian Atmospheric Radiative Transfer (BART): Model, Statistics Driver, and Application to HD 209458b

NASA Astrophysics Data System (ADS)

Cubillos, Patricio; Harrington, Joseph; Blecic, Jasmina; Stemm, Madison M.; Lust, Nate B.; Foster, Andrew S.; Rojo, Patricio M.; Loredo, Thomas J.

2014-11-01

Multi-wavelength secondary-eclipse and transit depths probe the thermo-chemical properties of exoplanets. In recent years, several research groups have developed retrieval codes to analyze the existing data and study the prospects of future facilities. However, the scientific community has limited access to these packages. Here we premiere the open-source Bayesian Atmospheric Radiative Transfer (BART) code. We discuss the key aspects of the radiative-transfer algorithm and the statistical package. The radiation code includes line databases for all HITRAN molecules, high-temperature H2O, TiO, and VO, and includes a preprocessor for adding additional line databases without recompiling the radiation code. Collision-induced absorption lines are available for H2-H2 and H2-He. The parameterized thermal and molecular abundance profiles can be modified arbitrarily without recompilation. The generated spectra are integrated over arbitrary bandpasses for comparison to data. BART's statistical package, Multi-core Markov-chain Monte Carlo (MC3), is a general-purpose MCMC module. MC3 implements the Differental-evolution Markov-chain Monte Carlo algorithm (ter Braak 2006, 2009). MC3 converges 20-400 times faster than the usual Metropolis-Hastings MCMC algorithm, and in addition uses the Message Passing Interface (MPI) to parallelize the MCMC chains. We apply the BART retrieval code to the HD 209458b data set to estimate the planet's temperature profile and molecular abundances. This work was supported by NASA Planetary Atmospheres grant NNX12AI69G and NASA Astrophysics Data Analysis Program grant NNX13AF38G. JB holds a NASA Earth and Space Science Fellowship.
Adaptive MCMC in Bayesian phylogenetics: an application to analyzing partitioned data in BEAST.

PubMed

Baele, Guy; Lemey, Philippe; Rambaut, Andrew; Suchard, Marc A

2017-06-15

Advances in sequencing technology continue to deliver increasingly large molecular sequence datasets that are often heavily partitioned in order to accurately model the underlying evolutionary processes. In phylogenetic analyses, partitioning strategies involve estimating conditionally independent models of molecular evolution for different genes and different positions within those genes, requiring a large number of evolutionary parameters that have to be estimated, leading to an increased computational burden for such analyses. The past two decades have also seen the rise of multi-core processors, both in the central processing unit (CPU) and Graphics processing unit processor markets, enabling massively parallel computations that are not yet fully exploited by many software packages for multipartite analyses. We here propose a Markov chain Monte Carlo (MCMC) approach using an adaptive multivariate transition kernel to estimate in parallel a large number of parameters, split across partitioned data, by exploiting multi-core processing. Across several real-world examples, we demonstrate that our approach enables the estimation of these multipartite parameters more efficiently than standard approaches that typically use a mixture of univariate transition kernels. In one case, when estimating the relative rate parameter of the non-coding partition in a heterochronous dataset, MCMC integration efficiency improves by > 14-fold. Our implementation is part of the BEAST code base, a widely used open source software package to perform Bayesian phylogenetic inference. guy.baele@kuleuven.be. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Sequential Designs Based on Bayesian Uncertainty Quantification in Sparse Representation Surrogate Modeling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Ray -Bing; Wang, Weichung; Jeff Wu, C. F.

A numerical method, called OBSM, was recently proposed which employs overcomplete basis functions to achieve sparse representations. While the method can handle non-stationary response without the need of inverting large covariance matrices, it lacks the capability to quantify uncertainty in predictions. We address this issue by proposing a Bayesian approach which first imposes a normal prior on the large space of linear coefficients, then applies the MCMC algorithm to generate posterior samples for predictions. From these samples, Bayesian credible intervals can then be obtained to assess prediction uncertainty. A key application for the proposed method is the efficient construction ofmore » sequential designs. Several sequential design procedures with different infill criteria are proposed based on the generated posterior samples. As a result, numerical studies show that the proposed schemes are capable of solving problems of positive point identification, optimization, and surrogate fitting.« less
Sequential Designs Based on Bayesian Uncertainty Quantification in Sparse Representation Surrogate Modeling

DOE PAGES

Chen, Ray -Bing; Wang, Weichung; Jeff Wu, C. F.

2017-04-12

A numerical method, called OBSM, was recently proposed which employs overcomplete basis functions to achieve sparse representations. While the method can handle non-stationary response without the need of inverting large covariance matrices, it lacks the capability to quantify uncertainty in predictions. We address this issue by proposing a Bayesian approach which first imposes a normal prior on the large space of linear coefficients, then applies the MCMC algorithm to generate posterior samples for predictions. From these samples, Bayesian credible intervals can then be obtained to assess prediction uncertainty. A key application for the proposed method is the efficient construction ofmore » sequential designs. Several sequential design procedures with different infill criteria are proposed based on the generated posterior samples. As a result, numerical studies show that the proposed schemes are capable of solving problems of positive point identification, optimization, and surrogate fitting.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Nitao, J J

The goal of the Event Reconstruction Project is to find the location and strength of atmospheric release points, both stationary and moving. Source inversion relies on observational data as input. The methodology is sufficiently general to allow various forms of data. In this report, the authors will focus primarily on concentration measurements obtained at point monitoring locations at various times. The algorithms being investigated in the Project are the MCMC (Markov Chain Monte Carlo), SMC (Sequential Monte Carlo) Methods, classical inversion methods, and hybrids of these. They refer the reader to the report by Johannesson et al. (2004) for explanationsmore » of these methods. These methods require computing the concentrations at all monitoring locations for a given ''proposed'' source characteristic (locations and strength history). It is anticipated that the largest portion of the CPU time will take place performing this computation. MCMC and SMC will require this computation to be done at least tens of thousands of times. Therefore, an efficient means of computing forward model predictions is important to making the inversion practical. In this report they show how Green's functions and reciprocal Green's functions can significantly accelerate forward model computations. First, instead of computing a plume for each possible source strength history, they can compute plumes from unit impulse sources only. By using linear superposition, they can obtain the response for any strength history. This response is given by the forward Green's function. Second, they may use the law of reciprocity. Suppose that they require the concentration at a single monitoring point x{sub m} due to a potential (unit impulse) source that is located at x{sub s}. instead of computing a plume with source location x{sub s}, they compute a ''reciprocal plume'' whose (unit impulse) source is at the monitoring locations x{sub m}. The reciprocal plume is computed using a reversed-direction wind field. The wind field and transport coefficients must also be appropriately time-reversed. Reciprocity says that the concentration of reciprocal plume at x{sub s} is related to the desired concentration at x{sub m}. Since there are many less monitoring points than potential source locations, the number of forward model computations is drastically reduced.« less

Resolution analysis of marine seismic full waveform data by Bayesian inversion

NASA Astrophysics Data System (ADS)

Ray, A.; Sekar, A.; Hoversten, G. M.; Albertin, U.

2015-12-01

The Bayesian posterior density function (PDF) of earth models that fit full waveform seismic data convey information on the uncertainty with which the elastic model parameters are resolved. In this work, we apply the trans-dimensional reversible jump Markov Chain Monte Carlo method (RJ-MCMC) for the 1D inversion of noisy synthetic full-waveform seismic data in the frequency-wavenumber domain. While seismic full waveform inversion (FWI) is a powerful method for characterizing subsurface elastic parameters, the uncertainty in the inverted models has remained poorly known, if at all and is highly initial model dependent. The Bayesian method we use is trans-dimensional in that the number of model layers is not fixed, and flexible such that the layer boundaries are free to move around. The resulting parameterization does not require regularization to stabilize the inversion. Depth resolution is traded off with the number of layers, providing an estimate of uncertainty in elastic parameters (compressional and shear velocities Vp and Vs as well as density) with depth. We find that in the absence of additional constraints, Bayesian inversion can result in a wide range of posterior PDFs on Vp, Vs and density. These PDFs range from being clustered around the true model, to those that contain little resolution of any particular features other than those in the near surface, depending on the particular data and target geometry. We present results for a suite of different frequencies and offset ranges, examining the differences in the posterior model densities thus derived. Though these results are for a 1D earth, they are applicable to areas with simple, layered geology and provide valuable insight into the resolving capabilities of FWI, as well as highlight the challenges in solving a highly non-linear problem. The RJ-MCMC method also presents a tantalizing possibility for extension to 2D and 3D Bayesian inversion of full waveform seismic data in the future, as it objectively tackles the problem of model selection (i.e., the number of layers or cells for parameterization), which could ease the computational burden of evaluating forward models with many parameters.
SVD/MCMC Data Analysis Pipeline for Global Redshifted 21-cm Spectrum Observations of the Cosmic Dawn and Dark Ages

NASA Astrophysics Data System (ADS)

Burns, Jack O.; Tauscher, Keith; Rapetti, David; Mirocha, Jordan; Switzer, Eric

2018-01-01

We have designed a complete data analysis pipeline for constraining Cosmic Dawn physics using sky-averaged spectra in the VHF range (40-200 MHz) obtained either from the ground (e.g., the Experiment to Detect Global Epoch of Reionization Signal, EDGES; and the Cosmic Twilight Polarimeter, CTP) or from orbit above the lunar farside (e.g., the Dark Ages Radio Explorer, DARE). In the case of DARE, we avoid Earth-based RFI, ionospheric effects, and radio solar emissions (when observing at night). To extract the 21-cm spectrum, we parametrize the cosmological signal and systematics with two separate sets of modes defined through Singular Value Decomposition (SVD) of training set curves. The training set for the 21-cm spin-flip brightness temperatures is composed of theoretical models of the first stars, galaxies and black holes created by varying physical parameters within the ares code. The systematics training set is created using sky and beam data to model the beam-weighted foregrounds (which are about four orders of magnitude larger than the signal) as well as expected lab data to model receiver systematics. To constrain physical parameters determining the 21-cm spectrum, we apply to the extracted signal a series of consecutive fitting techniques including two usages of a Markov Chain Monte Carlo (MCMC) algorithm. Importantly, our pipeline efficiently utilizes the significant differences between the foreground and the 21-cm signal in spatial and spectral variations. In addition, it incorporates for the first time polarization data, dramatically improving the constraining power. We are currently validating this end-to-end pipeline using detailed simulations of the signal, foregrounds and instruments. This work was directly supported by the NASA Solar System Exploration Research Virtual Institute cooperative agreement number 80ARC017M0006 and funding from the NASA Ames Research Center cooperative agreement NNX16AF59G.
A general method to determine sampling windows for nonlinear mixed effects models with an application to population pharmacokinetic studies.

PubMed

Foo, Lee Kien; McGree, James; Duffull, Stephen

2012-01-01

Optimal design methods have been proposed to determine the best sampling times when sparse blood sampling is required in clinical pharmacokinetic studies. However, the optimal blood sampling time points may not be feasible in clinical practice. Sampling windows, a time interval for blood sample collection, have been proposed to provide flexibility in blood sampling times while preserving efficient parameter estimation. Because of the complexity of the population pharmacokinetic models, which are generally nonlinear mixed effects models, there is no analytical solution available to determine sampling windows. We propose a method for determination of sampling windows based on MCMC sampling techniques. The proposed method attains a stationary distribution rapidly and provides time-sensitive windows around the optimal design points. The proposed method is applicable to determine sampling windows for any nonlinear mixed effects model although our work focuses on an application to population pharmacokinetic models. Copyright © 2012 John Wiley & Sons, Ltd.
Application of Bayesian Approach to Cost-Effectiveness Analysis of Antiviral Treatments in Chronic Hepatitis B.

PubMed

Zhang, Hua; Huo, Mingdong; Chao, Jianqian; Liu, Pei

2016-01-01

Hepatitis B virus (HBV) infection is a major problem for public health; timely antiviral treatment can significantly prevent the progression of liver damage from HBV by slowing down or stopping the virus from reproducing. In the study we applied Bayesian approach to cost-effectiveness analysis, using Markov Chain Monte Carlo (MCMC) simulation methods for the relevant evidence input into the model to evaluate cost-effectiveness of entecavir (ETV) and lamivudine (LVD) therapy for chronic hepatitis B (CHB) in Jiangsu, China, thus providing information to the public health system in the CHB therapy. Eight-stage Markov model was developed, a hypothetical cohort of 35-year-old HBeAg-positive patients with CHB was entered into the model. Treatment regimens were LVD100mg daily and ETV 0.5 mg daily. The transition parameters were derived either from systematic reviews of the literature or from previous economic studies. The outcome measures were life-years, quality-adjusted lifeyears (QALYs), and expected costs associated with the treatments and disease progression. For the Bayesian models all the analysis was implemented by using WinBUGS version 1.4. Expected cost, life expectancy, QALYs decreased with age. Cost-effectiveness increased with age. Expected cost of ETV was less than LVD, while life expectancy and QALYs were higher than that of LVD, ETV strategy was more cost-effective. Costs and benefits of the Monte Carlo simulation were very close to the results of exact form among the group, but standard deviation of each group indicated there was a big difference between individual patients. Compared with lamivudine, entecavir is the more cost-effective option. CHB patients should accept antiviral treatment as soon as possible as the lower age the more cost-effective. Monte Carlo simulation obtained costs and effectiveness distribution, indicate our Markov model is of good robustness.
EFFICIENT MODEL-FITTING AND MODEL-COMPARISON FOR HIGH-DIMENSIONAL BAYESIAN GEOSTATISTICAL MODELS. (R826887)

EPA Science Inventory

Geostatistical models are appropriate for spatially distributed data measured at irregularly spaced locations. We propose an efficient Markov chain Monte Carlo (MCMC) algorithm for fitting Bayesian geostatistical models with substantial numbers of unknown parameters to sizable...
MULTIVARIATE RECEPTOR MODELING FOR TEMPORALLY CORRELATED DATA BY USING MCMC. (R826238)

EPA Science Inventory

The perspectives, information and conclusions conveyed in research project abstracts, progress reports, final reports, journal abstracts and journal publications convey the viewpoints of the principal investigator and may not represent the views and policies of ORD and EPA. Concl...
Estimation of under-reporting in epidemics using approximations.

PubMed

Gamado, Kokouvi; Streftaris, George; Zachary, Stan

2017-06-01

Under-reporting in epidemics, when it is ignored, leads to under-estimation of the infection rate and therefore of the reproduction number. In the case of stochastic models with temporal data, a usual approach for dealing with such issues is to apply data augmentation techniques through Bayesian methodology. Departing from earlier literature approaches implemented using reversible jump Markov chain Monte Carlo (RJMCMC) techniques, we make use of approximations to obtain faster estimation with simple MCMC. Comparisons among the methods developed here, and with the RJMCMC approach, are carried out and highlight that approximation-based methodology offers useful alternative inference tools for large epidemics, with a good trade-off between time cost and accuracy.
A STUDY OF HIERARCHICAL ORGANIZATION LOCATION MODEL FOR SERVICE INDUSTRY ENTERPRISE

NASA Astrophysics Data System (ADS)

Okumura, Makoto; Takada, Naoki; Okubo, Kazuaki

The service industry has come to participate in many economic activities, but there are few studies, which analyze on the hierarchical organization location of a service industry enterprise that treats information. We propose a hierarchical organized location model, which endogenously determines the number of hierarchies. Furtheremore, We propose a MCMC based statistical method to obtain parameter distributions corresponding to the observed macro emplowment distribution as well as the saralies paid. By using our model, we analyzed the regional disparities about the number of employees and the wage. We found that the change of the regional disparities is caused by internal factors such as the industrial structural change, rather than external factors such as traffic condition changes.
Update on ɛK with lattice QCD inputs

NASA Astrophysics Data System (ADS)

Jang, Yong-Chull; Lee, Weonjong; Lee, Sunkyu; Leem, Jaehoon

2018-03-01

We report updated results for ɛK, the indirect CP violation parameter in neutral kaons, which is evaluated directly from the standard model with lattice QCD inputs. We use lattice QCD inputs to fix B\\hatk,|Vcb|,ξ0,ξ2,|Vus|, and mc(mc). Since Lattice 2016, the UTfit group has updated the Wolfenstein parameters in the angle-only-fit method, and the HFLAV group has also updated |Vcb|. Our results show that the evaluation of ɛK with exclusive |Vcb| (lattice QCD inputs) has 4.0σ tension with the experimental value, while that with inclusive |Vcb| (heavy quark expansion based on OPE and QCD sum rules) shows no tension.
Cosmological Constraint on Brans-Dicke Theory

NASA Astrophysics Data System (ADS)

Chen, Xuelei; Wu, Fengquan

We develop the covariant formalism of the cosmological perturbation theory for the Brans-Dicke gravity, and use it to calculate the cosmic microwave background (CMB) anisotropy and large scale structure (LSS) power spectrum. We introduce a new parameter ζ which is related to the Brans-Dicke parameter ζ = ln(1/ω + 1), and use the Markov-Chain Monte Carlo (MCMC) method to explore the parameter space. Using the latest CMB data published by WMAP, ACBAR, CBI, Boomerang teams, and the LSS data from the SDSS survey DR4, we find that the the 2σ (95.5%) bound on ζ is about |ζ| > 10-2, or |ω| > 102, the precise limit depends somewhat on the prior used.
Inferring epidemiological parameters from phylogenies using regression-ABC: A comparative study

PubMed Central

Gascuel, Olivier

2017-01-01

Inferring epidemiological parameters such as the R0 from time-scaled phylogenies is a timely challenge. Most current approaches rely on likelihood functions, which raise specific issues that range from computing these functions to finding their maxima numerically. Here, we present a new regression-based Approximate Bayesian Computation (ABC) approach, which we base on a large variety of summary statistics intended to capture the information contained in the phylogeny and its corresponding lineage-through-time plot. The regression step involves the Least Absolute Shrinkage and Selection Operator (LASSO) method, which is a robust machine learning technique. It allows us to readily deal with the large number of summary statistics, while avoiding resorting to Markov Chain Monte Carlo (MCMC) techniques. To compare our approach to existing ones, we simulated target trees under a variety of epidemiological models and settings, and inferred parameters of interest using the same priors. We found that, for large phylogenies, the accuracy of our regression-ABC is comparable to that of likelihood-based approaches involving birth-death processes implemented in BEAST2. Our approach even outperformed these when inferring the host population size with a Susceptible-Infected-Removed epidemiological model. It also clearly outperformed a recent kernel-ABC approach when assuming a Susceptible-Infected epidemiological model with two host types. Lastly, by re-analyzing data from the early stages of the recent Ebola epidemic in Sierra Leone, we showed that regression-ABC provides more realistic estimates for the duration parameters (latency and infectiousness) than the likelihood-based method. Overall, ABC based on a large variety of summary statistics and a regression method able to perform variable selection and avoid overfitting is a promising approach to analyze large phylogenies. PMID:28263987
Skill (or lack thereof) of data-model fusion techniques to provide an early warning signal for an approaching tipping point.

PubMed

Singh, Riddhi; Quinn, Julianne D; Reed, Patrick M; Keller, Klaus

2018-01-01

Many coupled human-natural systems have the potential to exhibit a highly nonlinear threshold response to external forcings resulting in fast transitions to undesirable states (such as eutrophication in a lake). Often, there are considerable uncertainties that make identifying the threshold challenging. Thus, rapid learning is critical for guiding management actions to avoid abrupt transitions. Here, we adopt the shallow lake problem as a test case to compare the performance of four common data assimilation schemes to predict an approaching transition. In order to demonstrate the complex interactions between management strategies and the ability of the data assimilation schemes to predict eutrophication, we also analyze our results across two different management strategies governing phosphorus emissions into the shallow lake. The compared data assimilation schemes are: ensemble Kalman filtering (EnKF), particle filtering (PF), pre-calibration (PC), and Markov Chain Monte Carlo (MCMC) estimation. While differing in their core assumptions, each data assimilation scheme is based on Bayes' theorem and updates prior beliefs about a system based on new information. For large computational investments, EnKF, PF and MCMC show similar skill in capturing the observed phosphorus in the lake (measured as expected root mean squared prediction error). EnKF, followed by PF, displays the highest learning rates at low computational cost, thus providing a more reliable signal of an impending transition. MCMC approaches the true probability of eutrophication only after a strong signal of an impending transition emerges from the observations. Overall, we find that learning rates are greatest near regions of abrupt transitions, posing a challenge to early learning and preemptive management of systems with such abrupt transitions.
Skill (or lack thereof) of data-model fusion techniques to provide an early warning signal for an approaching tipping point

PubMed Central

Quinn, Julianne D.; Reed, Patrick M.; Keller, Klaus

2018-01-01

Many coupled human-natural systems have the potential to exhibit a highly nonlinear threshold response to external forcings resulting in fast transitions to undesirable states (such as eutrophication in a lake). Often, there are considerable uncertainties that make identifying the threshold challenging. Thus, rapid learning is critical for guiding management actions to avoid abrupt transitions. Here, we adopt the shallow lake problem as a test case to compare the performance of four common data assimilation schemes to predict an approaching transition. In order to demonstrate the complex interactions between management strategies and the ability of the data assimilation schemes to predict eutrophication, we also analyze our results across two different management strategies governing phosphorus emissions into the shallow lake. The compared data assimilation schemes are: ensemble Kalman filtering (EnKF), particle filtering (PF), pre-calibration (PC), and Markov Chain Monte Carlo (MCMC) estimation. While differing in their core assumptions, each data assimilation scheme is based on Bayes’ theorem and updates prior beliefs about a system based on new information. For large computational investments, EnKF, PF and MCMC show similar skill in capturing the observed phosphorus in the lake (measured as expected root mean squared prediction error). EnKF, followed by PF, displays the highest learning rates at low computational cost, thus providing a more reliable signal of an impending transition. MCMC approaches the true probability of eutrophication only after a strong signal of an impending transition emerges from the observations. Overall, we find that learning rates are greatest near regions of abrupt transitions, posing a challenge to early learning and preemptive management of systems with such abrupt transitions. PMID:29389938
A Robust Deconvolution Method based on Transdimensional Hierarchical Bayesian Inference

NASA Astrophysics Data System (ADS)

Kolb, J.; Lekic, V.

2012-12-01

Analysis of P-S and S-P conversions allows us to map receiver side crustal and lithospheric structure. This analysis often involves deconvolution of the parent wave field from the scattered wave field as a means of suppressing source-side complexity. A variety of deconvolution techniques exist including damped spectral division, Wiener filtering, iterative time-domain deconvolution, and the multitaper method. All of these techniques require estimates of noise characteristics as input parameters. We present a deconvolution method based on transdimensional Hierarchical Bayesian inference in which both noise magnitude and noise correlation are used as parameters in calculating the likelihood probability distribution. Because the noise for P-S and S-P conversion analysis in terms of receiver functions is a combination of both background noise - which is relatively easy to characterize - and signal-generated noise - which is much more difficult to quantify - we treat measurement errors as an known quantity, characterized by a probability density function whose mean and variance are model parameters. This transdimensional Hierarchical Bayesian approach has been successfully used previously in the inversion of receiver functions in terms of shear and compressional wave speeds of an unknown number of layers [1]. In our method we used a Markov chain Monte Carlo (MCMC) algorithm to find the receiver function that best fits the data while accurately assessing the noise parameters. In order to parameterize the receiver function we model the receiver function as an unknown number of Gaussians of unknown amplitude and width. The algorithm takes multiple steps before calculating the acceptance probability of a new model, in order to avoid getting trapped in local misfit minima. Using both observed and synthetic data, we show that the MCMC deconvolution method can accurately obtain a receiver function as well as an estimate of the noise parameters given the parent and daughter components. Furthermore, we demonstrate that this new approach is far less susceptible to generating spurious features even at high noise levels. Finally, the method yields not only the most-likely receiver function, but also quantifies its full uncertainty. [1] Bodin, T., M. Sambridge, H. Tkalčić, P. Arroucau, K. Gallagher, and N. Rawlinson (2012), Transdimensional inversion of receiver functions and surface wave dispersion, J. Geophys. Res., 117, B02301
Modelling the effect of bednet coverage on malaria transmission in South Sudan.

PubMed

Mukhtar, Abdulaziz Y A; Munyakazi, Justin B; Ouifki, Rachid; Clark, Allan E

2018-01-01

A campaign for malaria control, using Long Lasting Insecticide Nets (LLINs) was launched in South Sudan in 2009. The success of such a campaign often depends upon adequate available resources and reliable surveillance data which help officials understand existing infections. An optimal allocation of resources for malaria control at a sub-national scale is therefore paramount to the success of efforts to reduce malaria prevalence. In this paper, we extend an existing SIR mathematical model to capture the effect of LLINs on malaria transmission. Available data on malaria is utilized to determine realistic parameter values of this model using a Bayesian approach via Markov Chain Monte Carlo (MCMC) methods. Then, we explore the parasite prevalence on a continued rollout of LLINs in three different settings in order to create a sub-national projection of malaria. Further, we calculate the model's basic reproductive number and study its sensitivity to LLINs' coverage and its efficacy. From the numerical simulation results, we notice a basic reproduction number, [Formula: see text], confirming a substantial increase of incidence cases if no form of intervention takes place in the community. This work indicates that an effective use of LLINs may reduce [Formula: see text] and hence malaria transmission. We hope that this study will provide a basis for recommending a scaling-up of the entry point of LLINs' distribution that targets households in areas at risk of malaria.
Toward a probabilistic acoustic emission source location algorithm: A Bayesian approach

NASA Astrophysics Data System (ADS)

Schumacher, Thomas; Straub, Daniel; Higgins, Christopher

2012-09-01

Acoustic emissions (AE) are stress waves initiated by sudden strain releases within a solid body. These can be caused by internal mechanisms such as crack opening or propagation, crushing, or rubbing of crack surfaces. One application for the AE technique in the field of Structural Engineering is Structural Health Monitoring (SHM). With piezo-electric sensors mounted to the surface of the structure, stress waves can be detected, recorded, and stored for later analysis. An important step in quantitative AE analysis is the estimation of the stress wave source locations. Commonly, source location results are presented in a rather deterministic manner as spatial and temporal points, excluding information about uncertainties and errors. Due to variability in the material properties and uncertainty in the mathematical model, measures of uncertainty are needed beyond best-fit point solutions for source locations. This paper introduces a novel holistic framework for the development of a probabilistic source location algorithm. Bayesian analysis methods with Markov Chain Monte Carlo (MCMC) simulation are employed where all source location parameters are described with posterior probability density functions (PDFs). The proposed methodology is applied to an example employing data collected from a realistic section of a reinforced concrete bridge column. The selected approach is general and has the advantage that it can be extended and refined efficiently. Results are discussed and future steps to improve the algorithm are suggested.
Hierarchical animal movement models for population-level inference

USGS Publications Warehouse

Hooten, Mevin B.; Buderman, Frances E.; Brost, Brian M.; Hanks, Ephraim M.; Ivans, Jacob S.

2016-01-01

New methods for modeling animal movement based on telemetry data are developed regularly. With advances in telemetry capabilities, animal movement models are becoming increasingly sophisticated. Despite a need for population-level inference, animal movement models are still predominantly developed for individual-level inference. Most efforts to upscale the inference to the population level are either post hoc or complicated enough that only the developer can implement the model. Hierarchical Bayesian models provide an ideal platform for the development of population-level animal movement models but can be challenging to fit due to computational limitations or extensive tuning required. We propose a two-stage procedure for fitting hierarchical animal movement models to telemetry data. The two-stage approach is statistically rigorous and allows one to fit individual-level movement models separately, then resample them using a secondary MCMC algorithm. The primary advantages of the two-stage approach are that the first stage is easily parallelizable and the second stage is completely unsupervised, allowing for an automated fitting procedure in many cases. We demonstrate the two-stage procedure with two applications of animal movement models. The first application involves a spatial point process approach to modeling telemetry data, and the second involves a more complicated continuous-time discrete-space animal movement model. We fit these models to simulated data and real telemetry data arising from a population of monitored Canada lynx in Colorado, USA.
Using informative priors in facies inversion: The case of C-ISR method

NASA Astrophysics Data System (ADS)

Valakas, G.; Modis, K.

2016-08-01

Inverse problems involving the characterization of hydraulic properties of groundwater flow systems by conditioning on observations of the state variables are mathematically ill-posed because they have multiple solutions and are sensitive to small changes in the data. In the framework of McMC methods for nonlinear optimization and under an iterative spatial resampling transition kernel, we present an algorithm for narrowing the prior and thus producing improved proposal realizations. To achieve this goal, we cosimulate the facies distribution conditionally to facies observations and normal scores transformed hydrologic response measurements, assuming a linear coregionalization model. The approach works by creating an importance sampling effect that steers the process to selected areas of the prior. The effectiveness of our approach is demonstrated by an example application on a synthetic underdetermined inverse problem in aquifer characterization.
Bayesian random-effect model for predicting outcome fraught with heterogeneity--an illustration with episodes of 44 patients with intractable epilepsy.

PubMed

Yen, A M-F; Liou, H-H; Lin, H-L; Chen, T H-H

2006-01-01

The study aimed to develop a predictive model to deal with data fraught with heterogeneity that cannot be explained by sampling variation or measured covariates. The random-effect Poisson regression model was first proposed to deal with over-dispersion for data fraught with heterogeneity after making allowance for measured covariates. Bayesian acyclic graphic model in conjunction with Markov Chain Monte Carlo (MCMC) technique was then applied to estimate the parameters of both relevant covariates and random effect. Predictive distribution was then generated to compare the predicted with the observed for the Bayesian model with and without random effect. Data from repeated measurement of episodes among 44 patients with intractable epilepsy were used as an illustration. The application of Poisson regression without taking heterogeneity into account to epilepsy data yielded a large value of heterogeneity (heterogeneity factor = 17.90, deviance = 1485, degree of freedom (df) = 83). After taking the random effect into account, the value of heterogeneity factor was greatly reduced (heterogeneity factor = 0.52, deviance = 42.5, df = 81). The Pearson chi2 for the comparison between the expected seizure frequencies and the observed ones at two and three months of the model with and without random effect were 34.27 (p = 1.00) and 1799.90 (p < 0.0001), respectively. The Bayesian acyclic model using the MCMC method was demonstrated to have great potential for disease prediction while data show over-dispersion attributed either to correlated property or to subject-to-subject variability.
Bayesian hierarchical modelling of continuous non-negative longitudinal data with a spike at zero: An application to a study of birds visiting gardens in winter.

PubMed

Swallow, Ben; Buckland, Stephen T; King, Ruth; Toms, Mike P

2016-03-01

The development of methods for dealing with continuous data with a spike at zero has lagged behind those for overdispersed or zero-inflated count data. We consider longitudinal ecological data corresponding to an annual average of 26 weekly maximum counts of birds, and are hence effectively continuous, bounded below by zero but also with a discrete mass at zero. We develop a Bayesian hierarchical Tweedie regression model that can directly accommodate the excess number of zeros common to this type of data, whilst accounting for both spatial and temporal correlation. Implementation of the model is conducted in a Markov chain Monte Carlo (MCMC) framework, using reversible jump MCMC to explore uncertainty across both parameter and model spaces. This regression modelling framework is very flexible and removes the need to make strong assumptions about mean-variance relationships a priori. It can also directly account for the spike at zero, whilst being easily applicable to other types of data and other model formulations. Whilst a correlative study such as this cannot prove causation, our results suggest that an increase in an avian predator may have led to an overall decrease in the number of one of its prey species visiting garden feeding stations in the United Kingdom. This may reflect a change in behaviour of house sparrows to avoid feeding stations frequented by sparrowhawks, or a reduction in house sparrow population size as a result of sparrowhawk increase. © 2015 The Author. Biometrical Journal published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

Model averaging in linkage analysis.

PubMed

Matthysse, Steven

2006-06-05

Methods for genetic linkage analysis are traditionally divided into "model-dependent" and "model-independent," but there may be a useful place for an intermediate class, in which a broad range of possible models is considered as a parametric family. It is possible to average over model space with an empirical Bayes prior that weights models according to their goodness of fit to epidemiologic data, such as the frequency of the disease in the population and in first-degree relatives (and correlations with other traits in the pleiotropic case). For averaging over high-dimensional spaces, Markov chain Monte Carlo (MCMC) has great appeal, but it has a near-fatal flaw: it is not possible, in most cases, to provide rigorous sufficient conditions to permit the user safely to conclude that the chain has converged. A way of overcoming the convergence problem, if not of solving it, rests on a simple application of the principle of detailed balance. If the starting point of the chain has the equilibrium distribution, so will every subsequent point. The first point is chosen according to the target distribution by rejection sampling, and subsequent points by an MCMC process that has the target distribution as its equilibrium distribution. Model averaging with an empirical Bayes prior requires rapid estimation of likelihoods at many points in parameter space. Symbolic polynomials are constructed before the random walk over parameter space begins, to make the actual likelihood computations at each step of the random walk very fast. Power analysis in an illustrative case is described. (c) 2006 Wiley-Liss, Inc.
Using Stan for Item Response Theory Models

ERIC Educational Resources Information Center

Ames, Allison J.; Au, Chi Hang

2018-01-01

Stan is a flexible probabilistic programming language providing full Bayesian inference through Hamiltonian Monte Carlo algorithms. The benefits of Hamiltonian Monte Carlo include improved efficiency and faster inference, when compared to other MCMC software implementations. Users can interface with Stan through a variety of computing…
pyblocxs: Bayesian Low-Counts X-ray Spectral Analysis in Sherpa

NASA Astrophysics Data System (ADS)

Siemiginowska, A.; Kashyap, V.; Refsdal, B.; van Dyk, D.; Connors, A.; Park, T.

2011-07-01

Typical X-ray spectra have low counts and should be modeled using the Poisson distribution. However, χ2 statistic is often applied as an alternative and the data are assumed to follow the Gaussian distribution. A variety of weights to the statistic or a binning of the data is performed to overcome the low counts issues. However, such modifications introduce biases or/and a loss of information. Standard modeling packages such as XSPEC and Sherpa provide the Poisson likelihood and allow computation of rudimentary MCMC chains, but so far do not allow for setting a full Bayesian model. We have implemented a sophisticated Bayesian MCMC-based algorithm to carry out spectral fitting of low counts sources in the Sherpa environment. The code is a Python extension to Sherpa and allows to fit a predefined Sherpa model to high-energy X-ray spectral data and other generic data. We present the algorithm and discuss several issues related to the implementation, including flexible definition of priors and allowing for variations in the calibration information.
Kepler Uniform Modeling of KOIs: MCMC Notes for Data Release 25

NASA Technical Reports Server (NTRS)

Hoffman, Kelsey L.; Rowe, Jason F.

2017-01-01

This document describes data products related to the reported planetary parameters and uncertainties for the Kepler Objects of Interest (KOIs) based on a Markov-Chain-Monte-Carlo (MCMC) analysis. Reported parameters, uncertainties and data products can be found at the NASA Exoplanet Archive . The codes used for this data analysis are available on the Github website (Rowe 2016). The relevant paper for details of the calculations is Rowe et al. (2015). The main differences between the model fits discussed here and those in the DR24 catalogue are that the DR25 light curves were used in the analysis, our processing of the MAST light curves took into account different data flags, the number of chains calculated was doubled to 200 000, and the parameters which are reported are based on a damped least-squares fit, instead of the median value from the Markov chain or the chain with the lowest 2 as reported in the past.
Applying species-tree analyses to deep phylogenetic histories: challenges and potential suggested from a survey of empirical phylogenetic studies.

PubMed

Lanier, Hayley C; Knowles, L Lacey

2015-02-01

Coalescent-based methods for species-tree estimation are becoming a dominant approach for reconstructing species histories from multi-locus data, with most of the studies examining these methodologies focused on recently diverged species. However, deeper phylogenies, such as the datasets that comprise many Tree of Life (ToL) studies, also exhibit gene-tree discordance. This discord may also arise from the stochastic sorting of gene lineages during the speciation process (i.e., reflecting the random coalescence of gene lineages in ancestral populations). It remains unknown whether guidelines regarding methodologies and numbers of loci established by simulation studies at shallow tree depths translate into accurate species relationships for deeper phylogenetic histories. We address this knowledge gap and specifically identify the challenges and limitations of species-tree methods that account for coalescent variance for deeper phylogenies. Using simulated data with characteristics informed by empirical studies, we evaluate both the accuracy of estimated species trees and the characteristics associated with recalcitrant nodes, with a specific focus on whether coalescent variance is generally responsible for the lack of resolution. By determining the proportion of coalescent genealogies that support a particular node, we demonstrate that (1) species-tree methods account for coalescent variance at deep nodes and (2) mutational variance - not gene-tree discord arising from the coalescent - posed the primary challenge for accurate reconstruction across the tree. For example, many nodes were accurately resolved despite predicted discord from the random coalescence of gene lineages and nodes with poor support were distributed across a range of depths (i.e., they were not restricted to a particular recent divergences). Given their broad taxonomic scope and large sampling of taxa, deep level phylogenies pose several potential methodological complications including difficulties with MCMC convergence and estimation of requisite population genetic parameters for coalescent-based approaches. Despite these difficulties, the findings generally support the utility of species-tree analyses for the estimation of species relationships throughout the ToL. We discuss strategies for successful application of species-tree approaches to deep phylogenies. Copyright © 2014 Elsevier Inc. All rights reserved.
Stochastic inversion of time-lapse geophysical data to characterize the vadose zone at the Arrenaes field site (Denmark)

NASA Astrophysics Data System (ADS)

Marie, S.; Irving, J. D.; Looms, M. C.; Nielsen, L.; Holliger, K.

2011-12-01

Geophysical methods such as ground-penetrating radar (GPR) can provide valuable information on the hydrological properties of the vadose zone. In particular, there is evidence to suggest that the stochastic inversion of such data may allow for significant reductions in uncertainty regarding subsurface van-Genuchten-Mualem (VGM) parameters, which characterize unsaturated hydrodynamic behaviour as defined by the combination of the water retention and hydraulic conductivity functions. A significant challenge associated with the use of geophysical methods in a hydrological context is that they generally exhibit an indirect and/or weak sensitivity to the hydraulic parameters of interest. A novel and increasingly popular means of addressing this issue involves the acquisition of geophysical data in a time-lapse fashion while changes occur in the hydrological condition of the probed subsurface region. Another significant challenge when attempting to use geophysical data for the estimation of subsurface hydrological properties is the inherent non-linearity and non-uniqueness of the corresponding inverse problems. Stochastic inversion approaches have the advantage of providing a comprehensive exploration of the model space, which makes them ideally suited for addressing such issues. In this work, we present the stochastic inversion of time-lapse zero-offset-profile (ZOP) crosshole GPR traveltime data, collected during a forced infiltration experiment at the Arreneas field site in Denmark, in order to estimate subsurface VGM parameters and their corresponding uncertainties. We do this using a Bayesian Markov-chain-Monte-Carlo (MCMC) inversion approach. We find that the Bayesian-MCMC methodology indeed allows for a substantial refinement in the inferred posterior parameter distributions of the VGM parameters as compared to the corresponding priors. To further understand the potential impact on capturing the underlying hydrological behaviour, we also explore how the posterior VGM parameter distributions affect the hydrodynamic characteristics. In doing so, we find clear evidence that the approach pursued in this study allows for effective characterization of the hydrological behaviour of the probed subsurface region.
Logistic random effects regression models: a comparison of statistical packages for binary and ordinal outcomes

PubMed Central

2011-01-01

Background Logistic random effects models are a popular tool to analyze multilevel also called hierarchical data with a binary or ordinal outcome. Here, we aim to compare different statistical software implementations of these models. Methods We used individual patient data from 8509 patients in 231 centers with moderate and severe Traumatic Brain Injury (TBI) enrolled in eight Randomized Controlled Trials (RCTs) and three observational studies. We fitted logistic random effects regression models with the 5-point Glasgow Outcome Scale (GOS) as outcome, both dichotomized as well as ordinal, with center and/or trial as random effects, and as covariates age, motor score, pupil reactivity or trial. We then compared the implementations of frequentist and Bayesian methods to estimate the fixed and random effects. Frequentist approaches included R (lme4), Stata (GLLAMM), SAS (GLIMMIX and NLMIXED), MLwiN ([R]IGLS) and MIXOR, Bayesian approaches included WinBUGS, MLwiN (MCMC), R package MCMCglmm and SAS experimental procedure MCMC. Three data sets (the full data set and two sub-datasets) were analysed using basically two logistic random effects models with either one random effect for the center or two random effects for center and trial. For the ordinal outcome in the full data set also a proportional odds model with a random center effect was fitted. Results The packages gave similar parameter estimates for both the fixed and random effects and for the binary (and ordinal) models for the main study and when based on a relatively large number of level-1 (patient level) data compared to the number of level-2 (hospital level) data. However, when based on relatively sparse data set, i.e. when the numbers of level-1 and level-2 data units were about the same, the frequentist and Bayesian approaches showed somewhat different results. The software implementations differ considerably in flexibility, computation time, and usability. There are also differences in the availability of additional tools for model evaluation, such as diagnostic plots. The experimental SAS (version 9.2) procedure MCMC appeared to be inefficient. Conclusions On relatively large data sets, the different software implementations of logistic random effects regression models produced similar results. Thus, for a large data set there seems to be no explicit preference (of course if there is no preference from a philosophical point of view) for either a frequentist or Bayesian approach (if based on vague priors). The choice for a particular implementation may largely depend on the desired flexibility, and the usability of the package. For small data sets the random effects variances are difficult to estimate. In the frequentist approaches the MLE of this variance was often estimated zero with a standard error that is either zero or could not be determined, while for Bayesian methods the estimates could depend on the chosen "non-informative" prior of the variance parameter. The starting value for the variance parameter may be also critical for the convergence of the Markov chain. PMID:21605357
Orbital fitting of imaged planetary companions with high eccentricities and unbound orbits. Their application to Fomalhaut b and PZ Telecopii B

NASA Astrophysics Data System (ADS)

Beust, H.; Bonnefoy, M.; Maire, A.-L.; Ehrenreich, D.; Lagrange, A.-M.; Chauvin, G.

2016-03-01

Context. Regular follow-up of imaged companions to main-sequence stars often allows a projected orbital motion to be detected. Markov chain Monte Carlo (MCMC) has become very popular recent years for fitting and constraining their orbits. Some of these imaged companions appear to move on very eccentric, possibly unbound orbits. This is, in particular, the case for the exoplanet Fomalhaut b and the brown dwarf companion PZ Tel B on which we focus here. Aims: For these orbits, standard MCMC codes that assume only bound orbits may be inappropriate. Our goal is to develop a new MCMC implementation that is able to handle both bound and unbound orbits in a continuous manner, and to apply this to the cases of Fomalhaut b and PZ Tel B. Methods: We present here this code, based on the use of universal Keplerian variables and Stumpff functions. We present two versions of this code, the second one using a different set of angular variables that were designed to avoid degeneracies arising when the projected orbital motion is quasi-radial, as is the case for PZ Tel B. We also present additional observations of PZ Tel B. Results: The code is applied to Fomalhaut b and PZ Tel B. We confirm previous results in relation to, but we show that on the sole basis of the astrometric data, open orbital solutions are also possible. The eccentricity distribution nevertheless still peaks around ~0.9 in the bound regime. We present a first successful orbital fit of PZ Tel B, which shows in particular that, while both bound and unbound orbital solutions are equally possible, the eccentricity distribution presents a sharp peak very close to e = 1, meaning a quasi-parabolic orbit. Conclusions: It has recently been suggested that the presence of unseen inner companions to imaged ones may lead orbital fitting algorithms to artificially give very high eccentricities. We show that this caveat is unlikely to apply to Fomalhaut b. Concerning PZ Tel B, we derive a possible solution, which involves an inner ~12 MJup companion, that would mimic a e = 1 orbit, despite a real eccentricity of around 0.7, but a dynamical analysis reveals that this type of system would not be stable. We thus conclude that our orbital fit is robust. Based on observations collected at the European Organisation for Astronomical Research in the Southern Hemisphere, Chile (Program ID: 085.C-0867(B) and 085.C-0277(B)).
Stochastic static fault slip inversion from geodetic data with non-negativity and bounds constraints

NASA Astrophysics Data System (ADS)

Nocquet, J.-M.

2018-04-01

Despite surface displacements observed by geodesy are linear combinations of slip at faults in an elastic medium, determining the spatial distribution of fault slip remains a ill-posed inverse problem. A widely used approach to circumvent the illness of the inversion is to add regularization constraints in terms of smoothing and/or damping so that the linear system becomes invertible. However, the choice of regularization parameters is often arbitrary, and sometimes leads to significantly different results. Furthermore, the resolution analysis is usually empirical and cannot be made independently of the regularization. The stochastic approach of inverse problems (Tarantola & Valette 1982; Tarantola 2005) provides a rigorous framework where the a priori information about the searched parameters is combined with the observations in order to derive posterior probabilities of the unkown parameters. Here, I investigate an approach where the prior probability density function (pdf) is a multivariate Gaussian function, with single truncation to impose positivity of slip or double truncation to impose positivity and upper bounds on slip for interseismic modeling. I show that the joint posterior pdf is similar to the linear untruncated Gaussian case and can be expressed as a Truncated Multi-Variate Normal (TMVN) distribution. The TMVN form can then be used to obtain semi-analytical formulas for the single, two-dimensional or n-dimensional marginal pdf. The semi-analytical formula involves the product of a Gaussian by an integral term that can be evaluated using recent developments in TMVN probabilities calculations (e.g. Genz & Bretz 2009). Posterior mean and covariance can also be efficiently derived. I show that the Maximum Posterior (MAP) can be obtained using a Non-Negative Least-Squares algorithm (Lawson & Hanson 1974) for the single truncated case or using the Bounded-Variable Least-Squares algorithm (Stark & Parker 1995) for the double truncated case. I show that the case of independent uniform priors can be approximated using TMVN. The numerical equivalence to Bayesian inversions using Monte Carlo Markov Chain (MCMC) sampling is shown for a synthetic example and a real case for interseismic modeling in Central Peru. The TMVN method overcomes several limitations of the Bayesian approach using MCMC sampling. First, the need of computer power is largely reduced. Second, unlike Bayesian MCMC based approach, marginal pdf, mean, variance or covariance are obtained independently one from each other. Third, the probability and cumulative density functions can be obtained with any density of points. Finally, determining the Maximum Posterior (MAP) is extremely fast.
Phyx: phylogenetic tools for unix.

PubMed

Brown, Joseph W; Walker, Joseph F; Smith, Stephen A

2017-06-15

The ease with which phylogenomic data can be generated has drastically escalated the computational burden for even routine phylogenetic investigations. To address this, we present phyx : a collection of programs written in C ++ to explore, manipulate, analyze and simulate phylogenetic objects (alignments, trees and MCMC logs). Modelled after Unix/GNU/Linux command line tools, individual programs perform a single task and operate on standard I/O streams that can be piped to quickly and easily form complex analytical pipelines. Because of the stream-centric paradigm, memory requirements are minimized (often only a single tree or sequence in memory at any instance), and hence phyx is capable of efficiently processing very large datasets. phyx runs on POSIX-compliant operating systems. Source code, installation instructions, documentation and example files are freely available under the GNU General Public License at https://github.com/FePhyFoFum/phyx. eebsmith@umich.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.
Phyx: phylogenetic tools for unix

PubMed Central

Brown, Joseph W.; Walker, Joseph F.; Smith, Stephen A.

2017-01-01

Abstract Summary: The ease with which phylogenomic data can be generated has drastically escalated the computational burden for even routine phylogenetic investigations. To address this, we present phyx: a collection of programs written in C ++ to explore, manipulate, analyze and simulate phylogenetic objects (alignments, trees and MCMC logs). Modelled after Unix/GNU/Linux command line tools, individual programs perform a single task and operate on standard I/O streams that can be piped to quickly and easily form complex analytical pipelines. Because of the stream-centric paradigm, memory requirements are minimized (often only a single tree or sequence in memory at any instance), and hence phyx is capable of efficiently processing very large datasets. Availability and Implementation: phyx runs on POSIX-compliant operating systems. Source code, installation instructions, documentation and example files are freely available under the GNU General Public License at https://github.com/FePhyFoFum/phyx Contact: eebsmith@umich.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28174903
A Bayesian Poisson-lognormal Model for Count Data for Multiple-Trait Multiple-Environment Genomic-Enabled Prediction

PubMed Central

Montesinos-López, Osval A.; Montesinos-López, Abelardo; Crossa, José; Toledo, Fernando H.; Montesinos-López, José C.; Singh, Pawan; Juliana, Philomin; Salinas-Ruiz, Josafhat

2017-01-01

When a plant scientist wishes to make genomic-enabled predictions of multiple traits measured in multiple individuals in multiple environments, the most common strategy for performing the analysis is to use a single trait at a time taking into account genotype × environment interaction (G × E), because there is a lack of comprehensive models that simultaneously take into account the correlated counting traits and G × E. For this reason, in this study we propose a multiple-trait and multiple-environment model for count data. The proposed model was developed under the Bayesian paradigm for which we developed a Markov Chain Monte Carlo (MCMC) with noninformative priors. This allows obtaining all required full conditional distributions of the parameters leading to an exact Gibbs sampler for the posterior distribution. Our model was tested with simulated data and a real data set. Results show that the proposed multi-trait, multi-environment model is an attractive alternative for modeling multiple count traits measured in multiple environments. PMID:28364037
Spectral decompositions of multiple time series: a Bayesian non-parametric approach.

PubMed

Macaro, Christian; Prado, Raquel

2014-01-01

We consider spectral decompositions of multiple time series that arise in studies where the interest lies in assessing the influence of two or more factors. We write the spectral density of each time series as a sum of the spectral densities associated to the different levels of the factors. We then use Whittle's approximation to the likelihood function and follow a Bayesian non-parametric approach to obtain posterior inference on the spectral densities based on Bernstein-Dirichlet prior distributions. The prior is strategically important as it carries identifiability conditions for the models and allows us to quantify our degree of confidence in such conditions. A Markov chain Monte Carlo (MCMC) algorithm for posterior inference within this class of frequency-domain models is presented.We illustrate the approach by analyzing simulated and real data via spectral one-way and two-way models. In particular, we present an analysis of functional magnetic resonance imaging (fMRI) brain responses measured in individuals who participated in a designed experiment to study pain perception in humans.
Measurement of the low-energy quenching factor in germanium using an Y 88 / Be photoneutron source

DOE Office of Scientific and Technical Information (OSTI.GOV)

Scholz, B. J.; Chavarria, A. E.; Collar, J. I.

2016-12-01

We employ an 88Y/Be photoneutron source to derive the quenching factor for neutron-induced nuclear recoils in germanium, probing recoil energies from a few hundred eVnr to 8.5 keV nr. A comprehensive Monte Carlo simulation of our setup is compared to experimental data employing a Lindhard model with a free electronic energy loss k and an adiabatic correction for sub-keVnr nuclear recoils. The best fit k=0.179±0.001 obtained using a Monte Carlo Markov chain (MCMC) ensemble sampler is in good agreement with previous measurements, confirming the adequacy of the Lindhard model to describe the stopping of few-keV ions in germanium crystals at a temperaturemore » of ~77 K. This value of k corresponds to a quenching factor of 13.7% to 25.3% for nuclear recoil energies between 0.3 and 8.5 keV nr, respectively.« less
Uncertainty analysis of gross primary production partitioned from net ecosystem exchange measurements

NASA Astrophysics Data System (ADS)

Raj, Rahul; Hamm, Nicholas Alexander Samuel; van der Tol, Christiaan; Stein, Alfred

2016-03-01

Gross primary production (GPP) can be separated from flux tower measurements of net ecosystem exchange (NEE) of CO2. This is used increasingly to validate process-based simulators and remote-sensing-derived estimates of simulated GPP at various time steps. Proper validation includes the uncertainty associated with this separation. In this study, uncertainty assessment was done in a Bayesian framework. It was applied to data from the Speulderbos forest site, The Netherlands. We estimated the uncertainty in GPP at half-hourly time steps, using a non-rectangular hyperbola (NRH) model for its separation from the flux tower measurements. The NRH model provides a robust empirical relationship between radiation and GPP. It includes the degree of curvature of the light response curve, radiation and temperature. Parameters of the NRH model were fitted to the measured NEE data for every 10-day period during the growing season (April to October) in 2009. We defined the prior distribution of each NRH parameter and used Markov chain Monte Carlo (MCMC) simulation to estimate the uncertainty in the separated GPP from the posterior distribution at half-hourly time steps. This time series also allowed us to estimate the uncertainty at daily time steps. We compared the informative with the non-informative prior distributions of the NRH parameters and found that both choices produced similar posterior distributions of GPP. This will provide relevant and important information for the validation of process-based simulators in the future. Furthermore, the obtained posterior distributions of NEE and the NRH parameters are of interest for a range of applications.
Logistic random effects regression models: a comparison of statistical packages for binary and ordinal outcomes.

PubMed

Li, Baoyue; Lingsma, Hester F; Steyerberg, Ewout W; Lesaffre, Emmanuel

2011-05-23

Logistic random effects models are a popular tool to analyze multilevel also called hierarchical data with a binary or ordinal outcome. Here, we aim to compare different statistical software implementations of these models. We used individual patient data from 8509 patients in 231 centers with moderate and severe Traumatic Brain Injury (TBI) enrolled in eight Randomized Controlled Trials (RCTs) and three observational studies. We fitted logistic random effects regression models with the 5-point Glasgow Outcome Scale (GOS) as outcome, both dichotomized as well as ordinal, with center and/or trial as random effects, and as covariates age, motor score, pupil reactivity or trial. We then compared the implementations of frequentist and Bayesian methods to estimate the fixed and random effects. Frequentist approaches included R (lme4), Stata (GLLAMM), SAS (GLIMMIX and NLMIXED), MLwiN ([R]IGLS) and MIXOR, Bayesian approaches included WinBUGS, MLwiN (MCMC), R package MCMCglmm and SAS experimental procedure MCMC.Three data sets (the full data set and two sub-datasets) were analysed using basically two logistic random effects models with either one random effect for the center or two random effects for center and trial. For the ordinal outcome in the full data set also a proportional odds model with a random center effect was fitted. The packages gave similar parameter estimates for both the fixed and random effects and for the binary (and ordinal) models for the main study and when based on a relatively large number of level-1 (patient level) data compared to the number of level-2 (hospital level) data. However, when based on relatively sparse data set, i.e. when the numbers of level-1 and level-2 data units were about the same, the frequentist and Bayesian approaches showed somewhat different results. The software implementations differ considerably in flexibility, computation time, and usability. There are also differences in the availability of additional tools for model evaluation, such as diagnostic plots. The experimental SAS (version 9.2) procedure MCMC appeared to be inefficient. On relatively large data sets, the different software implementations of logistic random effects regression models produced similar results. Thus, for a large data set there seems to be no explicit preference (of course if there is no preference from a philosophical point of view) for either a frequentist or Bayesian approach (if based on vague priors). The choice for a particular implementation may largely depend on the desired flexibility, and the usability of the package. For small data sets the random effects variances are difficult to estimate. In the frequentist approaches the MLE of this variance was often estimated zero with a standard error that is either zero or could not be determined, while for Bayesian methods the estimates could depend on the chosen "non-informative" prior of the variance parameter. The starting value for the variance parameter may be also critical for the convergence of the Markov chain.
What are hierarchical models and how do we analyze them?

USGS Publications Warehouse

Royle, Andy

2016-01-01

In this chapter we provide a basic definition of hierarchical models and introduce the two canonical hierarchical models in this book: site occupancy and N-mixture models. The former is a hierarchical extension of logistic regression and the latter is a hierarchical extension of Poisson regression. We introduce basic concepts of probability modeling and statistical inference including likelihood and Bayesian perspectives. We go through the mechanics of maximizing the likelihood and characterizing the posterior distribution by Markov chain Monte Carlo (MCMC) methods. We give a general perspective on topics such as model selection and assessment of model fit, although we demonstrate these topics in practice in later chapters (especially Chapters 5, 6, 7, and 10 Chapter 5 Chapter 6 Chapter 7 Chapter 10)
INFERENCE FOR INDIVIDUAL-LEVEL MODELS OF INFECTIOUS DISEASES IN LARGE POPULATIONS.

PubMed

Deardon, Rob; Brooks, Stephen P; Grenfell, Bryan T; Keeling, Matthew J; Tildesley, Michael J; Savill, Nicholas J; Shaw, Darren J; Woolhouse, Mark E J

2010-01-01

Individual Level Models (ILMs), a new class of models, are being applied to infectious epidemic data to aid in the understanding of the spatio-temporal dynamics of infectious diseases. These models are highly flexible and intuitive, and can be parameterised under a Bayesian framework via Markov chain Monte Carlo (MCMC) methods. Unfortunately, this parameterisation can be difficult to implement due to intense computational requirements when calculating the full posterior for large, or even moderately large, susceptible populations, or when missing data are present. Here we detail a methodology that can be used to estimate parameters for such large, and/or incomplete, data sets. This is done in the context of a study of the UK 2001 foot-and-mouth disease (FMD) epidemic.
MPN estimation of qPCR target sequence recoveries from whole cell calibrator samples.

PubMed

Sivaganesan, Mano; Siefring, Shawn; Varma, Manju; Haugland, Richard A

2011-12-01

DNA extracts from enumerated target organism cells (calibrator samples) have been used for estimating Enterococcus cell equivalent densities in surface waters by a comparative cycle threshold (Ct) qPCR analysis method. To compare surface water Enterococcus density estimates from different studies by this approach, either a consistent source of calibrator cells must be used or the estimates must account for any differences in target sequence recoveries from different sources of calibrator cells. In this report we describe two methods for estimating target sequence recoveries from whole cell calibrator samples based on qPCR analyses of their serially diluted DNA extracts and most probable number (MPN) calculation. The first method employed a traditional MPN calculation approach. The second method employed a Bayesian hierarchical statistical modeling approach and a Monte Carlo Markov Chain (MCMC) simulation method to account for the uncertainty in these estimates associated with different individual samples of the cell preparations, different dilutions of the DNA extracts and different qPCR analytical runs. The two methods were applied to estimate mean target sequence recoveries per cell from two different lots of a commercially available source of enumerated Enterococcus cell preparations. The mean target sequence recovery estimates (and standard errors) per cell from Lot A and B cell preparations by the Bayesian method were 22.73 (3.4) and 11.76 (2.4), respectively, when the data were adjusted for potential false positive results. Means were similar for the traditional MPN approach which cannot comparably assess uncertainty in the estimates. Cell numbers and estimates of recoverable target sequences in calibrator samples prepared from the two cell sources were also used to estimate cell equivalent and target sequence quantities recovered from surface water samples in a comparative Ct method. Our results illustrate the utility of the Bayesian method in accounting for uncertainty, the high degree of precision attainable by the MPN approach and the need to account for the differences in target sequence recoveries from different calibrator sample cell sources when they are used in the comparative Ct method. Published by Elsevier B.V.
Markov chain Monte Carlo techniques applied to parton distribution functions determination: Proof of concept

NASA Astrophysics Data System (ADS)

Gbedo, Yémalin Gabin; Mangin-Brinet, Mariane

2017-07-01

We present a new procedure to determine parton distribution functions (PDFs), based on Markov chain Monte Carlo (MCMC) methods. The aim of this paper is to show that we can replace the standard χ2 minimization by procedures grounded on statistical methods, and on Bayesian inference in particular, thus offering additional insight into the rich field of PDFs determination. After a basic introduction to these techniques, we introduce the algorithm we have chosen to implement—namely Hybrid (or Hamiltonian) Monte Carlo. This algorithm, initially developed for Lattice QCD, turns out to be very interesting when applied to PDFs determination by global analyses; we show that it allows us to circumvent the difficulties due to the high dimensionality of the problem, in particular concerning the acceptance. A first feasibility study is performed and presented, which indicates that Markov chain Monte Carlo can successfully be applied to the extraction of PDFs and of their uncertainties.

Study of optical and electronic properties of nickel from reflection electron energy loss spectra

NASA Astrophysics Data System (ADS)

Xu, H.; Yang, L. H.; Da, B.; Tóth, J.; Tőkési, K.; Ding, Z. J.

2017-09-01

We use the classical Monte Carlo transport model of electrons moving near the surface and inside solids to reproduce the measured reflection electron energy-loss spectroscopy (REELS) spectra. With the combination of the classical transport model and the Markov chain Monte Carlo (MCMC) sampling of oscillator parameters the so-called reverse Monte Carlo (RMC) method was developed, and used to obtain optical constants of Ni in this work. A systematic study of the electronic and optical properties of Ni has been performed in an energy loss range of 0-200 eV from the measured REELS spectra at primary energies of 1000 eV, 2000 eV and 3000 eV. The reliability of our method was tested by comparing our results with the previous data. Moreover, the accuracy of our optical data has been confirmed by applying oscillator strength-sum rule and perfect-screening-sum rule.
Segregation and recombination of a multipartite mitochondrial DNA in populations of the potato cyst nematode Globodera pallida.

PubMed

Armstrong, Miles R; Husmeier, Dirk; Phillips, Mark S; Blok, Vivian C

2007-06-01

The discovery that the potato cyst nematode Globodera pallida has a multipartite mitochondrial DNA (mtDNA) composed, at least in part, of six small circular mtDNAs (scmtDNAs) raised a number of questions concerning the population-level processes that might act on such a complex genome. Here we report our observations on the distribution of some scmtDNAs among a sample of European and South American G. pallida populations. The occurrence of sequence variants of scmtDNA IV in population P4A from South America, and that particular sequence variants are common to the individuals within a single cyst, is described. Evidence for recombination of sequence variants of scmtDNA IV in P4A is also reported. The mosaic structure of P4A scmtDNA IV sequences was revealed using several detection methods and recombination breakpoints were independently detected by maximum likelihood and Bayesian MCMC methods.
Application of Bayesian Inversion for Multilayer Reservoir Mapping while Drilling Measurements

NASA Astrophysics Data System (ADS)

Wang, J.; Chen, H.; Wang, X.

2017-12-01

Real-time geosteering technology plays a key role in horizontal well development, which keeps the wellbore trajectories within target zones to maximize reservoir contact. The new generation logging while drilling (LWD) resistivity tools have longer spacing and deeper investigation depth, but meanwhile bring a new challenge to inversion of logging data that is formation model not be restricted to few possible numbers of layer such as typical three layers model. If the inappropriate starting models of deterministic and gradient-based methods are adopted may mislead geophysicists in interpretation of subsurface structure. For this purpose, to take advantage of richness of the measurements and deep depth of investigation across multiple formation boundaries, a trans-dimensional Markov chain Monte Carlo(MCMC) inversion algorithm has been developed that combines phase and attenuation measurements at various frequencies and spacings. Unlike conventional gradient-based inversion approaches, MCMC algorithm does not introduce bias from prior information and require any subjective choice of regularization parameter. A synthetic three layers model example demonstrates how the algorithm can be used to image the subsurface using the LWD data. When the tool is far from top boundary, the inversion clearly resolves the boundary position; that is where the boundary histogram shows a large peak. But the measurements cannot resolve the bottom boundary; the large spread between quantiles reflects the uncertainty associated with the bed resolution. As the tool moves closer to the top boundary, the middle layer and bottom layer are resolved and retained models are more similar, the uncertainty associated with these two beds decreases. From the spread observed between models, we can evaluate actual depth of investigation, uncertainty, and sensitivity, which is more useful then just a single best model.
Accounting for model error in Bayesian solutions to hydrogeophysical inverse problems using a local basis approach

NASA Astrophysics Data System (ADS)

Köpke, Corinna; Irving, James; Elsheikh, Ahmed H.

2018-06-01

Bayesian solutions to geophysical and hydrological inverse problems are dependent upon a forward model linking subsurface physical properties to measured data, which is typically assumed to be perfectly known in the inversion procedure. However, to make the stochastic solution of the inverse problem computationally tractable using methods such as Markov-chain-Monte-Carlo (MCMC), fast approximations of the forward model are commonly employed. This gives rise to model error, which has the potential to significantly bias posterior statistics if not properly accounted for. Here, we present a new methodology for dealing with the model error arising from the use of approximate forward solvers in Bayesian solutions to hydrogeophysical inverse problems. Our approach is geared towards the common case where this error cannot be (i) effectively characterized through some parametric statistical distribution; or (ii) estimated by interpolating between a small number of computed model-error realizations. To this end, we focus on identification and removal of the model-error component of the residual during MCMC using a projection-based approach, whereby the orthogonal basis employed for the projection is derived in each iteration from the K-nearest-neighboring entries in a model-error dictionary. The latter is constructed during the inversion and grows at a specified rate as the iterations proceed. We demonstrate the performance of our technique on the inversion of synthetic crosshole ground-penetrating radar travel-time data considering three different subsurface parameterizations of varying complexity. Synthetic data are generated using the eikonal equation, whereas a straight-ray forward model is assumed for their inversion. In each case, our developed approach enables us to remove posterior bias and obtain a more realistic characterization of uncertainty.
Impact of Federal drug law enforcement on the supply of heroin in Australia.

PubMed

Smithson, Michael; McFadden, Michael; Mwesigye, Sue-Ellen

2005-08-01

To conduct an empirical investigation of the efficacy of law enforcement in reducing heroin supply in Australia. Specifically, this paper addresses the question of whether heroin purity levels in the Australian Capital Territory (ACT) could be predicted by heroin seizures at the national level by the Australian Federal Police (AFP) in the preceding year. We considered two forms of evidence. First, a Bayesian Markov Chain Monte Carlo (MCMC) change-point model was used to discover (a) if there was a substantial increase in heroin seizures by the AFP, (b) when the increase began and (c) whether it occurred after increased funding to the Australian Federal Police for the purpose of drug law enforcement. Second, standard time-series methods were used to ascertain whether fluctuations in heroin seizure weights or the frequency of large-scale seizures after the aforementioned changes in seizure levels predicted fluctuations in heroin purity levels in the ACT after autocorrelation had been removed from the purity series. A Bayesian MCMC change-point model supported the hypothesis that heroin seizures rapidly increased about a year before the estimated decline in heroin purity and after the increased funding of AFP. The autoregression models suggested that 10-20% of the variance in the residuals of the heroin purity series was predicted by appropriately lagged residuals of the seizure-number and log-weight series, after autocorrelation had been removed. The overall results are consistent with the hypothesis that large-scale heroin seizures by the AFP reduce street-level heroin supply a year or so later, although the short-term dynamics suggest an 'opponent' response to residual fluctuations in seizures. To our knowledge, this is first time a connection has been identified between large-scale heroin seizures and street-level supply.
Fast-NPS-A Markov Chain Monte Carlo-based analysis tool to obtain structural information from single-molecule FRET measurements

NASA Astrophysics Data System (ADS)

Eilert, Tobias; Beckers, Maximilian; Drechsler, Florian; Michaelis, Jens

2017-10-01

The analysis tool and software package Fast-NPS can be used to analyse smFRET data to obtain quantitative structural information about macromolecules in their natural environment. In the algorithm a Bayesian model gives rise to a multivariate probability distribution describing the uncertainty of the structure determination. Since Fast-NPS aims to be an easy-to-use general-purpose analysis tool for a large variety of smFRET networks, we established an MCMC based sampling engine that approximates the target distribution and requires no parameter specification by the user at all. For an efficient local exploration we automatically adapt the multivariate proposal kernel according to the shape of the target distribution. In order to handle multimodality, the sampler is equipped with a parallel tempering scheme that is fully adaptive with respect to temperature spacing and number of chains. Since the molecular surrounding of a dye molecule affects its spatial mobility and thus the smFRET efficiency, we introduce dye models which can be selected for every dye molecule individually. These models allow the user to represent the smFRET network in great detail leading to an increased localisation precision. Finally, a tool to validate the chosen model combination is provided. Programme Files doi:http://dx.doi.org/10.17632/7ztzj63r68.1 Licencing provisions: Apache-2.0 Programming language: GUI in MATLAB (The MathWorks) and the core sampling engine in C++ Nature of problem: Sampling of highly diverse multivariate probability distributions in order to solve for macromolecular structures from smFRET data. Solution method: MCMC algorithm with fully adaptive proposal kernel and parallel tempering scheme.
WMAP7 constraints on oscillations in the primordial power spectrum

NASA Astrophysics Data System (ADS)

Meerburg, P. Daniel; Wijers, Ralph A. M. J.; van der Schaar, Jan Pieter

2012-03-01

We use the 7-year Wilkinson Microwave Anisotropy Probe (WMAP7) data to place constraints on oscillations supplementing an almost scale-invariant primordial power spectrum. Such oscillations are predicted by a variety of models, some of which amount to assuming that there is some non-trivial choice of the vacuum state at the onset of inflation. In this paper, we will explore data-driven constraints on two distinct models of initial state modifications. In both models, the frequency, phase and amplitude are degrees of freedom of the theory for which the theoretical bounds are rather weak: both the amplitude and frequency have allowed values ranging over several orders of magnitude. This requires many computationally expensive evaluations of the model cosmic microwave background (CMB) spectra and their goodness of fit, even in a Markov chain Monte Carlo (MCMC), normally the most efficient fitting method for such a problem. To search more efficiently, we first run a densely-spaced grid, with only three varying parameters: the frequency, the amplitude and the baryon density. We obtain the optimal frequency and run an MCMC at the best-fitting frequency, randomly varying all other relevant parameters. To reduce the computational time of each power spectrum computation, we adjust both comoving momentum integration and spline interpolation (in l) as a function of frequency and amplitude of the primordial power spectrum. Applying this to the WMAP7 data allows us to improve existing constraints on the presence of oscillations. We confirm earlier findings that certain frequencies can improve the fitting over a model without oscillations. For those frequencies we compute the posterior probability, allowing us to put some constraints on the primordial parameter space of both models.
Parameter-expanded data augmentation for Bayesian analysis of capture-recapture models

USGS Publications Warehouse

Royle, J. Andrew; Dorazio, Robert M.

2012-01-01

Data augmentation (DA) is a flexible tool for analyzing closed and open population models of capture-recapture data, especially models which include sources of hetereogeneity among individuals. The essential concept underlying DA, as we use the term, is based on adding "observations" to create a dataset composed of a known number of individuals. This new (augmented) dataset, which includes the unknown number of individuals N in the population, is then analyzed using a new model that includes a reformulation of the parameter N in the conventional model of the observed (unaugmented) data. In the context of capture-recapture models, we add a set of "all zero" encounter histories which are not, in practice, observable. The model of the augmented dataset is a zero-inflated version of either a binomial or a multinomial base model. Thus, our use of DA provides a general approach for analyzing both closed and open population models of all types. In doing so, this approach provides a unified framework for the analysis of a huge range of models that are treated as unrelated "black boxes" and named procedures in the classical literature. As a practical matter, analysis of the augmented dataset by MCMC is greatly simplified compared to other methods that require specialized algorithms. For example, complex capture-recapture models of an augmented dataset can be fitted with popular MCMC software packages (WinBUGS or JAGS) by providing a concise statement of the model's assumptions that usually involves only a few lines of pseudocode. In this paper, we review the basic technical concepts of data augmentation, and we provide examples of analyses of closed-population models (M 0, M h , distance sampling, and spatial capture-recapture models) and open-population models (Jolly-Seber) with individual effects.
A COMBINED SPECTROSCOPIC AND PHOTOMETRIC STELLAR ACTIVITY STUDY OF EPSILON ERIDANI

DOE Office of Scientific and Technical Information (OSTI.GOV)

Giguere, Matthew J.; Fischer, Debra A.; Zhang, Cyril X. Y.

2016-06-20

We present simultaneous ground-based radial velocity (RV) measurements and space-based photometric measurements of the young and active K dwarf Epsilon Eridani. These measurements provide a data set for exploring methods of identifying and ultimately distinguishing stellar photospheric velocities from Keplerian motion. We compare three methods we have used in exploring this data set: Dalmatian, an MCMC spot modeling code that fits photometric and RV measurements simultaneously; the FF′ method, which uses photometric measurements to predict the stellar activity signal in simultaneous RV measurements; and H α analysis. We show that our H α measurements are strongly correlated with the Microvariabilitymore » and Oscillations of STars telescope ( MOST ) photometry, which led to a promising new method based solely on the spectroscopic observations. This new method, which we refer to as the HH′ method, uses H α measurements as input into the FF′ model. While the Dalmatian spot modeling analysis and the FF′ method with MOST space-based photometry are currently more robust, the HH′ method only makes use of one of the thousands of stellar lines in the visible spectrum. By leveraging additional spectral activity indicators, we believe the HH′ method may prove quite useful in disentangling stellar signals.« less
xspec_emcee: XSPEC-friendly interface for the emcee package

NASA Astrophysics Data System (ADS)

Sanders, Jeremy

2018-05-01

XSPEC_EMCEE is an XSPEC-friendly interface for emcee (ascl:1303.002). It carries out MCMC analyses of X-ray spectra in the X-ray spectral fitting program XSPEC (ascl:9910.005). It can run multiple xspec processes simultaneously, speeding up the analysis, and can switch to parameterizing norm parameters in log space.
Bayesian Estimation of the Logistic Positive Exponent IRT Model

ERIC Educational Resources Information Center

Bolfarine, Heleno; Bazan, Jorge Luis

2010-01-01

A Bayesian inference approach using Markov Chain Monte Carlo (MCMC) is developed for the logistic positive exponent (LPE) model proposed by Samejima and for a new skewed Logistic Item Response Theory (IRT) model, named Reflection LPE model. Both models lead to asymmetric item characteristic curves (ICC) and can be appropriate because a symmetric…
Evaluation of the Multiple Careers Magnet and Assessment Centers at William B. Carrell, 1978-79.

ERIC Educational Resources Information Center

Maples, Wayne; And Others

The report evaluates Texas' Multiple Careers Magnet Center (MCMC), a part time program to provide special education secondary students with career training. It is explained that students enter one of six career education clusters: furniture repair and upholstery, general construction trades, building and grounds maintenance, laundry and dry…
Explicitly integrating parameter, input, and structure uncertainties into Bayesian Neural Networks for probabilistic hydrologic forecasting

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Xuesong; Liang, Faming; Yu, Beibei

2011-11-09

Estimating uncertainty of hydrologic forecasting is valuable to water resources and other relevant decision making processes. Recently, Bayesian Neural Networks (BNNs) have been proved powerful tools for quantifying uncertainty of streamflow forecasting. In this study, we propose a Markov Chain Monte Carlo (MCMC) framework to incorporate the uncertainties associated with input, model structure, and parameter into BNNs. This framework allows the structure of the neural networks to change by removing or adding connections between neurons and enables scaling of input data by using rainfall multipliers. The results show that the new BNNs outperform the BNNs that only consider uncertainties associatedmore » with parameter and model structure. Critical evaluation of posterior distribution of neural network weights, number of effective connections, rainfall multipliers, and hyper-parameters show that the assumptions held in our BNNs are not well supported. Further understanding of characteristics of different uncertainty sources and including output error into the MCMC framework are expected to enhance the application of neural networks for uncertainty analysis of hydrologic forecasting.« less
Bridging the gap between uncertainty analysis for complex watershed models and decision-making for watershed-scale water management

NASA Astrophysics Data System (ADS)

Zheng, Y.; Han, F.; Wu, B.

2013-12-01

Process-based, spatially distributed and dynamic models provide desirable resolutions to watershed-scale water management. However, their reliability in solving real management problems has been seriously questioned, since the model simulation usually involves significant uncertainty with complicated origins. Uncertainty analysis (UA) for complex hydrological models has been a hot topic in the past decade, and a variety of UA approaches have been developed, but mostly in a theoretical setting. Whether and how a UA could benefit real management decisions remains to be critical questions. We have conducted a series of studies to investigate the applicability of classic approaches, such as GLUE and Markov Chain Monte Carlo (MCMC) methods, in real management settings, unravel the difficulties encountered by such methods, and tailor the methods to better serve the management. Frameworks and new algorithms, such as Probabilistic Collocation Method (PCM)-based approaches, were also proposed for specific management issues. This presentation summarize our past and ongoing studies on the role of UA in real water management. Challenges and potential strategies to bridge the gap between UA for complex models and decision-making for management will be discussed. Future directions for the research in this field will also be suggested. Two common water management settings were examined. One is the Total Maximum Daily Loads (TMDLs) management for surface water quality protection. The other is integrated water resources management for watershed sustainability. For the first setting, nutrients and pesticides TMDLs in the Newport Bay Watershed (Orange Country, California, USA) were discussed. It is a highly urbanized region with a semi-arid Mediterranean climate, typical of the western U.S. For the second setting, the water resources management in the Zhangye Basin (the midstream part of Heihe Baisn, China), where the famous 'Silk Road' came through, was investigated. The Zhangye Basin has a Gobi-oasis system typical of the western China, with extensive agriculture in its oasis.
Variational methods to estimate terrestrial ecosystem model parameters

NASA Astrophysics Data System (ADS)

Delahaies, Sylvain; Roulstone, Ian

2016-04-01

Carbon is at the basis of the chemistry of life. Its ubiquity in the Earth system is the result of complex recycling processes. Present in the atmosphere in the form of carbon dioxide it is adsorbed by marine and terrestrial ecosystems and stored within living biomass and decaying organic matter. Then soil chemistry and a non negligible amount of time transform the dead matter into fossil fuels. Throughout this cycle, carbon dioxide is released in the atmosphere through respiration and combustion of fossils fuels. Model-data fusion techniques allow us to combine our understanding of these complex processes with an ever-growing amount of observational data to help improving models and predictions. The data assimilation linked ecosystem carbon (DALEC) model is a simple box model simulating the carbon budget allocation for terrestrial ecosystems. Over the last decade several studies have demonstrated the relative merit of various inverse modelling strategies (MCMC, ENKF, 4DVAR) to estimate model parameters and initial carbon stocks for DALEC and to quantify the uncertainty in the predictions. Despite its simplicity, DALEC represents the basic processes at the heart of more sophisticated models of the carbon cycle. Using adjoint based methods we study inverse problems for DALEC with various data streams (8 days MODIS LAI, monthly MODIS LAI, NEE). The framework of constraint optimization allows us to incorporate ecological common sense into the variational framework. We use resolution matrices to study the nature of the inverse problems and to obtain data importance and information content for the different type of data. We study how varying the time step affect the solutions, and we show how "spin up" naturally improves the conditioning of the inverse problems.
A menu-driven software package of Bayesian nonparametric (and parametric) mixed models for regression analysis and density estimation.

PubMed

Karabatsos, George

2017-02-01

Most of applied statistics involves regression analysis of data. In practice, it is important to specify a regression model that has minimal assumptions which are not violated by data, to ensure that statistical inferences from the model are informative and not misleading. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models, constructed from MATLAB Compiler. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal linear models. Each of the 78 regression models handles either a continuous, binary, or ordinal dependent variable, and can handle multi-level (grouped) data. All 83 Bayesian models can handle the analysis of weighted observations (e.g., for meta-analysis), and the analysis of left-censored, right-censored, and/or interval-censored data. Each BNP infinite-mixture model has a mixture distribution assigned one of various BNP prior distributions, including priors defined by either the Dirichlet process, Pitman-Yor process (including the normalized stable process), beta (two-parameter) process, normalized inverse-Gaussian process, geometric weights prior, dependent Dirichlet process, or the dependent infinite-probits prior. The software user can mouse-click to select a Bayesian model and perform data analysis via Markov chain Monte Carlo (MCMC) sampling. After the sampling completes, the software automatically opens text output that reports MCMC-based estimates of the model's posterior distribution and model predictive fit to the data. Additional text and/or graphical output can be generated by mouse-clicking other menu options. This includes output of MCMC convergence analyses, and estimates of the model's posterior predictive distribution, for selected functionals and values of covariates. The software is illustrated through the BNP regression analysis of real data.
Bayesian structured additive regression modeling of epidemic data: application to cholera

PubMed Central

2012-01-01

Background A significant interest in spatial epidemiology lies in identifying associated risk factors which enhances the risk of infection. Most studies, however, make no, or limited use of the spatial structure of the data, as well as possible nonlinear effects of the risk factors. Methods We develop a Bayesian Structured Additive Regression model for cholera epidemic data. Model estimation and inference is based on fully Bayesian approach via Markov Chain Monte Carlo (MCMC) simulations. The model is applied to cholera epidemic data in the Kumasi Metropolis, Ghana. Proximity to refuse dumps, density of refuse dumps, and proximity to potential cholera reservoirs were modeled as continuous functions; presence of slum settlers and population density were modeled as fixed effects, whereas spatial references to the communities were modeled as structured and unstructured spatial effects. Results We observe that the risk of cholera is associated with slum settlements and high population density. The risk of cholera is equal and lower for communities with fewer refuse dumps, but variable and higher for communities with more refuse dumps. The risk is also lower for communities distant from refuse dumps and potential cholera reservoirs. The results also indicate distinct spatial variation in the risk of cholera infection. Conclusion The study highlights the usefulness of Bayesian semi-parametric regression model analyzing public health data. These findings could serve as novel information to help health planners and policy makers in making effective decisions to control or prevent cholera epidemics. PMID:22866662
Detecting consistent patterns of directional adaptation using differential selection codon models.

PubMed

Parto, Sahar; Lartillot, Nicolas

2017-06-23

Phylogenetic codon models are often used to characterize the selective regimes acting on protein-coding sequences. Recent methodological developments have led to models explicitly accounting for the interplay between mutation and selection, by modeling the amino acid fitness landscape along the sequence. However, thus far, most of these models have assumed that the fitness landscape is constant over time. Fluctuations of the fitness landscape may often be random or depend on complex and unknown factors. However, some organisms may be subject to systematic changes in selective pressure, resulting in reproducible molecular adaptations across independent lineages subject to similar conditions. Here, we introduce a codon-based differential selection model, which aims to detect and quantify the fine-grained consistent patterns of adaptation at the protein-coding level, as a function of external conditions experienced by the organism under investigation. The model parameterizes the global mutational pressure, as well as the site- and condition-specific amino acid selective preferences. This phylogenetic model is implemented in a Bayesian MCMC framework. After validation with simulations, we applied our method to a dataset of HIV sequences from patients with known HLA genetic background. Our differential selection model detects and characterizes differentially selected coding positions specifically associated with two different HLA alleles. Our differential selection model is able to identify consistent molecular adaptations as a function of repeated changes in the environment of the organism. These models can be applied to many other problems, ranging from viral adaptation to evolution of life-history strategies in plants or animals.
Exact posterior computation in non-conjugate Gaussian location-scale parameters models

NASA Astrophysics Data System (ADS)

Andrade, J. A. A.; Rathie, P. N.

2017-12-01

In Bayesian analysis the class of conjugate models allows to obtain exact posterior distributions, however this class quite restrictive in the sense that it involves only a few distributions. In fact, most of the practical applications involves non-conjugate models, thus approximate methods, such as the MCMC algorithms, are required. Although these methods can deal with quite complex structures, some practical problems can make their applications quite time demanding, for example, when we use heavy-tailed distributions, convergence may be difficult, also the Metropolis-Hastings algorithm can become very slow, in addition to the extra work inevitably required on choosing efficient candidate generator distributions. In this work, we draw attention to the special functions as a tools for Bayesian computation, we propose an alternative method for obtaining the posterior distribution in Gaussian non-conjugate models in an exact form. We use complex integration methods based on the H-function in order to obtain the posterior distribution and some of its posterior quantities in an explicit computable form. Two examples are provided in order to illustrate the theory.
A semiparametric Bayesian proportional hazards model for interval censored data with frailty effects.

PubMed

Henschel, Volkmar; Engel, Jutta; Hölzel, Dieter; Mansmann, Ulrich

2009-02-10

Multivariate analysis of interval censored event data based on classical likelihood methods is notoriously cumbersome. Likelihood inference for models which additionally include random effects are not available at all. Developed algorithms bear problems for practical users like: matrix inversion, slow convergence, no assessment of statistical uncertainty. MCMC procedures combined with imputation are used to implement hierarchical models for interval censored data within a Bayesian framework. Two examples from clinical practice demonstrate the handling of clustered interval censored event times as well as multilayer random effects for inter-institutional quality assessment. The software developed is called survBayes and is freely available at CRAN. The proposed software supports the solution of complex analyses in many fields of clinical epidemiology as well as health services research.

The Joker: A custom Monte Carlo sampler for binary-star and exoplanet radial velocity data

NASA Astrophysics Data System (ADS)

Price-Whelan, Adrian M.; Hogg, David W.; Foreman-Mackey, Daniel; Rix, Hans-Walter

2017-01-01

Given sparse or low-quality radial-velocity measurements of a star, there are often many qualitatively different stellar or exoplanet companion orbit models that are consistent with the data. The consequent multimodality of the likelihood function leads to extremely challenging search, optimization, and MCMC posterior sampling over the orbital parameters. The Joker is a custom-built Monte Carlo sampler that can produce a posterior sampling for orbital parameters given sparse or noisy radial-velocity measurements, even when the likelihood function is poorly behaved. The method produces correct samplings in orbital parameters for data that include as few as three epochs. The Joker can therefore be used to produce proper samplings of multimodal pdfs, which are still highly informative and can be used in hierarchical (population) modeling.
A new approach for handling longitudinal count data with zero-inflation and overdispersion: poisson geometric process model.

PubMed

Wan, Wai-Yin; Chan, Jennifer S K

2009-08-01

For time series of count data, correlated measurements, clustering as well as excessive zeros occur simultaneously in biomedical applications. Ignoring such effects might contribute to misleading treatment outcomes. A generalized mixture Poisson geometric process (GMPGP) model and a zero-altered mixture Poisson geometric process (ZMPGP) model are developed from the geometric process model, which was originally developed for modelling positive continuous data and was extended to handle count data. These models are motivated by evaluating the trend development of new tumour counts for bladder cancer patients as well as by identifying useful covariates which affect the count level. The models are implemented using Bayesian method with Markov chain Monte Carlo (MCMC) algorithms and are assessed using deviance information criterion (DIC).
Building an ACT-R Reader for Eye-Tracking Corpus Data.

PubMed

Dotlačil, Jakub

2018-01-01

Cognitive architectures have often been applied to data from individual experiments. In this paper, I develop an ACT-R reader that can model a much larger set of data, eye-tracking corpus data. It is shown that the resulting model has a good fit to the data for the considered low-level processes. Unlike previous related works (most prominently, Engelmann, Vasishth, Engbert & Kliegl, ), the model achieves the fit by estimating free parameters of ACT-R using Bayesian estimation and Markov-Chain Monte Carlo (MCMC) techniques, rather than by relying on the mix of manual selection + default values. The method used in the paper is generalizable beyond this particular model and data set and could be used on other ACT-R models. Copyright © 2017 Cognitive Science Society, Inc.
Directional data analysis under the general projected normal distribution

PubMed Central

Wang, Fangpo; Gelfand, Alan E.

2013-01-01

The projected normal distribution is an under-utilized model for explaining directional data. In particular, the general version provides flexibility, e.g., asymmetry and possible bimodality along with convenient regression specification. Here, we clarify the properties of this general class. We also develop fully Bayesian hierarchical models for analyzing circular data using this class. We show how they can be fit using MCMC methods with suitable latent variables. We show how posterior inference for distributional features such as the angular mean direction and concentration can be implemented as well as how prediction within the regression setting can be handled. With regard to model comparison, we argue for an out-of-sample approach using both a predictive likelihood scoring loss criterion and a cumulative rank probability score criterion. PMID:24046539
MCMC Sampling for a Multilevel Model with Nonindependent Residuals within and between Cluster Units

ERIC Educational Resources Information Center

Browne, William; Goldstein, Harvey

2010-01-01

In this article, we discuss the effect of removing the independence assumptions between the residuals in two-level random effect models. We first consider removing the independence between the Level 2 residuals and instead assume that the vector of all residuals at the cluster level follows a general multivariate normal distribution. We…
A Markov Chain Monte Carlo Approach to Confirmatory Item Factor Analysis

ERIC Educational Resources Information Center

Edwards, Michael C.

2010-01-01

Item factor analysis has a rich tradition in both the structural equation modeling and item response theory frameworks. The goal of this paper is to demonstrate a novel combination of various Markov chain Monte Carlo (MCMC) estimation routines to estimate parameters of a wide variety of confirmatory item factor analysis models. Further, I show…
Exploring nonlinear feature space dimension reduction and data representation in breast Cadx with Laplacian eigenmaps and t-SNE.

PubMed

Jamieson, Andrew R; Giger, Maryellen L; Drukker, Karen; Li, Hui; Yuan, Yading; Bhooshan, Neha

2010-01-01

In this preliminary study, recently developed unsupervised nonlinear dimension reduction (DR) and data representation techniques were applied to computer-extracted breast lesion feature spaces across three separate imaging modalities: Ultrasound (U.S.) with 1126 cases, dynamic contrast enhanced magnetic resonance imaging with 356 cases, and full-field digital mammography with 245 cases. Two methods for nonlinear DR were explored: Laplacian eigenmaps [M. Belkin and P. Niyogi, "Laplacian eigenmaps for dimensionality reduction and data representation," Neural Comput. 15, 1373-1396 (2003)] and t-distributed stochastic neighbor embedding (t-SNE) [L. van der Maaten and G. Hinton, "Visualizing data using t-SNE," J. Mach. Learn. Res. 9, 2579-2605 (2008)]. These methods attempt to map originally high dimensional feature spaces to more human interpretable lower dimensional spaces while preserving both local and global information. The properties of these methods as applied to breast computer-aided diagnosis (CADx) were evaluated in the context of malignancy classification performance as well as in the visual inspection of the sparseness within the two-dimensional and three-dimensional mappings. Classification performance was estimated by using the reduced dimension mapped feature output as input into both linear and nonlinear classifiers: Markov chain Monte Carlo based Bayesian artificial neural network (MCMC-BANN) and linear discriminant analysis. The new techniques were compared to previously developed breast CADx methodologies, including automatic relevance determination and linear stepwise (LSW) feature selection, as well as a linear DR method based on principal component analysis. Using ROC analysis and 0.632+bootstrap validation, 95% empirical confidence intervals were computed for the each classifier's AUC performance. In the large U.S. data set, sample high performance results include, AUC0.632+ = 0.88 with 95% empirical bootstrap interval [0.787;0.895] for 13 ARD selected features and AUC0.632+ = 0.87 with interval [0.817;0.906] for four LSW selected features compared to 4D t-SNE mapping (from the original 81D feature space) giving AUC0.632+ = 0.90 with interval [0.847;0.919], all using the MCMC-BANN. Preliminary results appear to indicate capability for the new methods to match or exceed classification performance of current advanced breast lesion CADx algorithms. While not appropriate as a complete replacement of feature selection in CADx problems, DR techniques offer a complementary approach, which can aid elucidation of additional properties associated with the data. Specifically, the new techniques were shown to possess the added benefit of delivering sparse lower dimensional representations for visual interpretation, revealing intricate data structure of the feature space.
Snow Water Equivalent Retrieval By Markov Chain Monte Carlo Based on Memls and Hut Snow Emission Model

NASA Astrophysics Data System (ADS)

Pan, J.; Durand, M. T.; Vanderjagt, B. J.

2014-12-01

The Markov chain Monte Carlo (MCMC) method had been proved to be successful in snow water equivalent retrieval based on synthetic point-scale passive microwave brightness temperature (TB) observations. This method needs only general prior information about distribution of snow parameters, and could estimate layered snow properties, including the thickness, temperature, density and snow grain size (or exponential correlation length) of each layer. In this study, the multi-layer HUT (Helsinki University of Technology) model and the MEMLS (Microwave Emission Model of Layered Snowpacks) will be used as observation models to assimilate the observed TB into snow parameter prediction. Previous studies had shown that the multi-layer HUT model tends to underestimate TB at 37 GHz for deep snow, while the MEMLS does not show sensitivity of model bias to snow depth. Therefore, results using HUT model and MEMLS will be compared to see how the observation model will influence the retrieval of snow parameters. The radiometric measurements at 10.65, 18.7, 36.5 and 90 GHz at Sodankyla, Finland will be used as MCMC input, and the statistics of all snow property measurement will be used to calculate the prior information. 43 dry snowpits with complete measurements of all snow parameters will be used for validation. The entire dataset are from NorSREx (Nordic Snow Radar Experiment) experiments carried out by Juha Lemmetyinen, Anna Kontu and Jouni Pulliainen in FMI in 2009-2011 winters, and continued two more winters from 2011 to Spring of 2013. Besides the snow thickness and snow density that are directly related to snow water equivalent, other parameters will be compared with observations, too. For thin snow, the previous studies showed that influence of underlying soil is considerable, especially when the soil is half frozen with part of unfrozen liquid water and part of ice. Therefore, this study will also try to employ a simple frozen soil permittivity model to improve the performance of retrieval. The behavior of the Markov chain in soil parameters will be studied.
Cross-Border Sexual Transmission of the Newly Emerging HIV-1 Clade CRF51_01B

PubMed Central

Cheong, Hui Ting; Ng, Kim Tien; Ong, Lai Yee; Chook, Jack Bee; Chan, Kok Gan; Takebe, Yutaka; Kamarulzaman, Adeeba; Tee, Kok Keng

2014-01-01

A novel HIV-1 recombinant clade (CRF51_01B) was recently identified among men who have sex with men (MSM) in Singapore. As cases of sexually transmitted HIV-1 infection increase concurrently in two socioeconomically intimate countries such as Malaysia and Singapore, cross transmission of HIV-1 between said countries is highly probable. In order to investigate the timeline for the emergence of HIV-1 CRF51_01B in Singapore and its possible introduction into Malaysia, 595 HIV-positive subjects recruited in Kuala Lumpur from 2008 to 2012 were screened. Phylogenetic relationship of 485 amplified polymerase gene sequences was determined through neighbour-joining method. Next, near-full length sequences were amplified for genomic sequences inferred to be CRF51_01B and subjected to further analysis implemented through Bayesian Markov chain Monte Carlo (MCMC) sampling and maximum likelihood methods. Based on the near full length genomes, two isolates formed a phylogenetic cluster with CRF51_01B sequences of Singapore origin, sharing identical recombination structure. Spatial and temporal information from Bayesian MCMC coalescent and maximum likelihood analysis of the protease, gp120 and gp41 genes suggest that Singapore is probably the country of origin of CRF51_01B (as early as in the mid-1990s) and featured a Malaysian who acquired the infection through heterosexual contact as host for its ancestral lineages. CRF51_01B then spread rapidly among the MSM in Singapore and Malaysia. Although the importation of CRF51_01B from Singapore to Malaysia is supported by coalescence analysis, the narrow timeframe of the transmission event indicates a closely linked epidemic. Discrepancies in the estimated divergence times suggest that CRF51_01B may have arisen through multiple recombination events from more than one parental lineage. We report the cross transmission of a novel CRF51_01B lineage between countries that involved different sexual risk groups. Understanding the cross-border transmission of HIV-1 involving sexual networks is crucial for effective intervention strategies in the region. PMID:25340817
Seismic imaging of Q structures by a trans-dimensional coda-wave analysis

NASA Astrophysics Data System (ADS)

Takahashi, Tsutomu

2017-04-01

Wave scattering and intrinsic attenuation are important processes to describe incoherent and complex wave trains of high frequency seismic wave (>1Hz). The multiple lapse time window analysis (MLTWA) has been used to estimate scattering and intrinsic Q values by assuming constant Q in a study area (e.g., Hoshiba 1993). This study generalizes this MLTWA to estimate lateral variations of Q values under the Bayesian framework in dimension variable space. Study area is partitioned into small areas by means of the Voronoi tessellation. Scattering and intrinsic Q in each small area are constant. We define a misfit function for spatiotemporal variations of wave energy as with the original MLTWA, and maximize the posterior probability with changing not only Q values but the number and spatial layout of the Voronoi cells. This maximization is conducted by means of the reversible jump Markov chain Monte Carlo (rjMCMC) (Green 1995) since the number of unknown parameters (i.e., dimension of posterior probability) is variable. After a convergence to the maximum posterior, we estimate Q structures from the ensemble averages of MCMC samples around the maximum posterior probability. Synthetic tests showed stable reconstructions of input structures with reasonable error distributions. We applied this method for seismic waveform data recorded by ocean bottom seismograms at the outer-rise area off Tohoku, and estimated Q values at 4-8Hz, 8-16Hz and 16-32Hz. Intrinsic Q are nearly constant at all frequency bands, and scattering Q shows two distinct strong scattering regions at petit spot area and high seismicity area. These strong scattering are probably related to magma inclusions and fractured structure, respectively. Difference between these two areas becomes clear at high frequencies. It means that scale dependences of inhomogeneities or smaller scale inhomogeneity is important to discuss medium property and origins of structural variations. While the generalized MLTWA is based on a classical waveform modeling in constant Q medium, this method can be a fundamental basis for Q structure imaging in the crust.
Cross-border sexual transmission of the newly emerging HIV-1 clade CRF51_01B.

PubMed

Cheong, Hui Ting; Ng, Kim Tien; Ong, Lai Yee; Chook, Jack Bee; Chan, Kok Gan; Takebe, Yutaka; Kamarulzaman, Adeeba; Tee, Kok Keng

2014-01-01

A novel HIV-1 recombinant clade (CRF51_01B) was recently identified among men who have sex with men (MSM) in Singapore. As cases of sexually transmitted HIV-1 infection increase concurrently in two socioeconomically intimate countries such as Malaysia and Singapore, cross transmission of HIV-1 between said countries is highly probable. In order to investigate the timeline for the emergence of HIV-1 CRF51_01B in Singapore and its possible introduction into Malaysia, 595 HIV-positive subjects recruited in Kuala Lumpur from 2008 to 2012 were screened. Phylogenetic relationship of 485 amplified polymerase gene sequences was determined through neighbour-joining method. Next, near-full length sequences were amplified for genomic sequences inferred to be CRF51_01B and subjected to further analysis implemented through Bayesian Markov chain Monte Carlo (MCMC) sampling and maximum likelihood methods. Based on the near full length genomes, two isolates formed a phylogenetic cluster with CRF51_01B sequences of Singapore origin, sharing identical recombination structure. Spatial and temporal information from Bayesian MCMC coalescent and maximum likelihood analysis of the protease, gp120 and gp41 genes suggest that Singapore is probably the country of origin of CRF51_01B (as early as in the mid-1990s) and featured a Malaysian who acquired the infection through heterosexual contact as host for its ancestral lineages. CRF51_01B then spread rapidly among the MSM in Singapore and Malaysia. Although the importation of CRF51_01B from Singapore to Malaysia is supported by coalescence analysis, the narrow timeframe of the transmission event indicates a closely linked epidemic. Discrepancies in the estimated divergence times suggest that CRF51_01B may have arisen through multiple recombination events from more than one parental lineage. We report the cross transmission of a novel CRF51_01B lineage between countries that involved different sexual risk groups. Understanding the cross-border transmission of HIV-1 involving sexual networks is crucial for effective intervention strategies in the region.
Bayesian spatiotemporal model of fMRI data using transfer functions.

PubMed

Quirós, Alicia; Diez, Raquel Montes; Wilson, Simon P

2010-09-01

This research describes a new Bayesian spatiotemporal model to analyse BOLD fMRI studies. In the temporal dimension, we describe the shape of the hemodynamic response function (HRF) with a transfer function model. The spatial continuity and local homogeneity of the evoked responses are modelled by a Gaussian Markov random field prior on the parameter indicating activations. The proposal constitutes an extension of the spatiotemporal model presented in a previous approach [Quirós, A., Montes Diez, R. and Gamerman, D., 2010. Bayesian spatiotemporal model of fMRI data, Neuroimage, 49: 442-456], offering more flexibility in the estimation of the HRF and computational advantages in the resulting MCMC algorithm. Simulations from the model are performed in order to ascertain the performance of the sampling scheme and the ability of the posterior to estimate model parameters, as well as to check the model sensitivity to signal to noise ratio. Results are shown on synthetic data and on a real data set from a block-design fMRI experiment. Copyright (c) 2010 Elsevier Inc. All rights reserved.
Simultaneously constraining the astrophysics of reionisation and the epoch of heating with 21CMMC

NASA Astrophysics Data System (ADS)

Greig, Bradley; Mesinger, Andrei

2018-05-01

We extend our MCMC sampler of 3D EoR simulations, 21CMMC, to perform parameter estimation directly on light-cones of the cosmic 21cm signal. This brings theoretical analysis one step closer to matching the expected 21-cm signal from next generation interferometers like HERA and the SKA. Using the light-cone version of 21CMMC, we quantify biases in the recovered astrophysical parameters obtained from the 21cm power spectrum when using the co-eval approximation to fit a mock 3D light-cone observation. While ignoring the light-cone effect does not bias the parameters under most assumptions, it can still underestimate their uncertainties. However, significant biases (~few - 10 σ) are possible if all of the following conditions are met: (i) foreground removal is very efficient, allowing large physical scales (k ~ 0.1 Mpc-1) to be used in the analysis; (ii) theoretical modelling is accurate to ~10 per cent in the power spectrum amplitude; and (iii) the 21cm signal evolves rapidly (i.e. the epochs of reionisation and heating overlap significantly
A Bayesian Poisson-lognormal Model for Count Data for Multiple-Trait Multiple-Environment Genomic-Enabled Prediction.

PubMed

Montesinos-López, Osval A; Montesinos-López, Abelardo; Crossa, José; Toledo, Fernando H; Montesinos-López, José C; Singh, Pawan; Juliana, Philomin; Salinas-Ruiz, Josafhat

2017-05-05

When a plant scientist wishes to make genomic-enabled predictions of multiple traits measured in multiple individuals in multiple environments, the most common strategy for performing the analysis is to use a single trait at a time taking into account genotype × environment interaction (G × E), because there is a lack of comprehensive models that simultaneously take into account the correlated counting traits and G × E. For this reason, in this study we propose a multiple-trait and multiple-environment model for count data. The proposed model was developed under the Bayesian paradigm for which we developed a Markov Chain Monte Carlo (MCMC) with noninformative priors. This allows obtaining all required full conditional distributions of the parameters leading to an exact Gibbs sampler for the posterior distribution. Our model was tested with simulated data and a real data set. Results show that the proposed multi-trait, multi-environment model is an attractive alternative for modeling multiple count traits measured in multiple environments. Copyright © 2017 Montesinos-López et al.
A review and comparison of Bayesian and likelihood-based inferences in beta regression and zero-or-one-inflated beta regression.

PubMed

Liu, Fang; Eugenio, Evercita C

2018-04-01

Beta regression is an increasingly popular statistical technique in medical research for modeling of outcomes that assume values in (0, 1), such as proportions and patient reported outcomes. When outcomes take values in the intervals [0,1), (0,1], or [0,1], zero-or-one-inflated beta (zoib) regression can be used. We provide a thorough review on beta regression and zoib regression in the modeling, inferential, and computational aspects via the likelihood-based and Bayesian approaches. We demonstrate the statistical and practical importance of correctly modeling the inflation at zero/one rather than ad hoc replacing them with values close to zero/one via simulation studies; the latter approach can lead to biased estimates and invalid inferences. We show via simulation studies that the likelihood-based approach is computationally faster in general than MCMC algorithms used in the Bayesian inferences, but runs the risk of non-convergence, large biases, and sensitivity to starting values in the optimization algorithm especially with clustered/correlated data, data with sparse inflation at zero and one, and data that warrant regularization of the likelihood. The disadvantages of the regular likelihood-based approach make the Bayesian approach an attractive alternative in these cases. Software packages and tools for fitting beta and zoib regressions in both the likelihood-based and Bayesian frameworks are also reviewed.
Impacts of uncertainties in weather and streamflow observations in calibration and evaluation of an elevation distributed HBV-model

NASA Astrophysics Data System (ADS)

Engeland, K.; Steinsland, I.; Petersen-Øverleir, A.; Johansen, S.

2012-04-01

The aim of this study is to assess the uncertainties in streamflow simulations when uncertainties in both observed inputs (precipitation and temperature) and streamflow observations used in the calibration of the hydrological model are explicitly accounted for. To achieve this goal we applied the elevation distributed HBV model operating on daily time steps to a small catchment in high elevation in Southern Norway where the seasonal snow cover is important. The uncertainties in precipitation inputs were quantified using conditional simulation. This procedure accounts for the uncertainty related to the density of the precipitation network, but neglects uncertainties related to measurement bias/errors and eventual elevation gradients in precipitation. The uncertainties in temperature inputs were quantified using a Bayesian temperature interpolation procedure where the temperature lapse rate is re-estimated every day. The uncertainty in the lapse rate was accounted for whereas the sampling uncertainty related to network density was neglected. For every day a random sample of precipitation and temperature inputs were drawn to be applied as inputs to the hydrologic model. The uncertainties in observed streamflow were assessed based on the uncertainties in the rating curve model. A Bayesian procedure was applied to estimate the probability for rating curve models with 1 to 3 segments and the uncertainties in their parameters. This method neglects uncertainties related to errors in observed water levels. Note that one rating curve was drawn to make one realisation of a whole time series of streamflow, thus the rating curve errors lead to a systematic bias in the streamflow observations. All these uncertainty sources were linked together in both calibration and evaluation of the hydrologic model using a DREAM based MCMC routine. Effects of having less information (e.g. missing one streamflow measurement for defining the rating curve or missing one precipitation station) was also investigated.
Probability density of spatially distributed soil moisture inferred from crosshole georadar traveltime measurements

NASA Astrophysics Data System (ADS)

Linde, N.; Vrugt, J. A.

2009-04-01

Geophysical models are increasingly used in hydrological simulations and inversions, where they are typically treated as an artificial data source with known uncorrelated "data errors". The model appraisal problem in classical deterministic linear and non-linear inversion approaches based on linearization is often addressed by calculating model resolution and model covariance matrices. These measures offer only a limited potential to assign a more appropriate "data covariance matrix" for future hydrological applications, simply because the regularization operators used to construct a stable inverse solution bear a strong imprint on such estimates and because the non-linearity of the geophysical inverse problem is not explored. We present a parallelized Markov Chain Monte Carlo (MCMC) scheme to efficiently derive the posterior spatially distributed radar slowness and water content between boreholes given first-arrival traveltimes. This method is called DiffeRential Evolution Adaptive Metropolis (DREAM_ZS) with snooker updater and sampling from past states. Our inverse scheme does not impose any smoothness on the final solution, and uses uniform prior ranges of the parameters. The posterior distribution of radar slowness is converted into spatially distributed soil moisture values using a petrophysical relationship. To benchmark the performance of DREAM_ZS, we first apply our inverse method to a synthetic two-dimensional infiltration experiment using 9421 traveltimes contaminated with Gaussian errors and 80 different model parameters, corresponding to a model discretization of 0.3 m × 0.3 m. After this, the method is applied to field data acquired in the vadose zone during snowmelt. This work demonstrates that fully non-linear stochastic inversion can be applied with few limiting assumptions to a range of common two-dimensional tomographic geophysical problems. The main advantage of DREAM_ZS is that it provides a full view of the posterior distribution of spatially distributed soil moisture, which is key to appropriately treat geophysical parameter uncertainty and infer hydrologic models.
ExoSOFT: Exoplanet Simple Orbit Fitting Toolbox

NASA Astrophysics Data System (ADS)

Mede, Kyle; Brandt, Timothy D.

2017-08-01

ExoSOFT provides orbital analysis of exoplanets and binary star systems. It fits any combination of astrometric and radial velocity data, and offers four parameter space exploration techniques, including MCMC. It is packaged with an automated set of post-processing and plotting routines to summarize results, and is suitable for performing orbital analysis during surveys with new radial velocity and direct imaging instruments.
Just Another Gibbs Sampler (JAGS): Flexible Software for MCMC Implementation

ERIC Educational Resources Information Center

Depaoli, Sarah; Clifton, James P.; Cobb, Patrice R.

2016-01-01

A review of the software Just Another Gibbs Sampler (JAGS) is provided. We cover aspects related to history and development and the elements a user needs to know to get started with the program, including (a) definition of the data, (b) definition of the model, (c) compilation of the model, and (d) initialization of the model. An example using a…
Development and Tuning of a 3D Stochastic Inversion Methodology to the European Arctic

DTIC Science & Technology

2010-09-01

from previous studies covering the region, in particular from Breivik et al. (2002). Our MCMC algorithm shown in Figure 3 has two major components...criteria, Geophys. J. Int., 156: 483–496, doi:10.1111/j.1365-246X.2004.570 02070.x. Breivik , A., R. Mjelde, P. Grogan, H. Shimamura, Y. Murai, Y

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vrugt, Jasper A; Robinson, Bruce A; Ter Braak, Cajo J F

In recent years, a strong debate has emerged in the hydrologic literature regarding what constitutes an appropriate framework for uncertainty estimation. Particularly, there is strong disagreement whether an uncertainty framework should have its roots within a proper statistical (Bayesian) context, or whether such a framework should be based on a different philosophy and implement informal measures and weaker inference to summarize parameter and predictive distributions. In this paper, we compare a formal Bayesian approach using Markov Chain Monte Carlo (MCMC) with generalized likelihood uncertainty estimation (GLUE) for assessing uncertainty in conceptual watershed modeling. Our formal Bayesian approach is implemented usingmore » the recently developed differential evolution adaptive metropolis (DREAM) MCMC scheme with a likelihood function that explicitly considers model structural, input and parameter uncertainty. Our results demonstrate that DREAM and GLUE can generate very similar estimates of total streamflow uncertainty. This suggests that formal and informal Bayesian approaches have more common ground than the hydrologic literature and ongoing debate might suggest. The main advantage of formal approaches is, however, that they attempt to disentangle the effect of forcing, parameter and model structural error on total predictive uncertainty. This is key to improving hydrologic theory and to better understand and predict the flow of water through catchments.« less
On the Bayesian Treed Multivariate Gaussian Process with Linear Model of Coregionalization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Konomi, Bledar A.; Karagiannis, Georgios; Lin, Guang

2015-02-01

The Bayesian treed Gaussian process (BTGP) has gained popularity in recent years because it provides a straightforward mechanism for modeling non-stationary data and can alleviate computational demands by fitting models to less data. The extension of BTGP to the multivariate setting requires us to model the cross-covariance and to propose efficient algorithms that can deal with trans-dimensional MCMC moves. In this paper we extend the cross-covariance of the Bayesian treed multivariate Gaussian process (BTMGP) to that of linear model of Coregionalization (LMC) cross-covariances. Different strategies have been developed to improve the MCMC mixing and invert smaller matrices in the Bayesianmore » inference. Moreover, we compare the proposed BTMGP with existing multiple BTGP and BTMGP in test cases and multiphase flow computer experiment in a full scale regenerator of a carbon capture unit. The use of the BTMGP with LMC cross-covariance helped to predict the computer experiments relatively better than existing competitors. The proposed model has a wide variety of applications, such as computer experiments and environmental data. In the case of computer experiments we also develop an adaptive sampling strategy for the BTMGP with LMC cross-covariance function.« less
Dependence of paracentric inversion rate on tract length.

PubMed

York, Thomas L; Durrett, Rick; Nielsen, Rasmus

2007-04-03

We develop a Bayesian method based on MCMC for estimating the relative rates of pericentric and paracentric inversions from marker data from two species. The method also allows estimation of the distribution of inversion tract lengths. We apply the method to data from Drosophila melanogaster and D. yakuba. We find that pericentric inversions occur at a much lower rate compared to paracentric inversions. The average paracentric inversion tract length is approx. 4.8 Mb with small inversions being more frequent than large inversions. If the two breakpoints defining a paracentric inversion tract are uniformly and independently distributed over chromosome arms there will be more short tract-length inversions than long; we find an even greater preponderance of short tract lengths than this would predict. Thus there appears to be a correlation between the positions of breakpoints which favors shorter tract lengths. The method developed in this paper provides the first statistical estimator for estimating the distribution of inversion tract lengths from marker data. Application of this method for a number of data sets may help elucidate the relationship between the length of an inversion and the chance that it will get accepted.
Dependence of paracentric inversion rate on tract length

PubMed Central

York, Thomas L; Durrett, Rick; Nielsen, Rasmus

2007-01-01

Background We develop a Bayesian method based on MCMC for estimating the relative rates of pericentric and paracentric inversions from marker data from two species. The method also allows estimation of the distribution of inversion tract lengths. Results We apply the method to data from Drosophila melanogaster and D. yakuba. We find that pericentric inversions occur at a much lower rate compared to paracentric inversions. The average paracentric inversion tract length is approx. 4.8 Mb with small inversions being more frequent than large inversions. If the two breakpoints defining a paracentric inversion tract are uniformly and independently distributed over chromosome arms there will be more short tract-length inversions than long; we find an even greater preponderance of short tract lengths than this would predict. Thus there appears to be a correlation between the positions of breakpoints which favors shorter tract lengths. Conclusion The method developed in this paper provides the first statistical estimator for estimating the distribution of inversion tract lengths from marker data. Application of this method for a number of data sets may help elucidate the relationship between the length of an inversion and the chance that it will get accepted. PMID:17407601
Quantum Enhanced Inference in Markov Logic Networks

NASA Astrophysics Data System (ADS)

Wittek, Peter; Gogolin, Christian

2017-04-01

Markov logic networks (MLNs) reconcile two opposing schools in machine learning and artificial intelligence: causal networks, which account for uncertainty extremely well, and first-order logic, which allows for formal deduction. An MLN is essentially a first-order logic template to generate Markov networks. Inference in MLNs is probabilistic and it is often performed by approximate methods such as Markov chain Monte Carlo (MCMC) Gibbs sampling. An MLN has many regular, symmetric structures that can be exploited at both first-order level and in the generated Markov network. We analyze the graph structures that are produced by various lifting methods and investigate the extent to which quantum protocols can be used to speed up Gibbs sampling with state preparation and measurement schemes. We review different such approaches, discuss their advantages, theoretical limitations, and their appeal to implementations. We find that a straightforward application of a recent result yields exponential speedup compared to classical heuristics in approximate probabilistic inference, thereby demonstrating another example where advanced quantum resources can potentially prove useful in machine learning.
PyDREAM: high-dimensional parameter inference for biological models in python.

PubMed

Shockley, Erin M; Vrugt, Jasper A; Lopez, Carlos F; Valencia, Alfonso

2018-02-15

Biological models contain many parameters whose values are difficult to measure directly via experimentation and therefore require calibration against experimental data. Markov chain Monte Carlo (MCMC) methods are suitable to estimate multivariate posterior model parameter distributions, but these methods may exhibit slow or premature convergence in high-dimensional search spaces. Here, we present PyDREAM, a Python implementation of the (Multiple-Try) Differential Evolution Adaptive Metropolis [DREAM(ZS)] algorithm developed by Vrugt and ter Braak (2008) and Laloy and Vrugt (2012). PyDREAM achieves excellent performance for complex, parameter-rich models and takes full advantage of distributed computing resources, facilitating parameter inference and uncertainty estimation of CPU-intensive biological models. PyDREAM is freely available under the GNU GPLv3 license from the Lopez lab GitHub repository at http://github.com/LoLab-VU/PyDREAM. c.lopez@vanderbilt.edu. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Multivariate Copula Analysis Toolbox (MvCAT): Describing dependence and underlying uncertainty using a Bayesian framework

NASA Astrophysics Data System (ADS)

Sadegh, Mojtaba; Ragno, Elisa; AghaKouchak, Amir

2017-06-01

We present a newly developed Multivariate Copula Analysis Toolbox (MvCAT) which includes a wide range of copula families with different levels of complexity. MvCAT employs a Bayesian framework with a residual-based Gaussian likelihood function for inferring copula parameters and estimating the underlying uncertainties. The contribution of this paper is threefold: (a) providing a Bayesian framework to approximate the predictive uncertainties of fitted copulas, (b) introducing a hybrid-evolution Markov Chain Monte Carlo (MCMC) approach designed for numerical estimation of the posterior distribution of copula parameters, and (c) enabling the community to explore a wide range of copulas and evaluate them relative to the fitting uncertainties. We show that the commonly used local optimization methods for copula parameter estimation often get trapped in local minima. The proposed method, however, addresses this limitation and improves describing the dependence structure. MvCAT also enables evaluation of uncertainties relative to the length of record, which is fundamental to a wide range of applications such as multivariate frequency analysis.
Quantum Enhanced Inference in Markov Logic Networks.

PubMed

Wittek, Peter; Gogolin, Christian

2017-04-19

Markov logic networks (MLNs) reconcile two opposing schools in machine learning and artificial intelligence: causal networks, which account for uncertainty extremely well, and first-order logic, which allows for formal deduction. An MLN is essentially a first-order logic template to generate Markov networks. Inference in MLNs is probabilistic and it is often performed by approximate methods such as Markov chain Monte Carlo (MCMC) Gibbs sampling. An MLN has many regular, symmetric structures that can be exploited at both first-order level and in the generated Markov network. We analyze the graph structures that are produced by various lifting methods and investigate the extent to which quantum protocols can be used to speed up Gibbs sampling with state preparation and measurement schemes. We review different such approaches, discuss their advantages, theoretical limitations, and their appeal to implementations. We find that a straightforward application of a recent result yields exponential speedup compared to classical heuristics in approximate probabilistic inference, thereby demonstrating another example where advanced quantum resources can potentially prove useful in machine learning.
Quantum Enhanced Inference in Markov Logic Networks

PubMed Central

Wittek, Peter; Gogolin, Christian

2017-01-01

Markov logic networks (MLNs) reconcile two opposing schools in machine learning and artificial intelligence: causal networks, which account for uncertainty extremely well, and first-order logic, which allows for formal deduction. An MLN is essentially a first-order logic template to generate Markov networks. Inference in MLNs is probabilistic and it is often performed by approximate methods such as Markov chain Monte Carlo (MCMC) Gibbs sampling. An MLN has many regular, symmetric structures that can be exploited at both first-order level and in the generated Markov network. We analyze the graph structures that are produced by various lifting methods and investigate the extent to which quantum protocols can be used to speed up Gibbs sampling with state preparation and measurement schemes. We review different such approaches, discuss their advantages, theoretical limitations, and their appeal to implementations. We find that a straightforward application of a recent result yields exponential speedup compared to classical heuristics in approximate probabilistic inference, thereby demonstrating another example where advanced quantum resources can potentially prove useful in machine learning. PMID:28422093
On The Value at Risk Using Bayesian Mixture Laplace Autoregressive Approach for Modelling the Islamic Stock Risk Investment

NASA Astrophysics Data System (ADS)

Miftahurrohmah, Brina; Iriawan, Nur; Fithriasari, Kartika

2017-06-01

Stocks are known as the financial instruments traded in the capital market which have a high level of risk. Their risks are indicated by their uncertainty of their return which have to be accepted by investors in the future. The higher the risk to be faced, the higher the return would be gained. Therefore, the measurements need to be made against the risk. Value at Risk (VaR) as the most popular risk measurement method, is frequently ignore when the pattern of return is not uni-modal Normal. The calculation of the risks using VaR method with the Normal Mixture Autoregressive (MNAR) approach has been considered. This paper proposes VaR method couple with the Mixture Laplace Autoregressive (MLAR) that would be implemented for analysing the first three biggest capitalization Islamic stock return in JII, namely PT. Astra International Tbk (ASII), PT. Telekomunikasi Indonesia Tbk (TLMK), and PT. Unilever Indonesia Tbk (UNVR). Parameter estimation is performed by employing Bayesian Markov Chain Monte Carlo (MCMC) approaches.
Arbitrage and Volatility in Chinese Stock's Markets

NASA Astrophysics Data System (ADS)

Lu, Shu Quan; Ito, Takao; Zhang, Jianbo

From the point of view of no-arbitrage pricing, what matters is how much volatility the stock has, for volatility measures the amount of profit that can be made from shorting stocks and purchasing options. With the short-sales constraints or in the absence of options, however, high volatility is likely to mean arbitrage from stock market. As emerging stock markets for China, investors are increasingly concerned about volatilities of Chinese two stock markets. We estimate volatility's models for Chinese stock markets' indexes using Markov chain Monte Carlo (MCMC) method and GARCH. We find that estimated values of volatility parameters are very high for all data frequencies. It suggests that stock returns are extremely volatile even at long term intervals in Chinese markets. Furthermore, this result could be considered that there seems to be arbitrage opportunities in Chinese stock markets.
Numerical Demons in Monte Carlo Estimation of Bayesian Model Evidence with Application to Soil Respiration Models

NASA Astrophysics Data System (ADS)

Elshall, A. S.; Ye, M.; Niu, G. Y.; Barron-Gafford, G.

2016-12-01

Bayesian multimodel inference is increasingly being used in hydrology. Estimating Bayesian model evidence (BME) is of central importance in many Bayesian multimodel analysis such as Bayesian model averaging and model selection. BME is the overall probability of the model in reproducing the data, accounting for the trade-off between the goodness-of-fit and the model complexity. Yet estimating BME is challenging, especially for high dimensional problems with complex sampling space. Estimating BME using the Monte Carlo numerical methods is preferred, as the methods yield higher accuracy than semi-analytical solutions (e.g. Laplace approximations, BIC, KIC, etc.). However, numerical methods are prone the numerical demons arising from underflow of round off errors. Although few studies alluded to this issue, to our knowledge this is the first study that illustrates these numerical demons. We show that the precision arithmetic can become a threshold on likelihood values and Metropolis acceptance ratio, which results in trimming parameter regions (when likelihood function is less than the smallest floating point number that a computer can represent) and corrupting of the empirical measures of the random states of the MCMC sampler (when using log-likelihood function). We consider two of the most powerful numerical estimators of BME that are the path sampling method of thermodynamic integration (TI) and the importance sampling method of steppingstone sampling (SS). We also consider the two most widely used numerical estimators, which are the prior sampling arithmetic mean (AS) and posterior sampling harmonic mean (HM). We investigate the vulnerability of these four estimators to the numerical demons. Interesting, the most biased estimator, namely the HM, turned out to be the least vulnerable. While it is generally assumed that AM is a bias-free estimator that will always approximate the true BME by investing in computational effort, we show that arithmetic underflow can hamper AM resulting in severe underestimation of BME. TI turned out to be the most vulnerable, resulting in BME overestimation. Finally, we show how SS can be largely invariant to rounding errors, yielding the most accurate and computational efficient results. These research results are useful for MC simulations to estimate Bayesian model evidence.
Sworn testimony of the model evidence: Gaussian Mixture Importance (GAME) sampling

NASA Astrophysics Data System (ADS)

Volpi, Elena; Schoups, Gerrit; Firmani, Giovanni; Vrugt, Jasper A.

2017-07-01

What is the "best" model? The answer to this question lies in part in the eyes of the beholder, nevertheless a good model must blend rigorous theory with redeeming qualities such as parsimony and quality of fit. Model selection is used to make inferences, via weighted averaging, from a set of K candidate models, Mk; k=>(1,…,K>), and help identify which model is most supported by the observed data, Y>˜=>(y˜1,…,y˜n>). Here, we introduce a new and robust estimator of the model evidence, p>(Y>˜|Mk>), which acts as normalizing constant in the denominator of Bayes' theorem and provides a single quantitative measure of relative support for each hypothesis that integrates model accuracy, uncertainty, and complexity. However, p>(Y>˜|Mk>) is analytically intractable for most practical modeling problems. Our method, coined GAussian Mixture importancE (GAME) sampling, uses bridge sampling of a mixture distribution fitted to samples of the posterior model parameter distribution derived from MCMC simulation. We benchmark the accuracy and reliability of GAME sampling by application to a diverse set of multivariate target distributions (up to 100 dimensions) with known values of p>(Y>˜|Mk>) and to hypothesis testing using numerical modeling of the rainfall-runoff transformation of the Leaf River watershed in Mississippi, USA. These case studies demonstrate that GAME sampling provides robust and unbiased estimates of the evidence at a relatively small computational cost outperforming commonly used estimators. The GAME sampler is implemented in the MATLAB package of DREAM and simplifies considerably scientific inquiry through hypothesis testing and model selection.
A General Model for Estimating Macroevolutionary Landscapes.

PubMed

Boucher, Florian C; Démery, Vincent; Conti, Elena; Harmon, Luke J; Uyeda, Josef

2018-03-01

The evolution of quantitative characters over long timescales is often studied using stochastic diffusion models. The current toolbox available to students of macroevolution is however limited to two main models: Brownian motion and the Ornstein-Uhlenbeck process, plus some of their extensions. Here, we present a very general model for inferring the dynamics of quantitative characters evolving under both random diffusion and deterministic forces of any possible shape and strength, which can accommodate interesting evolutionary scenarios like directional trends, disruptive selection, or macroevolutionary landscapes with multiple peaks. This model is based on a general partial differential equation widely used in statistical mechanics: the Fokker-Planck equation, also known in population genetics as the Kolmogorov forward equation. We thus call the model FPK, for Fokker-Planck-Kolmogorov. We first explain how this model can be used to describe macroevolutionary landscapes over which quantitative traits evolve and, more importantly, we detail how it can be fitted to empirical data. Using simulations, we show that the model has good behavior both in terms of discrimination from alternative models and in terms of parameter inference. We provide R code to fit the model to empirical data using either maximum-likelihood or Bayesian estimation, and illustrate the use of this code with two empirical examples of body mass evolution in mammals. FPK should greatly expand the set of macroevolutionary scenarios that can be studied since it opens the way to estimating macroevolutionary landscapes of any conceivable shape. [Adaptation; bounds; diffusion; FPK model; macroevolution; maximum-likelihood estimation; MCMC methods; phylogenetic comparative data; selection.].
Kernel-Phase Interferometry for Super-Resolution Detection of Faint Companions

NASA Astrophysics Data System (ADS)

Factor, Samuel

2016-10-01

Direct detection of close in companions (binary systems or exoplanets) is notoriously difficult. While chronagraphs and point spread function (PSF) subtraction can be used to reduce contrast and dig out signals of companions under the PSF, there are still significant limitations in separation and contrast. While non-redundant aperture masking (NRM) interferometry can be used to detect companions well inside the PSF of a diffraction limited image, the mask discards 95% of the light gathered by the telescope and thus the technique is severely flux limited. Kernel-phase analysis applies interferometric techniques similar to NRM though utilizing the full aperture. Instead of closure-phases, kernel-phases are constructed from a grid of points on the full aperture, simulating a redundant interferometer. I propose to develop my own faint companion detection pipeline which utilizes an MCMC analysis of kernel-phases. I will search for new companions in archival images from NIC1 and ACS/HRC in order to constrain binary and planet formation models at separations inaccessible to previous techniques. Using this method, it is possible to detect a companion well within the classical l/D Rayleigh diffraction limit using a fraction of the telescope time as NRM. This technique can easily be applied to archival data as no mask is needed and will thus make the detection of close in companions cheap and simple as no additional observations are needed. Since the James Webb Space Telescope (JWST) will be able to perform NRM observations, further development and characterization of kernel-phase analysis will allow efficient use of highly competitive JWST telescope time.
Estimation of spatially varying heat transfer coefficient from a flat plate with flush mounted heat sources using Bayesian inference

NASA Astrophysics Data System (ADS)

Jakkareddy, Pradeep S.; Balaji, C.

2016-09-01

This paper employs the Bayesian based Metropolis Hasting - Markov Chain Monte Carlo algorithm to solve inverse heat transfer problem of determining the spatially varying heat transfer coefficient from a flat plate with flush mounted discrete heat sources with measured temperatures at the bottom of the plate. The Nusselt number is assumed to be of the form Nu = aReb(x/l)c . To input reasonable values of ’a’ and ‘b’ into the inverse problem, first limited two dimensional conjugate convection simulations were done with Comsol. Based on the guidance from this different values of ‘a’ and ‘b’ are input to a computationally less complex problem of conjugate conduction in the flat plate (15mm thickness) and temperature distributions at the bottom of the plate which is a more convenient location for measuring the temperatures without disturbing the flow were obtained. Since the goal of this work is to demonstrate the eficiacy of the Bayesian approach to accurately retrieve ‘a’ and ‘b’, numerically generated temperatures with known values of ‘a’ and ‘b’ are treated as ‘surrogate’ experimental data. The inverse problem is then solved by repeatedly using the forward solutions together with the MH-MCMC aprroach. To speed up the estimation, the forward model is replaced by an artificial neural network. The mean, maximum-a-posteriori and standard deviation of the estimated parameters ‘a’ and ‘b’ are reported. The robustness of the proposed method is examined, by synthetically adding noise to the temperatures.
Charming dark matter

NASA Astrophysics Data System (ADS)

Jubb, Thomas; Kirk, Matthew; Lenz, Alexander

2017-12-01

We have considered a model of Dark Minimal Flavour Violation (DMFV), in which a triplet of dark matter particles couple to right-handed up-type quarks via a heavy colour-charged scalar mediator. By studying a large spectrum of possible constraints, and assessing the entire parameter space using a Markov Chain Monte Carlo (MCMC), we can place strong restrictions on the allowed parameter space for dark matter models of this type.
Exploring nonlinear feature space dimension reduction and data representation in breast CADx with Laplacian eigenmaps and t-SNE

PubMed Central

Jamieson, Andrew R.; Giger, Maryellen L.; Drukker, Karen; Li, Hui; Yuan, Yading; Bhooshan, Neha

2010-01-01

Purpose: In this preliminary study, recently developed unsupervised nonlinear dimension reduction (DR) and data representation techniques were applied to computer-extracted breast lesion feature spaces across three separate imaging modalities: Ultrasound (U.S.) with 1126 cases, dynamic contrast enhanced magnetic resonance imaging with 356 cases, and full-field digital mammography with 245 cases. Two methods for nonlinear DR were explored: Laplacian eigenmaps [M. Belkin and P. Niyogi, “Laplacian eigenmaps for dimensionality reduction and data representation,” Neural Comput. 15, 1373–1396 (2003)] and t-distributed stochastic neighbor embedding (t-SNE) [L. van der Maaten and G. Hinton, “Visualizing data using t-SNE,” J. Mach. Learn. Res. 9, 2579–2605 (2008)]. Methods: These methods attempt to map originally high dimensional feature spaces to more human interpretable lower dimensional spaces while preserving both local and global information. The properties of these methods as applied to breast computer-aided diagnosis (CADx) were evaluated in the context of malignancy classification performance as well as in the visual inspection of the sparseness within the two-dimensional and three-dimensional mappings. Classification performance was estimated by using the reduced dimension mapped feature output as input into both linear and nonlinear classifiers: Markov chain Monte Carlo based Bayesian artificial neural network (MCMC-BANN) and linear discriminant analysis. The new techniques were compared to previously developed breast CADx methodologies, including automatic relevance determination and linear stepwise (LSW) feature selection, as well as a linear DR method based on principal component analysis. Using ROC analysis and 0.632+bootstrap validation, 95% empirical confidence intervals were computed for the each classifier’s AUC performance. Results: In the large U.S. data set, sample high performance results include, AUC0.632+=0.88 with 95% empirical bootstrap interval [0.787;0.895] for 13 ARD selected features and AUC0.632+=0.87 with interval [0.817;0.906] for four LSW selected features compared to 4D t-SNE mapping (from the original 81D feature space) giving AUC0.632+=0.90 with interval [0.847;0.919], all using the MCMC-BANN. Conclusions: Preliminary results appear to indicate capability for the new methods to match or exceed classification performance of current advanced breast lesion CADx algorithms. While not appropriate as a complete replacement of feature selection in CADx problems, DR techniques offer a complementary approach, which can aid elucidation of additional properties associated with the data. Specifically, the new techniques were shown to possess the added benefit of delivering sparse lower dimensional representations for visual interpretation, revealing intricate data structure of the feature space. PMID:20175497
Use of Markov Chain Monte Carlo analysis with a physiologically-based pharmacokinetic model of methylmercury to estimate exposures in US women of childbearing age.

PubMed

Allen, Bruce C; Hack, C Eric; Clewell, Harvey J

2007-08-01

A Bayesian approach, implemented using Markov Chain Monte Carlo (MCMC) analysis, was applied with a physiologically-based pharmacokinetic (PBPK) model of methylmercury (MeHg) to evaluate the variability of MeHg exposure in women of childbearing age in the U.S. population. The analysis made use of the newly available National Health and Nutrition Survey (NHANES) blood and hair mercury concentration data for women of age 16-49 years (sample size, 1,582). Bayesian analysis was performed to estimate the population variability in MeHg exposure (daily ingestion rate) implied by the variation in blood and hair concentrations of mercury in the NHANES database. The measured variability in the NHANES blood and hair data represents the result of a process that includes interindividual variation in exposure to MeHg and interindividual variation in the pharmacokinetics (distribution, clearance) of MeHg. The PBPK model includes a number of pharmacokinetic parameters (e.g., tissue volumes, partition coefficients, rate constants for metabolism and elimination) that can vary from individual to individual within the subpopulation of interest. Using MCMC analysis, it was possible to combine prior distributions of the PBPK model parameters with the NHANES blood and hair data, as well as with kinetic data from controlled human exposures to MeHg, to derive posterior distributions that refine the estimates of both the population exposure distribution and the pharmacokinetic parameters. In general, based on the populations surveyed by NHANES, the results of the MCMC analysis indicate that a small fraction, less than 1%, of the U.S. population of women of childbearing age may have mercury exposures greater than the EPA RfD for MeHg of 0.1 microg/kg/day, and that there are few, if any, exposures greater than the ATSDR MRL of 0.3 microg/kg/day. The analysis also indicates that typical exposures may be greater than previously estimated from food consumption surveys, but that the variability in exposure within the population of U.S. women of childbearing age may be less than previously assumed.
X-Ray Luminosity Functions of Normal Galaxies in the Great Observatories Origins Deep Survey

NASA Astrophysics Data System (ADS)

Ptak, Andrew; Mobasher, Bahram; Hornschemeier, Ann; Bauer, Franz; Norman, Colin

2007-10-01

We present soft (0.5-2 keV) X-ray luminosity functions (XLFs) in the Great Observatories Origins Deep Survey (GOODS) fields derived for galaxies at z~0.25 and 0.75. SED fitting was used to estimate photometric redshifts and separate galaxy types, resulting in a sample of 40 early-type galaxies and 46 late-type galaxies. We estimate k-corrections for both the X-ray/optical and X-ray/NIR flux ratios, which facilitates the separation of AGNs from the normal/starburst galaxies. We fit the XLFs with a power-law model using both traditional and Markov-Chain Monte Carlo (MCMC) procedures. A key advantage of the MCMC approach is that it explicitly takes into account upper limits and allows errors on ``derived'' quantities, such as luminosity densities, to be computed directly (i.e., without potentially questionable assumptions concerning the propagation of errors). The slopes of the early-type galaxy XLFs tend to be slightly flatter than the late-type galaxy XLFs, although the effect is significant at only the 90% and 97% levels for z~0.25 and 0.75. The XLFs differ between z<0.5 and z>0.5 at >99% significance levels for early-type, late-type, and all (early- and late-type) galaxies. We also fit Schechter and lognormal models to the XLFs, fitting the low- and high-redshift XLFs for a given sample simultaneously assuming only pure luminosity evolution. In the case of lognormal fits, the results of MCMC fitting of the local FIR luminosity function were used as priors for the faint- and bright-end slopes (similar to ``fixing'' these parameters at the FIR values, except here the FIR uncertainty is included). The best-fit values of the change in logL* with redshift were ΔlogL*=0.23+/-0.16 dex (for early-type galaxies) and 0.34+/-0.12 dex (for late-type galaxies), corresponding to (1+z)1.6 and (1+z)2.3. These results were insensitive to whether the Schechter or lognormal function was adopted.

Achieving sustainable ground-water management by using GIS-integrated simulation tools: the EU H2020 FREEWAT platform

NASA Astrophysics Data System (ADS)

Rossetto, Rudy; De Filippis, Giovanna; Borsi, Iacopo; Foglia, Laura; Toegl, Anja; Cannata, Massimiliano; Neumann, Jakob; Vazquez-Sune, Enric; Criollo, Rotman

2017-04-01

In order to achieve sustainable and participated ground-water management, innovative software built on the integration of numerical models within GIS software is a perfect candidate to provide a full characterization of quantitative and qualitative aspects of ground- and surface-water resources maintaining the time and spatial dimension. The EU H2020 FREEWAT project (FREE and open source software tools for WATer resource management; Rossetto et al., 2015) aims at simplifying the application of EU water-related Directives through an open-source and public-domain, GIS-integrated simulation platform for planning and management of ground- and surface-water resources. The FREEWAT platform allows to simulate the whole hydrological cycle, coupling the power of GIS geo-processing and post-processing tools in spatial data analysis with that of process-based simulation models. This results in a modeling environment where large spatial datasets can be stored, managed and visualized and where several simulation codes (mainly belonging to the USGS MODFLOW family) are integrated to simulate multiple hydrological, hydrochemical or economic processes. So far, the FREEWAT platform is a large plugin for the QGIS GIS desktop software and it integrates the following capabilities: • the AkvaGIS module allows to produce plots and statistics for the analysis and interpretation of hydrochemical and hydrogeological data; • the Observation Analysis Tool, to facilitate the import, analysis and visualization of time-series data and the use of these data to support model construction and calibration; • groundwater flow simulation in the saturated and unsaturated zones may be simulated using MODFLOW-2005 (Harbaugh, 2005); • multi-species advective-dispersive transport in the saturated zone can be simulated using MT3DMS (Zheng & Wang, 1999); the possibility to simulate viscosity- and density-dependent flows is further accomplished through SEAWAT (Langevin et al., 2007); • sustainable management of combined use of ground- and surface-water resources in rural environments is accomplished by the Farm Process module embedded in MODFLOW-OWHM (Hanson et al., 2014), which allows to dynamically integrate crop water demand and supply from ground- and surface-water; • UCODE_2014 (Poeter et al., 2014) is implemented to perform sensitivity analysis and parameter estimation to improve the model fit through an inverse, regression method based on the evaluation of an objective function. Through creating a common environment among water research/professionals, policy makers and implementers, FREEWAT aims at enhancing science and participatory approach and evidence-based decision making in water resource management, hence producing relevant outcomes for policy implementation. Acknowledgements This paper is presented within the framework of the project FREEWAT, which has received funding from the European Union's HORIZON 2020 research and innovation programme under Grant Agreement n. 642224. References Hanson, R.T., Boyce, S.E., Schmid, W., Hughes, J.D., Mehl, S.M., Leake, S.A., Maddock, T., Niswonger, R.G. One-Water Hydrologic Flow Model (MODFLOW-OWHM), U.S. Geological Survey, Techniques and Methods 6-A51, 2014 134 p. Harbaugh A.W. (2005) - MODFLOW-2005, The U.S. Geological Survey Modular Ground-Water Model - the Ground-Water Flow Process. U.S. Geological Survey, Techniques and Methods 6-A16, 253 p. Langevin C.D., Thorne D.T. Jr., Dausman A.M., Sukop M.C. & Guo Weixing (2007) - SEAWAT Version 4: A Computer Program for Simulation of Multi-Species Solute and Heat Transport. U.S. Geological Survey Techniques and Methods 6-A22, 39 pp. Poeter E.P., Hill M.C., Lu D., Tiedeman C.R. & Mehl S. (2014) - UCODE_2014, with new capabilities to define parameters unique to predictions, calculate weights using simulated values, estimate parameters with SVD, evaluate uncertainty with MCMC, and more. Integrated Groundwater Modeling Center Report Number GWMI 2014-02. Rossetto, R., Borsi, I. & Foglia, L. FREEWAT: FREE and open source software tools for WATer resource management, Rendiconti Online Società Geologica Italiana, 2015, 35, 252-255. Zheng C. & Wang P.P. (1999) - MT3DMS, A modular three-dimensional multi-species transport model for simulation of advection, dispersion and chemical reactions of contaminants in groundwater systems. U.S. Army Engineer Research and Development Center Contract Report SERDP-99-1, Vicksburg, MS, 202 pp.
Estimating the Properties of Hard X-Ray Solar Flares by Constraining Model Parameters

NASA Technical Reports Server (NTRS)

Ireland, J.; Tolbert, A. K.; Schwartz, R. A.; Holman, G. D.; Dennis, B. R.

2013-01-01

We wish to better constrain the properties of solar flares by exploring how parameterized models of solar flares interact with uncertainty estimation methods. We compare four different methods of calculating uncertainty estimates in fitting parameterized models to Ramaty High Energy Solar Spectroscopic Imager X-ray spectra, considering only statistical sources of error. Three of the four methods are based on estimating the scale-size of the minimum in a hypersurface formed by the weighted sum of the squares of the differences between the model fit and the data as a function of the fit parameters, and are implemented as commonly practiced. The fourth method is also based on the difference between the data and the model, but instead uses Bayesian data analysis and Markov chain Monte Carlo (MCMC) techniques to calculate an uncertainty estimate. Two flare spectra are modeled: one from the Geostationary Operational Environmental Satellite X1.3 class flare of 2005 January 19, and the other from the X4.8 flare of 2002 July 23.We find that the four methods give approximately the same uncertainty estimates for the 2005 January 19 spectral fit parameters, but lead to very different uncertainty estimates for the 2002 July 23 spectral fit. This is because each method implements different analyses of the hypersurface, yielding method-dependent results that can differ greatly depending on the shape of the hypersurface. The hypersurface arising from the 2005 January 19 analysis is consistent with a normal distribution; therefore, the assumptions behind the three non- Bayesian uncertainty estimation methods are satisfied and similar estimates are found. The 2002 July 23 analysis shows that the hypersurface is not consistent with a normal distribution, indicating that the assumptions behind the three non-Bayesian uncertainty estimation methods are not satisfied, leading to differing estimates of the uncertainty. We find that the shape of the hypersurface is crucial in understanding the output from each uncertainty estimation technique, and that a crucial factor determining the shape of hypersurface is the location of the low-energy cutoff relative to energies where the thermal emission dominates. The Bayesian/MCMC approach also allows us to provide detailed information on probable values of the low-energy cutoff, Ec, a crucial parameter in defining the energy content of the flare-accelerated electrons. We show that for the 2002 July 23 flare data, there is a 95% probability that Ec lies below approximately 40 keV, and a 68% probability that it lies in the range 7-36 keV. Further, the low-energy cutoff is more likely to be in the range 25-35 keV than in any other 10 keV wide energy range. The low-energy cutoff for the 2005 January 19 flare is more tightly constrained to 107 +/- 4 keV with 68% probability.
Parameter estimation of multivariate multiple regression model using bayesian with non-informative Jeffreys’ prior distribution

NASA Astrophysics Data System (ADS)

Saputro, D. R. S.; Amalia, F.; Widyaningsih, P.; Affan, R. C.

2018-05-01

Bayesian method is a method that can be used to estimate the parameters of multivariate multiple regression model. Bayesian method has two distributions, there are prior and posterior distributions. Posterior distribution is influenced by the selection of prior distribution. Jeffreys’ prior distribution is a kind of Non-informative prior distribution. This prior is used when the information about parameter not available. Non-informative Jeffreys’ prior distribution is combined with the sample information resulting the posterior distribution. Posterior distribution is used to estimate the parameter. The purposes of this research is to estimate the parameters of multivariate regression model using Bayesian method with Non-informative Jeffreys’ prior distribution. Based on the results and discussion, parameter estimation of β and Σ which were obtained from expected value of random variable of marginal posterior distribution function. The marginal posterior distributions for β and Σ are multivariate normal and inverse Wishart. However, in calculation of the expected value involving integral of a function which difficult to determine the value. Therefore, approach is needed by generating of random samples according to the posterior distribution characteristics of each parameter using Markov chain Monte Carlo (MCMC) Gibbs sampling algorithm.
Bayesian inversion of refraction seismic traveltime data

NASA Astrophysics Data System (ADS)

Ryberg, T.; Haberland, Ch

2018-03-01

We apply a Bayesian Markov chain Monte Carlo (McMC) formalism to the inversion of refraction seismic, traveltime data sets to derive 2-D velocity models below linear arrays (i.e. profiles) of sources and seismic receivers. Typical refraction data sets, especially when using the far-offset observations, are known as having experimental geometries which are very poor, highly ill-posed and far from being ideal. As a consequence, the structural resolution quickly degrades with depth. Conventional inversion techniques, based on regularization, potentially suffer from the choice of appropriate inversion parameters (i.e. number and distribution of cells, starting velocity models, damping and smoothing constraints, data noise level, etc.) and only local model space exploration. McMC techniques are used for exhaustive sampling of the model space without the need of prior knowledge (or assumptions) of inversion parameters, resulting in a large number of models fitting the observations. Statistical analysis of these models allows to derive an average (reference) solution and its standard deviation, thus providing uncertainty estimates of the inversion result. The highly non-linear character of the inversion problem, mainly caused by the experiment geometry, does not allow to derive a reference solution and error map by a simply averaging procedure. We present a modified averaging technique, which excludes parts of the prior distribution in the posterior values due to poor ray coverage, thus providing reliable estimates of inversion model properties even in those parts of the models. The model is discretized by a set of Voronoi polygons (with constant slowness cells) or a triangulated mesh (with interpolation within the triangles). Forward traveltime calculations are performed by a fast, finite-difference-based eikonal solver. The method is applied to a data set from a refraction seismic survey from Northern Namibia and compared to conventional tomography. An inversion test for a synthetic data set from a known model is also presented.
Benefits of seasonal forecasts of crop yields

NASA Astrophysics Data System (ADS)

Sakurai, G.; Okada, M.; Nishimori, M.; Yokozawa, M.

2017-12-01

Major factors behind recent fluctuations in food prices include increased biofuel production and oil price fluctuations. In addition, several extreme climate events that reduced worldwide food production coincided with upward spikes in food prices. The stabilization of crop yields is one of the most important tasks to stabilize food prices and thereby enhance food security. Recent development of technologies related to crop modeling and seasonal weather forecasting has made it possible to forecast future crop yields for maize and soybean. However, the effective use of these technologies remains limited. Here we present the potential benefits of seasonal crop-yield forecasts on a global scale for choice of planting day. For this purpose, we used a model (PRYSBI-2) that can well replicate past crop yields both for maize and soybean. This model system uses a Bayesian statistical approach to estimate the parameters of a basic process-based model of crop growth. The spatial variability of model parameters was considered by estimating the posterior distribution of the parameters from historical yield data by using the Markov-chain Monte Carlo (MCMC) method with a resolution of 1.125° × 1.125°. The posterior distributions of model parameters were estimated for each spatial grid with 30 000 MCMC steps of 10 chains each. By using this model and the estimated parameter distributions, we were able to estimate not only crop yield but also levels of associated uncertainty. We found that the global average crop yield increased about 30% as the result of the optimal selection of planting day and that the seasonal forecast of crop yield had a large benefit in and near the eastern part of Brazil and India for maize and the northern area of China for soybean. In these countries, the effects of El Niño and Indian Ocean dipole are large. The results highlight the importance of developing a system to forecast global crop yields.
Characterising the Dense Molecular Gas in Exceptional Local Galaxies

NASA Astrophysics Data System (ADS)

Tunnard, Richard C. A.

2016-08-01

The interferometric facilities now coming online (the Atacama Large Millimetre Array (ALMA) and the NOrthern Extended Millimeter Array (NOEMA)) and those planned for the coming decade (the Next Generation Very Large Array (ngVLA) and the Square Kilometre Array (SKA)) in the radio to sub-millimetre regimes are opening a window to the molecular gas in high-redshift galaxies. However, our understanding of similar galaxies in the local universe is still far from complete and the data analysis techniques and tools needed to interpret the observations in consistent and comparable ways are yet to be developed. I first describe the Monte Carlo Markov Chain (MCMC) script developed to empower a public radiative transfer code. I characterise both the public code and MCMC script, including an exploration of the effect of observing molecular lines at high redshift where the Cosmic Microwave Background (CMB) can provide a significant background, as well as the effect this can have on well-known local correlations. I present two studies of ultraluminous infrared galaxies (ULIRGs) in the local universe making use of literature and collaborator data. In the first of these, NGC6240, I use the wealth of available data and the geometry of the source to develop a multi-phase, multi-species model, finding evidence for a complex medium of hot diffuse and cold dense gas in pressure equilibrium. Next, I study the prototypical ULIRG Arp 220; an extraordinary galaxy rendered especially interesting by the controversy over the power source of the western of the two merger nuclei and its immense luminosity and dust obscuration. Using traditional grid based methods I explore the molecular gas conditions within the nuclei and find evidence for chemical differentiation between the two nuclei, potentially related to the obscured power source. Finally, I investigate the potential evolution of proto-clusters over cosmic time with sub-millimetre observations of 14 radio galaxies, unexpectedly finding little to no evidence for cluster evolution.
Phylodynamic Analysis Reveals CRF01_AE Dissemination between Japan and Neighboring Asian Countries and the Role of Intravenous Drug Use in Transmission

PubMed Central

Shiino, Teiichiro; Hattori, Junko; Yokomaku, Yoshiyuki; Iwatani, Yasumasa; Sugiura, Wataru

2014-01-01

Background One major circulating HIV-1 subtype in Southeast Asian countries is CRF01_AE, but little is known about its epidemiology in Japan. We conducted a molecular phylodynamic study of patients newly diagnosed with CRF01_AE from 2003 to 2010. Methods Plasma samples from patients registered in Japanese Drug Resistance HIV-1 Surveillance Network were analyzed for protease-reverse transcriptase sequences; all sequences undergo subtyping and phylogenetic analysis using distance-matrix-based, maximum likelihood and Bayesian coalescent Markov Chain Monte Carlo (MCMC) phylogenetic inferences. Transmission clusters were identified using interior branch test and depth-first searches for sub-tree partitions. Times of most recent common ancestor (tMRCAs) of significant clusters were estimated using Bayesian MCMC analysis. Results Among 3618 patient registered in our network, 243 were infected with CRF01_AE. The majority of individuals with CRF01_AE were Japanese, predominantly male, and reported heterosexual contact as their risk factor. We found 5 large clusters with ≥5 members and 25 small clusters consisting of pairs of individuals with highly related CRF01_AE strains. The earliest cluster showed a tMRCA of 1996, and consisted of individuals with their known risk as heterosexual contacts. The other four large clusters showed later tMRCAs between 2000 and 2002 with members including intravenous drug users (IVDU) and non-Japanese, but not men who have sex with men (MSM). In contrast, small clusters included a high frequency of individuals reporting MSM risk factors. Phylogenetic analysis also showed that some individuals infected with HIV strains spread in East and South-eastern Asian countries. Conclusions Introduction of CRF01_AE viruses into Japan is estimated to have occurred in the 1990s. CFR01_AE spread via heterosexual behavior, then among persons connected with non-Japanese, IVDU, and MSM. Phylogenetic analysis demonstrated that some viral variants are largely restricted to Japan, while others have a broad geographic distribution. PMID:25025900
Accounting for model error in Bayesian solutions to hydrogeophysical inverse problems using a local basis approach

NASA Astrophysics Data System (ADS)

Irving, J.; Koepke, C.; Elsheikh, A. H.

2017-12-01

Bayesian solutions to geophysical and hydrological inverse problems are dependent upon a forward process model linking subsurface parameters to measured data, which is typically assumed to be known perfectly in the inversion procedure. However, in order to make the stochastic solution of the inverse problem computationally tractable using, for example, Markov-chain-Monte-Carlo (MCMC) methods, fast approximations of the forward model are commonly employed. This introduces model error into the problem, which has the potential to significantly bias posterior statistics and hamper data integration efforts if not properly accounted for. Here, we present a new methodology for addressing the issue of model error in Bayesian solutions to hydrogeophysical inverse problems that is geared towards the common case where these errors cannot be effectively characterized globally through some parametric statistical distribution or locally based on interpolation between a small number of computed realizations. Rather than focusing on the construction of a global or local error model, we instead work towards identification of the model-error component of the residual through a projection-based approach. In this regard, pairs of approximate and detailed model runs are stored in a dictionary that grows at a specified rate during the MCMC inversion procedure. At each iteration, a local model-error basis is constructed for the current test set of model parameters using the K-nearest neighbour entries in the dictionary, which is then used to separate the model error from the other error sources before computing the likelihood of the proposed set of model parameters. We demonstrate the performance of our technique on the inversion of synthetic crosshole ground-penetrating radar traveltime data for three different subsurface parameterizations of varying complexity. The synthetic data are generated using the eikonal equation, whereas a straight-ray forward model is assumed in the inversion procedure. In each case, the developed model-error approach enables to remove posterior bias and obtain a more realistic characterization of uncertainty.
Inferences of biogeographical histories within subfamily Hyacinthoideae using S-DIVA and Bayesian binary MCMC analysis implemented in RASP (Reconstruct Ancestral State in Phylogenies)

PubMed Central

Ali, Syed Shujait; Yu, Yan; Pfosser, Martin; Wetschnig, Wolfgang

2012-01-01

Background and Aims Subfamily Hyacinthoideae (Hyacinthaceae) comprises more than 400 species. Members are distributed in sub-Saharan Africa, Madagascar, India, eastern Asia, the Mediterranean region and Eurasia. Hyacinthoideae, like many other plant lineages, show disjunct distribution patterns. The aim of this study was to reconstruct the biogeographical history of Hyacinthoideae based on phylogenetic analyses, to find the possible ancestral range of Hyacinthoideae and to identify factors responsible for the current disjunct distribution pattern. Methods Parsimony and Bayesian approaches were applied to obtain phylogenetic trees, based on sequences of the trnL-F region. Biogeographical inferences were obtained by applying statistical dispersal-vicariance analysis (S-DIVA) and Bayesian binary MCMC (BBM) analysis implemented in RASP (Reconstruct Ancestral State in Phylogenies). Key Results S-DIVA and BBM analyses suggest that the Hyacinthoideae clade seem to have originated in sub-Saharan Africa. Dispersal and vicariance played vital roles in creating the disjunct distribution pattern. Results also suggest an early dispersal to the Mediterranean region, and thus the northward route (from sub-Saharan Africa to Mediterranean) of dispersal is plausible for members of subfamily Hyacinthoideae. Conclusions Biogeographical analyses reveal that subfamily Hyacinthoideae has originated in sub-Saharan Africa. S-DIVA indicates an early dispersal event to the Mediterranean region followed by a vicariance event, which resulted in Hyacintheae and Massonieae tribes. By contrast, BBM analysis favours dispersal to the Mediterranean region, eastern Asia and Europe. Biogeographical analysis suggests that sub-Saharan Africa and the Mediterranean region have played vital roles as centres of diversification and radiation within subfamily Hyacinthoideae. In this bimodal distribution pattern, sub-Saharan Africa is the primary centre of diversity and the Mediterranean region is the secondary centre of diversity. Sub-Saharan Africa was the source area for radiation toward Madagascar, the Mediterranean region and India. Radiations occurred from the Mediterranean region to eastern Asia, Europe, western Asia and India. PMID:22039008
Discovering Social Circles in Ego Networks (Author’s Manuscript)

DTIC Science & Technology

2013-01-10

ego-network. We expect that circles are formed by densely-connected sets of alters ( Newman , 2006). However, different circles overlap heavily, i.e...umbrella of community detection (Lancichinetti and Fortunato, 2009a; Schaeffer, 2007; Leskovec et al., 2010; Porter et al., 2009; Newman , 2004). While...MCMC) sampler ( Newman and Barkema, 1999) which efficiently updates node-community memberships by ‘collapsing’ nodes that have common features and
Evaluation of Electromagnetic Induction (EMI) Resistivity Technologies for Assessing Permafrost Geomorphologies

DTIC Science & Technology

2016-08-01

Structures Laboratory MCMC Markov Chain Monte Carlo NAPL Non -Aqueous Phase Liquids ppm Parts per Million R&D Research and Development SERDP... Research Engineering Labora- tory (CRREL) and Fridon Shubitidze at Dartmouth College lead a research group that has constructed several research -grade...by Bar- rowes’ research group to obtain EC over lines or areas. Another possibility is to use unmanned helicopters to acquire data over larger areas
Modelling past land use using archaeological and pollen data

NASA Astrophysics Data System (ADS)

Pirzamanbein, Behnaz; Lindström, johan; Poska, Anneli; Gaillard-Lemdahl, Marie-José

2016-04-01

Accurate maps of past land use are necessary for studying the impact of anthropogenic land-cover changes on climate and biodiversity. We develop a Bayesian hierarchical model to reconstruct the land use using Gaussian Markov random fields. The model uses two observations sets: 1) archaeological data, representing human settlements, urbanization and agricultural findings; and 2) pollen-based land estimates of the three land-cover types Coniferous forest, Broadleaved forest and Unforested/Open land. The pollen based estimates are obtained from the REVEALS model, based on pollen counts from lakes and bogs. Our developed model uses the sparse pollen-based estimations to reconstruct the spatial continuous cover of three land cover types. Using the open-land component and the archaeological data, the extent of land-use is reconstructed. The model is applied on three time periods - centred around 1900 CE, 1000 and, 4000 BCE over Sweden for which both pollen-based estimates and archaeological data are available. To estimate the model parameters and land use, a block updated Markov chain Monte Carlo (MCMC) algorithm is applied. Using the MCMC posterior samples uncertainties in land-use predictions are computed. Due to lack of good historic land use data, model results are evaluated by cross-validation. Keywords. Spatial reconstruction, Gaussian Markov random field, Fossil pollen records, Archaeological data, Human land-use, Prediction uncertainty
Bayesian inversion of seismic and electromagnetic data for marine gas reservoir characterization using multi-chain Markov chain Monte Carlo sampling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ren, Huiying; Ray, Jaideep; Hou, Zhangshuan

In this study we developed an efficient Bayesian inversion framework for interpreting marine seismic amplitude versus angle (AVA) and controlled source electromagnetic (CSEM) data for marine reservoir characterization. The framework uses a multi-chain Markov-chain Monte Carlo (MCMC) sampler, which is a hybrid of DiffeRential Evolution Adaptive Metropolis (DREAM) and Adaptive Metropolis (AM) samplers. The inversion framework is tested by estimating reservoir-fluid saturations and porosity based on marine seismic and CSEM data. The multi-chain MCMC is scalable in terms of the number of chains, and is useful for computationally demanding Bayesian model calibration in scientific and engineering problems. As a demonstration,more » the approach is used to efficiently and accurately estimate the porosity and saturations in a representative layered synthetic reservoir. The results indicate that the seismic AVA and CSEM joint inversion provides better estimation of reservoir saturations than the seismic AVA-only inversion, especially for the parameters in deep layers. The performance of the inversion approach for various levels of noise in observational data was evaluated – reasonable estimates can be obtained with noise levels up to 25%. Sampling efficiency due to the use of multiple chains was also checked and was found to have almost linear scalability.« less
Forecast and analysis of the cosmological redshift drift.

PubMed

Lazkoz, Ruth; Leanizbarrutia, Iker; Salzano, Vincenzo

2018-01-01

The cosmological redshift drift could lead to the next step in high-precision cosmic geometric observations, becoming a direct and irrefutable test for cosmic acceleration. In order to test the viability and possible properties of this effect, also called Sandage-Loeb (SL) test, we generate a model-independent mock data set in order to compare its constraining power with that of the future mock data sets of Type Ia Supernovae (SNe) and Baryon Acoustic Oscillations (BAO). The performance of those data sets is analyzed by testing several cosmological models with the Markov chain Monte Carlo (MCMC) method, both independently as well as combining all data sets. Final results show that, in general, SL data sets allow for remarkable constraints on the matter density parameter today [Formula: see text] on every tested model, showing also a great complementarity with SNe and BAO data regarding dark energy parameters.
Bayesian analysis of stochastic volatility-in-mean model with leverage and asymmetrically heavy-tailed error using generalized hyperbolic skew Student’s t-distribution*

PubMed Central

Leão, William L.; Chen, Ming-Hui

2017-01-01

A stochastic volatility-in-mean model with correlated errors using the generalized hyperbolic skew Student-t (GHST) distribution provides a robust alternative to the parameter estimation for daily stock returns in the absence of normality. An efficient Markov chain Monte Carlo (MCMC) sampling algorithm is developed for parameter estimation. The deviance information, the Bayesian predictive information and the log-predictive score criterion are used to assess the fit of the proposed model. The proposed method is applied to an analysis of the daily stock return data from the Standard & Poor’s 500 index (S&P 500). The empirical results reveal that the stochastic volatility-in-mean model with correlated errors and GH-ST distribution leads to a significant improvement in the goodness-of-fit for the S&P 500 index returns dataset over the usual normal model. PMID:29333210
Transmission Parameters of the 2001 Foot and Mouth Epidemic in Great Britain

PubMed Central

Chis Ster, Irina; Ferguson, Neil M.

2007-01-01

Despite intensive ongoing research, key aspects of the spatial-temporal evolution of the 2001 foot and mouth disease (FMD) epidemic in Great Britain (GB) remain unexplained. Here we develop a Markov Chain Monte Carlo (MCMC) method for estimating epidemiological parameters of the 2001 outbreak for a range of simple transmission models. We make the simplifying assumption that infectious farms were completely observed in 2001, equivalent to assuming that farms that were proactively culled but not diagnosed with FMD were not infectious, even if some were infected. We estimate how transmission parameters varied through time, highlighting the impact of the control measures on the progression of the epidemic. We demonstrate statistically significant evidence for assortative contact patterns between animals of the same species. Predictive risk maps of the transmission potential in different geographic areas of GB are presented for the fitted models. PMID:17551582
Selectivity curves of the capture of mangrove crab (Ucides cordatus) on the northern coast of Brazil using bayesian inference.

PubMed

Furtado-Junior, I; Abrunhosa, F A; Holanda, F C A F; Tavares, M C S

2016-06-01

Fishing selectivity of the mangrove crab Ucides cordatus in the north coast of Brazil can be defined as the fisherman's ability to capture and select individuals from a certain size or sex (or a combination of these factors) which suggests an empirical selectivity. Considering this hypothesis, we calculated the selectivity curves for males and females crabs using the logit function of the logistic model in the formulation. The Bayesian inference consisted of obtaining the posterior distribution by applying the Markov chain Monte Carlo (MCMC) method to software R using the OpenBUGS, BRugs, and R2WinBUGS libraries. The estimated results of width average carapace selection for males and females compared with previous studies reporting the average width of the carapace of sexual maturity allow us to confirm the hypothesis that most mature individuals do not suffer from fishing pressure; thus, ensuring their sustainability.
Bayesian analysis of stochastic volatility-in-mean model with leverage and asymmetrically heavy-tailed error using generalized hyperbolic skew Student's t-distribution.

PubMed

Leão, William L; Abanto-Valle, Carlos A; Chen, Ming-Hui

2017-01-01

A stochastic volatility-in-mean model with correlated errors using the generalized hyperbolic skew Student-t (GHST) distribution provides a robust alternative to the parameter estimation for daily stock returns in the absence of normality. An efficient Markov chain Monte Carlo (MCMC) sampling algorithm is developed for parameter estimation. The deviance information, the Bayesian predictive information and the log-predictive score criterion are used to assess the fit of the proposed model. The proposed method is applied to an analysis of the daily stock return data from the Standard & Poor's 500 index (S&P 500). The empirical results reveal that the stochastic volatility-in-mean model with correlated errors and GH-ST distribution leads to a significant improvement in the goodness-of-fit for the S&P 500 index returns dataset over the usual normal model.
Bayesian analysis of Jolly-Seber type models

USGS Publications Warehouse

Matechou, Eleni; Nicholls, Geoff K.; Morgan, Byron J. T.; Collazo, Jaime A.; Lyons, James E.

2016-01-01

We propose the use of finite mixtures of continuous distributions in modelling the process by which new individuals, that arrive in groups, become part of a wildlife population. We demonstrate this approach using a data set of migrating semipalmated sandpipers (Calidris pussila) for which we extend existing stopover models to allow for individuals to have different behaviour in terms of their stopover duration at the site. We demonstrate the use of reversible jump MCMC methods to derive posterior distributions for the model parameters and the models, simultaneously. The algorithm moves between models with different numbers of arrival groups as well as between models with different numbers of behavioural groups. The approach is shown to provide new ecological insights about the stopover behaviour of semipalmated sandpipers but is generally applicable to any population in which animals arrive in groups and potentially exhibit heterogeneity in terms of one or more other processes.
A Bayesian Alternative for Multi-objective Ecohydrological Model Specification

NASA Astrophysics Data System (ADS)

Tang, Y.; Marshall, L. A.; Sharma, A.; Ajami, H.

2015-12-01

Process-based ecohydrological models combine the study of hydrological, physical, biogeochemical and ecological processes of the catchments, which are usually more complex and parametric than conceptual hydrological models. Thus, appropriate calibration objectives and model uncertainty analysis are essential for ecohydrological modeling. In recent years, Bayesian inference has become one of the most popular tools for quantifying the uncertainties in hydrological modeling with the development of Markov Chain Monte Carlo (MCMC) techniques. Our study aims to develop appropriate prior distributions and likelihood functions that minimize the model uncertainties and bias within a Bayesian ecohydrological framework. In our study, a formal Bayesian approach is implemented in an ecohydrological model which combines a hydrological model (HyMOD) and a dynamic vegetation model (DVM). Simulations focused on one objective likelihood (Streamflow/LAI) and multi-objective likelihoods (Streamflow and LAI) with different weights are compared. Uniform, weakly informative and strongly informative prior distributions are used in different simulations. The Kullback-leibler divergence (KLD) is used to measure the dis(similarity) between different priors and corresponding posterior distributions to examine the parameter sensitivity. Results show that different prior distributions can strongly influence posterior distributions for parameters, especially when the available data is limited or parameters are insensitive to the available data. We demonstrate differences in optimized parameters and uncertainty limits in different cases based on multi-objective likelihoods vs. single objective likelihoods. We also demonstrate the importance of appropriately defining the weights of objectives in multi-objective calibration according to different data types.

Baking a mass-spectrometry data PIE with McMC and simulated annealing: predicting protein post-translational modifications from integrated top-down and bottom-up data.

PubMed

Jefferys, Stuart R; Giddings, Morgan C

2011-03-15

Post-translational modifications are vital to the function of proteins, but are hard to study, especially since several modified isoforms of a protein may be present simultaneously. Mass spectrometers are a great tool for investigating modified proteins, but the data they provide is often incomplete, ambiguous and difficult to interpret. Combining data from multiple experimental techniques-especially bottom-up and top-down mass spectrometry-provides complementary information. When integrated with background knowledge this allows a human expert to interpret what modifications are present and where on a protein they are located. However, the process is arduous and for high-throughput applications needs to be automated. This article explores a data integration methodology based on Markov chain Monte Carlo and simulated annealing. Our software, the Protein Inference Engine (the PIE) applies these algorithms using a modular approach, allowing multiple types of data to be considered simultaneously and for new data types to be added as needed. Even for complicated data representing multiple modifications and several isoforms, the PIE generates accurate modification predictions, including location. When applied to experimental data collected on the L7/L12 ribosomal protein the PIE was able to make predictions consistent with manual interpretation for several different L7/L12 isoforms using a combination of bottom-up data with experimentally identified intact masses. Software, demo projects and source can be downloaded from http://pie.giddingslab.org/
Spatiotemporal dynamics of landscape pattern and hydrologic process in watershed systems

NASA Astrophysics Data System (ADS)

Randhir, Timothy O.; Tsvetkova, Olga

2011-06-01

SummaryLand use change is influenced by spatial and temporal factors that interact with watershed resources. Modeling these changes is critical to evaluate emerging land use patterns and to predict variation in water quantity and quality. The objective of this study is to model the nature and emergence of spatial patterns in land use and water resource impacts using a spatially explicit and dynamic landscape simulation. Temporal changes are predicted using a probabilistic Markovian process and spatial interaction through cellular automation. The MCMC (Monte Carlo Markov Chain) analysis with cellular automation is linked to hydrologic equations to simulate landscape patterns and processes. The spatiotemporal watershed dynamics (SWD) model is applied to a subwatershed in the Blackstone River watershed of Massachusetts to predict potential land use changes and expected runoff and sediment loading. Changes in watershed land use and water resources are evaluated over 100 years at a yearly time step. Results show high potential for rapid urbanization that could result in lowering of groundwater recharge and increased storm water peaks. The watershed faces potential decreases in agricultural and forest area that affect open space and pervious cover of the watershed system. Water quality deteriorated due to increased runoff which can also impact stream morphology. While overland erosion decreased, instream erosion increased from increased runoff from urban areas. Use of urban best management practices (BMPs) in sensitive locations, preventive strategies, and long-term conservation planning will be useful in sustaining the watershed system.
Bayesian Treed Calibration: An Application to Carbon Capture With AX Sorbent

DOE Office of Scientific and Technical Information (OSTI.GOV)

Konomi, Bledar A.; Karagiannis, Georgios; Lai, Kevin

2017-01-02

In cases where field or experimental measurements are not available, computer models can model real physical or engineering systems to reproduce their outcomes. They are usually calibrated in light of experimental data to create a better representation of the real system. Statistical methods, based on Gaussian processes, for calibration and prediction have been especially important when the computer models are expensive and experimental data limited. In this paper, we develop the Bayesian treed calibration (BTC) as an extension of standard Gaussian process calibration methods to deal with non-stationarity computer models and/or their discrepancy from the field (or experimental) data. Ourmore » proposed method partitions both the calibration and observable input space, based on a binary tree partitioning, into sub-regions where existing model calibration methods can be applied to connect a computer model with the real system. The estimation of the parameters in the proposed model is carried out using Markov chain Monte Carlo (MCMC) computational techniques. Different strategies have been applied to improve mixing. We illustrate our method in two artificial examples and a real application that concerns the capture of carbon dioxide with AX amine based sorbents. The source code and the examples analyzed in this paper are available as part of the supplementary materials.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Tony Y.; Wechsler, Risa H.; Devaraj, Kiruthika

Intensity mapping, which images a single spectral line from unresolved galaxies across cosmological volumes, is a promising technique for probing the early universe. Here we present predictions for the intensity map and power spectrum of the CO(1–0) line from galaxies atmore » $$z\\sim 2.4$$–2.8, based on a parameterized model for the galaxy–halo connection, and demonstrate the extent to which properties of high-redshift galaxies can be directly inferred from such observations. We find that our fiducial prediction should be detectable by a realistic experiment. Motivated by significant modeling uncertainties, we demonstrate the effect on the power spectrum of varying each parameter in our model. Using simulated observations, we infer constraints on our model parameter space with an MCMC procedure, and show corresponding constraints on the $${L}_{\\mathrm{IR}}$$–$${L}_{\\mathrm{CO}}$$ relation and the CO luminosity function. These constraints would be complementary to current high-redshift galaxy observations, which can detect the brightest galaxies but not complete samples from the faint end of the luminosity function. Furthermore, by probing these populations in aggregate, CO intensity mapping could be a valuable tool for probing molecular gas and its relation to star formation in high-redshift galaxies.« less
Exploring Hot Exoplanet Atmospheres with JWST/NIRSpec and a Hybrid Version of NEMESIS

NASA Astrophysics Data System (ADS)

Badhan, Mahmuda A.; Mandell, Avi; Batalha, Natasha; Irwin, Patrick GJ; Barstow, Joanna; Garland, Ryan; Deming, Drake; Hesman, Brigette E.; Nixon, Conor A.

2016-01-01

Understanding the formation environments and evolution scenarios of hot Jupiters demands robust measures for constraining their atmospheric physical properties and transit observations at unprecedented resolutions. Here we have utilized a combination of two different approaches, Optimal Estimation (OE) and Markov Chain Monte Carlo (MCMC), as part of the extensively validated NEMESIS atmospheric retrieval code, to infer pressure-temperature (P-T) profiles & gas mixing ratios (VMR) of H2O, CO2, CH4 and CO, from a series of tests conducted on JWST/NIRSpec simulations of the dayside thermal emission spectra (secondary eclipse) of H2-dominated hot-Jupiter candidates. To keep the number of parameters low and henceforth retrieve more plausible profile shapes, we have used a parametrized form of the temperature profile based upon the analytic radiative equilibrium derivation in Guillot et al. 2010. For the purpose of testing and validation, we also show some preliminary work on published dataset from the Hubble Space Telescope (HST) and Spitzer missions. Finally, high-temperature (T> 1000K) spectroscopic line lists are slowly but continually being improved by the atmospheric retrieval community. Since this carries the potential of impacting hot Jupiter atmospheric models quite significantly, we compare results from different databases.
A spatio-temporal analysis of BSE cases born before and after the reinforced feed ban in France.

PubMed

Ducrot, Christian; Abrial, David; Calavas, Didier; Carpenter, Tim

2005-01-01

A spatio-temporal analysis was carried out to see how the risk distribution of bovine spongiform encephalopathy (BSE) in France changed depending on the period of birth. The data concerned the 539 BSE cases born in France after the ban (BAB) of meat and bone meal (MBM) in 1990 and detected between July 1, 2001 and December 31, 2003, when the surveillance of BSE was comprehensive. Seventy-two of these cases were born after the reinforced (second) ban (BASB) in 1996, which involved the removal of BSE-risk materials and cadavers from the processing of MBM. The Ederer-Myers-Mantel (EMM) time and space cluster test was applied, after classifying the cases by trimester and region of birth, BAB or BASB status, and dairy or beef status. Then disease mapping was performed for four successive birth periods, three for the BAB cases (January 1991 through June 1994, July 1994 through June 1995, July 1995 through June 1996), and one for the BASB (July 1996 through October 1998). It was elaborated with the Bayesian graphical modelling methods and based on a Poisson distribution with spatial smoothing. The parameters were estimated by a Markov Chain Monte Carlo (MCMC) simulation method. The main finding was that the areas with the highest risk of BSE changed largely from one birth period to another; from the west, it reached the east of France for birth cohort 1994-1995 and the southwest for birth cohort 1995-1996. The EMM test identified a peak risk in this region both for dairy and beef cattle in the fall 1995. The spatial distribution of the risk for the BASB cases matched the spatial pattern of risk for the preceding BAB birth cohort quite well; this was in favour of a common origin of the infection of the BAB and BASB cases, despite the complementary control measures.
Uncertainty analysis of gross primary production partitioned from net ecosystem exchange measurements

NASA Astrophysics Data System (ADS)

Raj, R.; Hamm, N. A. S.; van der Tol, C.; Stein, A.

2015-08-01

Gross primary production (GPP), separated from flux tower measurements of net ecosystem exchange (NEE) of CO2, is used increasingly to validate process-based simulators and remote sensing-derived estimates of simulated GPP at various time steps. Proper validation should include the uncertainty associated with this separation at different time steps. This can be achieved by using a Bayesian framework. In this study, we estimated the uncertainty in GPP at half hourly time steps. We used a non-rectangular hyperbola (NRH) model to separate GPP from flux tower measurements of NEE at the Speulderbos forest site, The Netherlands. The NRH model included the variables that influence GPP, in particular radiation, and temperature. In addition, the NRH model provided a robust empirical relationship between radiation and GPP by including the degree of curvature of the light response curve. Parameters of the NRH model were fitted to the measured NEE data for every 10-day period during the growing season (April to October) in 2009. Adopting a Bayesian approach, we defined the prior distribution of each NRH parameter. Markov chain Monte Carlo (MCMC) simulation was used to update the prior distribution of each NRH parameter. This allowed us to estimate the uncertainty in the separated GPP at half-hourly time steps. This yielded the posterior distribution of GPP at each half hour and allowed the quantification of uncertainty. The time series of posterior distributions thus obtained allowed us to estimate the uncertainty at daily time steps. We compared the informative with non-informative prior distributions of the NRH parameters. The results showed that both choices of prior produced similar posterior distributions GPP. This will provide relevant and important information for the validation of process-based simulators in the future. Furthermore, the obtained posterior distributions of NEE and the NRH parameters are of interest for a range of applications.
Extracting Lane Geometry and Topology Information from Vehicle Fleet Trajectories in Complex Urban Scenarios Using a Reversible Jump Mcmc Method

NASA Astrophysics Data System (ADS)

Roeth, O.; Zaum, D.; Brenner, C.

2017-05-01

Highly automated driving (HAD) requires maps not only of high spatial precision but also of yet unprecedented actuality. Traditionally small highly specialized fleets of measurement vehicles are used to generate such maps. Nevertheless, for achieving city-wide or even nation-wide coverage, automated map update mechanisms based on very large vehicle fleet data gain importance since highly frequent measurements are only to be obtained using such an approach. Furthermore, the processing of imprecise mass data in contrast to few dedicated highly accurate measurements calls for a high degree of automation. We present a method for the generation of lane-accurate road network maps from vehicle trajectory data (GPS or better). Our approach therefore allows for exploiting today's connected vehicle fleets for the generation of HAD maps. The presented algorithm is based on elementary building blocks which guarantees useful lane models and uses a Reversible Jump Markov chain Monte Carlo method to explore the models parameters in order to reconstruct the one most likely emitting the input data. The approach is applied to a challenging urban real-world scenario of different trajectory accuracy levels and is evaluated against a LIDAR-based ground truth map.
RadVel: General toolkit for modeling Radial Velocities

NASA Astrophysics Data System (ADS)

Fulton, Benjamin J.; Petigura, Erik A.; Blunt, Sarah; Sinukoff, Evan

2018-01-01

RadVel models Keplerian orbits in radial velocity (RV) time series. The code is written in Python with a fast Kepler's equation solver written in C. It provides a framework for fitting RVs using maximum a posteriori optimization and computing robust confidence intervals by sampling the posterior probability density via Markov Chain Monte Carlo (MCMC). RadVel can perform Bayesian model comparison and produces publication quality plots and LaTeX tables.
Systemic Console: Advanced analysis of exoplanetary data

NASA Astrophysics Data System (ADS)

Meschiari, Stefano; Wolf, Aaron S.; Rivera, Eugenio; Laughlin, Gregory; Vogt, Steve; Butler, Paul

2012-10-01

Systemic Console is a tool for advanced analysis of exoplanetary data. It comprises a graphical tool for fitting radial velocity and transits datasets and a library of routines for non-interactive calculations. Among its features are interactive plotting of RV curves and transits, combined fitting of RV and transit timing (primary and secondary), interactive periodograms and FAP estimation, and bootstrap and MCMC error estimation. The console package includes public radial velocity and transit data.
Markov Chain Monte Carlo Inversion of Mantle Temperature and Composition, with Application to Iceland

NASA Astrophysics Data System (ADS)

Brown, Eric; Petersen, Kenni; Lesher, Charles

2017-04-01

Basalts are formed by adiabatic decompression melting of the asthenosphere, and thus provide records of the thermal, chemical and dynamical state of the upper mantle. However, uniquely constraining the importance of these factors through the lens of melting is challenging given the inevitability that primary basalts are the product of variable mixing of melts derived from distinct lithologies having different melting behaviors (e.g. peridotite vs. pyroxenite). Forward mantle melting models, such as REEBOX PRO [1], are useful tools in this regard, because they can account for differences in melting behavior and melt pooling processes, and provide estimates of bulk crust composition and volume that can be compared with geochemical and geophysical constraints, respectively. Nevertheless, these models require critical assumptions regarding mantle temperature, and lithologic abundance(s)/composition(s), all of which are poorly constrained. To provide better constraints on these parameters and their uncertainties, we have coupled a Markov Chain Monte Carlo (MCMC) sampling technique with the REEBOX PRO melting model. The MCMC method systematically samples distributions of key REEBOX PRO input parameters (mantle potential temperature, and initial abundances and compositions of the source lithologies) based on a likelihood function that describes the 'fit' of the model outputs (bulk crust composition and volume and end-member peridotite and pyroxenite melts) relative to geochemical and geophysical constraints and their associated uncertainties. As a case study, we have tested and applied the model to magmatism along Reykjanes Peninsula in Iceland, where pyroxenite has been inferred to be present in the mantle source. This locale is ideal because there exist sufficient geochemical and geophysical data to estimate bulk crust compositions and volumes, as well as the range of near-parental melts derived from the mantle. We find that for the case of passive upwelling, the models that best fit the geochemical and geophysical observables require elevated mantle potential temperatures ( 120 °C above ambient mantle), and 5% pyroxenite. The modeled peridotite source has a trace element composition similar to depleted MORB mantle, whereas the trace element composition of the pyroxenite is similar to enriched mid-ocean ridge basalt. These results highlight the promise of this method for efficiently exploring the range of mantle temperatures, lithologic abundances, and mantle source compositions that are most consistent with available observational constraints in individual volcanic systems. 1 Brown and Lesher (2016), G-cubed, 17, 3929-3968
Rotational evolution of slow-rotator sequence stars

NASA Astrophysics Data System (ADS)

Lanzafame, A. C.; Spada, F.

2015-12-01

Context. The observed relationship between mass, age and rotation in open clusters shows the progressive development of a slow-rotator sequence among stars possessing a radiative interior and a convective envelope during their pre-main sequence and main-sequence evolution. After 0.6 Gyr, most cluster members of this type have settled on this sequence. Aims: The observed clustering on this sequence suggests that it corresponds to some equilibrium or asymptotic condition that still lacks a complete theoretical interpretation, and which is crucial to our understanding of the stellar angular momentum evolution. Methods: We couple a rotational evolution model, which takes internal differential rotation into account, with classical and new proposals for the wind braking law, and fit models to the data using a Monte Carlo Markov chain (MCMC) method tailored to the problem at hand. We explore to what extent these models are able to reproduce the mass and time dependence of the stellar rotational evolution on the slow-rotator sequence. Results: The description of the evolution of the slow-rotator sequence requires taking the transfer of angular momentum from the radiative core to the convective envelope into account. We find that, in the mass range 0.85-1.10 M⊙, the core-envelope coupling timescale for stars in the slow-rotator sequence scales as M-7.28. Quasi-solid body rotation is achieved only after 1-2 Gyr, depending on stellar mass, which implies that observing small deviations from the Skumanich law (P ∝ √{t}) would require period data of older open clusters than is available to date. The observed evolution in the 0.1-2.5 Gyr age range and in the 0.85-1.10 M⊙ mass range is best reproduced by assuming an empirical mass dependence of the wind angular momentum loss proportional to the convective turnover timescale and to the stellar moment of inertia. Period isochrones based on our MCMC fit provide a tool for inferring stellar ages of solar-like main-sequence stars from their mass and rotation period that is largely independent of the wind braking model adopted. These effectively represent gyro-chronology relationships that take the physics of the two-zone model for the stellar angular momentum evolution into account.
Estimating the Attack Ratio of Dengue Epidemics under Time-varying Force of Infection using Aggregated Notification Data

NASA Astrophysics Data System (ADS)

Coelho, Flavio Codeço; Carvalho, Luiz Max De

2015-12-01

Quantifying the attack ratio of disease is key to epidemiological inference and public health planning. For multi-serotype pathogens, however, different levels of serotype-specific immunity make it difficult to assess the population at risk. In this paper we propose a Bayesian method for estimation of the attack ratio of an epidemic and the initial fraction of susceptibles using aggregated incidence data. We derive the probability distribution of the effective reproductive number, Rt, and use MCMC to obtain posterior distributions of the parameters of a single-strain SIR transmission model with time-varying force of infection. Our method is showcased in a data set consisting of 18 years of dengue incidence in the city of Rio de Janeiro, Brazil. We demonstrate that it is possible to learn about the initial fraction of susceptibles and the attack ratio even in the absence of serotype specific data. On the other hand, the information provided by this approach is limited, stressing the need for detailed serological surveys to characterise the distribution of serotype-specific immunity in the population.
Efficient Bayesian experimental design for contaminant source identification

NASA Astrophysics Data System (ADS)

Zhang, Jiangjiang; Zeng, Lingzao; Chen, Cheng; Chen, Dingjiang; Wu, Laosheng

2015-01-01

In this study, an efficient full Bayesian approach is developed for the optimal sampling well location design and source parameters identification of groundwater contaminants. An information measure, i.e., the relative entropy, is employed to quantify the information gain from concentration measurements in identifying unknown parameters. In this approach, the sampling locations that give the maximum expected relative entropy are selected as the optimal design. After the sampling locations are determined, a Bayesian approach based on Markov Chain Monte Carlo (MCMC) is used to estimate unknown parameters. In both the design and estimation, the contaminant transport equation is required to be solved many times to evaluate the likelihood. To reduce the computational burden, an interpolation method based on the adaptive sparse grid is utilized to construct a surrogate for the contaminant transport equation. The approximated likelihood can be evaluated directly from the surrogate, which greatly accelerates the design and estimation process. The accuracy and efficiency of our approach are demonstrated through numerical case studies. It is shown that the methods can be used to assist in both single sampling location and monitoring network design for contaminant source identifications in groundwater.
Uncertainty quantification for environmental models

USGS Publications Warehouse

Hill, Mary C.; Lu, Dan; Kavetski, Dmitri; Clark, Martyn P.; Ye, Ming

2012-01-01

Environmental models are used to evaluate the fate of fertilizers in agricultural settings (including soil denitrification), the degradation of hydrocarbons at spill sites, and water supply for people and ecosystems in small to large basins and cities—to mention but a few applications of these models. They also play a role in understanding and diagnosing potential environmental impacts of global climate change. The models are typically mildly to extremely nonlinear. The persistent demand for enhanced dynamics and resolution to improve model realism [17] means that lengthy individual model execution times will remain common, notwithstanding continued enhancements in computer power. In addition, high-dimensional parameter spaces are often defined, which increases the number of model runs required to quantify uncertainty [2]. Some environmental modeling projects have access to extensive funding and computational resources; many do not. The many recent studies of uncertainty quantification in environmental model predictions have focused on uncertainties related to data error and sparsity of data, expert judgment expressed mathematically through prior information, poorly known parameter values, and model structure (see, for example, [1,7,9,10,13,18]). Approaches for quantifying uncertainty include frequentist (potentially with prior information [7,9]), Bayesian [13,18,19], and likelihood-based. A few of the numerous methods, including some sensitivity and inverse methods with consequences for understanding and quantifying uncertainty, are as follows: Bayesian hierarchical modeling and Bayesian model averaging; single-objective optimization with error-based weighting [7] and multi-objective optimization [3]; methods based on local derivatives [2,7,10]; screening methods like OAT (one at a time) and the method of Morris [14]; FAST (Fourier amplitude sensitivity testing) [14]; the Sobol' method [14]; randomized maximum likelihood [10]; Markov chain Monte Carlo (MCMC) [10]. There are also bootstrapping and cross-validation approaches.Sometimes analyses are conducted using surrogate models [12]. The availability of so many options can be confusing. Categorizing methods based on fundamental questions assists in communicating the essential results of uncertainty analyses to stakeholders. Such questions can focus on model adequacy (e.g., How well does the model reproduce observed system characteristics and dynamics?) and sensitivity analysis (e.g., What parameters can be estimated with available data? What observations are important to parameters and predictions? What parameters are important to predictions?), as well as on the uncertainty quantification (e.g., How accurate and precise are the predictions?). The methods can also be classified by the number of model runs required: few (10s to 1000s) or many (10,000s to 1,000,000s). Of the methods listed above, the most computationally frugal are generally those based on local derivatives; MCMC methods tend to be among the most computationally demanding. Surrogate models (emulators)do not necessarily produce computational frugality because many runs of the full model are generally needed to create a meaningful surrogate model. With this categorization, we can, in general, address all the fundamental questions mentioned above using either computationally frugal or demanding methods. Model development and analysis can thus be conducted consistently using either computation-ally frugal or demanding methods; alternatively, different fundamental questions can be addressed using methods that require different levels of effort. Based on this perspective, we pose the question: Can computationally frugal methods be useful companions to computationally demanding meth-ods? The reliability of computationally frugal methods generally depends on the model being reasonably linear, which usually means smooth nonlin-earities and the assumption of Gaussian errors; both tend to be more valid with more linear
Medical imaging feasibility in body fluids using Markov chains

NASA Astrophysics Data System (ADS)

Kavehrad, M.; Armstrong, A. D.

2017-02-01

A relatively wide field-of-view and high resolution imaging is necessary for navigating the scope within the body, inspecting tissue, diagnosing disease, and guiding surgical interventions. As the large number of modes available in the multimode fibers (MMF) provides higher resolution, MMFs could replace the millimeters-thick bundles of fibers and lenses currently used in endoscopes. However, attributes of body fluids and obscurants such as blood, impose perennial limitations on resolution and reliability of optical imaging inside human body. To design and evaluate optimum imaging techniques that operate under realistic body fluids conditions, a good understanding of the channel (medium) behavior is necessary. In most prior works, Monte-Carlo Ray Tracing (MCRT) algorithm has been used to analyze the channel behavior. This task is quite numerically intensive. The focus of this paper is on investigating the possibility of simplifying this task by a direct extraction of state transition matrices associated with standard Markov modeling from the MCRT computer simulations programs. We show that by tracing a photon's trajectory in the body fluids via a Markov chain model, the angular distribution can be calculated by simple matrix multiplications. We also demonstrate that the new approach produces result that are close to those obtained by MCRT and other known methods. Furthermore, considering the fact that angular, spatial, and temporal distributions of energy are inter-related, mixing time of Monte- Carlo Markov Chain (MCMC) for different types of liquid concentrations is calculated based on Eigen-analysis of the state transition matrix and possibility of imaging in scattering media are investigated. To this end, we have started to characterize the body fluids that reduce the resolution of imaging [1].
Inferring the Spatio-temporal Patterns of Dengue Transmission from Surveillance Data in Guangzhou, China

PubMed Central

Liu, Jiming; Tan, Qi; Shi, Benyun

2016-01-01

Background Dengue is a serious vector-borne disease, and incidence rates have significantly increased during the past few years, particularly in 2014 in Guangzhou. The current situation is more complicated, due to various factors such as climate warming, urbanization, population increase, and human mobility. The purpose of this study is to detect dengue transmission patterns and identify the disease dispersion dynamics in Guangzhou, China. Methodology We conducted surveys in 12 districts of Guangzhou, and collected daily data of Breteau index (BI) and reported cases between September and November 2014 from the public health authority reports. Based on the available data and the Ross-Macdonald theory, we propose a dengue transmission model that systematically integrates entomologic, demographic, and environmental information. In this model, we use (1) BI data and geographic variables to evaluate the spatial heterogeneities of Aedes mosquitoes, (2) a radiation model to simulate the daily mobility of humans, and (3) a Markov chain Monte Carlo (MCMC) method to estimate the model parameters. Results/Conclusions By implementing our proposed model, we can (1) estimate the incidence rates of dengue, and trace the infection time and locations, (2) assess risk factors and evaluate the infection threat in a city, and (3) evaluate the primary diffusion process in different districts. From the results, we can see that dengue infections exhibited a spatial and temporal variation during 2014 in Guangzhou. We find that urbanization, vector activities, and human behavior play significant roles in shaping the dengue outbreak and the patterns of its spread. This study offers useful information on dengue dynamics, which can help policy makers improve control and prevention measures. PMID:27105350
OCO-2 Column Carbon Dioxide and Biometric Data Jointly Constrain Parameterization and Projection of a Global Land Model

NASA Astrophysics Data System (ADS)

Shi, Z.; Crowell, S.; Luo, Y.; Rayner, P. J.; Moore, B., III

2015-12-01

Uncertainty in predicted carbon-climate feedback largely stems from poor parameterization of global land models. However, calibration of global land models with observations has been extremely challenging at least for two reasons. First we lack global data products from systematical measurements of land surface processes. Second, computational demand is insurmountable for estimation of model parameter due to complexity of global land models. In this project, we will use OCO-2 retrievals of dry air mole fraction XCO2 and solar induced fluorescence (SIF) to independently constrain estimation of net ecosystem exchange (NEE) and gross primary production (GPP). The constrained NEE and GPP will be combined with data products of global standing biomass, soil organic carbon and soil respiration to improve the community land model version 4.5 (CLM4.5). Specifically, we will first develop a high fidelity emulator of CLM4.5 according to the matrix representation of the terrestrial carbon cycle. It has been shown that the emulator fully represents the original model and can be effectively used for data assimilation to constrain parameter estimation. We will focus on calibrating those key model parameters (e.g., maximum carboxylation rate, turnover time and transfer coefficients of soil carbon pools, and temperature sensitivity of respiration) for carbon cycle. The Bayesian Markov chain Monte Carlo method (MCMC) will be used to assimilate the global databases into the high fidelity emulator to constrain the model parameters, which will be incorporated back to the original CLM4.5. The calibrated CLM4.5 will be used to make scenario-based projections. In addition, we will conduct observing system simulation experiments (OSSEs) to evaluate how the sampling frequency and length could affect the model constraining and prediction.
Mapping malaria incidence distribution that accounts for environmental factors in Maputo Province - Mozambique

PubMed Central

2010-01-01

Background The objective was to study if an association exists between the incidence of malaria and some weather parameters in tropical Maputo province, Mozambique. Methods A Bayesian hierarchical model to malaria count data aggregated at district level over a two years period is formulated. This model made it possible to account for spatial area variations. The model was extended to include environmental covariates temperature and rainfall. Study period was then divided into two climate conditions: rainy and dry seasons. The incidences of malaria between the two seasons were compared. Parameter estimation and inference were carried out using MCMC simulation techniques based on Poisson variation. Model comparisons are made using DIC. Results For winter season, in 2001 the temperature covariate with estimated value of -8.88 shows no association to malaria incidence. In year 2002, the parameter estimation of the same covariate resulted in 5.498 of positive level of association. In both years rainfall covariate determines no dependency to malaria incidence. Malaria transmission is higher in wet season with both covariates positively related to malaria with posterior means 1.99 and 2.83 in year 2001. For 2002 only temperature is associated to malaria incidence with estimated value 2.23. Conclusions The incidence of malaria in year 2001, presents an independent spatial pattern for temperature in summer and for rainfall in winter seasons respectively. In year 2002 temperature determines the spatial pattern of malaria incidence in the region. Temperature influences the model in cases where both covariates are introduced in winter and summer season. Its influence is extended to the summer model with temperature covariate only. It is reasonable to state that with the occurrence of high temperatures, malaria incidence had certainly escalated in this year. PMID:20302674
Nonstationary Extreme Value Analysis in a Changing Climate: A Software Package

NASA Astrophysics Data System (ADS)

Cheng, L.; AghaKouchak, A.; Gilleland, E.

2013-12-01

Numerous studies show that climatic extremes have increased substantially in the second half of the 20th century. For this reason, analysis of extremes under a nonstationary assumption has received a great deal of attention. This paper presents a software package developed for estimation of return levels, return periods, and risks of climatic extremes in a changing climate. This MATLAB software package offers tools for analysis of climate extremes under both stationary and non-stationary assumptions. The Nonstationary Extreme Value Analysis (hereafter, NEVA) provides an efficient and generalized framework for analyzing extremes using Bayesian inference. NEVA estimates the extreme value parameters using a Differential Evolution Markov Chain (DE-MC) which utilizes the genetic algorithm Differential Evolution (DE) for global optimization over the real parameter space with the Markov Chain Monte Carlo (MCMC) approach and has the advantage of simplicity, speed of calculation and convergence over conventional MCMC. NEVA also offers the confidence interval and uncertainty bounds of estimated return levels based on the sampled parameters. NEVA integrates extreme value design concepts, data analysis tools, optimization and visualization, explicitly designed to facilitate analysis extremes in geosciences. The generalized input and output files of this software package make it attractive for users from across different fields. Both stationary and nonstationary components of the package are validated for a number of case studies using empirical return levels. The results show that NEVA reliably describes extremes and their return levels.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stroeer, Alexander; Veitch, John

The Laser Interferometer Space Antenna (LISA) defines new demands on data analysis efforts in its all-sky gravitational wave survey, recording simultaneously thousands of galactic compact object binary foreground sources and tens to hundreds of background sources like binary black hole mergers and extreme-mass ratio inspirals. We approach this problem with an adaptive and fully automatic Reversible Jump Markov Chain Monte Carlo sampler, able to sample from the joint posterior density function (as established by Bayes theorem) for a given mixture of signals ''out of the box'', handling the total number of signals as an additional unknown parameter beside the unknownmore » parameters of each individual source and the noise floor. We show in examples from the LISA Mock Data Challenge implementing the full response of LISA in its TDI description that this sampler is able to extract monochromatic Double White Dwarf signals out of colored instrumental noise and additional foreground and background noise successfully in a global fitting approach. We introduce 2 examples with fixed number of signals (MCMC sampling), and 1 example with unknown number of signals (RJ-MCMC), the latter further promoting the idea behind an experimental adaptation of the model indicator proposal densities in the main sampling stage. We note that the experienced runtimes and degeneracies in parameter extraction limit the shown examples to the extraction of a low but realistic number of signals.« less
Parameter Identification and Uncertainty Analysis for Visual MODFLOW based Groundwater Flow Model in a Small River Basin, Eastern India

NASA Astrophysics Data System (ADS)

Jena, S.

2015-12-01

The overexploitation of groundwater resulted in abandoning many shallow tube wells in the river Basin in Eastern India. For the sustainability of groundwater resources, basin-scale modelling of groundwater flow is essential for the efficient planning and management of the water resources. The main intent of this study is to develope a 3-D groundwater flow model of the study basin using the Visual MODFLOW package and successfully calibrate and validate it using 17 years of observed data. The sensitivity analysis was carried out to quantify the susceptibility of aquifer system to the river bank seepage, recharge from rainfall and agriculture practices, horizontal and vertical hydraulic conductivities, and specific yield. To quantify the impact of parameter uncertainties, Sequential Uncertainty Fitting Algorithm (SUFI-2) and Markov chain Monte Carlo (MCMC) techniques were implemented. Results from the two techniques were compared and the advantages and disadvantages were analysed. Nash-Sutcliffe coefficient (NSE) and coefficient of determination (R2) were adopted as two criteria during calibration and validation of the developed model. NSE and R2 values of groundwater flow model for calibration and validation periods were in acceptable range. Also, the MCMC technique was able to provide more reasonable results than SUFI-2. The calibrated and validated model will be useful to identify the aquifer properties, analyse the groundwater flow dynamics and the change in groundwater levels in future forecasts.
Bayesian inversions of a dynamic vegetation model in four European grassland sites

NASA Astrophysics Data System (ADS)

Minet, J.; Laloy, E.; Tychon, B.; François, L.

2015-01-01

Eddy covariance data from four European grassland sites are used to probabilistically invert the CARAIB dynamic vegetation model (DVM) with ten unknown parameters, using the DREAM(ZS) Markov chain Monte Carlo (MCMC) sampler. We compare model inversions considering both homoscedastic and heteroscedastic eddy covariance residual errors, with variances either fixed a~priori or jointly inferred with the model parameters. Agreements between measured and simulated data during calibration are comparable with previous studies, with root-mean-square error (RMSE) of simulated daily gross primary productivity (GPP), ecosystem respiration (RECO) and evapotranspiration (ET) ranging from 1.73 to 2.19 g C m-2 day-1, 1.04 to 1.56 g C m-2 day-1, and 0.50 to 1.28 mm day-1, respectively. In validation, mismatches between measured and simulated data are larger, but still with Nash-Sutcliffe efficiency scores above 0.5 for three out of the four sites. Although measurement errors associated with eddy covariance data are known to be heteroscedastic, we showed that assuming a classical linear heteroscedastic model of the residual errors in the inversion do not fully remove heteroscedasticity. Since the employed heteroscedastic error model allows for larger deviations between simulated and measured data as the magnitude of the measured data increases, this error model expectedly lead to poorer data fitting compared to inversions considering a constant variance of the residual errors. Furthermore, sampling the residual error variances along with model parameters results in overall similar model parameter posterior distributions as those obtained by fixing these variances beforehand, while slightly improving model performance. Despite the fact that the calibrated model is generally capable of fitting the data within measurement errors, systematic bias in the model simulations are observed. These are likely due to model inadequacies such as shortcomings in the photosynthesis modelling. Besides model behaviour, difference between model parameter posterior distributions among the four grassland sites are also investigated. It is shown that the marginal distributions of the specific leaf area and characteristic mortality time parameters can be explained by site-specific ecophysiological characteristics. Lastly, the possibility of finding a common set of parameters among the four experimental sites is discussed.
Frequency assessment of spatially distributed generations of flood scenarios: an application on Italian territory

NASA Astrophysics Data System (ADS)

Lomazzi, M.; Roth, G.; Rudari, R.; Taramasso, A. C.; Ghizzoni, T.; Benedetti, R.; Espa, G.; Terpessi, C.

2009-12-01

The flooding risk impact on society cannot be understated: it influences land use and territorial planning and development at both physical and regulatory levels. To cope with it, a variety of actions can be put in place, involving multidisciplinary competences. Mitigation measures goes from the improvement of monitoring systems to the development of hydraulic structures, throughout land use restrictions, civil protection and insurance plans. All of those options present social and economic impacts, either positive or negative, whose proper estimate should rely on the assumption of appropriate - present and future - scenarios, i.e. quantitative event descriptions in terms of i) the flood hazard, with its probability of occurrence, extension, intensity, and duration, ii) the exposed values and iii) their vulnerability. At present, initial attention has been devoted to the design of flood scenarios, or ensembles of them, and to the evaluation of their frequency of occurrence. In the present work, a model for spatially distributed flood scenarios generation and frequency assessment is proposed and applied to the Italian territory. The study area has been divided into homogeneous regions according to their hydrologic, orographic and meteoclimatic characteristics. A statistical model for flood scenarios simulation has been implemented throughout a conditional approach based on MCMC simulations by using i) a historical flood events catalogue; ii) a homogeneous regions correlation matrix; and iii) an auxiliary variables data set. In this framework, the role of the information stored in the historical flood events catalogue "Aree Vulnerate Italiane" (AVI, http://avi.gndci.cnr.it/), produced by the Italian National Research Council, is of crucial importance.
Robust Bayesian Analysis of Heavy-tailed Stochastic Volatility Models using Scale Mixtures of Normal Distributions

PubMed Central

Abanto-Valle, C. A.; Bandyopadhyay, D.; Lachos, V. H.; Enriquez, I.

2009-01-01

A Bayesian analysis of stochastic volatility (SV) models using the class of symmetric scale mixtures of normal (SMN) distributions is considered. In the face of non-normality, this provides an appealing robust alternative to the routine use of the normal distribution. Specific distributions examined include the normal, student-t, slash and the variance gamma distributions. Using a Bayesian paradigm, an efficient Markov chain Monte Carlo (MCMC) algorithm is introduced for parameter estimation. Moreover, the mixing parameters obtained as a by-product of the scale mixture representation can be used to identify outliers. The methods developed are applied to analyze daily stock returns data on S&P500 index. Bayesian model selection criteria as well as out-of- sample forecasting results reveal that the SV models based on heavy-tailed SMN distributions provide significant improvement in model fit as well as prediction to the S&P500 index data over the usual normal model. PMID:20730043
Observational constraints on holographic dark energy with varying gravitational constant

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lu, Jianbo; Xu, Lixin; Saridakis, Emmanuel N.

2010-03-01

We use observational data from Type Ia Supernovae (SN), Baryon Acoustic Oscillations (BAO), Cosmic Microwave Background (CMB) and observational Hubble data (OHD), and the Markov Chain Monte Carlo (MCMC) method, to constrain the cosmological scenario of holographic dark energy with varying gravitational constant. We consider both flat and non-flat background geometry, and we present the corresponding constraints and contour-plots of the model parameters. We conclude that the scenario is compatible with observations. In 1σ we find Ω{sub Λ0} = 0.72{sup +0.03}{sub −0.03}, Ω{sub k0} = −0.0013{sup +0.0130}{sub −0.0040}, c = 0.80{sup +0.19}{sub −0.14} and Δ{sub G}≡G'/G = −0.0025{sup +0.0080}{sub −0.0050},more » while for the present value of the dark energy equation-of-state parameter we obtain w{sub 0} = −1.04{sup +0.15}{sub −0.20}.« less
Bayesian pedigree inference with small numbers of single nucleotide polymorphisms via a factor-graph representation.

PubMed

Anderson, Eric C; Ng, Thomas C

2016-02-01

We develop a computational framework for addressing pedigree inference problems using small numbers (80-400) of single nucleotide polymorphisms (SNPs). Our approach relaxes the assumptions, which are commonly made, that sampling is complete with respect to the pedigree and that there is no genotyping error. It relies on representing the inferred pedigree as a factor graph and invoking the Sum-Product algorithm to compute and store quantities that allow the joint probability of the data to be rapidly computed under a large class of rearrangements of the pedigree structure. This allows efficient MCMC sampling over the space of pedigrees, and, hence, Bayesian inference of pedigree structure. In this paper we restrict ourselves to inference of pedigrees without loops using SNPs assumed to be unlinked. We present the methodology in general for multigenerational inference, and we illustrate the method by applying it to the inference of full sibling groups in a large sample (n=1157) of Chinook salmon typed at 95 SNPs. The results show that our method provides a better point estimate and estimate of uncertainty than the currently best-available maximum-likelihood sibling reconstruction method. Extensions of this work to more complex scenarios are briefly discussed. Published by Elsevier Inc.
Assessment of Agricultural Water Management in Punjab, India using Bayesian Methods

NASA Astrophysics Data System (ADS)

Russo, T. A.; Devineni, N.; Lall, U.; Sidhu, R.

2013-12-01

The success of the Green Revolution in Punjab, India is threatened by the declining water table (approx. 1 m/yr). Punjab, a major agricultural supplier for the rest of India, supports irrigation with a canal system and groundwater, which is vastly over-exploited. Groundwater development in many districts is greater than 200% the annual recharge rate. The hydrologic data required to complete a mass-balance model are not available for this region, therefore we use Bayesian methods to estimate hydrologic properties and irrigation requirements. Using the known values of precipitation, total canal water delivery, crop yield, and water table elevation, we solve for each unknown parameter (often a coefficient) using a Markov chain Monte Carlo (MCMC) algorithm. Results provide regional estimates of irrigation requirements and groundwater recharge rates under observed climate conditions (1972 to 2002). Model results are used to estimate future water availability and demand to help inform agriculture management decisions under projected climate conditions. We find that changing cropping patterns for the region can maintain food production while balancing groundwater pumping with natural recharge. This computational method can be applied in data-scarce regions across the world, where agricultural water management is required to resolve competition between food security and changing resource availability.
Receptive Field Inference with Localized Priors

PubMed Central

Park, Mijung; Pillow, Jonathan W.

2011-01-01

The linear receptive field describes a mapping from sensory stimuli to a one-dimensional variable governing a neuron's spike response. However, traditional receptive field estimators such as the spike-triggered average converge slowly and often require large amounts of data. Bayesian methods seek to overcome this problem by biasing estimates towards solutions that are more likely a priori, typically those with small, smooth, or sparse coefficients. Here we introduce a novel Bayesian receptive field estimator designed to incorporate locality, a powerful form of prior information about receptive field structure. The key to our approach is a hierarchical receptive field model that flexibly adapts to localized structure in both spacetime and spatiotemporal frequency, using an inference method known as empirical Bayes. We refer to our method as automatic locality determination (ALD), and show that it can accurately recover various types of smooth, sparse, and localized receptive fields. We apply ALD to neural data from retinal ganglion cells and V1 simple cells, and find it achieves error rates several times lower than standard estimators. Thus, estimates of comparable accuracy can be achieved with substantially less data. Finally, we introduce a computationally efficient Markov Chain Monte Carlo (MCMC) algorithm for fully Bayesian inference under the ALD prior, yielding accurate Bayesian confidence intervals for small or noisy datasets. PMID:22046110
The Search for Hydrologic Signatures: The Effect of Data Transformations on Bayesian Model Calibration

NASA Astrophysics Data System (ADS)

Sadegh, M.; Vrugt, J. A.

2011-12-01

In the past few years, several contributions have begun to appear in the hydrologic literature that introduced and analyzed the benefits of using a signature based approach to watershed analysis. This signature-based approach abandons the standard single criteria model-data fitting paradigm in favor of a diagnostic approach that better extracts the available information from the available data. Despite the prospects of this new viewpoint, rather ad-hoc criteria have hitherto been proposed to improve watershed modeling. Here, we aim to provide a proper mathematical foundation to signature based analysis. We analyze the information content of different data transformation by analyzing their convergence speed with Markov Chain Monte Carlo (MCMC) simulation using the Generalized Likelihood function of Schousp and Vrugt (2010). We compare the information content of the original discharge data against a simple square root and Box-Cox transformation of the streamflow data. We benchmark these results against wavelet and flow duration curve transformations that temporally disaggregate the discharge data. Our results conclusive demonstrate that wavelet transformations and flow duration curves significantly reduce the information content of the streamflow data and consequently unnecessarily increase the uncertainty of the HYMOD model parameters. Hydrologic signatures thus need to be found in the original data, without temporal disaggregation.
Modelling HIV/AIDS Epidemic among Men Who Have Sex with Men in China

PubMed Central

Sun, Xiaodan; Xiao, Yanni; Peng, Zhihang; Wang, Ning

2013-01-01

A compartmental model with antiviral therapy was proposed to identify the important factors that influence HIV infection among gay men in China and suggest some effective control strategies. We proved that the disease will be eradicated if the reproduction number is less than one. Based on the number of annual reported HIV/AIDS among MSM we used the Markov-Chain Monte-Carlo (MCMC) simulation to estimate the unknown parameters. We estimated a mean reproduction number of 3.88 (95% CI: 3.69–4.07). The estimation results showed that there were a higher transmission rate and a lower diagnose rate among MSM than those for another high-risk population. We compared the current treatment policy and immediate therapy once people are diagnosed with HIV, and numerical studies indicated that immediate antiviral therapy would lead to few HIV new infections conditional upon relatively low infectiousness; otherwise the current treatment policy would result in low HIV new infection. Further, increasing treatment coverage rate may lead to decline in HIV new infections and be beneficial to disease control, depending on the infectiousness of the infected individuals with antiviral therapy. The finding suggested that treatment efficacy (directly affecting infectiousness), behavior changes, and interventions greatly affect HIV new infection; strengthening intensity will contribute to the disease control. PMID:24195071
Analysis of Blood Transfusion Data Using Bivariate Zero-Inflated Poisson Model: A Bayesian Approach.

PubMed

Mohammadi, Tayeb; Kheiri, Soleiman; Sedehi, Morteza

2016-01-01

Recognizing the factors affecting the number of blood donation and blood deferral has a major impact on blood transfusion. There is a positive correlation between the variables "number of blood donation" and "number of blood deferral": as the number of return for donation increases, so does the number of blood deferral. On the other hand, due to the fact that many donors never return to donate, there is an extra zero frequency for both of the above-mentioned variables. In this study, in order to apply the correlation and to explain the frequency of the excessive zero, the bivariate zero-inflated Poisson regression model was used for joint modeling of the number of blood donation and number of blood deferral. The data was analyzed using the Bayesian approach applying noninformative priors at the presence and absence of covariates. Estimating the parameters of the model, that is, correlation, zero-inflation parameter, and regression coefficients, was done through MCMC simulation. Eventually double-Poisson model, bivariate Poisson model, and bivariate zero-inflated Poisson model were fitted on the data and were compared using the deviance information criteria (DIC). The results showed that the bivariate zero-inflated Poisson regression model fitted the data better than the other models.
Analysis of Blood Transfusion Data Using Bivariate Zero-Inflated Poisson Model: A Bayesian Approach

PubMed Central

Mohammadi, Tayeb; Sedehi, Morteza

2016-01-01

Recognizing the factors affecting the number of blood donation and blood deferral has a major impact on blood transfusion. There is a positive correlation between the variables “number of blood donation” and “number of blood deferral”: as the number of return for donation increases, so does the number of blood deferral. On the other hand, due to the fact that many donors never return to donate, there is an extra zero frequency for both of the above-mentioned variables. In this study, in order to apply the correlation and to explain the frequency of the excessive zero, the bivariate zero-inflated Poisson regression model was used for joint modeling of the number of blood donation and number of blood deferral. The data was analyzed using the Bayesian approach applying noninformative priors at the presence and absence of covariates. Estimating the parameters of the model, that is, correlation, zero-inflation parameter, and regression coefficients, was done through MCMC simulation. Eventually double-Poisson model, bivariate Poisson model, and bivariate zero-inflated Poisson model were fitted on the data and were compared using the deviance information criteria (DIC). The results showed that the bivariate zero-inflated Poisson regression model fitted the data better than the other models. PMID:27703493
Estimating thermal performance curves from repeated field observations

USGS Publications Warehouse

Childress, Evan; Letcher, Benjamin H.

2017-01-01

Estimating thermal performance of organisms is critical for understanding population distributions and dynamics and predicting responses to climate change. Typically, performance curves are estimated using laboratory studies to isolate temperature effects, but other abiotic and biotic factors influence temperature-performance relationships in nature reducing these models' predictive ability. We present a model for estimating thermal performance curves from repeated field observations that includes environmental and individual variation. We fit the model in a Bayesian framework using MCMC sampling, which allowed for estimation of unobserved latent growth while propagating uncertainty. Fitting the model to simulated data varying in sampling design and parameter values demonstrated that the parameter estimates were accurate, precise, and unbiased. Fitting the model to individual growth data from wild trout revealed high out-of-sample predictive ability relative to laboratory-derived models, which produced more biased predictions for field performance. The field-based estimates of thermal maxima were lower than those based on laboratory studies. Under warming temperature scenarios, field-derived performance models predicted stronger declines in body size than laboratory-derived models, suggesting that laboratory-based models may underestimate climate change effects. The presented model estimates true, realized field performance, avoiding assumptions required for applying laboratory-based models to field performance, which should improve estimates of performance under climate change and advance thermal ecology.
Connecting CO intensity mapping to molecular gas and star formation in the epoch of galaxy assembly

DOE PAGES

Li, Tony Y.; Wechsler, Risa H.; Devaraj, Kiruthika; ...

2016-01-29

Intensity mapping, which images a single spectral line from unresolved galaxies across cosmological volumes, is a promising technique for probing the early universe. Here we present predictions for the intensity map and power spectrum of the CO(1–0) line from galaxies atmore » $$z\\sim 2.4$$–2.8, based on a parameterized model for the galaxy–halo connection, and demonstrate the extent to which properties of high-redshift galaxies can be directly inferred from such observations. We find that our fiducial prediction should be detectable by a realistic experiment. Motivated by significant modeling uncertainties, we demonstrate the effect on the power spectrum of varying each parameter in our model. Using simulated observations, we infer constraints on our model parameter space with an MCMC procedure, and show corresponding constraints on the $${L}_{\\mathrm{IR}}$$–$${L}_{\\mathrm{CO}}$$ relation and the CO luminosity function. These constraints would be complementary to current high-redshift galaxy observations, which can detect the brightest galaxies but not complete samples from the faint end of the luminosity function. Furthermore, by probing these populations in aggregate, CO intensity mapping could be a valuable tool for probing molecular gas and its relation to star formation in high-redshift galaxies.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Zheng, Xiaogang; Biesiada, Marek; Cao, Shuo

A new compilation of 012 angular-size/redshift data for compact radio quasars from very-long-baseline interferometry (VLBI) surveys motivates us to revisit the interaction between dark energy and dark matter with these probes reaching high redshifts z ∼ 3.0. In this paper, we investigate observational constraints on different phenomenological interacting dark energy (IDE) models with the intermediate-luminosity radio quasars acting as individual standard rulers, combined with the newest BAO and CMB observation from Planck results acting as statistical rulers. The results obtained from the MCMC method and other statistical methods including figure of Merit and Information Criteria show that: (1) Compared withmore » the current standard candle data and standard clock data, the intermediate-luminosity radio quasar standard rulers , probing much higher redshifts, could provide comparable constraints on different IDE scenarios. (2) The strong degeneracies between the interaction term and Hubble constant may contribute to alleviate the tension of H {sub 0} between the recent Planck and HST measurements. (3) Concerning the ranking of competing dark energy models, IDE with more free parameters are substantially penalized by the BIC criterion, which agrees very well with the previous results derived from other cosmological probes.« less
Appraisal of jump distributions in ensemble-based sampling algorithms

NASA Astrophysics Data System (ADS)

Dejanic, Sanda; Scheidegger, Andreas; Rieckermann, Jörg; Albert, Carlo

2017-04-01

Sampling Bayesian posteriors of model parameters is often required for making model-based probabilistic predictions. For complex environmental models, standard Monte Carlo Markov Chain (MCMC) methods are often infeasible because they require too many sequential model runs. Therefore, we focused on ensemble methods that use many Markov chains in parallel, since they can be run on modern cluster architectures. Little is known about how to choose the best performing sampler, for a given application. A poor choice can lead to an inappropriate representation of posterior knowledge. We assessed two different jump moves, the stretch and the differential evolution move, underlying, respectively, the software packages EMCEE and DREAM, which are popular in different scientific communities. For the assessment, we used analytical posteriors with features as they often occur in real posteriors, namely high dimensionality, strong non-linear correlations or multimodality. For posteriors with non-linear features, standard convergence diagnostics based on sample means can be insufficient. Therefore, we resorted to an entropy-based convergence measure. We assessed the samplers by means of their convergence speed, robustness and effective sample sizes. For posteriors with strongly non-linear features, we found that the stretch move outperforms the differential evolution move, w.r.t. all three aspects.
Strong Ligand-Protein Interactions Derived from Diffuse Ligand Interactions with Loose Binding Sites.

PubMed

Marsh, Lorraine

2015-01-01

Many systems in biology rely on binding of ligands to target proteins in a single high-affinity conformation with a favorable ΔG. Alternatively, interactions of ligands with protein regions that allow diffuse binding, distributed over multiple sites and conformations, can exhibit favorable ΔG because of their higher entropy. Diffuse binding may be biologically important for multidrug transporters and carrier proteins. A fine-grained computational method for numerical integration of total binding ΔG arising from diffuse regional interaction of a ligand in multiple conformations using a Markov Chain Monte Carlo (MCMC) approach is presented. This method yields a metric that quantifies the influence on overall ligand affinity of ligand binding to multiple, distinct sites within a protein binding region. This metric is essentially a measure of dispersion in equilibrium ligand binding and depends on both the number of potential sites of interaction and the distribution of their individual predicted affinities. Analysis of test cases indicates that, for some ligand/protein pairs involving transporters and carrier proteins, diffuse binding contributes greatly to total affinity, whereas in other cases the influence is modest. This approach may be useful for studying situations where "nonspecific" interactions contribute to biological function.
Assessment of myocardial metabolic rate of glucose by means of Bayesian ICA and Markov Chain Monte Carlo methods in small animal PET imaging

NASA Astrophysics Data System (ADS)

Berradja, Khadidja; Boughanmi, Nabil

2016-09-01

In dynamic cardiac PET FDG studies the assessment of myocardial metabolic rate of glucose (MMRG) requires the knowledge of the blood input function (IF). IF can be obtained by manual or automatic blood sampling and cross calibrated with PET. These procedures are cumbersome, invasive and generate uncertainties. The IF is contaminated by spillover of radioactivity from the adjacent myocardium and this could cause important error in the estimated MMRG. In this study, we show that the IF can be extracted from the images in a rat heart study with 18F-fluorodeoxyglucose (18F-FDG) by means of Independent Component Analysis (ICA) based on Bayesian theory and Markov Chain Monte Carlo (MCMC) sampling method (BICA). Images of the heart from rats were acquired with the Sherbrooke small animal PET scanner. A region of interest (ROI) was drawn around the rat image and decomposed into blood and tissue using BICA. The Statistical study showed that there is a significant difference (p < 0.05) between MMRG obtained with IF extracted by BICA with respect to IF extracted from measured images corrupted with spillover.
Model-based Clustering of Categorical Time Series with Multinomial Logit Classification

NASA Astrophysics Data System (ADS)

Frühwirth-Schnatter, Sylvia; Pamminger, Christoph; Winter-Ebmer, Rudolf; Weber, Andrea

2010-09-01

A common problem in many areas of applied statistics is to identify groups of similar time series in a panel of time series. However, distance-based clustering methods cannot easily be extended to time series data, where an appropriate distance-measure is rather difficult to define, particularly for discrete-valued time series. Markov chain clustering, proposed by Pamminger and Frühwirth-Schnatter [6], is an approach for clustering discrete-valued time series obtained by observing a categorical variable with several states. This model-based clustering method is based on finite mixtures of first-order time-homogeneous Markov chain models. In order to further explain group membership we present an extension to the approach of Pamminger and Frühwirth-Schnatter [6] by formulating a probabilistic model for the latent group indicators within the Bayesian classification rule by using a multinomial logit model. The parameters are estimated for a fixed number of clusters within a Bayesian framework using an Markov chain Monte Carlo (MCMC) sampling scheme representing a (full) Gibbs-type sampler which involves only draws from standard distributions. Finally, an application to a panel of Austrian wage mobility data is presented which leads to an interesting segmentation of the Austrian labour market.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.