Sample records for mcmc inference methods

  1. Data Analysis Recipes: Using Markov Chain Monte Carlo

    NASA Astrophysics Data System (ADS)

    Hogg, David W.; Foreman-Mackey, Daniel

    2018-05-01

    Markov Chain Monte Carlo (MCMC) methods for sampling probability density functions (combined with abundant computational resources) have transformed the sciences, especially in performing probabilistic inferences, or fitting models to data. In this primarily pedagogical contribution, we give a brief overview of the most basic MCMC method and some practical advice for the use of MCMC in real inference problems. We give advice on method choice, tuning for performance, methods for initialization, tests of convergence, troubleshooting, and use of the chain output to produce or report parameter estimates with associated uncertainties. We argue that autocorrelation time is the most important test for convergence, as it directly connects to the uncertainty on the sampling estimate of any quantity of interest. We emphasize that sampling is a method for doing integrals; this guides our thinking about how MCMC output is best used. .

  2. Assessing an ensemble Kalman filter inference of Manning's n coefficient of an idealized tidal inlet against a polynomial chaos-based MCMC

    NASA Astrophysics Data System (ADS)

    Siripatana, Adil; Mayo, Talea; Sraj, Ihab; Knio, Omar; Dawson, Clint; Le Maitre, Olivier; Hoteit, Ibrahim

    2017-08-01

    Bayesian estimation/inversion is commonly used to quantify and reduce modeling uncertainties in coastal ocean model, especially in the framework of parameter estimation. Based on Bayes rule, the posterior probability distribution function (pdf) of the estimated quantities is obtained conditioned on available data. It can be computed either directly, using a Markov chain Monte Carlo (MCMC) approach, or by sequentially processing the data following a data assimilation approach, which is heavily exploited in large dimensional state estimation problems. The advantage of data assimilation schemes over MCMC-type methods arises from the ability to algorithmically accommodate a large number of uncertain quantities without significant increase in the computational requirements. However, only approximate estimates are generally obtained by this approach due to the restricted Gaussian prior and noise assumptions that are generally imposed in these methods. This contribution aims at evaluating the effectiveness of utilizing an ensemble Kalman-based data assimilation method for parameter estimation of a coastal ocean model against an MCMC polynomial chaos (PC)-based scheme. We focus on quantifying the uncertainties of a coastal ocean ADvanced CIRCulation (ADCIRC) model with respect to the Manning's n coefficients. Based on a realistic framework of observation system simulation experiments (OSSEs), we apply an ensemble Kalman filter and the MCMC method employing a surrogate of ADCIRC constructed by a non-intrusive PC expansion for evaluating the likelihood, and test both approaches under identical scenarios. We study the sensitivity of the estimated posteriors with respect to the parameters of the inference methods, including ensemble size, inflation factor, and PC order. A full analysis of both methods, in the context of coastal ocean model, suggests that an ensemble Kalman filter with appropriate ensemble size and well-tuned inflation provides reliable mean estimates and uncertainties of Manning's n coefficients compared to the full posterior distributions inferred by MCMC.

  3. On an adaptive preconditioned Crank-Nicolson MCMC algorithm for infinite dimensional Bayesian inference

    NASA Astrophysics Data System (ADS)

    Hu, Zixi; Yao, Zhewei; Li, Jinglai

    2017-03-01

    Many scientific and engineering problems require to perform Bayesian inference for unknowns of infinite dimension. In such problems, many standard Markov Chain Monte Carlo (MCMC) algorithms become arbitrary slow under the mesh refinement, which is referred to as being dimension dependent. To this end, a family of dimensional independent MCMC algorithms, known as the preconditioned Crank-Nicolson (pCN) methods, were proposed to sample the infinite dimensional parameters. In this work we develop an adaptive version of the pCN algorithm, where the covariance operator of the proposal distribution is adjusted based on sampling history to improve the simulation efficiency. We show that the proposed algorithm satisfies an important ergodicity condition under some mild assumptions. Finally we provide numerical examples to demonstrate the performance of the proposed method.

  4. Model-based Bayesian inference for ROC data analysis

    NASA Astrophysics Data System (ADS)

    Lei, Tianhu; Bae, K. Ty

    2013-03-01

    This paper presents a study of model-based Bayesian inference to Receiver Operating Characteristics (ROC) data. The model is a simple version of general non-linear regression model. Different from Dorfman model, it uses a probit link function with a covariate variable having zero-one two values to express binormal distributions in a single formula. Model also includes a scale parameter. Bayesian inference is implemented by Markov Chain Monte Carlo (MCMC) method carried out by Bayesian analysis Using Gibbs Sampling (BUGS). Contrast to the classical statistical theory, Bayesian approach considers model parameters as random variables characterized by prior distributions. With substantial amount of simulated samples generated by sampling algorithm, posterior distributions of parameters as well as parameters themselves can be accurately estimated. MCMC-based BUGS adopts Adaptive Rejection Sampling (ARS) protocol which requires the probability density function (pdf) which samples are drawing from be log concave with respect to the targeted parameters. Our study corrects a common misconception and proves that pdf of this regression model is log concave with respect to its scale parameter. Therefore, ARS's requirement is satisfied and a Gaussian prior which is conjugate and possesses many analytic and computational advantages is assigned to the scale parameter. A cohort of 20 simulated data sets and 20 simulations from each data set are used in our study. Output analysis and convergence diagnostics for MCMC method are assessed by CODA package. Models and methods by using continuous Gaussian prior and discrete categorical prior are compared. Intensive simulations and performance measures are given to illustrate our practice in the framework of model-based Bayesian inference using MCMC method.

  5. Bayesian posterior distributions without Markov chains.

    PubMed

    Cole, Stephen R; Chu, Haitao; Greenland, Sander; Hamra, Ghassan; Richardson, David B

    2012-03-01

    Bayesian posterior parameter distributions are often simulated using Markov chain Monte Carlo (MCMC) methods. However, MCMC methods are not always necessary and do not help the uninitiated understand Bayesian inference. As a bridge to understanding Bayesian inference, the authors illustrate a transparent rejection sampling method. In example 1, they illustrate rejection sampling using 36 cases and 198 controls from a case-control study (1976-1983) assessing the relation between residential exposure to magnetic fields and the development of childhood cancer. Results from rejection sampling (odds ratio (OR) = 1.69, 95% posterior interval (PI): 0.57, 5.00) were similar to MCMC results (OR = 1.69, 95% PI: 0.58, 4.95) and approximations from data-augmentation priors (OR = 1.74, 95% PI: 0.60, 5.06). In example 2, the authors apply rejection sampling to a cohort study of 315 human immunodeficiency virus seroconverters (1984-1998) to assess the relation between viral load after infection and 5-year incidence of acquired immunodeficiency syndrome, adjusting for (continuous) age at seroconversion and race. In this more complex example, rejection sampling required a notably longer run time than MCMC sampling but remained feasible and again yielded similar results. The transparency of the proposed approach comes at a price of being less broadly applicable than MCMC.

  6. An NCME Instructional Module on Estimating Item Response Theory Models Using Markov Chain Monte Carlo Methods

    ERIC Educational Resources Information Center

    Kim, Jee-Seon; Bolt, Daniel M.

    2007-01-01

    The purpose of this ITEMS module is to provide an introduction to Markov chain Monte Carlo (MCMC) estimation for item response models. A brief description of Bayesian inference is followed by an overview of the various facets of MCMC algorithms, including discussion of prior specification, sampling procedures, and methods for evaluating chain…

  7. Of bugs and birds: Markov Chain Monte Carlo for hierarchical modeling in wildlife research

    USGS Publications Warehouse

    Link, W.A.; Cam, E.; Nichols, J.D.; Cooch, E.G.

    2002-01-01

    Markov chain Monte Carlo (MCMC) is a statistical innovation that allows researchers to fit far more complex models to data than is feasible using conventional methods. Despite its widespread use in a variety of scientific fields, MCMC appears to be underutilized in wildlife applications. This may be due to a misconception that MCMC requires the adoption of a subjective Bayesian analysis, or perhaps simply to its lack of familiarity among wildlife researchers. We introduce the basic ideas of MCMC and software BUGS (Bayesian inference using Gibbs sampling), stressing that a simple and satisfactory intuition for MCMC does not require extraordinary mathematical sophistication. We illustrate the use of MCMC with an analysis of the association between latent factors governing individual heterogeneity in breeding and survival rates of kittiwakes (Rissa tridactyla). We conclude with a discussion of the importance of individual heterogeneity for understanding population dynamics and designing management plans.

  8. Exact Bayesian Inference for Phylogenetic Birth-Death Models.

    PubMed

    Parag, K V; Pybus, O G

    2018-04-26

    Inferring the rates of change of a population from a reconstructed phylogeny of genetic sequences is a central problem in macro-evolutionary biology, epidemiology, and many other disciplines. A popular solution involves estimating the parameters of a birth-death process (BDP), which links the shape of the phylogeny to its birth and death rates. Modern BDP estimators rely on random Markov chain Monte Carlo (MCMC) sampling to infer these rates. Such methods, while powerful and scalable, cannot be guaranteed to converge, leading to results that may be hard to replicate or difficult to validate. We present a conceptually and computationally different parametric BDP inference approach using flexible and easy to implement Snyder filter (SF) algorithms. This method is deterministic so its results are provable, guaranteed, and reproducible. We validate the SF on constant rate BDPs and find that it solves BDP likelihoods known to produce robust estimates. We then examine more complex BDPs with time-varying rates. Our estimates compare well with a recently developed parametric MCMC inference method. Lastly, we performmodel selection on an empirical Agamid species phylogeny, obtaining results consistent with the literature. The SF makes no approximations, beyond those required for parameter quantisation and numerical integration, and directly computes the posterior distribution of model parameters. It is a promising alternative inference algorithm that may serve either as a standalone Bayesian estimator or as a useful diagnostic reference for validating more involved MCMC strategies. The Snyder filter is implemented in Matlab and the time-varying BDP models are simulated in R. The source code and data are freely available at https://github.com/kpzoo/snyder-birth-death-code. kris.parag@zoo.ox.ac.uk. Supplementary material is available at Bioinformatics online.

  9. A Detailed History of Intron-rich Eukaryotic Ancestors Inferred from a Global Survey of 100 Complete Genomes

    PubMed Central

    Csuros, Miklos; Rogozin, Igor B.; Koonin, Eugene V.

    2011-01-01

    Protein-coding genes in eukaryotes are interrupted by introns, but intron densities widely differ between eukaryotic lineages. Vertebrates, some invertebrates and green plants have intron-rich genes, with 6–7 introns per kilobase of coding sequence, whereas most of the other eukaryotes have intron-poor genes. We reconstructed the history of intron gain and loss using a probabilistic Markov model (Markov Chain Monte Carlo, MCMC) on 245 orthologous genes from 99 genomes representing the three of the five supergroups of eukaryotes for which multiple genome sequences are available. Intron-rich ancestors are confidently reconstructed for each major group, with 53 to 74% of the human intron density inferred with 95% confidence for the Last Eukaryotic Common Ancestor (LECA). The results of the MCMC reconstruction are compared with the reconstructions obtained using Maximum Likelihood (ML) and Dollo parsimony methods. An excellent agreement between the MCMC and ML inferences is demonstrated whereas Dollo parsimony introduces a noticeable bias in the estimations, typically yielding lower ancestral intron densities than MCMC and ML. Evolution of eukaryotic genes was dominated by intron loss, with substantial gain only at the bases of several major branches including plants and animals. The highest intron density, 120 to 130% of the human value, is inferred for the last common ancestor of animals. The reconstruction shows that the entire line of descent from LECA to mammals was intron-rich, a state conducive to the evolution of alternative splicing. PMID:21935348

  10. Multi-chain Markov chain Monte Carlo methods for computationally expensive models

    NASA Astrophysics Data System (ADS)

    Huang, M.; Ray, J.; Ren, H.; Hou, Z.; Bao, J.

    2017-12-01

    Markov chain Monte Carlo (MCMC) methods are used to infer model parameters from observational data. The parameters are inferred as probability densities, thus capturing estimation error due to sparsity of the data, and the shortcomings of the model. Multiple communicating chains executing the MCMC method have the potential to explore the parameter space better, and conceivably accelerate the convergence to the final distribution. We present results from tests conducted with the multi-chain method to show how the acceleration occurs i.e., for loose convergence tolerances, the multiple chains do not make much of a difference. The ensemble of chains also seems to have the ability to accelerate the convergence of a few chains that might start from suboptimal starting points. Finally, we show the performance of the chains in the estimation of O(10) parameters using computationally expensive forward models such as the Community Land Model, where the sampling burden is distributed over multiple chains.

  11. Sythesis of MCMC and Belief Propagation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ahn, Sungsoo; Chertkov, Michael; Shin, Jinwoo

    Markov Chain Monte Carlo (MCMC) and Belief Propagation (BP) are the most popular algorithms for computational inference in Graphical Models (GM). In principle, MCMC is an exact probabilistic method which, however, often suffers from exponentially slow mixing. In contrast, BP is a deterministic method, which is typically fast, empirically very successful, however in general lacking control of accuracy over loopy graphs. In this paper, we introduce MCMC algorithms correcting the approximation error of BP, i.e., we provide a way to compensate for BP errors via a consecutive BP-aware MCMC. Our framework is based on the Loop Calculus (LC) approach whichmore » allows to express the BP error as a sum of weighted generalized loops. Although the full series is computationally intractable, it is known that a truncated series, summing up all 2-regular loops, is computable in polynomial-time for planar pair-wise binary GMs and it also provides a highly accurate approximation empirically. Motivated by this, we first propose a polynomial-time approximation MCMC scheme for the truncated series of general (non-planar) pair-wise binary models. Our main idea here is to use the Worm algorithm, known to provide fast mixing in other (related) problems, and then design an appropriate rejection scheme to sample 2-regular loops. Furthermore, we also design an efficient rejection-free MCMC scheme for approximating the full series. The main novelty underlying our design is in utilizing the concept of cycle basis, which provides an efficient decomposition of the generalized loops. In essence, the proposed MCMC schemes run on transformed GM built upon the non-trivial BP solution, and our experiments show that this synthesis of BP and MCMC outperforms both direct MCMC and bare BP schemes.« less

  12. Birth-death prior on phylogeny and speed dating

    PubMed Central

    2008-01-01

    Background In recent years there has been a trend of leaving the strict molecular clock in order to infer dating of speciations and other evolutionary events. Explicit modeling of substitution rates and divergence times makes formulation of informative prior distributions for branch lengths possible. Models with birth-death priors on tree branching and auto-correlated or iid substitution rates among lineages have been proposed, enabling simultaneous inference of substitution rates and divergence times. This problem has, however, mainly been analysed in the Markov chain Monte Carlo (MCMC) framework, an approach requiring computation times of hours or days when applied to large phylogenies. Results We demonstrate that a hill-climbing maximum a posteriori (MAP) adaptation of the MCMC scheme results in considerable gain in computational efficiency. We demonstrate also that a novel dynamic programming (DP) algorithm for branch length factorization, useful both in the hill-climbing and in the MCMC setting, further reduces computation time. For the problem of inferring rates and times parameters on a fixed tree, we perform simulations, comparisons between hill-climbing and MCMC on a plant rbcL gene dataset, and dating analysis on an animal mtDNA dataset, showing that our methodology enables efficient, highly accurate analysis of very large trees. Datasets requiring a computation time of several days with MCMC can with our MAP algorithm be accurately analysed in less than a minute. From the results of our example analyses, we conclude that our methodology generally avoids getting trapped early in local optima. For the cases where this nevertheless can be a problem, for instance when we in addition to the parameters also infer the tree topology, we show that the problem can be evaded by using a simulated-annealing like (SAL) method in which we favour tree swaps early in the inference while biasing our focus towards rate and time parameter changes later on. Conclusion Our contribution leaves the field open for fast and accurate dating analysis of nucleotide sequence data. Modeling branch substitutions rates and divergence times separately allows us to include birth-death priors on the times without the assumption of a molecular clock. The methodology is easily adapted to take data from fossil records into account and it can be used together with a broad range of rate and substitution models. PMID:18318893

  13. Birth-death prior on phylogeny and speed dating.

    PubMed

    Akerborg, Orjan; Sennblad, Bengt; Lagergren, Jens

    2008-03-04

    In recent years there has been a trend of leaving the strict molecular clock in order to infer dating of speciations and other evolutionary events. Explicit modeling of substitution rates and divergence times makes formulation of informative prior distributions for branch lengths possible. Models with birth-death priors on tree branching and auto-correlated or iid substitution rates among lineages have been proposed, enabling simultaneous inference of substitution rates and divergence times. This problem has, however, mainly been analysed in the Markov chain Monte Carlo (MCMC) framework, an approach requiring computation times of hours or days when applied to large phylogenies. We demonstrate that a hill-climbing maximum a posteriori (MAP) adaptation of the MCMC scheme results in considerable gain in computational efficiency. We demonstrate also that a novel dynamic programming (DP) algorithm for branch length factorization, useful both in the hill-climbing and in the MCMC setting, further reduces computation time. For the problem of inferring rates and times parameters on a fixed tree, we perform simulations, comparisons between hill-climbing and MCMC on a plant rbcL gene dataset, and dating analysis on an animal mtDNA dataset, showing that our methodology enables efficient, highly accurate analysis of very large trees. Datasets requiring a computation time of several days with MCMC can with our MAP algorithm be accurately analysed in less than a minute. From the results of our example analyses, we conclude that our methodology generally avoids getting trapped early in local optima. For the cases where this nevertheless can be a problem, for instance when we in addition to the parameters also infer the tree topology, we show that the problem can be evaded by using a simulated-annealing like (SAL) method in which we favour tree swaps early in the inference while biasing our focus towards rate and time parameter changes later on. Our contribution leaves the field open for fast and accurate dating analysis of nucleotide sequence data. Modeling branch substitutions rates and divergence times separately allows us to include birth-death priors on the times without the assumption of a molecular clock. The methodology is easily adapted to take data from fossil records into account and it can be used together with a broad range of rate and substitution models.

  14. Bayesian inference for OPC modeling

    NASA Astrophysics Data System (ADS)

    Burbine, Andrew; Sturtevant, John; Fryer, David; Smith, Bruce W.

    2016-03-01

    The use of optical proximity correction (OPC) demands increasingly accurate models of the photolithographic process. Model building and inference techniques in the data science community have seen great strides in the past two decades which make better use of available information. This paper aims to demonstrate the predictive power of Bayesian inference as a method for parameter selection in lithographic models by quantifying the uncertainty associated with model inputs and wafer data. Specifically, the method combines the model builder's prior information about each modelling assumption with the maximization of each observation's likelihood as a Student's t-distributed random variable. Through the use of a Markov chain Monte Carlo (MCMC) algorithm, a model's parameter space is explored to find the most credible parameter values. During parameter exploration, the parameters' posterior distributions are generated by applying Bayes' rule, using a likelihood function and the a priori knowledge supplied. The MCMC algorithm used, an affine invariant ensemble sampler (AIES), is implemented by initializing many walkers which semiindependently explore the space. The convergence of these walkers to global maxima of the likelihood volume determine the parameter values' highest density intervals (HDI) to reveal champion models. We show that this method of parameter selection provides insights into the data that traditional methods do not and outline continued experiments to vet the method.

  15. Estimating the Effective Sample Size of Tree Topologies from Bayesian Phylogenetic Analyses

    PubMed Central

    Lanfear, Robert; Hua, Xia; Warren, Dan L.

    2016-01-01

    Bayesian phylogenetic analyses estimate posterior distributions of phylogenetic tree topologies and other parameters using Markov chain Monte Carlo (MCMC) methods. Before making inferences from these distributions, it is important to assess their adequacy. To this end, the effective sample size (ESS) estimates how many truly independent samples of a given parameter the output of the MCMC represents. The ESS of a parameter is frequently much lower than the number of samples taken from the MCMC because sequential samples from the chain can be non-independent due to autocorrelation. Typically, phylogeneticists use a rule of thumb that the ESS of all parameters should be greater than 200. However, we have no method to calculate an ESS of tree topology samples, despite the fact that the tree topology is often the parameter of primary interest and is almost always central to the estimation of other parameters. That is, we lack a method to determine whether we have adequately sampled one of the most important parameters in our analyses. In this study, we address this problem by developing methods to estimate the ESS for tree topologies. We combine these methods with two new diagnostic plots for assessing posterior samples of tree topologies, and compare their performance on simulated and empirical data sets. Combined, the methods we present provide new ways to assess the mixing and convergence of phylogenetic tree topologies in Bayesian MCMC analyses. PMID:27435794

  16. Multi-objective calibration and uncertainty analysis of hydrologic models; A comparative study between formal and informal methods

    NASA Astrophysics Data System (ADS)

    Shafii, M.; Tolson, B.; Matott, L. S.

    2012-04-01

    Hydrologic modeling has benefited from significant developments over the past two decades. This has resulted in building of higher levels of complexity into hydrologic models, which eventually makes the model evaluation process (parameter estimation via calibration and uncertainty analysis) more challenging. In order to avoid unreasonable parameter estimates, many researchers have suggested implementation of multi-criteria calibration schemes. Furthermore, for predictive hydrologic models to be useful, proper consideration of uncertainty is essential. Consequently, recent research has emphasized comprehensive model assessment procedures in which multi-criteria parameter estimation is combined with statistically-based uncertainty analysis routines such as Bayesian inference using Markov Chain Monte Carlo (MCMC) sampling. Such a procedure relies on the use of formal likelihood functions based on statistical assumptions, and moreover, the Bayesian inference structured on MCMC samplers requires a considerably large number of simulations. Due to these issues, especially in complex non-linear hydrological models, a variety of alternative informal approaches have been proposed for uncertainty analysis in the multi-criteria context. This study aims at exploring a number of such informal uncertainty analysis techniques in multi-criteria calibration of hydrological models. The informal methods addressed in this study are (i) Pareto optimality which quantifies the parameter uncertainty using the Pareto solutions, (ii) DDS-AU which uses the weighted sum of objective functions to derive the prediction limits, and (iii) GLUE which describes the total uncertainty through identification of behavioral solutions. The main objective is to compare such methods with MCMC-based Bayesian inference with respect to factors such as computational burden, and predictive capacity, which are evaluated based on multiple comparative measures. The measures for comparison are calculated both for calibration and evaluation periods. The uncertainty analysis methodologies are applied to a simple 5-parameter rainfall-runoff model, called HYMOD.

  17. Bayesian inference for dynamic transcriptional regulation; the Hes1 system as a case study.

    PubMed

    Heron, Elizabeth A; Finkenstädt, Bärbel; Rand, David A

    2007-10-01

    In this study, we address the problem of estimating the parameters of regulatory networks and provide the first application of Markov chain Monte Carlo (MCMC) methods to experimental data. As a case study, we consider a stochastic model of the Hes1 system expressed in terms of stochastic differential equations (SDEs) to which rigorous likelihood methods of inference can be applied. When fitting continuous-time stochastic models to discretely observed time series the lengths of the sampling intervals are important, and much of our study addresses the problem when the data are sparse. We estimate the parameters of an autoregulatory network providing results both for simulated and real experimental data from the Hes1 system. We develop an estimation algorithm using MCMC techniques which are flexible enough to allow for the imputation of latent data on a finer time scale and the presence of prior information about parameters which may be informed from other experiments as well as additional measurement error.

  18. Efficient inference for genetic association studies with multiple outcomes.

    PubMed

    Ruffieux, Helene; Davison, Anthony C; Hager, Jorg; Irincheeva, Irina

    2017-10-01

    Combined inference for heterogeneous high-dimensional data is critical in modern biology, where clinical and various kinds of molecular data may be available from a single study. Classical genetic association studies regress a single clinical outcome on many genetic variants one by one, but there is an increasing demand for joint analysis of many molecular outcomes and genetic variants in order to unravel functional interactions. Unfortunately, most existing approaches to joint modeling are either too simplistic to be powerful or are impracticable for computational reasons. Inspired by Richardson and others (2010, Bayesian Statistics 9), we consider a sparse multivariate regression model that allows simultaneous selection of predictors and associated responses. As Markov chain Monte Carlo (MCMC) inference on such models can be prohibitively slow when the number of genetic variants exceeds a few thousand, we propose a variational inference approach which produces posterior information very close to that of MCMC inference, at a much reduced computational cost. Extensive numerical experiments show that our approach outperforms popular variable selection methods and tailored Bayesian procedures, dealing within hours with problems involving hundreds of thousands of genetic variants and tens to hundreds of clinical or molecular outcomes. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  19. Approximate Bayesian computation for spatial SEIR(S) epidemic models.

    PubMed

    Brown, Grant D; Porter, Aaron T; Oleson, Jacob J; Hinman, Jessica A

    2018-02-01

    Approximate Bayesia n Computation (ABC) provides an attractive approach to estimation in complex Bayesian inferential problems for which evaluation of the kernel of the posterior distribution is impossible or computationally expensive. These highly parallelizable techniques have been successfully applied to many fields, particularly in cases where more traditional approaches such as Markov chain Monte Carlo (MCMC) are impractical. In this work, we demonstrate the application of approximate Bayesian inference to spatially heterogeneous Susceptible-Exposed-Infectious-Removed (SEIR) stochastic epidemic models. These models have a tractable posterior distribution, however MCMC techniques nevertheless become computationally infeasible for moderately sized problems. We discuss the practical implementation of these techniques via the open source ABSEIR package for R. The performance of ABC relative to traditional MCMC methods in a small problem is explored under simulation, as well as in the spatially heterogeneous context of the 2014 epidemic of Chikungunya in the Americas. Copyright © 2017 Elsevier Ltd. All rights reserved.

  20. Accounting for the measurement error of spectroscopically inferred soil carbon data for improved precision of spatial predictions.

    PubMed

    Somarathna, P D S N; Minasny, Budiman; Malone, Brendan P; Stockmann, Uta; McBratney, Alex B

    2018-08-01

    Spatial modelling of environmental data commonly only considers spatial variability as the single source of uncertainty. In reality however, the measurement errors should also be accounted for. In recent years, infrared spectroscopy has been shown to offer low cost, yet invaluable information needed for digital soil mapping at meaningful spatial scales for land management. However, spectrally inferred soil carbon data are known to be less accurate compared to laboratory analysed measurements. This study establishes a methodology to filter out the measurement error variability by incorporating the measurement error variance in the spatial covariance structure of the model. The study was carried out in the Lower Hunter Valley, New South Wales, Australia where a combination of laboratory measured, and vis-NIR and MIR inferred topsoil and subsoil soil carbon data are available. We investigated the applicability of residual maximum likelihood (REML) and Markov Chain Monte Carlo (MCMC) simulation methods to generate parameters of the Matérn covariance function directly from the data in the presence of measurement error. The results revealed that the measurement error can be effectively filtered-out through the proposed technique. When the measurement error was filtered from the data, the prediction variance almost halved, which ultimately yielded a greater certainty in spatial predictions of soil carbon. Further, the MCMC technique was successfully used to define the posterior distribution of measurement error. This is an important outcome, as the MCMC technique can be used to estimate the measurement error if it is not explicitly quantified. Although this study dealt with soil carbon data, this method is amenable for filtering the measurement error of any kind of continuous spatial environmental data. Copyright © 2018 Elsevier B.V. All rights reserved.

  1. An adaptive sparse-grid high-order stochastic collocation method for Bayesian inference in groundwater reactive transport modeling

    NASA Astrophysics Data System (ADS)

    Zhang, Guannan; Lu, Dan; Ye, Ming; Gunzburger, Max; Webster, Clayton

    2013-10-01

    Bayesian analysis has become vital to uncertainty quantification in groundwater modeling, but its application has been hindered by the computational cost associated with numerous model executions required by exploring the posterior probability density function (PPDF) of model parameters. This is particularly the case when the PPDF is estimated using Markov Chain Monte Carlo (MCMC) sampling. In this study, a new approach is developed to improve the computational efficiency of Bayesian inference by constructing a surrogate of the PPDF, using an adaptive sparse-grid high-order stochastic collocation (aSG-hSC) method. Unlike previous works using first-order hierarchical basis, this paper utilizes a compactly supported higher-order hierarchical basis to construct the surrogate system, resulting in a significant reduction in the number of required model executions. In addition, using the hierarchical surplus as an error indicator allows locally adaptive refinement of sparse grids in the parameter space, which further improves computational efficiency. To efficiently build the surrogate system for the PPDF with multiple significant modes, optimization techniques are used to identify the modes, for which high-probability regions are defined and components of the aSG-hSC approximation are constructed. After the surrogate is determined, the PPDF can be evaluated by sampling the surrogate system directly without model execution, resulting in improved efficiency of the surrogate-based MCMC compared with conventional MCMC. The developed method is evaluated using two synthetic groundwater reactive transport models. The first example involves coupled linear reactions and demonstrates the accuracy of our high-order hierarchical basis approach in approximating high-dimensional posteriori distribution. The second example is highly nonlinear because of the reactions of uranium surface complexation, and demonstrates how the iterative aSG-hSC method is able to capture multimodal and non-Gaussian features of PPDF caused by model nonlinearity. Both experiments show that aSG-hSC is an effective and efficient tool for Bayesian inference.

  2. Profile-Based LC-MS Data Alignment—A Bayesian Approach

    PubMed Central

    Tsai, Tsung-Heng; Tadesse, Mahlet G.; Wang, Yue; Ressom, Habtom W.

    2014-01-01

    A Bayesian alignment model (BAM) is proposed for alignment of liquid chromatography-mass spectrometry (LC-MS) data. BAM belongs to the category of profile-based approaches, which are composed of two major components: a prototype function and a set of mapping functions. Appropriate estimation of these functions is crucial for good alignment results. BAM uses Markov chain Monte Carlo (MCMC) methods to draw inference on the model parameters and improves on existing MCMC-based alignment methods through 1) the implementation of an efficient MCMC sampler and 2) an adaptive selection of knots. A block Metropolis-Hastings algorithm that mitigates the problem of the MCMC sampler getting stuck at local modes of the posterior distribution is used for the update of the mapping function coefficients. In addition, a stochastic search variable selection (SSVS) methodology is used to determine the number and positions of knots. We applied BAM to a simulated data set, an LC-MS proteomic data set, and two LC-MS metabolomic data sets, and compared its performance with the Bayesian hierarchical curve registration (BHCR) model, the dynamic time-warping (DTW) model, and the continuous profile model (CPM). The advantage of applying appropriate profile-based retention time correction prior to performing a feature-based approach is also demonstrated through the metabolomic data sets. PMID:23929872

  3. Bayesian Inference on Malignant Breast Cancer in Nigeria: A Diagnosis of MCMC Convergence

    PubMed Central

    Ogunsakin, Ropo Ebenezer; Siaka, Lougue

    2017-01-01

    Background: There has been no previous study to classify malignant breast tumor in details based on Markov Chain Monte Carlo (MCMC) convergence in Western, Nigeria. This study therefore aims to profile patients living with benign and malignant breast tumor in two different hospitals among women of Western Nigeria, with a focus on prognostic factors and MCMC convergence. Materials and Methods: A hospital-based record was used to identify prognostic factors for malignant breast cancer among women of Western Nigeria. This paper describes Bayesian inference and demonstrates its usage to estimation of parameters of the logistic regression via Markov Chain Monte Carlo (MCMC) algorithm. The result of the Bayesian approach is compared with the classical statistics. Results: The mean age of the respondents was 42.2 ±16.6 years with 52% of the women aged between 35-49 years. The results of both techniques suggest that age and women with at least high school education have a significantly higher risk of being diagnosed with malignant breast tumors than benign breast tumors. The results also indicate a reduction of standard errors is associated with the coefficients obtained from the Bayesian approach. In addition, simulation result reveal that women with at least high school are 1.3 times more at risk of having malignant breast lesion in western Nigeria compared to benign breast lesion. Conclusion: We concluded that more efforts are required towards creating awareness and advocacy campaigns on how the prevalence of malignant breast lesions can be reduced, especially among women. The application of Bayesian produces precise estimates for modeling malignant breast cancer. PMID:29072396

  4. MontePython 3: Parameter inference code for cosmology

    NASA Astrophysics Data System (ADS)

    Brinckmann, Thejs; Lesgourgues, Julien; Audren, Benjamin; Benabed, Karim; Prunet, Simon

    2018-05-01

    MontePython 3 provides numerous ways to explore parameter space using Monte Carlo Markov Chain (MCMC) sampling, including Metropolis-Hastings, Nested Sampling, Cosmo Hammer, and a Fisher sampling method. This improved version of the Monte Python (ascl:1307.002) parameter inference code for cosmology offers new ingredients that improve the performance of Metropolis-Hastings sampling, speeding up convergence and offering significant time improvement in difficult runs. Additional likelihoods and plotting options are available, as are post-processing algorithms such as Importance Sampling and Adding Derived Parameter.

  5. IMa2p - Parallel MCMC and inference of ancient demography under the Isolation with Migration (IM) model

    PubMed Central

    Sethuraman, Arun; Hey, Jody

    2015-01-01

    IMa2 and related programs are used to study the divergence of closely related species and of populations within species. These methods are based on the sampling of genealogies using MCMC, and they can proceed quite slowly for larger data sets. We describe a parallel implementation, called IMa2p, that provides a nearly linear increase in genealogy sampling rate with the number of processors in use. IMa2p is written in OpenMPI and C++, and scales well for demographic analyses of a large number of loci and populations, which are difficult to study using the serial version of the program. PMID:26059786

  6. Monte Carlo Bayesian inference on a statistical model of sub-gridcolumn moisture variability using high-resolution cloud observations. Part 1: Method.

    PubMed

    Norris, Peter M; da Silva, Arlindo M

    2016-07-01

    A method is presented to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation. The gridcolumn model includes assumed probability density function (PDF) intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used in the current study are Moderate Resolution Imaging Spectroradiometer (MODIS) cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. The current study uses a skewed-triangle distribution for layer moisture. The article also includes a discussion of the Metropolis and multiple-try Metropolis versions of MCMC.

  7. Monte Carlo Bayesian Inference on a Statistical Model of Sub-Gridcolumn Moisture Variability Using High-Resolution Cloud Observations. Part 1: Method

    NASA Technical Reports Server (NTRS)

    Norris, Peter M.; Da Silva, Arlindo M.

    2016-01-01

    A method is presented to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation. The gridcolumn model includes assumed probability density function (PDF) intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used in the current study are Moderate Resolution Imaging Spectroradiometer (MODIS) cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. The current study uses a skewed-triangle distribution for layer moisture. The article also includes a discussion of the Metropolis and multiple-try Metropolis versions of MCMC.

  8. Monte Carlo Bayesian inference on a statistical model of sub-gridcolumn moisture variability using high-resolution cloud observations. Part 1: Method

    PubMed Central

    Norris, Peter M.; da Silva, Arlindo M.

    2018-01-01

    A method is presented to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation. The gridcolumn model includes assumed probability density function (PDF) intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used in the current study are Moderate Resolution Imaging Spectroradiometer (MODIS) cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. The current study uses a skewed-triangle distribution for layer moisture. The article also includes a discussion of the Metropolis and multiple-try Metropolis versions of MCMC. PMID:29618847

  9. The Bayesian group lasso for confounded spatial data

    USGS Publications Warehouse

    Hefley, Trevor J.; Hooten, Mevin B.; Hanks, Ephraim M.; Russell, Robin E.; Walsh, Daniel P.

    2017-01-01

    Generalized linear mixed models for spatial processes are widely used in applied statistics. In many applications of the spatial generalized linear mixed model (SGLMM), the goal is to obtain inference about regression coefficients while achieving optimal predictive ability. When implementing the SGLMM, multicollinearity among covariates and the spatial random effects can make computation challenging and influence inference. We present a Bayesian group lasso prior with a single tuning parameter that can be chosen to optimize predictive ability of the SGLMM and jointly regularize the regression coefficients and spatial random effect. We implement the group lasso SGLMM using efficient Markov chain Monte Carlo (MCMC) algorithms and demonstrate how multicollinearity among covariates and the spatial random effect can be monitored as a derived quantity. To test our method, we compared several parameterizations of the SGLMM using simulated data and two examples from plant ecology and disease ecology. In all examples, problematic levels multicollinearity occurred and influenced sampling efficiency and inference. We found that the group lasso prior resulted in roughly twice the effective sample size for MCMC samples of regression coefficients and can have higher and less variable predictive accuracy based on out-of-sample data when compared to the standard SGLMM.

  10. A hybrid expectation maximisation and MCMC sampling algorithm to implement Bayesian mixture model based genomic prediction and QTL mapping.

    PubMed

    Wang, Tingting; Chen, Yi-Ping Phoebe; Bowman, Phil J; Goddard, Michael E; Hayes, Ben J

    2016-09-21

    Bayesian mixture models in which the effects of SNP are assumed to come from normal distributions with different variances are attractive for simultaneous genomic prediction and QTL mapping. These models are usually implemented with Monte Carlo Markov Chain (MCMC) sampling, which requires long compute times with large genomic data sets. Here, we present an efficient approach (termed HyB_BR), which is a hybrid of an Expectation-Maximisation algorithm, followed by a limited number of MCMC without the requirement for burn-in. To test prediction accuracy from HyB_BR, dairy cattle and human disease trait data were used. In the dairy cattle data, there were four quantitative traits (milk volume, protein kg, fat% in milk and fertility) measured in 16,214 cattle from two breeds genotyped for 632,002 SNPs. Validation of genomic predictions was in a subset of cattle either from the reference set or in animals from a third breeds that were not in the reference set. In all cases, HyB_BR gave almost identical accuracies to Bayesian mixture models implemented with full MCMC, however computational time was reduced by up to 1/17 of that required by full MCMC. The SNPs with high posterior probability of a non-zero effect were also very similar between full MCMC and HyB_BR, with several known genes affecting milk production in this category, as well as some novel genes. HyB_BR was also applied to seven human diseases with 4890 individuals genotyped for around 300 K SNPs in a case/control design, from the Welcome Trust Case Control Consortium (WTCCC). In this data set, the results demonstrated again that HyB_BR performed as well as Bayesian mixture models with full MCMC for genomic predictions and genetic architecture inference while reducing the computational time from 45 h with full MCMC to 3 h with HyB_BR. The results for quantitative traits in cattle and disease in humans demonstrate that HyB_BR can perform equally well as Bayesian mixture models implemented with full MCMC in terms of prediction accuracy, but with up to 17 times faster than the full MCMC implementations. The HyB_BR algorithm makes simultaneous genomic prediction, QTL mapping and inference of genetic architecture feasible in large genomic data sets.

  11. A surrogate-based sensitivity quantification and Bayesian inversion of a regional groundwater flow model

    NASA Astrophysics Data System (ADS)

    Chen, Mingjie; Izady, Azizallah; Abdalla, Osman A.; Amerjeed, Mansoor

    2018-02-01

    Bayesian inference using Markov Chain Monte Carlo (MCMC) provides an explicit framework for stochastic calibration of hydrogeologic models accounting for uncertainties; however, the MCMC sampling entails a large number of model calls, and could easily become computationally unwieldy if the high-fidelity hydrogeologic model simulation is time consuming. This study proposes a surrogate-based Bayesian framework to address this notorious issue, and illustrates the methodology by inverse modeling a regional MODFLOW model. The high-fidelity groundwater model is approximated by a fast statistical model using Bagging Multivariate Adaptive Regression Spline (BMARS) algorithm, and hence the MCMC sampling can be efficiently performed. In this study, the MODFLOW model is developed to simulate the groundwater flow in an arid region of Oman consisting of mountain-coast aquifers, and used to run representative simulations to generate training dataset for BMARS model construction. A BMARS-based Sobol' method is also employed to efficiently calculate input parameter sensitivities, which are used to evaluate and rank their importance for the groundwater flow model system. According to sensitivity analysis, insensitive parameters are screened out of Bayesian inversion of the MODFLOW model, further saving computing efforts. The posterior probability distribution of input parameters is efficiently inferred from the prescribed prior distribution using observed head data, demonstrating that the presented BMARS-based Bayesian framework is an efficient tool to reduce parameter uncertainties of a groundwater system.

  12. Implementing informative priors for heterogeneity in meta-analysis using meta-regression and pseudo data.

    PubMed

    Rhodes, Kirsty M; Turner, Rebecca M; White, Ian R; Jackson, Dan; Spiegelhalter, David J; Higgins, Julian P T

    2016-12-20

    Many meta-analyses combine results from only a small number of studies, a situation in which the between-study variance is imprecisely estimated when standard methods are applied. Bayesian meta-analysis allows incorporation of external evidence on heterogeneity, providing the potential for more robust inference on the effect size of interest. We present a method for performing Bayesian meta-analysis using data augmentation, in which we represent an informative conjugate prior for between-study variance by pseudo data and use meta-regression for estimation. To assist in this, we derive predictive inverse-gamma distributions for the between-study variance expected in future meta-analyses. These may serve as priors for heterogeneity in new meta-analyses. In a simulation study, we compare approximate Bayesian methods using meta-regression and pseudo data against fully Bayesian approaches based on importance sampling techniques and Markov chain Monte Carlo (MCMC). We compare the frequentist properties of these Bayesian methods with those of the commonly used frequentist DerSimonian and Laird procedure. The method is implemented in standard statistical software and provides a less complex alternative to standard MCMC approaches. An importance sampling approach produces almost identical results to standard MCMC approaches, and results obtained through meta-regression and pseudo data are very similar. On average, data augmentation provides closer results to MCMC, if implemented using restricted maximum likelihood estimation rather than DerSimonian and Laird or maximum likelihood estimation. The methods are applied to real datasets, and an extension to network meta-analysis is described. The proposed method facilitates Bayesian meta-analysis in a way that is accessible to applied researchers. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  13. Modelling maximum river flow by using Bayesian Markov Chain Monte Carlo

    NASA Astrophysics Data System (ADS)

    Cheong, R. Y.; Gabda, D.

    2017-09-01

    Analysis of flood trends is vital since flooding threatens human living in terms of financial, environment and security. The data of annual maximum river flows in Sabah were fitted into generalized extreme value (GEV) distribution. Maximum likelihood estimator (MLE) raised naturally when working with GEV distribution. However, previous researches showed that MLE provide unstable results especially in small sample size. In this study, we used different Bayesian Markov Chain Monte Carlo (MCMC) based on Metropolis-Hastings algorithm to estimate GEV parameters. Bayesian MCMC method is a statistical inference which studies the parameter estimation by using posterior distribution based on Bayes’ theorem. Metropolis-Hastings algorithm is used to overcome the high dimensional state space faced in Monte Carlo method. This approach also considers more uncertainty in parameter estimation which then presents a better prediction on maximum river flow in Sabah.

  14. Dimension-independent likelihood-informed MCMC

    DOE PAGES

    Cui, Tiangang; Law, Kody J. H.; Marzouk, Youssef M.

    2015-10-08

    Many Bayesian inference problems require exploring the posterior distribution of highdimensional parameters that represent the discretization of an underlying function. Our work introduces a family of Markov chain Monte Carlo (MCMC) samplers that can adapt to the particular structure of a posterior distribution over functions. There are two distinct lines of research that intersect in the methods we develop here. First, we introduce a general class of operator-weighted proposal distributions that are well defined on function space, such that the performance of the resulting MCMC samplers is independent of the discretization of the function. Second, by exploiting local Hessian informationmore » and any associated lowdimensional structure in the change from prior to posterior distributions, we develop an inhomogeneous discretization scheme for the Langevin stochastic differential equation that yields operator-weighted proposals adapted to the non-Gaussian structure of the posterior. The resulting dimension-independent and likelihood-informed (DILI) MCMC samplers may be useful for a large class of high-dimensional problems where the target probability measure has a density with respect to a Gaussian reference measure. Finally, we use two nonlinear inverse problems in order to demonstrate the efficiency of these DILI samplers: an elliptic PDE coefficient inverse problem and path reconstruction in a conditioned diffusion.« less

  15. Using Stan for Item Response Theory Models

    ERIC Educational Resources Information Center

    Ames, Allison J.; Au, Chi Hang

    2018-01-01

    Stan is a flexible probabilistic programming language providing full Bayesian inference through Hamiltonian Monte Carlo algorithms. The benefits of Hamiltonian Monte Carlo include improved efficiency and faster inference, when compared to other MCMC software implementations. Users can interface with Stan through a variety of computing…

  16. Population forecasts for Bangladesh, using a Bayesian methodology.

    PubMed

    Mahsin, Md; Hossain, Syed Shahadat

    2012-12-01

    Population projection for many developing countries could be quite a challenging task for the demographers mostly due to lack of availability of enough reliable data. The objective of this paper is to present an overview of the existing methods for population forecasting and to propose an alternative based on the Bayesian statistics, combining the formality of inference. The analysis has been made using Markov Chain Monte Carlo (MCMC) technique for Bayesian methodology available with the software WinBUGS. Convergence diagnostic techniques available with the WinBUGS software have been applied to ensure the convergence of the chains necessary for the implementation of MCMC. The Bayesian approach allows for the use of observed data and expert judgements by means of appropriate priors, and a more realistic population forecasts, along with associated uncertainty, has been possible.

  17. Bayesian inversion of a CRN depth profile to infer Quaternary erosion of the northwestern Campine Plateau (NE Belgium)

    NASA Astrophysics Data System (ADS)

    Laloy, Eric; Beerten, Koen; Vanacker, Veerle; Christl, Marcus; Rogiers, Bart; Wouters, Laurent

    2017-07-01

    The rate at which low-lying sandy areas in temperate regions, such as the Campine Plateau (NE Belgium), have been eroding during the Quaternary is a matter of debate. Current knowledge on the average pace of landscape evolution in the Campine area is largely based on geological inferences and modern analogies. We performed a Bayesian inversion of an in situ-produced 10Be concentration depth profile to infer the average long-term erosion rate together with two other parameters: the surface exposure age and the inherited 10Be concentration. Compared to the latest advances in probabilistic inversion of cosmogenic radionuclide (CRN) data, our approach has the following two innovative components: it (1) uses Markov chain Monte Carlo (MCMC) sampling and (2) accounts (under certain assumptions) for the contribution of model errors to posterior uncertainty. To investigate to what extent our approach differs from the state of the art in practice, a comparison against the Bayesian inversion method implemented in the CRONUScalc program is made. Both approaches identify similar maximum a posteriori (MAP) parameter values, but posterior parameter and predictive uncertainty derived using the method taken in CRONUScalc is moderately underestimated. A simple way for producing more consistent uncertainty estimates with the CRONUScalc-like method in the presence of model errors is therefore suggested. Our inferred erosion rate of 39 ± 8. 9 mm kyr-1 (1σ) is relatively large in comparison with landforms that erode under comparable (paleo-)climates elsewhere in the world. We evaluate this value in the light of the erodibility of the substrate and sudden base level lowering during the Middle Pleistocene. A denser sampling scheme of a two-nuclide concentration depth profile would allow for better inferred erosion rate resolution, and including more uncertain parameters in the MCMC inversion.

  18. Bayesian Orbit Computation Tools for Objects on Geocentric Orbits

    NASA Astrophysics Data System (ADS)

    Virtanen, J.; Granvik, M.; Muinonen, K.; Oszkiewicz, D.

    2013-08-01

    We consider the space-debris orbital inversion problem via the concept of Bayesian inference. The methodology has been put forward for the orbital analysis of solar system small bodies in early 1990's [7] and results in a full solution of the statistical inverse problem given in terms of a posteriori probability density function (PDF) for the orbital parameters. We demonstrate the applicability of our statistical orbital analysis software to Earth orbiting objects, both using well-established Monte Carlo (MC) techniques (for a review, see e.g. [13] as well as recently developed Markov-chain MC (MCMC) techniques (e.g., [9]). In particular, we exploit the novel virtual observation MCMC method [8], which is based on the characterization of the phase-space volume of orbital solutions before the actual MCMC sampling. Our statistical methods and the resulting PDFs immediately enable probabilistic impact predictions to be carried out. Furthermore, this can be readily done also for very sparse data sets and data sets of poor quality - providing that some a priori information on the observational uncertainty is available. For asteroids, impact probabilities with the Earth from the discovery night onwards have been provided, e.g., by [11] and [10], the latter study includes the sampling of the observational-error standard deviation as a random variable.

  19. Markov Chain Monte Carlo: an introduction for epidemiologists

    PubMed Central

    Hamra, Ghassan; MacLehose, Richard; Richardson, David

    2013-01-01

    Markov Chain Monte Carlo (MCMC) methods are increasingly popular among epidemiologists. The reason for this may in part be that MCMC offers an appealing approach to handling some difficult types of analyses. Additionally, MCMC methods are those most commonly used for Bayesian analysis. However, epidemiologists are still largely unfamiliar with MCMC. They may lack familiarity either with he implementation of MCMC or with interpretation of the resultant output. As with tutorials outlining the calculus behind maximum likelihood in previous decades, a simple description of the machinery of MCMC is needed. We provide an introduction to conducting analyses with MCMC, and show that, given the same data and under certain model specifications, the results of an MCMC simulation match those of methods based on standard maximum-likelihood estimation (MLE). In addition, we highlight examples of instances in which MCMC approaches to data analysis provide a clear advantage over MLE. We hope that this brief tutorial will encourage epidemiologists to consider MCMC approaches as part of their analytic tool-kit. PMID:23569196

  20. Neural Dynamics as Sampling: A Model for Stochastic Computation in Recurrent Networks of Spiking Neurons

    PubMed Central

    Buesing, Lars; Bill, Johannes; Nessler, Bernhard; Maass, Wolfgang

    2011-01-01

    The organization of computations in networks of spiking neurons in the brain is still largely unknown, in particular in view of the inherently stochastic features of their firing activity and the experimentally observed trial-to-trial variability of neural systems in the brain. In principle there exists a powerful computational framework for stochastic computations, probabilistic inference by sampling, which can explain a large number of macroscopic experimental data in neuroscience and cognitive science. But it has turned out to be surprisingly difficult to create a link between these abstract models for stochastic computations and more detailed models of the dynamics of networks of spiking neurons. Here we create such a link and show that under some conditions the stochastic firing activity of networks of spiking neurons can be interpreted as probabilistic inference via Markov chain Monte Carlo (MCMC) sampling. Since common methods for MCMC sampling in distributed systems, such as Gibbs sampling, are inconsistent with the dynamics of spiking neurons, we introduce a different approach based on non-reversible Markov chains that is able to reflect inherent temporal processes of spiking neuronal activity through a suitable choice of random variables. We propose a neural network model and show by a rigorous theoretical analysis that its neural activity implements MCMC sampling of a given distribution, both for the case of discrete and continuous time. This provides a step towards closing the gap between abstract functional models of cortical computation and more detailed models of networks of spiking neurons. PMID:22096452

  1. Bayesian inference on EMRI signals using low frequency approximations

    NASA Astrophysics Data System (ADS)

    Ali, Asad; Christensen, Nelson; Meyer, Renate; Röver, Christian

    2012-07-01

    Extreme mass ratio inspirals (EMRIs) are thought to be one of the most exciting gravitational wave sources to be detected with LISA. Due to their complicated nature and weak amplitudes the detection and parameter estimation of such sources is a challenging task. In this paper we present a statistical methodology based on Bayesian inference in which the estimation of parameters is carried out by advanced Markov chain Monte Carlo (MCMC) algorithms such as parallel tempering MCMC. We analysed high and medium mass EMRI systems that fall well inside the low frequency range of LISA. In the context of the Mock LISA Data Challenges, our investigation and results are also the first instance in which a fully Markovian algorithm is applied for EMRI searches. Results show that our algorithm worked well in recovering EMRI signals from different (simulated) LISA data sets having single and multiple EMRI sources and holds great promise for posterior computation under more realistic conditions. The search and estimation methods presented in this paper are general in their nature, and can be applied in any other scenario such as AdLIGO, AdVIRGO and Einstein Telescope with their respective response functions.

  2. Modeling Well Sampled Composite Spectral Energy Distributions of Distant Galaxies via an MCMC-driven Inference Framework

    NASA Astrophysics Data System (ADS)

    Pasha, Imad; Kriek, Mariska; Johnson, Benjamin; Conroy, Charlie

    2018-01-01

    Using a novel, MCMC-driven inference framework, we have modeled the stellar and dust emission of 32 composite spectral energy distributions (SEDs), which span from the near-ultraviolet (NUV) to far infrared (FIR). The composite SEDs were originally constructed in a previous work from the photometric catalogs of the NEWFIRM Medium-Band Survey, in which SEDs of individual galaxies at 0.5 < z < 2.0 were iteratively matched and sorted into types based on their rest-frame UV-to-NIR photometry. In a subsequent work, MIPS 24 μm was added for each SED type, and in this work, PACS 100 μm, PACS160 μm, SPIRE 25 μm, and SPIRE 350 μm photometry have been added to extend the range of the composite SEDs into the FIR. We fit the composite SEDs with the Prospector code, which utilizes an MCMC sampling to explore the parameter space for models created by the Flexible Stellar Population Synthesis (FSPS) code, in order to investigate how specific star formation rate (sSFR), dust temperature, and other galaxy properties vary with SED type.This work is also being used to better constrain the SPS models within FSPS.

  3. Bayesian pedigree inference with small numbers of single nucleotide polymorphisms via a factor-graph representation.

    PubMed

    Anderson, Eric C; Ng, Thomas C

    2016-02-01

    We develop a computational framework for addressing pedigree inference problems using small numbers (80-400) of single nucleotide polymorphisms (SNPs). Our approach relaxes the assumptions, which are commonly made, that sampling is complete with respect to the pedigree and that there is no genotyping error. It relies on representing the inferred pedigree as a factor graph and invoking the Sum-Product algorithm to compute and store quantities that allow the joint probability of the data to be rapidly computed under a large class of rearrangements of the pedigree structure. This allows efficient MCMC sampling over the space of pedigrees, and, hence, Bayesian inference of pedigree structure. In this paper we restrict ourselves to inference of pedigrees without loops using SNPs assumed to be unlinked. We present the methodology in general for multigenerational inference, and we illustrate the method by applying it to the inference of full sibling groups in a large sample (n=1157) of Chinook salmon typed at 95 SNPs. The results show that our method provides a better point estimate and estimate of uncertainty than the currently best-available maximum-likelihood sibling reconstruction method. Extensions of this work to more complex scenarios are briefly discussed. Published by Elsevier Inc.

  4. Hydrologic Model Selection using Markov chain Monte Carlo methods

    NASA Astrophysics Data System (ADS)

    Marshall, L.; Sharma, A.; Nott, D.

    2002-12-01

    Estimation of parameter uncertainty (and in turn model uncertainty) allows assessment of the risk in likely applications of hydrological models. Bayesian statistical inference provides an ideal means of assessing parameter uncertainty whereby prior knowledge about the parameter is combined with information from the available data to produce a probability distribution (the posterior distribution) that describes uncertainty about the parameter and serves as a basis for selecting appropriate values for use in modelling applications. Widespread use of Bayesian techniques in hydrology has been hindered by difficulties in summarizing and exploring the posterior distribution. These difficulties have been largely overcome by recent advances in Markov chain Monte Carlo (MCMC) methods that involve random sampling of the posterior distribution. This study presents an adaptive MCMC sampling algorithm which has characteristics that are well suited to model parameters with a high degree of correlation and interdependence, as is often evident in hydrological models. The MCMC sampling technique is used to compare six alternative configurations of a commonly used conceptual rainfall-runoff model, the Australian Water Balance Model (AWBM), using 11 years of daily rainfall runoff data from the Bass river catchment in Australia. The alternative configurations considered fall into two classes - those that consider model errors to be independent of prior values, and those that model the errors as an autoregressive process. Each such class consists of three formulations that represent increasing levels of complexity (and parameterisation) of the original model structure. The results from this study point both to the importance of using Bayesian approaches in evaluating model performance, as well as the simplicity of the MCMC sampling framework that has the ability to bring such approaches within the reach of the applied hydrological community.

  5. A Bayesian approach to modeling diffraction profiles and application to ferroelectric materials

    DOE PAGES

    Iamsasri, Thanakorn; Guerrier, Jonathon; Esteves, Giovanni; ...

    2017-02-01

    A new statistical approach for modeling diffraction profiles is introduced, using Bayesian inference and a Markov chain Monte Carlo (MCMC) algorithm. This method is demonstrated by modeling the degenerate reflections during application of an electric field to two different ferroelectric materials: thin-film lead zirconate titanate (PZT) of composition PbZr 0.3Ti 0.7O 3and a bulk commercial PZT polycrystalline ferroelectric. Here, the new method offers a unique uncertainty quantification of the model parameters that can be readily propagated into new calculated parameters.

  6. Probabilistic Damage Characterization Using the Computationally-Efficient Bayesian Approach

    NASA Technical Reports Server (NTRS)

    Warner, James E.; Hochhalter, Jacob D.

    2016-01-01

    This work presents a computationally-ecient approach for damage determination that quanti es uncertainty in the provided diagnosis. Given strain sensor data that are polluted with measurement errors, Bayesian inference is used to estimate the location, size, and orientation of damage. This approach uses Bayes' Theorem to combine any prior knowledge an analyst may have about the nature of the damage with information provided implicitly by the strain sensor data to form a posterior probability distribution over possible damage states. The unknown damage parameters are then estimated based on samples drawn numerically from this distribution using a Markov Chain Monte Carlo (MCMC) sampling algorithm. Several modi cations are made to the traditional Bayesian inference approach to provide signi cant computational speedup. First, an ecient surrogate model is constructed using sparse grid interpolation to replace a costly nite element model that must otherwise be evaluated for each sample drawn with MCMC. Next, the standard Bayesian posterior distribution is modi ed using a weighted likelihood formulation, which is shown to improve the convergence of the sampling process. Finally, a robust MCMC algorithm, Delayed Rejection Adaptive Metropolis (DRAM), is adopted to sample the probability distribution more eciently. Numerical examples demonstrate that the proposed framework e ectively provides damage estimates with uncertainty quanti cation and can yield orders of magnitude speedup over standard Bayesian approaches.

  7. Modeling two strains of disease via aggregate-level infectivity curves.

    PubMed

    Romanescu, Razvan; Deardon, Rob

    2016-04-01

    Well formulated models of disease spread, and efficient methods to fit them to observed data, are powerful tools for aiding the surveillance and control of infectious diseases. Our project considers the problem of the simultaneous spread of two related strains of disease in a context where spatial location is the key driver of disease spread. We start our modeling work with the individual level models (ILMs) of disease transmission, and extend these models to accommodate the competing spread of the pathogens in a two-tier hierarchical population (whose levels we refer to as 'farm' and 'animal'). The postulated interference mechanism between the two strains is a period of cross-immunity following infection. We also present a framework for speeding up the computationally intensive process of fitting the ILM to data, typically done using Markov chain Monte Carlo (MCMC) in a Bayesian framework, by turning the inference into a two-stage process. First, we approximate the number of animals infected on a farm over time by infectivity curves. These curves are fit to data sampled from farms, using maximum likelihood estimation, then, conditional on the fitted curves, Bayesian MCMC inference proceeds for the remaining parameters. Finally, we use posterior predictive distributions of salient epidemic summary statistics, in order to assess the model fitted.

  8. Quantum Enhanced Inference in Markov Logic Networks

    NASA Astrophysics Data System (ADS)

    Wittek, Peter; Gogolin, Christian

    2017-04-01

    Markov logic networks (MLNs) reconcile two opposing schools in machine learning and artificial intelligence: causal networks, which account for uncertainty extremely well, and first-order logic, which allows for formal deduction. An MLN is essentially a first-order logic template to generate Markov networks. Inference in MLNs is probabilistic and it is often performed by approximate methods such as Markov chain Monte Carlo (MCMC) Gibbs sampling. An MLN has many regular, symmetric structures that can be exploited at both first-order level and in the generated Markov network. We analyze the graph structures that are produced by various lifting methods and investigate the extent to which quantum protocols can be used to speed up Gibbs sampling with state preparation and measurement schemes. We review different such approaches, discuss their advantages, theoretical limitations, and their appeal to implementations. We find that a straightforward application of a recent result yields exponential speedup compared to classical heuristics in approximate probabilistic inference, thereby demonstrating another example where advanced quantum resources can potentially prove useful in machine learning.

  9. Quantum Enhanced Inference in Markov Logic Networks.

    PubMed

    Wittek, Peter; Gogolin, Christian

    2017-04-19

    Markov logic networks (MLNs) reconcile two opposing schools in machine learning and artificial intelligence: causal networks, which account for uncertainty extremely well, and first-order logic, which allows for formal deduction. An MLN is essentially a first-order logic template to generate Markov networks. Inference in MLNs is probabilistic and it is often performed by approximate methods such as Markov chain Monte Carlo (MCMC) Gibbs sampling. An MLN has many regular, symmetric structures that can be exploited at both first-order level and in the generated Markov network. We analyze the graph structures that are produced by various lifting methods and investigate the extent to which quantum protocols can be used to speed up Gibbs sampling with state preparation and measurement schemes. We review different such approaches, discuss their advantages, theoretical limitations, and their appeal to implementations. We find that a straightforward application of a recent result yields exponential speedup compared to classical heuristics in approximate probabilistic inference, thereby demonstrating another example where advanced quantum resources can potentially prove useful in machine learning.

  10. Quantum Enhanced Inference in Markov Logic Networks

    PubMed Central

    Wittek, Peter; Gogolin, Christian

    2017-01-01

    Markov logic networks (MLNs) reconcile two opposing schools in machine learning and artificial intelligence: causal networks, which account for uncertainty extremely well, and first-order logic, which allows for formal deduction. An MLN is essentially a first-order logic template to generate Markov networks. Inference in MLNs is probabilistic and it is often performed by approximate methods such as Markov chain Monte Carlo (MCMC) Gibbs sampling. An MLN has many regular, symmetric structures that can be exploited at both first-order level and in the generated Markov network. We analyze the graph structures that are produced by various lifting methods and investigate the extent to which quantum protocols can be used to speed up Gibbs sampling with state preparation and measurement schemes. We review different such approaches, discuss their advantages, theoretical limitations, and their appeal to implementations. We find that a straightforward application of a recent result yields exponential speedup compared to classical heuristics in approximate probabilistic inference, thereby demonstrating another example where advanced quantum resources can potentially prove useful in machine learning. PMID:28422093

  11. Triadic split-merge sampler

    NASA Astrophysics Data System (ADS)

    van Rossum, Anne C.; Lin, Hai Xiang; Dubbeldam, Johan; van der Herik, H. Jaap

    2018-04-01

    In machine vision typical heuristic methods to extract parameterized objects out of raw data points are the Hough transform and RANSAC. Bayesian models carry the promise to optimally extract such parameterized objects given a correct definition of the model and the type of noise at hand. A category of solvers for Bayesian models are Markov chain Monte Carlo methods. Naive implementations of MCMC methods suffer from slow convergence in machine vision due to the complexity of the parameter space. Towards this blocked Gibbs and split-merge samplers have been developed that assign multiple data points to clusters at once. In this paper we introduce a new split-merge sampler, the triadic split-merge sampler, that perform steps between two and three randomly chosen clusters. This has two advantages. First, it reduces the asymmetry between the split and merge steps. Second, it is able to propose a new cluster that is composed out of data points from two different clusters. Both advantages speed up convergence which we demonstrate on a line extraction problem. We show that the triadic split-merge sampler outperforms the conventional split-merge sampler. Although this new MCMC sampler is demonstrated in this machine vision context, its application extend to the very general domain of statistical inference.

  12. Estimating Model Probabilities using Thermodynamic Markov Chain Monte Carlo Methods

    NASA Astrophysics Data System (ADS)

    Ye, M.; Liu, P.; Beerli, P.; Lu, D.; Hill, M. C.

    2014-12-01

    Markov chain Monte Carlo (MCMC) methods are widely used to evaluate model probability for quantifying model uncertainty. In a general procedure, MCMC simulations are first conducted for each individual model, and MCMC parameter samples are then used to approximate marginal likelihood of the model by calculating the geometric mean of the joint likelihood of the model and its parameters. It has been found the method of evaluating geometric mean suffers from the numerical problem of low convergence rate. A simple test case shows that even millions of MCMC samples are insufficient to yield accurate estimation of the marginal likelihood. To resolve this problem, a thermodynamic method is used to have multiple MCMC runs with different values of a heating coefficient between zero and one. When the heating coefficient is zero, the MCMC run is equivalent to a random walk MC in the prior parameter space; when the heating coefficient is one, the MCMC run is the conventional one. For a simple case with analytical form of the marginal likelihood, the thermodynamic method yields more accurate estimate than the method of using geometric mean. This is also demonstrated for a case of groundwater modeling with consideration of four alternative models postulated based on different conceptualization of a confining layer. This groundwater example shows that model probabilities estimated using the thermodynamic method are more reasonable than those obtained using the geometric method. The thermodynamic method is general, and can be used for a wide range of environmental problem for model uncertainty quantification.

  13. Tree-Structured Infinite Sparse Factor Model

    PubMed Central

    Zhang, XianXing; Dunson, David B.; Carin, Lawrence

    2013-01-01

    A tree-structured multiplicative gamma process (TMGP) is developed, for inferring the depth of a tree-based factor-analysis model. This new model is coupled with the nested Chinese restaurant process, to nonparametrically infer the depth and width (structure) of the tree. In addition to developing the model, theoretical properties of the TMGP are addressed, and a novel MCMC sampler is developed. The structure of the inferred tree is used to learn relationships between high-dimensional data, and the model is also applied to compressive sensing and interpolation of incomplete images. PMID:25279389

  14. Bayesian inference based on dual generalized order statistics from the exponentiated Weibull model

    NASA Astrophysics Data System (ADS)

    Al Sobhi, Mashail M.

    2015-02-01

    Bayesian estimation for the two parameters and the reliability function of the exponentiated Weibull model are obtained based on dual generalized order statistics (DGOS). Also, Bayesian prediction bounds for future DGOS from exponentiated Weibull model are obtained. The symmetric and asymmetric loss functions are considered for Bayesian computations. The Markov chain Monte Carlo (MCMC) methods are used for computing the Bayes estimates and prediction bounds. The results have been specialized to the lower record values. Comparisons are made between Bayesian and maximum likelihood estimators via Monte Carlo simulation.

  15. Vertically Integrated Seismological Analysis II : Inference

    NASA Astrophysics Data System (ADS)

    Arora, N. S.; Russell, S.; Sudderth, E.

    2009-12-01

    Methods for automatically associating detected waveform features with hypothesized seismic events, and localizing those events, are a critical component of efforts to verify the Comprehensive Test Ban Treaty (CTBT). As outlined in our companion abstract, we have developed a hierarchical model which views detection, association, and localization as an integrated probabilistic inference problem. In this abstract, we provide more details on the Markov chain Monte Carlo (MCMC) methods used to solve this inference task. MCMC generates samples from a posterior distribution π(x) over possible worlds x by defining a Markov chain whose states are the worlds x, and whose stationary distribution is π(x). In the Metropolis-Hastings (M-H) method, transitions in the Markov chain are constructed in two steps. First, given the current state x, a candidate next state x‧ is generated from a proposal distribution q(x‧ | x), which may be (more or less) arbitrary. Second, the transition to x‧ is not automatic, but occurs with an acceptance probability—α(x‧ | x) = min(1, π(x‧)q(x | x‧)/π(x)q(x‧ | x)). The seismic event model outlined in our companion abstract is quite similar to those used in multitarget tracking, for which MCMC has proved very effective. In this model, each world x is defined by a collection of events, a list of properties characterizing those events (times, locations, magnitudes, and types), and the association of each event to a set of observed detections. The target distribution π(x) = P(x | y), the posterior distribution over worlds x given the observed waveform data y at all stations. Proposal distributions then implement several types of moves between worlds. For example, birth moves create new events; death moves delete existing events; split moves partition the detections for an event into two new events; merge moves combine event pairs; swap moves modify the properties and assocations for pairs of events. Importantly, the rules for accepting such complex moves need not be hand-designed. Instead, they are automatically determined by the underlying probabilistic model, which is in turn calibrated via historical data and scientific knowledge. Consider a small seismic event which generates weak signals at several different stations, which might independently be mistaken for noise. A birth move may nevertheless hypothesize an event jointly explaining these detections. If the corresponding waveform data then aligns with the seismological knowledge encoded in the probabilistic model, the event may be detected even though no single station observes it unambiguously. Alternatively, if a large outlier reading is produced at a single station, moves which instantiate a corresponding (false) event would be rejected because of the absence of plausible detections at other sensors. More broadly, one of the main advantages of our MCMC approach is its consistent handling of the relative uncertainties in different information sources. By avoiding low-level thresholds, we expect to improve accuracy and robustness. At the conference, we will present results quantitatively validating our approach, using ground-truth associations and locations provided either by simulation or human analysts.

  16. Efficient Bayesian parameter estimation with implicit sampling and surrogate modeling for a vadose zone hydrological problem

    NASA Astrophysics Data System (ADS)

    Liu, Y.; Pau, G. S. H.; Finsterle, S.

    2015-12-01

    Parameter inversion involves inferring the model parameter values based on sparse observations of some observables. To infer the posterior probability distributions of the parameters, Markov chain Monte Carlo (MCMC) methods are typically used. However, the large number of forward simulations needed and limited computational resources limit the complexity of the hydrological model we can use in these methods. In view of this, we studied the implicit sampling (IS) method, an efficient importance sampling technique that generates samples in the high-probability region of the posterior distribution and thus reduces the number of forward simulations that we need to run. For a pilot-point inversion of a heterogeneous permeability field based on a synthetic ponded infiltration experiment simu­lated with TOUGH2 (a subsurface modeling code), we showed that IS with linear map provides an accurate Bayesian description of the parameterized permeability field at the pilot points with just approximately 500 forward simulations. We further studied the use of surrogate models to improve the computational efficiency of parameter inversion. We implemented two reduced-order models (ROMs) for the TOUGH2 forward model. One is based on polynomial chaos expansion (PCE), of which the coefficients are obtained using the sparse Bayesian learning technique to mitigate the "curse of dimensionality" of the PCE terms. The other model is Gaussian process regression (GPR) for which different covariance, likelihood and inference models are considered. Preliminary results indicate that ROMs constructed based on the prior parameter space perform poorly. It is thus impractical to replace this hydrological model by a ROM directly in a MCMC method. However, the IS method can work with a ROM constructed for parameters in the close vicinity of the maximum a posteriori probability (MAP) estimate. We will discuss the accuracy and computational efficiency of using ROMs in the implicit sampling procedure for the hydrological problem considered. This work was supported, in part, by the U.S. Dept. of Energy under Contract No. DE-AC02-05CH11231

  17. Searching for efficient Markov chain Monte Carlo proposal kernels

    PubMed Central

    Yang, Ziheng; Rodríguez, Carlos E.

    2013-01-01

    Markov chain Monte Carlo (MCMC) or the Metropolis–Hastings algorithm is a simulation algorithm that has made modern Bayesian statistical inference possible. Nevertheless, the efficiency of different Metropolis–Hastings proposal kernels has rarely been studied except for the Gaussian proposal. Here we propose a unique class of Bactrian kernels, which avoid proposing values that are very close to the current value, and compare their efficiency with a number of proposals for simulating different target distributions, with efficiency measured by the asymptotic variance of a parameter estimate. The uniform kernel is found to be more efficient than the Gaussian kernel, whereas the Bactrian kernel is even better. When optimal scales are used for both, the Bactrian kernel is at least 50% more efficient than the Gaussian. Implementation in a Bayesian program for molecular clock dating confirms the general applicability of our results to generic MCMC algorithms. Our results refute a previous claim that all proposals had nearly identical performance and will prompt further research into efficient MCMC proposals. PMID:24218600

  18. Learning an Eddy Viscosity Model Using Shrinkage and Bayesian Calibration: A Jet-in-Crossflow Case Study

    DOE PAGES

    Ray, Jaideep; Lefantzi, Sophia; Arunajatesan, Srinivasan; ...

    2017-09-07

    In this paper, we demonstrate a statistical procedure for learning a high-order eddy viscosity model (EVM) from experimental data and using it to improve the predictive skill of a Reynolds-averaged Navier–Stokes (RANS) simulator. The method is tested in a three-dimensional (3D), transonic jet-in-crossflow (JIC) configuration. The process starts with a cubic eddy viscosity model (CEVM) developed for incompressible flows. It is fitted to limited experimental JIC data using shrinkage regression. The shrinkage process removes all the terms from the model, except an intercept, a linear term, and a quadratic one involving the square of the vorticity. The shrunk eddy viscositymore » model is implemented in an RANS simulator and calibrated, using vorticity measurements, to infer three parameters. The calibration is Bayesian and is solved using a Markov chain Monte Carlo (MCMC) method. A 3D probability density distribution for the inferred parameters is constructed, thus quantifying the uncertainty in the estimate. The phenomenal cost of using a 3D flow simulator inside an MCMC loop is mitigated by using surrogate models (“curve-fits”). A support vector machine classifier (SVMC) is used to impose our prior belief regarding parameter values, specifically to exclude nonphysical parameter combinations. The calibrated model is compared, in terms of its predictive skill, to simulations using uncalibrated linear and CEVMs. Finally, we find that the calibrated model, with one quadratic term, is more accurate than the uncalibrated simulator. The model is also checked at a flow condition at which the model was not calibrated.« less

  19. Learning an Eddy Viscosity Model Using Shrinkage and Bayesian Calibration: A Jet-in-Crossflow Case Study

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ray, Jaideep; Lefantzi, Sophia; Arunajatesan, Srinivasan

    In this paper, we demonstrate a statistical procedure for learning a high-order eddy viscosity model (EVM) from experimental data and using it to improve the predictive skill of a Reynolds-averaged Navier–Stokes (RANS) simulator. The method is tested in a three-dimensional (3D), transonic jet-in-crossflow (JIC) configuration. The process starts with a cubic eddy viscosity model (CEVM) developed for incompressible flows. It is fitted to limited experimental JIC data using shrinkage regression. The shrinkage process removes all the terms from the model, except an intercept, a linear term, and a quadratic one involving the square of the vorticity. The shrunk eddy viscositymore » model is implemented in an RANS simulator and calibrated, using vorticity measurements, to infer three parameters. The calibration is Bayesian and is solved using a Markov chain Monte Carlo (MCMC) method. A 3D probability density distribution for the inferred parameters is constructed, thus quantifying the uncertainty in the estimate. The phenomenal cost of using a 3D flow simulator inside an MCMC loop is mitigated by using surrogate models (“curve-fits”). A support vector machine classifier (SVMC) is used to impose our prior belief regarding parameter values, specifically to exclude nonphysical parameter combinations. The calibrated model is compared, in terms of its predictive skill, to simulations using uncalibrated linear and CEVMs. Finally, we find that the calibrated model, with one quadratic term, is more accurate than the uncalibrated simulator. The model is also checked at a flow condition at which the model was not calibrated.« less

  20. Coronal loop seismology using damping of standing kink oscillations by mode coupling. II. additional physical effects and Bayesian analysis

    NASA Astrophysics Data System (ADS)

    Pascoe, D. J.; Anfinogentov, S.; Nisticò, G.; Goddard, C. R.; Nakariakov, V. M.

    2017-04-01

    Context. The strong damping of kink oscillations of coronal loops can be explained by mode coupling. The damping envelope depends on the transverse density profile of the loop. Observational measurements of the damping envelope have been used to determine the transverse loop structure which is important for understanding other physical processes such as heating. Aims: The general damping envelope describing the mode coupling of kink waves consists of a Gaussian damping regime followed by an exponential damping regime. Recent observational detection of these damping regimes has been employed as a seismological tool. We extend the description of the damping behaviour to account for additional physical effects, namely a time-dependent period of oscillation, the presence of additional longitudinal harmonics, and the decayless regime of standing kink oscillations. Methods: We examine four examples of standing kink oscillations observed by the Atmospheric Imaging Assembly (AIA) onboard the Solar Dynamics Observatory (SDO). We use forward modelling of the loop position and investigate the dependence on the model parameters using Bayesian inference and Markov chain Monte Carlo (MCMC) sampling. Results: Our improvements to the physical model combined with the use of Bayesian inference and MCMC produce improved estimates of model parameters and their uncertainties. Calculation of the Bayes factor also allows us to compare the suitability of different physical models. We also use a new method based on spline interpolation of the zeroes of the oscillation to accurately describe the background trend of the oscillating loop. Conclusions: This powerful and robust method allows for accurate seismology of coronal loops, in particular the transverse density profile, and potentially reveals additional physical effects.

  1. An introduction of Markov chain Monte Carlo method to geochemical inverse problems: Reading melting parameters from REE abundances in abyssal peridotites

    NASA Astrophysics Data System (ADS)

    Liu, Boda; Liang, Yan

    2017-04-01

    Markov chain Monte Carlo (MCMC) simulation is a powerful statistical method in solving inverse problems that arise from a wide range of applications. In Earth sciences applications of MCMC simulations are primarily in the field of geophysics. The purpose of this study is to introduce MCMC methods to geochemical inverse problems related to trace element fractionation during mantle melting. MCMC methods have several advantages over least squares methods in deciphering melting processes from trace element abundances in basalts and mantle rocks. Here we use an MCMC method to invert for extent of melting, fraction of melt present during melting, and extent of chemical disequilibrium between the melt and residual solid from REE abundances in clinopyroxene in abyssal peridotites from Mid-Atlantic Ridge, Central Indian Ridge, Southwest Indian Ridge, Lena Trough, and American-Antarctic Ridge. We consider two melting models: one with exact analytical solution and the other without. We solve the latter numerically in a chain of melting models according to the Metropolis-Hastings algorithm. The probability distribution of inverted melting parameters depends on assumptions of the physical model, knowledge of mantle source composition, and constraints from the REE data. Results from MCMC inversion are consistent with and provide more reliable uncertainty estimates than results based on nonlinear least squares inversion. We show that chemical disequilibrium is likely to play an important role in fractionating LREE in residual peridotites during partial melting beneath mid-ocean ridge spreading centers. MCMC simulation is well suited for more complicated but physically more realistic melting problems that do not have analytical solutions.

  2. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hou Fengji; Hogg, David W.; Goodman, Jonathan

    Markov chain Monte Carlo (MCMC) proves to be powerful for Bayesian inference and in particular for exoplanet radial velocity fitting because MCMC provides more statistical information and makes better use of data than common approaches like chi-square fitting. However, the nonlinear density functions encountered in these problems can make MCMC time-consuming. In this paper, we apply an ensemble sampler respecting affine invariance to orbital parameter extraction from radial velocity data. This new sampler has only one free parameter, and does not require much tuning for good performance, which is important for automatization. The autocorrelation time of this sampler is approximatelymore » the same for all parameters and far smaller than Metropolis-Hastings, which means it requires many fewer function calls to produce the same number of independent samples. The affine-invariant sampler speeds up MCMC by hundreds of times compared with Metropolis-Hastings in the same computing situation. This novel sampler would be ideal for projects involving large data sets such as statistical investigations of planet distribution. The biggest obstacle to ensemble samplers is the existence of multiple local optima; we present a clustering technique to deal with local optima by clustering based on the likelihood of the walkers in the ensemble. We demonstrate the effectiveness of the sampler on real radial velocity data.« less

  3. Inference of multi-Gaussian property fields by probabilistic inversion of crosshole ground penetrating radar data using an improved dimensionality reduction

    NASA Astrophysics Data System (ADS)

    Hunziker, Jürg; Laloy, Eric; Linde, Niklas

    2016-04-01

    Deterministic inversion procedures can often explain field data, but they only deliver one final subsurface model that depends on the initial model and regularization constraints. This leads to poor insights about the uncertainties associated with the inferred model properties. In contrast, probabilistic inversions can provide an ensemble of model realizations that accurately span the range of possible models that honor the available calibration data and prior information allowing a quantitative description of model uncertainties. We reconsider the problem of inferring the dielectric permittivity (directly related to radar velocity) structure of the subsurface by inversion of first-arrival travel times from crosshole ground penetrating radar (GPR) measurements. We rely on the DREAM_(ZS) algorithm that is a state-of-the-art Markov chain Monte Carlo (MCMC) algorithm. Such algorithms need several orders of magnitude more forward simulations than deterministic algorithms and often become infeasible in high parameter dimensions. To enable high-resolution imaging with MCMC, we use a recently proposed dimensionality reduction approach that allows reproducing 2D multi-Gaussian fields with far fewer parameters than a classical grid discretization. We consider herein a dimensionality reduction from 5000 to 257 unknowns. The first 250 parameters correspond to a spectral representation of random and uncorrelated spatial fluctuations while the remaining seven geostatistical parameters are (1) the standard deviation of the data error, (2) the mean and (3) the variance of the relative electric permittivity, (4) the integral scale along the major axis of anisotropy, (5) the anisotropy angle, (6) the ratio of the integral scale along the minor axis of anisotropy to the integral scale along the major axis of anisotropy and (7) the shape parameter of the Matérn function. The latter essentially defines the type of covariance function (e.g., exponential, Whittle, Gaussian). We present an improved formulation of the dimensionality reduction, and numerically show how it reduces artifacts in the generated models and provides better posterior estimation of the subsurface geostatistical structure. We next show that the results of the method compare very favorably against previous deterministic and stochastic inversion results obtained at the South Oyster Bacterial Transport Site in Virginia, USA. The long-term goal of this work is to enable MCMC-based full waveform inversion of crosshole GPR data.

  4. Assessing the pollution risk of a groundwater source field at western Laizhou Bay under seawater intrusion

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zeng, Xiankui; Wu, Jichun; Wang, Dong, E-mail: wangdong@nju.edu.cn

    Coastal areas have great significance for human living, economy and society development in the world. With the rapid increase of pressures from human activities and climate change, the safety of groundwater resource is under the threat of seawater intrusion in coastal areas. The area of Laizhou Bay is one of the most serious seawater intruded areas in China, since seawater intrusion phenomenon was firstly recognized in the middle of 1970s. This study assessed the pollution risk of a groundwater source filed of western Laizhou Bay area by inferring the probability distribution of groundwater Cl{sup −} concentration. The numerical model ofmore » seawater intrusion process is built by using SEAWAT4. The parameter uncertainty of this model is evaluated by Markov Chain Monte Carlo (MCMC) simulation, and DREAM{sub (ZS)} is used as sampling algorithm. Then, the predictive distribution of Cl{sup -} concentration at groundwater source field is inferred by using the samples of model parameters obtained from MCMC. After that, the pollution risk of groundwater source filed is assessed by the predictive quantiles of Cl{sup -} concentration. The results of model calibration and verification demonstrate that the DREAM{sub (ZS)} based MCMC is efficient and reliable to estimate model parameters under current observation. Under the condition of 95% confidence level, the groundwater source point will not be polluted by seawater intrusion in future five years (2015–2019). In addition, the 2.5% and 97.5% predictive quantiles show that the Cl{sup −} concentration of groundwater source field always vary between 175 mg/l and 200 mg/l. - Highlights: • The parameter uncertainty of seawater intrusion model is evaluated by MCMC. • Groundwater source field won’t be polluted by seawater intrusion in future 5 years. • The pollution risk is assessed by the predictive quantiles of Cl{sup −} concentration.« less

  5. Gradient-free MCMC methods for dynamic causal modelling

    DOE PAGES

    Sengupta, Biswa; Friston, Karl J.; Penny, Will D.

    2015-03-14

    Here, we compare the performance of four gradient-free MCMC samplers (random walk Metropolis sampling, slice-sampling, adaptive MCMC sampling and population-based MCMC sampling with tempering) in terms of the number of independent samples they can produce per unit computational time. For the Bayesian inversion of a single-node neural mass model, both adaptive and population-based samplers are more efficient compared with random walk Metropolis sampler or slice-sampling; yet adaptive MCMC sampling is more promising in terms of compute time. Slice-sampling yields the highest number of independent samples from the target density -- albeit at almost 1000% increase in computational time, in comparisonmore » to the most efficient algorithm (i.e., the adaptive MCMC sampler).« less

  6. Uncertainty Quantification of GEOS-5 L-band Radiative Transfer Model Parameters Using Bayesian Inference and SMOS Observations

    NASA Technical Reports Server (NTRS)

    DeLannoy, Gabrielle J. M.; Reichle, Rolf H.; Vrugt, Jasper A.

    2013-01-01

    Uncertainties in L-band (1.4 GHz) radiative transfer modeling (RTM) affect the simulation of brightness temperatures (Tb) over land and the inversion of satellite-observed Tb into soil moisture retrievals. In particular, accurate estimates of the microwave soil roughness, vegetation opacity and scattering albedo for large-scale applications are difficult to obtain from field studies and often lack an uncertainty estimate. Here, a Markov Chain Monte Carlo (MCMC) simulation method is used to determine satellite-scale estimates of RTM parameters and their posterior uncertainty by minimizing the misfit between long-term averages and standard deviations of simulated and observed Tb at a range of incidence angles, at horizontal and vertical polarization, and for morning and evening overpasses. Tb simulations are generated with the Goddard Earth Observing System (GEOS-5) and confronted with Tb observations from the Soil Moisture Ocean Salinity (SMOS) mission. The MCMC algorithm suggests that the relative uncertainty of the RTM parameter estimates is typically less than 25 of the maximum a posteriori density (MAP) parameter value. Furthermore, the actual root-mean-square-differences in long-term Tb averages and standard deviations are found consistent with the respective estimated total simulation and observation error standard deviations of m3.1K and s2.4K. It is also shown that the MAP parameter values estimated through MCMC simulation are in close agreement with those obtained with Particle Swarm Optimization (PSO).

  7. Bayesian Estimation of the Logistic Positive Exponent IRT Model

    ERIC Educational Resources Information Center

    Bolfarine, Heleno; Bazan, Jorge Luis

    2010-01-01

    A Bayesian inference approach using Markov Chain Monte Carlo (MCMC) is developed for the logistic positive exponent (LPE) model proposed by Samejima and for a new skewed Logistic Item Response Theory (IRT) model, named Reflection LPE model. Both models lead to asymmetric item characteristic curves (ICC) and can be appropriate because a symmetric…

  8. Bayesian inversion using a geologically realistic and discrete model space

    NASA Astrophysics Data System (ADS)

    Jaeggli, C.; Julien, S.; Renard, P.

    2017-12-01

    Since the early days of groundwater modeling, inverse methods play a crucial role. Many research and engineering groups aim to infer extensive knowledge of aquifer parameters from a sparse set of observations. Despite decades of dedicated research on this topic, there are still several major issues to be solved. In the hydrogeological framework, one is often confronted with underground structures that present very sharp contrasts of geophysical properties. In particular, subsoil structures such as karst conduits, channels, faults, or lenses, strongly influence groundwater flow and transport behavior of the underground. For this reason it can be essential to identify their location and shape very precisely. Unfortunately, when inverse methods are specially trained to consider such complex features, their computation effort often becomes unaffordably high. The following work is an attempt to solve this dilemma. We present a new method that is, in some sense, a compromise between the ergodicity of Markov chain Monte Carlo (McMC) methods and the efficient handling of data by the ensemble based Kalmann filters. The realistic and complex random fields are generated by a Multiple-Point Statistics (MPS) tool. Nonetheless, it is applicable with any conditional geostatistical simulation tool. Furthermore, the algorithm is independent of any parametrization what becomes most important when two parametric systems are equivalent (permeability and resistivity, speed and slowness, etc.). When compared to two existing McMC schemes, the computational effort was divided by a factor of 12.

  9. Auxiliary Parameter MCMC for Exponential Random Graph Models

    NASA Astrophysics Data System (ADS)

    Byshkin, Maksym; Stivala, Alex; Mira, Antonietta; Krause, Rolf; Robins, Garry; Lomi, Alessandro

    2016-11-01

    Exponential random graph models (ERGMs) are a well-established family of statistical models for analyzing social networks. Computational complexity has so far limited the appeal of ERGMs for the analysis of large social networks. Efficient computational methods are highly desirable in order to extend the empirical scope of ERGMs. In this paper we report results of a research project on the development of snowball sampling methods for ERGMs. We propose an auxiliary parameter Markov chain Monte Carlo (MCMC) algorithm for sampling from the relevant probability distributions. The method is designed to decrease the number of allowed network states without worsening the mixing of the Markov chains, and suggests a new approach for the developments of MCMC samplers for ERGMs. We demonstrate the method on both simulated and actual (empirical) network data and show that it reduces CPU time for parameter estimation by an order of magnitude compared to current MCMC methods.

  10. Gradient-free MCMC methods for dynamic causal modelling.

    PubMed

    Sengupta, Biswa; Friston, Karl J; Penny, Will D

    2015-05-15

    In this technical note we compare the performance of four gradient-free MCMC samplers (random walk Metropolis sampling, slice-sampling, adaptive MCMC sampling and population-based MCMC sampling with tempering) in terms of the number of independent samples they can produce per unit computational time. For the Bayesian inversion of a single-node neural mass model, both adaptive and population-based samplers are more efficient compared with random walk Metropolis sampler or slice-sampling; yet adaptive MCMC sampling is more promising in terms of compute time. Slice-sampling yields the highest number of independent samples from the target density - albeit at almost 1000% increase in computational time, in comparison to the most efficient algorithm (i.e., the adaptive MCMC sampler). Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  11. A computer program for uncertainty analysis integrating regression and Bayesian methods

    USGS Publications Warehouse

    Lu, Dan; Ye, Ming; Hill, Mary C.; Poeter, Eileen P.; Curtis, Gary

    2014-01-01

    This work develops a new functionality in UCODE_2014 to evaluate Bayesian credible intervals using the Markov Chain Monte Carlo (MCMC) method. The MCMC capability in UCODE_2014 is based on the FORTRAN version of the differential evolution adaptive Metropolis (DREAM) algorithm of Vrugt et al. (2009), which estimates the posterior probability density function of model parameters in high-dimensional and multimodal sampling problems. The UCODE MCMC capability provides eleven prior probability distributions and three ways to initialize the sampling process. It evaluates parametric and predictive uncertainties and it has parallel computing capability based on multiple chains to accelerate the sampling process. This paper tests and demonstrates the MCMC capability using a 10-dimensional multimodal mathematical function, a 100-dimensional Gaussian function, and a groundwater reactive transport model. The use of the MCMC capability is made straightforward and flexible by adopting the JUPITER API protocol. With the new MCMC capability, UCODE_2014 can be used to calculate three types of uncertainty intervals, which all can account for prior information: (1) linear confidence intervals which require linearity and Gaussian error assumptions and typically 10s–100s of highly parallelizable model runs after optimization, (2) nonlinear confidence intervals which require a smooth objective function surface and Gaussian observation error assumptions and typically 100s–1,000s of partially parallelizable model runs after optimization, and (3) MCMC Bayesian credible intervals which require few assumptions and commonly 10,000s–100,000s or more partially parallelizable model runs. Ready access allows users to select methods best suited to their work, and to compare methods in many circumstances.

  12. Comparison of sampling techniques for Bayesian parameter estimation

    NASA Astrophysics Data System (ADS)

    Allison, Rupert; Dunkley, Joanna

    2014-02-01

    The posterior probability distribution for a set of model parameters encodes all that the data have to tell us in the context of a given model; it is the fundamental quantity for Bayesian parameter estimation. In order to infer the posterior probability distribution we have to decide how to explore parameter space. Here we compare three prescriptions for how parameter space is navigated, discussing their relative merits. We consider Metropolis-Hasting sampling, nested sampling and affine-invariant ensemble Markov chain Monte Carlo (MCMC) sampling. We focus on their performance on toy-model Gaussian likelihoods and on a real-world cosmological data set. We outline the sampling algorithms themselves and elaborate on performance diagnostics such as convergence time, scope for parallelization, dimensional scaling, requisite tunings and suitability for non-Gaussian distributions. We find that nested sampling delivers high-fidelity estimates for posterior statistics at low computational cost, and should be adopted in favour of Metropolis-Hastings in many cases. Affine-invariant MCMC is competitive when computing clusters can be utilized for massive parallelization. Affine-invariant MCMC and existing extensions to nested sampling naturally probe multimodal and curving distributions.

  13. A Bayesian method for detecting pairwise associations in compositional data

    PubMed Central

    Ventz, Steffen; Huttenhower, Curtis

    2017-01-01

    Compositional data consist of vectors of proportions normalized to a constant sum from a basis of unobserved counts. The sum constraint makes inference on correlations between unconstrained features challenging due to the information loss from normalization. However, such correlations are of long-standing interest in fields including ecology. We propose a novel Bayesian framework (BAnOCC: Bayesian Analysis of Compositional Covariance) to estimate a sparse precision matrix through a LASSO prior. The resulting posterior, generated by MCMC sampling, allows uncertainty quantification of any function of the precision matrix, including the correlation matrix. We also use a first-order Taylor expansion to approximate the transformation from the unobserved counts to the composition in order to investigate what characteristics of the unobserved counts can make the correlations more or less difficult to infer. On simulated datasets, we show that BAnOCC infers the true network as well as previous methods while offering the advantage of posterior inference. Larger and more realistic simulated datasets further showed that BAnOCC performs well as measured by type I and type II error rates. Finally, we apply BAnOCC to a microbial ecology dataset from the Human Microbiome Project, which in addition to reproducing established ecological results revealed unique, competition-based roles for Proteobacteria in multiple distinct habitats. PMID:29140991

  14. PyDREAM: high-dimensional parameter inference for biological models in python.

    PubMed

    Shockley, Erin M; Vrugt, Jasper A; Lopez, Carlos F; Valencia, Alfonso

    2018-02-15

    Biological models contain many parameters whose values are difficult to measure directly via experimentation and therefore require calibration against experimental data. Markov chain Monte Carlo (MCMC) methods are suitable to estimate multivariate posterior model parameter distributions, but these methods may exhibit slow or premature convergence in high-dimensional search spaces. Here, we present PyDREAM, a Python implementation of the (Multiple-Try) Differential Evolution Adaptive Metropolis [DREAM(ZS)] algorithm developed by Vrugt and ter Braak (2008) and Laloy and Vrugt (2012). PyDREAM achieves excellent performance for complex, parameter-rich models and takes full advantage of distributed computing resources, facilitating parameter inference and uncertainty estimation of CPU-intensive biological models. PyDREAM is freely available under the GNU GPLv3 license from the Lopez lab GitHub repository at http://github.com/LoLab-VU/PyDREAM. c.lopez@vanderbilt.edu. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  15. Geometric MCMC for infinite-dimensional inverse problems

    NASA Astrophysics Data System (ADS)

    Beskos, Alexandros; Girolami, Mark; Lan, Shiwei; Farrell, Patrick E.; Stuart, Andrew M.

    2017-04-01

    Bayesian inverse problems often involve sampling posterior distributions on infinite-dimensional function spaces. Traditional Markov chain Monte Carlo (MCMC) algorithms are characterized by deteriorating mixing times upon mesh-refinement, when the finite-dimensional approximations become more accurate. Such methods are typically forced to reduce step-sizes as the discretization gets finer, and thus are expensive as a function of dimension. Recently, a new class of MCMC methods with mesh-independent convergence times has emerged. However, few of them take into account the geometry of the posterior informed by the data. At the same time, recently developed geometric MCMC algorithms have been found to be powerful in exploring complicated distributions that deviate significantly from elliptic Gaussian laws, but are in general computationally intractable for models defined in infinite dimensions. In this work, we combine geometric methods on a finite-dimensional subspace with mesh-independent infinite-dimensional approaches. Our objective is to speed up MCMC mixing times, without significantly increasing the computational cost per step (for instance, in comparison with the vanilla preconditioned Crank-Nicolson (pCN) method). This is achieved by using ideas from geometric MCMC to probe the complex structure of an intrinsic finite-dimensional subspace where most data information concentrates, while retaining robust mixing times as the dimension grows by using pCN-like methods in the complementary subspace. The resulting algorithms are demonstrated in the context of three challenging inverse problems arising in subsurface flow, heat conduction and incompressible flow control. The algorithms exhibit up to two orders of magnitude improvement in sampling efficiency when compared with the pCN method.

  16. Selectivity curves of the capture of mangrove crab (Ucides cordatus) on the northern coast of Brazil using bayesian inference.

    PubMed

    Furtado-Junior, I; Abrunhosa, F A; Holanda, F C A F; Tavares, M C S

    2016-06-01

    Fishing selectivity of the mangrove crab Ucides cordatus in the north coast of Brazil can be defined as the fisherman's ability to capture and select individuals from a certain size or sex (or a combination of these factors) which suggests an empirical selectivity. Considering this hypothesis, we calculated the selectivity curves for males and females crabs using the logit function of the logistic model in the formulation. The Bayesian inference consisted of obtaining the posterior distribution by applying the Markov chain Monte Carlo (MCMC) method to software R using the OpenBUGS, BRugs, and R2WinBUGS libraries. The estimated results of width average carapace selection for males and females compared with previous studies reporting the average width of the carapace of sexual maturity allow us to confirm the hypothesis that most mature individuals do not suffer from fishing pressure; thus, ensuring their sustainability.

  17. Estimating diversifying selection and functional constraint in the presence of recombination.

    PubMed

    Wilson, Daniel J; McVean, Gilean

    2006-03-01

    Models of molecular evolution that incorporate the ratio of nonsynonymous to synonymous polymorphism (dN/dS ratio) as a parameter can be used to identify sites that are under diversifying selection or functional constraint in a sample of gene sequences. However, when there has been recombination in the evolutionary history of the sequences, reconstructing a single phylogenetic tree is not appropriate, and inference based on a single tree can give misleading results. In the presence of high levels of recombination, the identification of sites experiencing diversifying selection can suffer from a false-positive rate as high as 90%. We present a model that uses a population genetics approximation to the coalescent with recombination and use reversible-jump MCMC to perform Bayesian inference on both the dN/dS ratio and the recombination rate, allowing each to vary along the sequence. We demonstrate that the method has the power to detect variation in the dN/dS ratio and the recombination rate and does not suffer from a high false-positive rate. We use the method to analyze the porB gene of Neisseria meningitidis and verify the inferences using prior sensitivity analysis and model criticism techniques.

  18. Estimating Diversifying Selection and Functional Constraint in the Presence of Recombination

    PubMed Central

    Wilson, Daniel J.; McVean, Gilean

    2006-01-01

    Models of molecular evolution that incorporate the ratio of nonsynonymous to synonymous polymorphism (dN/dS ratio) as a parameter can be used to identify sites that are under diversifying selection or functional constraint in a sample of gene sequences. However, when there has been recombination in the evolutionary history of the sequences, reconstructing a single phylogenetic tree is not appropriate, and inference based on a single tree can give misleading results. In the presence of high levels of recombination, the identification of sites experiencing diversifying selection can suffer from a false-positive rate as high as 90%. We present a model that uses a population genetics approximation to the coalescent with recombination and use reversible-jump MCMC to perform Bayesian inference on both the dN/dS ratio and the recombination rate, allowing each to vary along the sequence. We demonstrate that the method has the power to detect variation in the dN/dS ratio and the recombination rate and does not suffer from a high false-positive rate. We use the method to analyze the porB gene of Neisseria meningitidis and verify the inferences using prior sensitivity analysis and model criticism techniques. PMID:16387887

  19. Bayesian calibration of terrestrial ecosystem models: A study of advanced Markov chain Monte Carlo methods

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lu, Dan; Ricciuto, Daniel; Walker, Anthony

    Calibration of terrestrial ecosystem models is important but challenging. Bayesian inference implemented by Markov chain Monte Carlo (MCMC) sampling provides a comprehensive framework to estimate model parameters and associated uncertainties using their posterior distributions. The effectiveness and efficiency of the method strongly depend on the MCMC algorithm used. In this study, a Differential Evolution Adaptive Metropolis (DREAM) algorithm was used to estimate posterior distributions of 21 parameters for the data assimilation linked ecosystem carbon (DALEC) model using 14 years of daily net ecosystem exchange data collected at the Harvard Forest Environmental Measurement Site eddy-flux tower. The DREAM is a multi-chainmore » method and uses differential evolution technique for chain movement, allowing it to be efficiently applied to high-dimensional problems, and can reliably estimate heavy-tailed and multimodal distributions that are difficult for single-chain schemes using a Gaussian proposal distribution. The results were evaluated against the popular Adaptive Metropolis (AM) scheme. DREAM indicated that two parameters controlling autumn phenology have multiple modes in their posterior distributions while AM only identified one mode. The calibration of DREAM resulted in a better model fit and predictive performance compared to the AM. DREAM provides means for a good exploration of the posterior distributions of model parameters. Lastly, it reduces the risk of false convergence to a local optimum and potentially improves the predictive performance of the calibrated model.« less

  20. Bayesian calibration of terrestrial ecosystem models: A study of advanced Markov chain Monte Carlo methods

    DOE PAGES

    Lu, Dan; Ricciuto, Daniel; Walker, Anthony; ...

    2017-02-22

    Calibration of terrestrial ecosystem models is important but challenging. Bayesian inference implemented by Markov chain Monte Carlo (MCMC) sampling provides a comprehensive framework to estimate model parameters and associated uncertainties using their posterior distributions. The effectiveness and efficiency of the method strongly depend on the MCMC algorithm used. In this study, a Differential Evolution Adaptive Metropolis (DREAM) algorithm was used to estimate posterior distributions of 21 parameters for the data assimilation linked ecosystem carbon (DALEC) model using 14 years of daily net ecosystem exchange data collected at the Harvard Forest Environmental Measurement Site eddy-flux tower. The DREAM is a multi-chainmore » method and uses differential evolution technique for chain movement, allowing it to be efficiently applied to high-dimensional problems, and can reliably estimate heavy-tailed and multimodal distributions that are difficult for single-chain schemes using a Gaussian proposal distribution. The results were evaluated against the popular Adaptive Metropolis (AM) scheme. DREAM indicated that two parameters controlling autumn phenology have multiple modes in their posterior distributions while AM only identified one mode. The calibration of DREAM resulted in a better model fit and predictive performance compared to the AM. DREAM provides means for a good exploration of the posterior distributions of model parameters. Lastly, it reduces the risk of false convergence to a local optimum and potentially improves the predictive performance of the calibrated model.« less

  1. A comparison between Gauss-Newton and Markov chain Monte Carlo basedmethods for inverting spectral induced polarization data for Cole-Coleparameters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, Jinsong; Kemna, Andreas; Hubbard, Susan S.

    2008-05-15

    We develop a Bayesian model to invert spectral induced polarization (SIP) data for Cole-Cole parameters using Markov chain Monte Carlo (MCMC) sampling methods. We compare the performance of the MCMC based stochastic method with an iterative Gauss-Newton based deterministic method for Cole-Cole parameter estimation through inversion of synthetic and laboratory SIP data. The Gauss-Newton based method can provide an optimal solution for given objective functions under constraints, but the obtained optimal solution generally depends on the choice of initial values and the estimated uncertainty information is often inaccurate or insufficient. In contrast, the MCMC based inversion method provides extensive globalmore » information on unknown parameters, such as the marginal probability distribution functions, from which we can obtain better estimates and tighter uncertainty bounds of the parameters than with the deterministic method. Additionally, the results obtained with the MCMC method are independent of the choice of initial values. Because the MCMC based method does not explicitly offer single optimal solution for given objective functions, the deterministic and stochastic methods can complement each other. For example, the stochastic method can first be used to obtain the means of the unknown parameters by starting from an arbitrary set of initial values and the deterministic method can then be initiated using the means as starting values to obtain the optimal estimates of the Cole-Cole parameters.« less

  2. An Evaluation of a Markov Chain Monte Carlo Method for the Two-Parameter Logistic Model.

    ERIC Educational Resources Information Center

    Kim, Seock-Ho; Cohen, Allan S.

    The accuracy of the Markov Chain Monte Carlo (MCMC) procedure Gibbs sampling was considered for estimation of item parameters of the two-parameter logistic model. Data for the Law School Admission Test (LSAT) Section 6 were analyzed to illustrate the MCMC procedure. In addition, simulated data sets were analyzed using the MCMC, marginal Bayesian…

  3. Estimation Methods for One-Parameter Testlet Models

    ERIC Educational Resources Information Center

    Jiao, Hong; Wang, Shudong; He, Wei

    2013-01-01

    This study demonstrated the equivalence between the Rasch testlet model and the three-level one-parameter testlet model and explored the Markov Chain Monte Carlo (MCMC) method for model parameter estimation in WINBUGS. The estimation accuracy from the MCMC method was compared with those from the marginalized maximum likelihood estimation (MMLE)…

  4. Receptive Field Inference with Localized Priors

    PubMed Central

    Park, Mijung; Pillow, Jonathan W.

    2011-01-01

    The linear receptive field describes a mapping from sensory stimuli to a one-dimensional variable governing a neuron's spike response. However, traditional receptive field estimators such as the spike-triggered average converge slowly and often require large amounts of data. Bayesian methods seek to overcome this problem by biasing estimates towards solutions that are more likely a priori, typically those with small, smooth, or sparse coefficients. Here we introduce a novel Bayesian receptive field estimator designed to incorporate locality, a powerful form of prior information about receptive field structure. The key to our approach is a hierarchical receptive field model that flexibly adapts to localized structure in both spacetime and spatiotemporal frequency, using an inference method known as empirical Bayes. We refer to our method as automatic locality determination (ALD), and show that it can accurately recover various types of smooth, sparse, and localized receptive fields. We apply ALD to neural data from retinal ganglion cells and V1 simple cells, and find it achieves error rates several times lower than standard estimators. Thus, estimates of comparable accuracy can be achieved with substantially less data. Finally, we introduce a computationally efficient Markov Chain Monte Carlo (MCMC) algorithm for fully Bayesian inference under the ALD prior, yielding accurate Bayesian confidence intervals for small or noisy datasets. PMID:22046110

  5. Modelling the spread of American foulbrood in honeybees

    PubMed Central

    Datta, Samik; Bull, James C.; Budge, Giles E.; Keeling, Matt J.

    2013-01-01

    We investigate the spread of American foulbrood (AFB), a disease caused by the bacterium Paenibacillus larvae, that affects bees and can be extremely damaging to beehives. Our dataset comes from an inspection period carried out during an AFB epidemic of honeybee colonies on the island of Jersey during the summer of 2010. The data include the number of hives of honeybees, location and owner of honeybee apiaries across the island. We use a spatial SIR model with an underlying owner network to simulate the epidemic and characterize the epidemic using a Markov chain Monte Carlo (MCMC) scheme to determine model parameters and infection times (including undetected ‘occult’ infections). Likely methods of infection spread can be inferred from the analysis, with both distance- and owner-based transmissions being found to contribute to the spread of AFB. The results of the MCMC are corroborated by simulating the epidemic using a stochastic SIR model, resulting in aggregate levels of infection that are comparable to the data. We use this stochastic SIR model to simulate the impact of different control strategies on controlling the epidemic. It is found that earlier inspections result in smaller epidemics and a higher likelihood of AFB extinction. PMID:24026473

  6. Inferring soil salinity in a drip irrigation system from multi-configuration EMI measurements using adaptive Markov chain Monte Carlo

    NASA Astrophysics Data System (ADS)

    Zaib Jadoon, Khan; Umer Altaf, Muhammad; McCabe, Matthew Francis; Hoteit, Ibrahim; Muhammad, Nisar; Moghadas, Davood; Weihermüller, Lutz

    2017-10-01

    A substantial interpretation of electromagnetic induction (EMI) measurements requires quantifying optimal model parameters and uncertainty of a nonlinear inverse problem. For this purpose, an adaptive Bayesian Markov chain Monte Carlo (MCMC) algorithm is used to assess multi-orientation and multi-offset EMI measurements in an agriculture field with non-saline and saline soil. In MCMC the posterior distribution is computed using Bayes' rule. The electromagnetic forward model based on the full solution of Maxwell's equations was used to simulate the apparent electrical conductivity measured with the configurations of EMI instrument, the CMD Mini-Explorer. Uncertainty in the parameters for the three-layered earth model are investigated by using synthetic data. Our results show that in the scenario of non-saline soil, the parameters of layer thickness as compared to layers electrical conductivity are not very informative and are therefore difficult to resolve. Application of the proposed MCMC-based inversion to field measurements in a drip irrigation system demonstrates that the parameters of the model can be well estimated for the saline soil as compared to the non-saline soil, and provides useful insight about parameter uncertainty for the assessment of the model outputs.

  7. Intercoalescence time distribution of incomplete gene genealogies in temporally varying populations, and applications in population genetic inference.

    PubMed

    Chen, Hua

    2013-03-01

    Tracing back to a specific time T in the past, the genealogy of a sample of haplotypes may not have reached their common ancestor and may leave m lineages extant. For such an incomplete genealogy truncated at a specific time T in the past, the distribution and expectation of the intercoalescence times conditional on T are derived in an exact form in this paper for populations of deterministically time-varying sizes, specifically, for populations growing exponentially. The derived intercoalescence time distribution can be integrated to the coalescent-based joint allele frequency spectrum (JAFS) theory, and is useful for population genetic inference from large-scale genomic data, without relying on computationally intensive approaches, such as importance sampling and Markov Chain Monte Carlo (MCMC) methods. The inference of several important parameters relying on this derived conditional distribution is demonstrated: quantifying population growth rate and onset time, and estimating the number of ancestral lineages at a specific ancient time. Simulation studies confirm validity of the derivation and statistical efficiency of the methods using the derived intercoalescence time distribution. Two examples of real data are given to show the inference of the population growth rate of a European sample from the NIEHS Environmental Genome Project, and the number of ancient lineages of 31 mitochondrial genomes from Tibetan populations. © 2013 Blackwell Publishing Ltd/University College London.

  8. Bayesian Population Genomic Inference of Crossing Over and Gene Conversion

    PubMed Central

    Padhukasahasram, Badri; Rannala, Bruce

    2011-01-01

    Meiotic recombination is a fundamental cellular mechanism in sexually reproducing organisms and its different forms, crossing over and gene conversion both play an important role in shaping genetic variation in populations. Here, we describe a coalescent-based full-likelihood Markov chain Monte Carlo (MCMC) method for jointly estimating the crossing-over, gene-conversion, and mean tract length parameters from population genomic data under a Bayesian framework. Although computationally more expensive than methods that use approximate likelihoods, the relative efficiency of our method is expected to be optimal in theory. Furthermore, it is also possible to obtain a posterior sample of genealogies for the data using this method. We first check the performance of the new method on simulated data and verify its correctness. We also extend the method for inference under models with variable gene-conversion and crossing-over rates and demonstrate its ability to identify recombination hotspots. Then, we apply the method to two empirical data sets that were sequenced in the telomeric regions of the X chromosome of Drosophila melanogaster. Our results indicate that gene conversion occurs more frequently than crossing over in the su-w and su-s gene sequences while the local rates of crossing over as inferred by our program are not low. The mean tract lengths for gene-conversion events are estimated to be ∼70 bp and 430 bp, respectively, for these data sets. Finally, we discuss ideas and optimizations for reducing the execution time of our algorithm. PMID:21840857

  9. Convergence analysis of surrogate-based methods for Bayesian inverse problems

    NASA Astrophysics Data System (ADS)

    Yan, Liang; Zhang, Yuan-Xiang

    2017-12-01

    The major challenges in the Bayesian inverse problems arise from the need for repeated evaluations of the forward model, as required by Markov chain Monte Carlo (MCMC) methods for posterior sampling. Many attempts at accelerating Bayesian inference have relied on surrogates for the forward model, typically constructed through repeated forward simulations that are performed in an offline phase. Although such approaches can be quite effective at reducing computation cost, there has been little analysis of the approximation on posterior inference. In this work, we prove error bounds on the Kullback-Leibler (KL) distance between the true posterior distribution and the approximation based on surrogate models. Our rigorous error analysis show that if the forward model approximation converges at certain rate in the prior-weighted L 2 norm, then the posterior distribution generated by the approximation converges to the true posterior at least two times faster in the KL sense. The error bound on the Hellinger distance is also provided. To provide concrete examples focusing on the use of the surrogate model based methods, we present an efficient technique for constructing stochastic surrogate models to accelerate the Bayesian inference approach. The Christoffel least squares algorithms, based on generalized polynomial chaos, are used to construct a polynomial approximation of the forward solution over the support of the prior distribution. The numerical strategy and the predicted convergence rates are then demonstrated on the nonlinear inverse problems, involving the inference of parameters appearing in partial differential equations.

  10. Uncertainty Analysis Based on Sparse Grid Collocation and Quasi-Monte Carlo Sampling with Application in Groundwater Modeling

    NASA Astrophysics Data System (ADS)

    Zhang, G.; Lu, D.; Ye, M.; Gunzburger, M.

    2011-12-01

    Markov Chain Monte Carlo (MCMC) methods have been widely used in many fields of uncertainty analysis to estimate the posterior distributions of parameters and credible intervals of predictions in the Bayesian framework. However, in practice, MCMC may be computationally unaffordable due to slow convergence and the excessive number of forward model executions required, especially when the forward model is expensive to compute. Both disadvantages arise from the curse of dimensionality, i.e., the posterior distribution is usually a multivariate function of parameters. Recently, sparse grid method has been demonstrated to be an effective technique for coping with high-dimensional interpolation or integration problems. Thus, in order to accelerate the forward model and avoid the slow convergence of MCMC, we propose a new method for uncertainty analysis based on sparse grid interpolation and quasi-Monte Carlo sampling. First, we construct a polynomial approximation of the forward model in the parameter space by using the sparse grid interpolation. This approximation then defines an accurate surrogate posterior distribution that can be evaluated repeatedly at minimal computational cost. Second, instead of using MCMC, a quasi-Monte Carlo method is applied to draw samples in the parameter space. Then, the desired probability density function of each prediction is approximated by accumulating the posterior density values of all the samples according to the prediction values. Our method has the following advantages: (1) the polynomial approximation of the forward model on the sparse grid provides a very efficient evaluation of the surrogate posterior distribution; (2) the quasi-Monte Carlo method retains the same accuracy in approximating the PDF of predictions but avoids all disadvantages of MCMC. The proposed method is applied to a controlled numerical experiment of groundwater flow modeling. The results show that our method attains the same accuracy much more efficiently than traditional MCMC.

  11. Standard Error Estimation of 3PL IRT True Score Equating with an MCMC Method

    ERIC Educational Resources Information Center

    Liu, Yuming; Schulz, E. Matthew; Yu, Lei

    2008-01-01

    A Markov chain Monte Carlo (MCMC) method and a bootstrap method were compared in the estimation of standard errors of item response theory (IRT) true score equating. Three test form relationships were examined: parallel, tau-equivalent, and congeneric. Data were simulated based on Reading Comprehension and Vocabulary tests of the Iowa Tests of…

  12. [Comparison of different methods in dealing with HIV viral load data with diversified missing value mechanism on HIV positive MSM].

    PubMed

    Jiang, Z; Dou, Z; Song, W L; Xu, J; Wu, Z Y

    2017-11-10

    Objective: To compare results of different methods: in organizing HIV viral load (VL) data with missing values mechanism. Methods We used software SPSS 17.0 to simulate complete and missing data with different missing value mechanism from HIV viral loading data collected from MSM in 16 cities in China in 2013. Maximum Likelihood Methods Using the Expectation and Maximization Algorithm (EM), regressive method, mean imputation, delete method, and Markov Chain Monte Carlo (MCMC) were used to supplement missing data respectively. The results: of different methods were compared according to distribution characteristics, accuracy and precision. Results HIV VL data could not be transferred into a normal distribution. All the methods showed good results in iterating data which is Missing Completely at Random Mechanism (MCAR). For the other types of missing data, regressive and MCMC methods were used to keep the main characteristic of the original data. The means of iterating database with different methods were all close to the original one. The EM, regressive method, mean imputation, and delete method under-estimate VL while MCMC overestimates it. Conclusion: MCMC can be used as the main imputation method for HIV virus loading missing data. The iterated data can be used as a reference for mean HIV VL estimation among the investigated population.

  13. Bayesian Analysis of Evolutionary Divergence with Genomic Data under Diverse Demographic Models.

    PubMed

    Chung, Yujin; Hey, Jody

    2017-06-01

    We present a new Bayesian method for estimating demographic and phylogenetic history using population genomic data. Several key innovations are introduced that allow the study of diverse models within an Isolation-with-Migration framework. The new method implements a 2-step analysis, with an initial Markov chain Monte Carlo (MCMC) phase that samples simple coalescent trees, followed by the calculation of the joint posterior density for the parameters of a demographic model. In step 1, the MCMC sampling phase, the method uses a reduced state space, consisting of coalescent trees without migration paths, and a simple importance sampling distribution without the demography of interest. Once obtained, a single sample of trees can be used in step 2 to calculate the joint posterior density for model parameters under multiple diverse demographic models, without having to repeat MCMC runs. Because migration paths are not included in the state space of the MCMC phase, but rather are handled by analytic integration in step 2 of the analysis, the method is scalable to a large number of loci with excellent MCMC mixing properties. With an implementation of the new method in the computer program MIST, we demonstrate the method's accuracy, scalability, and other advantages using simulated data and DNA sequences of two common chimpanzee subspecies: Pan troglodytes (P. t.) troglodytes and P. t. verus. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  14. Bayesian calibration of terrestrial ecosystem models: a study of advanced Markov chain Monte Carlo methods

    NASA Astrophysics Data System (ADS)

    Lu, Dan; Ricciuto, Daniel; Walker, Anthony; Safta, Cosmin; Munger, William

    2017-09-01

    Calibration of terrestrial ecosystem models is important but challenging. Bayesian inference implemented by Markov chain Monte Carlo (MCMC) sampling provides a comprehensive framework to estimate model parameters and associated uncertainties using their posterior distributions. The effectiveness and efficiency of the method strongly depend on the MCMC algorithm used. In this work, a differential evolution adaptive Metropolis (DREAM) algorithm is used to estimate posterior distributions of 21 parameters for the data assimilation linked ecosystem carbon (DALEC) model using 14 years of daily net ecosystem exchange data collected at the Harvard Forest Environmental Measurement Site eddy-flux tower. The calibration of DREAM results in a better model fit and predictive performance compared to the popular adaptive Metropolis (AM) scheme. Moreover, DREAM indicates that two parameters controlling autumn phenology have multiple modes in their posterior distributions while AM only identifies one mode. The application suggests that DREAM is very suitable to calibrate complex terrestrial ecosystem models, where the uncertain parameter size is usually large and existence of local optima is always a concern. In addition, this effort justifies the assumptions of the error model used in Bayesian calibration according to the residual analysis. The result indicates that a heteroscedastic, correlated, Gaussian error model is appropriate for the problem, and the consequent constructed likelihood function can alleviate the underestimation of parameter uncertainty that is usually caused by using uncorrelated error models.

  15. Estimation of under-reporting in epidemics using approximations.

    PubMed

    Gamado, Kokouvi; Streftaris, George; Zachary, Stan

    2017-06-01

    Under-reporting in epidemics, when it is ignored, leads to under-estimation of the infection rate and therefore of the reproduction number. In the case of stochastic models with temporal data, a usual approach for dealing with such issues is to apply data augmentation techniques through Bayesian methodology. Departing from earlier literature approaches implemented using reversible jump Markov chain Monte Carlo (RJMCMC) techniques, we make use of approximations to obtain faster estimation with simple MCMC. Comparisons among the methods developed here, and with the RJMCMC approach, are carried out and highlight that approximation-based methodology offers useful alternative inference tools for large epidemics, with a good trade-off between time cost and accuracy.

  16. Asteroid orbital inversion using uniform phase-space sampling

    NASA Astrophysics Data System (ADS)

    Muinonen, K.; Pentikäinen, H.; Granvik, M.; Oszkiewicz, D.; Virtanen, J.

    2014-07-01

    We review statistical inverse methods for asteroid orbit computation from a small number of astrometric observations and short time intervals of observations. With the help of Markov-chain Monte Carlo methods (MCMC), we present a novel inverse method that utilizes uniform sampling of the phase space for the orbital elements. The statistical orbital ranging method (Virtanen et al. 2001, Muinonen et al. 2001) was set out to resolve the long-lasting challenges in the initial computation of orbits for asteroids. The ranging method starts from the selection of a pair of astrometric observations. Thereafter, the topocentric ranges and angular deviations in R.A. and Decl. are randomly sampled. The two Cartesian positions allow for the computation of orbital elements and, subsequently, the computation of ephemerides for the observation dates. Candidate orbital elements are included in the sample of accepted elements if the χ^2-value between the observed and computed observations is within a pre-defined threshold. The sample orbital elements obtain weights based on a certain debiasing procedure. When the weights are available, the full sample of orbital elements allows the probabilistic assessments for, e.g., object classification and ephemeris computation as well as the computation of collision probabilities. The MCMC ranging method (Oszkiewicz et al. 2009; see also Granvik et al. 2009) replaces the original sampling algorithm described above with a proposal probability density function (p.d.f.), and a chain of sample orbital elements results in the phase space. MCMC ranging is based on a bivariate Gaussian p.d.f. for the topocentric ranges, and allows for the sampling to focus on the phase-space domain with most of the probability mass. In the virtual-observation MCMC method (Muinonen et al. 2012), the proposal p.d.f. for the orbital elements is chosen to mimic the a posteriori p.d.f. for the elements: first, random errors are simulated for each observation, resulting in a set of virtual observations; second, corresponding virtual least-squares orbital elements are derived using the Nelder-Mead downhill simplex method; third, repeating the procedure two times allows for a computation of a difference for two sets of virtual orbital elements; and, fourth, this orbital-element difference constitutes a symmetric proposal in a random-walk Metropolis-Hastings algorithm, avoiding the explicit computation of the proposal p.d.f. In a discrete approximation, the allowed proposals coincide with the differences that are based on a large number of pre-computed sets of virtual least-squares orbital elements. The virtual-observation MCMC method is thus based on the characterization of the relevant volume in the orbital-element phase space. Here we utilize MCMC to map the phase-space domain of acceptable solutions. We can make use of the proposal p.d.f.s from the MCMC ranging and virtual-observation methods. The present phase-space mapping produces, upon convergence, a uniform sampling of the solution space within a pre-defined χ^2-value. The weights of the sampled orbital elements are then computed on the basis of the corresponding χ^2-values. The present method resembles the original ranging method. On one hand, MCMC mapping is insensitive to local extrema in the phase space and efficiently maps the solution space. This is somewhat contrary to the MCMC methods described above. On the other hand, MCMC mapping can suffer from producing a small number of sample elements with small χ^2-values, in resemblance to the original ranging method. We apply the methods to example near-Earth, main-belt, and transneptunian objects, and highlight the utilization of the methods in the data processing and analysis pipeline of the ESA Gaia space mission.

  17. Improved Assimilation of Streamflow and Satellite Soil Moisture with the Evolutionary Particle Filter and Geostatistical Modeling

    NASA Astrophysics Data System (ADS)

    Yan, Hongxiang; Moradkhani, Hamid; Abbaszadeh, Peyman

    2017-04-01

    Assimilation of satellite soil moisture and streamflow data into hydrologic models using has received increasing attention over the past few years. Currently, these observations are increasingly used to improve the model streamflow and soil moisture predictions. However, the performance of this land data assimilation (DA) system still suffers from two limitations: 1) satellite data scarcity and quality; and 2) particle weight degeneration. In order to overcome these two limitations, we propose two possible solutions in this study. First, the general Gaussian geostatistical approach is proposed to overcome the limitation in the space/time resolution of satellite soil moisture products thus improving their accuracy at uncovered/biased grid cells. Secondly, an evolutionary PF approach based on Genetic Algorithm (GA) and Markov Chain Monte Carlo (MCMC), the so-called EPF-MCMC, is developed to further reduce weight degeneration and improve the robustness of the land DA system. This study provides a detailed analysis of the joint and separate assimilation of streamflow and satellite soil moisture into a distributed Sacramento Soil Moisture Accounting (SAC-SMA) model, with the use of recently developed EPF-MCMC and the general Gaussian geostatistical approach. Performance is assessed over several basins in the USA selected from Model Parameter Estimation Experiment (MOPEX) and located in different climate regions. The results indicate that: 1) the general Gaussian approach can predict the soil moisture at uncovered grid cells within the expected satellite data quality threshold; 2) assimilation of satellite soil moisture inferred from the general Gaussian model can significantly improve the soil moisture predictions; and 3) in terms of both deterministic and probabilistic measures, the EPF-MCMC can achieve better streamflow predictions. These results recommend that the geostatistical model is a helpful tool to aid the remote sensing technique and the EPF-MCMC is a reliable and effective DA approach in hydrologic applications.

  18. Modeling and Bayesian parameter estimation for shape memory alloy bending actuators

    NASA Astrophysics Data System (ADS)

    Crews, John H.; Smith, Ralph C.

    2012-04-01

    In this paper, we employ a homogenized energy model (HEM) for shape memory alloy (SMA) bending actuators. Additionally, we utilize a Bayesian method for quantifying parameter uncertainty. The system consists of a SMA wire attached to a flexible beam. As the actuator is heated, the beam bends, providing endoscopic motion. The model parameters are fit to experimental data using an ordinary least-squares approach. The uncertainty in the fit model parameters is then quantified using Markov Chain Monte Carlo (MCMC) methods. The MCMC algorithm provides bounds on the parameters, which will ultimately be used in robust control algorithms. One purpose of the paper is to test the feasibility of the Random Walk Metropolis algorithm, the MCMC method used here.

  19. Bayesian Monte Carlo and Maximum Likelihood Approach for ...

    EPA Pesticide Factsheets

    Model uncertainty estimation and risk assessment is essential to environmental management and informed decision making on pollution mitigation strategies. In this study, we apply a probabilistic methodology, which combines Bayesian Monte Carlo simulation and Maximum Likelihood estimation (BMCML) to calibrate a lake oxygen recovery model. We first derive an analytical solution of the differential equation governing lake-averaged oxygen dynamics as a function of time-variable wind speed. Statistical inferences on model parameters and predictive uncertainty are then drawn by Bayesian conditioning of the analytical solution on observed daily wind speed and oxygen concentration data obtained from an earlier study during two recovery periods on a eutrophic lake in upper state New York. The model is calibrated using oxygen recovery data for one year and statistical inferences were validated using recovery data for another year. Compared with essentially two-step, regression and optimization approach, the BMCML results are more comprehensive and performed relatively better in predicting the observed temporal dissolved oxygen levels (DO) in the lake. BMCML also produced comparable calibration and validation results with those obtained using popular Markov Chain Monte Carlo technique (MCMC) and is computationally simpler and easier to implement than the MCMC. Next, using the calibrated model, we derive an optimal relationship between liquid film-transfer coefficien

  20. Bayesian inference of nonlinear unsteady aerodynamics from aeroelastic limit cycle oscillations

    NASA Astrophysics Data System (ADS)

    Sandhu, Rimple; Poirel, Dominique; Pettit, Chris; Khalil, Mohammad; Sarkar, Abhijit

    2016-07-01

    A Bayesian model selection and parameter estimation algorithm is applied to investigate the influence of nonlinear and unsteady aerodynamic loads on the limit cycle oscillation (LCO) of a pitching airfoil in the transitional Reynolds number regime. At small angles of attack, laminar boundary layer trailing edge separation causes negative aerodynamic damping leading to the LCO. The fluid-structure interaction of the rigid, but elastically mounted, airfoil and nonlinear unsteady aerodynamics is represented by two coupled nonlinear stochastic ordinary differential equations containing uncertain parameters and model approximation errors. Several plausible aerodynamic models with increasing complexity are proposed to describe the aeroelastic system leading to LCO. The likelihood in the posterior parameter probability density function (pdf) is available semi-analytically using the extended Kalman filter for the state estimation of the coupled nonlinear structural and unsteady aerodynamic model. The posterior parameter pdf is sampled using a parallel and adaptive Markov Chain Monte Carlo (MCMC) algorithm. The posterior probability of each model is estimated using the Chib-Jeliazkov method that directly uses the posterior MCMC samples for evidence (marginal likelihood) computation. The Bayesian algorithm is validated through a numerical study and then applied to model the nonlinear unsteady aerodynamic loads using wind-tunnel test data at various Reynolds numbers.

  1. Bayesian inference of nonlinear unsteady aerodynamics from aeroelastic limit cycle oscillations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sandhu, Rimple; Poirel, Dominique; Pettit, Chris

    2016-07-01

    A Bayesian model selection and parameter estimation algorithm is applied to investigate the influence of nonlinear and unsteady aerodynamic loads on the limit cycle oscillation (LCO) of a pitching airfoil in the transitional Reynolds number regime. At small angles of attack, laminar boundary layer trailing edge separation causes negative aerodynamic damping leading to the LCO. The fluid–structure interaction of the rigid, but elastically mounted, airfoil and nonlinear unsteady aerodynamics is represented by two coupled nonlinear stochastic ordinary differential equations containing uncertain parameters and model approximation errors. Several plausible aerodynamic models with increasing complexity are proposed to describe the aeroelastic systemmore » leading to LCO. The likelihood in the posterior parameter probability density function (pdf) is available semi-analytically using the extended Kalman filter for the state estimation of the coupled nonlinear structural and unsteady aerodynamic model. The posterior parameter pdf is sampled using a parallel and adaptive Markov Chain Monte Carlo (MCMC) algorithm. The posterior probability of each model is estimated using the Chib–Jeliazkov method that directly uses the posterior MCMC samples for evidence (marginal likelihood) computation. The Bayesian algorithm is validated through a numerical study and then applied to model the nonlinear unsteady aerodynamic loads using wind-tunnel test data at various Reynolds numbers.« less

  2. Parametric modelling of cost data in medical studies.

    PubMed

    Nixon, R M; Thompson, S G

    2004-04-30

    The cost of medical resources used is often recorded for each patient in clinical studies in order to inform decision-making. Although cost data are generally skewed to the right, interest is in making inferences about the population mean cost. Common methods for non-normal data, such as data transformation, assuming asymptotic normality of the sample mean or non-parametric bootstrapping, are not ideal. This paper describes possible parametric models for analysing cost data. Four example data sets are considered, which have different sample sizes and degrees of skewness. Normal, gamma, log-normal, and log-logistic distributions are fitted, together with three-parameter versions of the latter three distributions. Maximum likelihood estimates of the population mean are found; confidence intervals are derived by a parametric BC(a) bootstrap and checked by MCMC methods. Differences between model fits and inferences are explored.Skewed parametric distributions fit cost data better than the normal distribution, and should in principle be preferred for estimating the population mean cost. However for some data sets, we find that models that fit badly can give similar inferences to those that fit well. Conversely, particularly when sample sizes are not large, different parametric models that fit the data equally well can lead to substantially different inferences. We conclude that inferences are sensitive to choice of statistical model, which itself can remain uncertain unless there is enough data to model the tail of the distribution accurately. Investigating the sensitivity of conclusions to choice of model should thus be an essential component of analysing cost data in practice. Copyright 2004 John Wiley & Sons, Ltd.

  3. An investigation into exoplanet transits and uncertainties

    NASA Astrophysics Data System (ADS)

    Ji, Y.; Banks, T.; Budding, E.; Rhodes, M. D.

    2017-06-01

    A simple transit model is described along with tests of this model against published results for 4 exoplanet systems (Kepler-1, 2, 8, and 77). Data from the Kepler mission are used. The Markov Chain Monte Carlo (MCMC) method is applied to obtain realistic error estimates. Optimisation of limb darkening coefficients is subject to data quality. It is more likely for MCMC to derive an empirical limb darkening coefficient for light curves with S/N (signal to noise) above 15. Finally, the model is applied to Kepler data for 4 Kepler candidate systems (KOI 760.01, 767.01, 802.01, and 824.01) with previously unpublished results. Error estimates for these systems are obtained via the MCMC method.

  4. What are hierarchical models and how do we analyze them?

    USGS Publications Warehouse

    Royle, Andy

    2016-01-01

    In this chapter we provide a basic definition of hierarchical models and introduce the two canonical hierarchical models in this book: site occupancy and N-mixture models. The former is a hierarchical extension of logistic regression and the latter is a hierarchical extension of Poisson regression. We introduce basic concepts of probability modeling and statistical inference including likelihood and Bayesian perspectives. We go through the mechanics of maximizing the likelihood and characterizing the posterior distribution by Markov chain Monte Carlo (MCMC) methods. We give a general perspective on topics such as model selection and assessment of model fit, although we demonstrate these topics in practice in later chapters (especially Chapters 5, 6, 7, and 10 Chapter 5 Chapter 6 Chapter 7 Chapter 10)

  5. INFERENCE FOR INDIVIDUAL-LEVEL MODELS OF INFECTIOUS DISEASES IN LARGE POPULATIONS.

    PubMed

    Deardon, Rob; Brooks, Stephen P; Grenfell, Bryan T; Keeling, Matthew J; Tildesley, Michael J; Savill, Nicholas J; Shaw, Darren J; Woolhouse, Mark E J

    2010-01-01

    Individual Level Models (ILMs), a new class of models, are being applied to infectious epidemic data to aid in the understanding of the spatio-temporal dynamics of infectious diseases. These models are highly flexible and intuitive, and can be parameterised under a Bayesian framework via Markov chain Monte Carlo (MCMC) methods. Unfortunately, this parameterisation can be difficult to implement due to intense computational requirements when calculating the full posterior for large, or even moderately large, susceptible populations, or when missing data are present. Here we detail a methodology that can be used to estimate parameters for such large, and/or incomplete, data sets. This is done in the context of a study of the UK 2001 foot-and-mouth disease (FMD) epidemic.

  6. Bayesian calibration of terrestrial ecosystem models: a study of advanced Markov chain Monte Carlo methods

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lu, Dan; Ricciuto, Daniel M.; Walker, Anthony P.

    Calibration of terrestrial ecosystem models is important but challenging. Bayesian inference implemented by Markov chain Monte Carlo (MCMC) sampling provides a comprehensive framework to estimate model parameters and associated uncertainties using their posterior distributions. The effectiveness and efficiency of the method strongly depend on the MCMC algorithm used. In this work, a differential evolution adaptive Metropolis (DREAM) algorithm is used to estimate posterior distributions of 21 parameters for the data assimilation linked ecosystem carbon (DALEC) model using 14 years of daily net ecosystem exchange data collected at the Harvard Forest Environmental Measurement Site eddy-flux tower. The calibration of DREAM results inmore » a better model fit and predictive performance compared to the popular adaptive Metropolis (AM) scheme. Moreover, DREAM indicates that two parameters controlling autumn phenology have multiple modes in their posterior distributions while AM only identifies one mode. The application suggests that DREAM is very suitable to calibrate complex terrestrial ecosystem models, where the uncertain parameter size is usually large and existence of local optima is always a concern. In addition, this effort justifies the assumptions of the error model used in Bayesian calibration according to the residual analysis. Here, the result indicates that a heteroscedastic, correlated, Gaussian error model is appropriate for the problem, and the consequent constructed likelihood function can alleviate the underestimation of parameter uncertainty that is usually caused by using uncorrelated error models.« less

  7. Bayesian calibration of terrestrial ecosystem models: a study of advanced Markov chain Monte Carlo methods

    DOE PAGES

    Lu, Dan; Ricciuto, Daniel M.; Walker, Anthony P.; ...

    2017-09-27

    Calibration of terrestrial ecosystem models is important but challenging. Bayesian inference implemented by Markov chain Monte Carlo (MCMC) sampling provides a comprehensive framework to estimate model parameters and associated uncertainties using their posterior distributions. The effectiveness and efficiency of the method strongly depend on the MCMC algorithm used. In this work, a differential evolution adaptive Metropolis (DREAM) algorithm is used to estimate posterior distributions of 21 parameters for the data assimilation linked ecosystem carbon (DALEC) model using 14 years of daily net ecosystem exchange data collected at the Harvard Forest Environmental Measurement Site eddy-flux tower. The calibration of DREAM results inmore » a better model fit and predictive performance compared to the popular adaptive Metropolis (AM) scheme. Moreover, DREAM indicates that two parameters controlling autumn phenology have multiple modes in their posterior distributions while AM only identifies one mode. The application suggests that DREAM is very suitable to calibrate complex terrestrial ecosystem models, where the uncertain parameter size is usually large and existence of local optima is always a concern. In addition, this effort justifies the assumptions of the error model used in Bayesian calibration according to the residual analysis. Here, the result indicates that a heteroscedastic, correlated, Gaussian error model is appropriate for the problem, and the consequent constructed likelihood function can alleviate the underestimation of parameter uncertainty that is usually caused by using uncorrelated error models.« less

  8. Invited commentary: Lost in estimation--searching for alternatives to markov chains to fit complex Bayesian models.

    PubMed

    Molitor, John

    2012-03-01

    Bayesian methods have seen an increase in popularity in a wide variety of scientific fields, including epidemiology. One of the main reasons for their widespread application is the power of the Markov chain Monte Carlo (MCMC) techniques generally used to fit these models. As a result, researchers often implicitly associate Bayesian models with MCMC estimation procedures. However, Bayesian models do not always require Markov-chain-based methods for parameter estimation. This is important, as MCMC estimation methods, while generally quite powerful, are complex and computationally expensive and suffer from convergence problems related to the manner in which they generate correlated samples used to estimate probability distributions for parameters of interest. In this issue of the Journal, Cole et al. (Am J Epidemiol. 2012;175(5):368-375) present an interesting paper that discusses non-Markov-chain-based approaches to fitting Bayesian models. These methods, though limited, can overcome some of the problems associated with MCMC techniques and promise to provide simpler approaches to fitting Bayesian models. Applied researchers will find these estimation approaches intuitively appealing and will gain a deeper understanding of Bayesian models through their use. However, readers should be aware that other non-Markov-chain-based methods are currently in active development and have been widely published in other fields.

  9. A Robust Deconvolution Method based on Transdimensional Hierarchical Bayesian Inference

    NASA Astrophysics Data System (ADS)

    Kolb, J.; Lekic, V.

    2012-12-01

    Analysis of P-S and S-P conversions allows us to map receiver side crustal and lithospheric structure. This analysis often involves deconvolution of the parent wave field from the scattered wave field as a means of suppressing source-side complexity. A variety of deconvolution techniques exist including damped spectral division, Wiener filtering, iterative time-domain deconvolution, and the multitaper method. All of these techniques require estimates of noise characteristics as input parameters. We present a deconvolution method based on transdimensional Hierarchical Bayesian inference in which both noise magnitude and noise correlation are used as parameters in calculating the likelihood probability distribution. Because the noise for P-S and S-P conversion analysis in terms of receiver functions is a combination of both background noise - which is relatively easy to characterize - and signal-generated noise - which is much more difficult to quantify - we treat measurement errors as an known quantity, characterized by a probability density function whose mean and variance are model parameters. This transdimensional Hierarchical Bayesian approach has been successfully used previously in the inversion of receiver functions in terms of shear and compressional wave speeds of an unknown number of layers [1]. In our method we used a Markov chain Monte Carlo (MCMC) algorithm to find the receiver function that best fits the data while accurately assessing the noise parameters. In order to parameterize the receiver function we model the receiver function as an unknown number of Gaussians of unknown amplitude and width. The algorithm takes multiple steps before calculating the acceptance probability of a new model, in order to avoid getting trapped in local misfit minima. Using both observed and synthetic data, we show that the MCMC deconvolution method can accurately obtain a receiver function as well as an estimate of the noise parameters given the parent and daughter components. Furthermore, we demonstrate that this new approach is far less susceptible to generating spurious features even at high noise levels. Finally, the method yields not only the most-likely receiver function, but also quantifies its full uncertainty. [1] Bodin, T., M. Sambridge, H. Tkalčić, P. Arroucau, K. Gallagher, and N. Rawlinson (2012), Transdimensional inversion of receiver functions and surface wave dispersion, J. Geophys. Res., 117, B02301

  10. Using SAS PROC MCMC for Item Response Theory Models

    PubMed Central

    Samonte, Kelli

    2014-01-01

    Interest in using Bayesian methods for estimating item response theory models has grown at a remarkable rate in recent years. This attentiveness to Bayesian estimation has also inspired a growth in available software such as WinBUGS, R packages, BMIRT, MPLUS, and SAS PROC MCMC. This article intends to provide an accessible overview of Bayesian methods in the context of item response theory to serve as a useful guide for practitioners in estimating and interpreting item response theory (IRT) models. Included is a description of the estimation procedure used by SAS PROC MCMC. Syntax is provided for estimation of both dichotomous and polytomous IRT models, as well as a discussion on how to extend the syntax to accommodate more complex IRT models. PMID:29795834

  11. Bayesian-MCMC-based parameter estimation of stealth aircraft RCS models

    NASA Astrophysics Data System (ADS)

    Xia, Wei; Dai, Xiao-Xia; Feng, Yuan

    2015-12-01

    When modeling a stealth aircraft with low RCS (Radar Cross Section), conventional parameter estimation methods may cause a deviation from the actual distribution, owing to the fact that the characteristic parameters are estimated via directly calculating the statistics of RCS. The Bayesian-Markov Chain Monte Carlo (Bayesian-MCMC) method is introduced herein to estimate the parameters so as to improve the fitting accuracies of fluctuation models. The parameter estimations of the lognormal and the Legendre polynomial models are reformulated in the Bayesian framework. The MCMC algorithm is then adopted to calculate the parameter estimates. Numerical results show that the distribution curves obtained by the proposed method exhibit improved consistence with the actual ones, compared with those fitted by the conventional method. The fitting accuracy could be improved by no less than 25% for both fluctuation models, which implies that the Bayesian-MCMC method might be a good candidate among the optimal parameter estimation methods for stealth aircraft RCS models. Project supported by the National Natural Science Foundation of China (Grant No. 61101173), the National Basic Research Program of China (Grant No. 613206), the National High Technology Research and Development Program of China (Grant No. 2012AA01A308), the State Scholarship Fund by the China Scholarship Council (CSC), and the Oversea Academic Training Funds, and University of Electronic Science and Technology of China (UESTC).

  12. Model Reduction via Principe Component Analysis and Markov Chain Monte Carlo (MCMC) Methods

    NASA Astrophysics Data System (ADS)

    Gong, R.; Chen, J.; Hoversten, M. G.; Luo, J.

    2011-12-01

    Geophysical and hydrogeological inverse problems often include a large number of unknown parameters, ranging from hundreds to millions, depending on parameterization and problems undertaking. This makes inverse estimation and uncertainty quantification very challenging, especially for those problems in two- or three-dimensional spatial domains. Model reduction technique has the potential of mitigating the curse of dimensionality by reducing total numbers of unknowns while describing the complex subsurface systems adequately. In this study, we explore the use of principal component analysis (PCA) and Markov chain Monte Carlo (MCMC) sampling methods for model reduction through the use of synthetic datasets. We compare the performances of three different but closely related model reduction approaches: (1) PCA methods with geometric sampling (referred to as 'Method 1'), (2) PCA methods with MCMC sampling (referred to as 'Method 2'), and (3) PCA methods with MCMC sampling and inclusion of random effects (referred to as 'Method 3'). We consider a simple convolution model with five unknown parameters as our goal is to understand and visualize the advantages and disadvantages of each method by comparing their inversion results with the corresponding analytical solutions. We generated synthetic data with noise added and invert them under two different situations: (1) the noised data and the covariance matrix for PCA analysis are consistent (referred to as the unbiased case), and (2) the noise data and the covariance matrix are inconsistent (referred to as biased case). In the unbiased case, comparison between the analytical solutions and the inversion results show that all three methods provide good estimates of the true values and Method 1 is computationally more efficient. In terms of uncertainty quantification, Method 1 performs poorly because of relatively small number of samples obtained, Method 2 performs best, and Method 3 overestimates uncertainty due to inclusion of random effects. However, in the biased case, only Method 3 correctly estimates all the unknown parameters, and both Methods 1 and 2 provide wrong values for the biased parameters. The synthetic case study demonstrates that if the covariance matrix for PCA analysis is inconsistent with true models, the PCA methods with geometric or MCMC sampling will provide incorrect estimates.

  13. Hierarchical animal movement models for population-level inference

    USGS Publications Warehouse

    Hooten, Mevin B.; Buderman, Frances E.; Brost, Brian M.; Hanks, Ephraim M.; Ivans, Jacob S.

    2016-01-01

    New methods for modeling animal movement based on telemetry data are developed regularly. With advances in telemetry capabilities, animal movement models are becoming increasingly sophisticated. Despite a need for population-level inference, animal movement models are still predominantly developed for individual-level inference. Most efforts to upscale the inference to the population level are either post hoc or complicated enough that only the developer can implement the model. Hierarchical Bayesian models provide an ideal platform for the development of population-level animal movement models but can be challenging to fit due to computational limitations or extensive tuning required. We propose a two-stage procedure for fitting hierarchical animal movement models to telemetry data. The two-stage approach is statistically rigorous and allows one to fit individual-level movement models separately, then resample them using a secondary MCMC algorithm. The primary advantages of the two-stage approach are that the first stage is easily parallelizable and the second stage is completely unsupervised, allowing for an automated fitting procedure in many cases. We demonstrate the two-stage procedure with two applications of animal movement models. The first application involves a spatial point process approach to modeling telemetry data, and the second involves a more complicated continuous-time discrete-space animal movement model. We fit these models to simulated data and real telemetry data arising from a population of monitored Canada lynx in Colorado, USA.

  14. Inferring epidemiological parameters from phylogenies using regression-ABC: A comparative study

    PubMed Central

    Gascuel, Olivier

    2017-01-01

    Inferring epidemiological parameters such as the R0 from time-scaled phylogenies is a timely challenge. Most current approaches rely on likelihood functions, which raise specific issues that range from computing these functions to finding their maxima numerically. Here, we present a new regression-based Approximate Bayesian Computation (ABC) approach, which we base on a large variety of summary statistics intended to capture the information contained in the phylogeny and its corresponding lineage-through-time plot. The regression step involves the Least Absolute Shrinkage and Selection Operator (LASSO) method, which is a robust machine learning technique. It allows us to readily deal with the large number of summary statistics, while avoiding resorting to Markov Chain Monte Carlo (MCMC) techniques. To compare our approach to existing ones, we simulated target trees under a variety of epidemiological models and settings, and inferred parameters of interest using the same priors. We found that, for large phylogenies, the accuracy of our regression-ABC is comparable to that of likelihood-based approaches involving birth-death processes implemented in BEAST2. Our approach even outperformed these when inferring the host population size with a Susceptible-Infected-Removed epidemiological model. It also clearly outperformed a recent kernel-ABC approach when assuming a Susceptible-Infected epidemiological model with two host types. Lastly, by re-analyzing data from the early stages of the recent Ebola epidemic in Sierra Leone, we showed that regression-ABC provides more realistic estimates for the duration parameters (latency and infectiousness) than the likelihood-based method. Overall, ABC based on a large variety of summary statistics and a regression method able to perform variable selection and avoid overfitting is a promising approach to analyze large phylogenies. PMID:28263987

  15. Estimating a Markovian Epidemic Model Using Household Serial Interval Data from the Early Phase of an Epidemic

    PubMed Central

    Black, Andrew J.; Ross, Joshua V.

    2013-01-01

    The clinical serial interval of an infectious disease is the time between date of symptom onset in an index case and the date of symptom onset in one of its secondary cases. It is a quantity which is commonly collected during a pandemic and is of fundamental importance to public health policy and mathematical modelling. In this paper we present a novel method for calculating the serial interval distribution for a Markovian model of household transmission dynamics. This allows the use of Bayesian MCMC methods, with explicit evaluation of the likelihood, to fit to serial interval data and infer parameters of the underlying model. We use simulated and real data to verify the accuracy of our methodology and illustrate the importance of accounting for household size. The output of our approach can be used to produce posterior distributions of population level epidemic characteristics. PMID:24023679

  16. Modeling error distributions of growth curve models through Bayesian methods.

    PubMed

    Zhang, Zhiyong

    2016-06-01

    Growth curve models are widely used in social and behavioral sciences. However, typical growth curve models often assume that the errors are normally distributed although non-normal data may be even more common than normal data. In order to avoid possible statistical inference problems in blindly assuming normality, a general Bayesian framework is proposed to flexibly model normal and non-normal data through the explicit specification of the error distributions. A simulation study shows when the distribution of the error is correctly specified, one can avoid the loss in the efficiency of standard error estimates. A real example on the analysis of mathematical ability growth data from the Early Childhood Longitudinal Study, Kindergarten Class of 1998-99 is used to show the application of the proposed methods. Instructions and code on how to conduct growth curve analysis with both normal and non-normal error distributions using the the MCMC procedure of SAS are provided.

  17. Markov chain Monte Carlo techniques applied to parton distribution functions determination: Proof of concept

    NASA Astrophysics Data System (ADS)

    Gbedo, Yémalin Gabin; Mangin-Brinet, Mariane

    2017-07-01

    We present a new procedure to determine parton distribution functions (PDFs), based on Markov chain Monte Carlo (MCMC) methods. The aim of this paper is to show that we can replace the standard χ2 minimization by procedures grounded on statistical methods, and on Bayesian inference in particular, thus offering additional insight into the rich field of PDFs determination. After a basic introduction to these techniques, we introduce the algorithm we have chosen to implement—namely Hybrid (or Hamiltonian) Monte Carlo. This algorithm, initially developed for Lattice QCD, turns out to be very interesting when applied to PDFs determination by global analyses; we show that it allows us to circumvent the difficulties due to the high dimensionality of the problem, in particular concerning the acceptance. A first feasibility study is performed and presented, which indicates that Markov chain Monte Carlo can successfully be applied to the extraction of PDFs and of their uncertainties.

  18. Bayesian median regression for temporal gene expression data

    NASA Astrophysics Data System (ADS)

    Yu, Keming; Vinciotti, Veronica; Liu, Xiaohui; 't Hoen, Peter A. C.

    2007-09-01

    Most of the existing methods for the identification of biologically interesting genes in a temporal expression profiling dataset do not fully exploit the temporal ordering in the dataset and are based on normality assumptions for the gene expression. In this paper, we introduce a Bayesian median regression model to detect genes whose temporal profile is significantly different across a number of biological conditions. The regression model is defined by a polynomial function where both time and condition effects as well as interactions between the two are included. MCMC-based inference returns the posterior distribution of the polynomial coefficients. From this a simple Bayes factor test is proposed to test for significance. The estimation of the median rather than the mean, and within a Bayesian framework, increases the robustness of the method compared to a Hotelling T2-test previously suggested. This is shown on simulated data and on muscular dystrophy gene expression data.

  19. MultiNest: Efficient and Robust Bayesian Inference

    NASA Astrophysics Data System (ADS)

    Feroz, F.; Hobson, M. P.; Bridges, M.

    2011-09-01

    We present further development and the first public release of our multimodal nested sampling algorithm, called MultiNest. This Bayesian inference tool calculates the evidence, with an associated error estimate, and produces posterior samples from distributions that may contain multiple modes and pronounced (curving) degeneracies in high dimensions. The developments presented here lead to further substantial improvements in sampling efficiency and robustness, as compared to the original algorithm presented in Feroz & Hobson (2008), which itself significantly outperformed existing MCMC techniques in a wide range of astrophysical inference problems. The accuracy and economy of the MultiNest algorithm is demonstrated by application to two toy problems and to a cosmological inference problem focusing on the extension of the vanilla LambdaCDM model to include spatial curvature and a varying equation of state for dark energy. The MultiNest software is fully parallelized using MPI and includes an interface to CosmoMC. It will also be released as part of the SuperBayeS package, for the analysis of supersymmetric theories of particle physics, at this http URL.

  20. A semiparametric Bayesian proportional hazards model for interval censored data with frailty effects.

    PubMed

    Henschel, Volkmar; Engel, Jutta; Hölzel, Dieter; Mansmann, Ulrich

    2009-02-10

    Multivariate analysis of interval censored event data based on classical likelihood methods is notoriously cumbersome. Likelihood inference for models which additionally include random effects are not available at all. Developed algorithms bear problems for practical users like: matrix inversion, slow convergence, no assessment of statistical uncertainty. MCMC procedures combined with imputation are used to implement hierarchical models for interval censored data within a Bayesian framework. Two examples from clinical practice demonstrate the handling of clustered interval censored event times as well as multilayer random effects for inter-institutional quality assessment. The software developed is called survBayes and is freely available at CRAN. The proposed software supports the solution of complex analyses in many fields of clinical epidemiology as well as health services research.

  1. Directional data analysis under the general projected normal distribution

    PubMed Central

    Wang, Fangpo; Gelfand, Alan E.

    2013-01-01

    The projected normal distribution is an under-utilized model for explaining directional data. In particular, the general version provides flexibility, e.g., asymmetry and possible bimodality along with convenient regression specification. Here, we clarify the properties of this general class. We also develop fully Bayesian hierarchical models for analyzing circular data using this class. We show how they can be fit using MCMC methods with suitable latent variables. We show how posterior inference for distributional features such as the angular mean direction and concentration can be implemented as well as how prediction within the regression setting can be handled. With regard to model comparison, we argue for an out-of-sample approach using both a predictive likelihood scoring loss criterion and a cumulative rank probability score criterion. PMID:24046539

  2. Markov Chain Monte Carlo estimation of species distributions: a case study of the swift fox in western Kansas

    USGS Publications Warehouse

    Sargeant, Glen A.; Sovada, Marsha A.; Slivinski, Christiane C.; Johnson, Douglas H.

    2005-01-01

    Accurate maps of species distributions are essential tools for wildlife research and conservation. Unfortunately, biologists often are forced to rely on maps derived from observed occurrences recorded opportunistically during observation periods of variable length. Spurious inferences are likely to result because such maps are profoundly affected by the duration and intensity of observation and by methods used to delineate distributions, especially when detection is uncertain. We conducted a systematic survey of swift fox (Vulpes velox) distribution in western Kansas, USA, and used Markov chain Monte Carlo (MCMC) image restoration to rectify these problems. During 1997–1999, we searched 355 townships (ca. 93 km) 1–3 times each for an average cost of $7,315 per year and achieved a detection rate (probability of detecting swift foxes, if present, during a single search) of = 0.69 (95% Bayesian confidence interval [BCI] = [0.60, 0.77]). Our analysis produced an estimate of the underlying distribution, rather than a map of observed occurrences, that reflected the uncertainty associated with estimates of model parameters. To evaluate our results, we analyzed simulated data with similar properties. Results of our simulations suggest negligible bias and good precision when probabilities of detection on ≥1 survey occasions (cumulative probabilities of detection) exceed 0.65. Although the use of MCMC image restoration has been limited by theoretical and computational complexities, alternatives do not possess the same advantages. Image models accommodate uncertain detection, do not require spatially independent data or a census of map units, and can be used to estimate species distributions directly from observations without relying on habitat covariates or parameters that must be estimated subjectively. These features facilitate economical surveys of large regions, the detection of temporal trends in distribution, and assessments of landscape-level relations between species and habitats. Requirements for the use of MCMC image restoration include study areas that can be partitioned into regular grids of mapping units, spatially contagious species distributions, reliable methods for identifying target species, and cumulative probabilities of detection ≥0.65.

  3. Markov chain Monte Carlo estimation of species distributions: A case study of the swift fox in western Kansas

    USGS Publications Warehouse

    Sargeant, G.A.; Sovada, M.A.; Slivinski, C.C.; Johnson, D.H.

    2005-01-01

    Accurate maps of species distributions are essential tools for wildlife research and conservation. Unfortunately, biologists often are forced to rely on maps derived from observed occurrences recorded opportunistically during observation periods of variable length. Spurious inferences are likely to result because such maps are profoundly affected by the duration and intensity of observation and by methods used to delineate distributions, especially when detection is uncertain. We conducted a systematic survey of swift fox (Vulpes velox) distribution in western Kansas, USA, and used Markov chain Monte Carlo (MCMC) image restoration to rectify these problems. During 1997-1999, we searched 355 townships (ca. 93 km2) 1-3 times each for an average cost of $7,315 per year and achieved a detection rate (probability of detecting swift foxes, if present, during a single search) of ?? = 0.69 (95% Bayesian confidence interval [BCI] = [0.60, 0.77]). Our analysis produced an estimate of the underlying distribution, rather than a map of observed occurrences, that reflected the uncertainty associated with estimates of model parameters. To evaluate our results, we analyzed simulated data with similar properties. Results of our simulations suggest negligible bias and good precision when probabilities of detection on ???1 survey occasions (cumulative probabilities of detection) exceed 0.65. Although the use of MCMC image restoration has been limited by theoretical and computational complexities, alternatives do not possess the same advantages. Image models accommodate uncertain detection, do not require spatially independent data or a census of map units, and can be used to estimate species distributions directly from observations without relying on habitat covariates or parameters that must be estimated subjectively. These features facilitate economical surveys of large regions, the detection of temporal trends in distribution, and assessments of landscape-level relations between species and habitats. Requirements for the use of MCMC image restoration include study areas that can be partitioned into regular grids of mapping units, spatially contagious species distributions, reliable methods for identifying target species, and cumulative probabilities of detection ???0.65.

  4. Annealed Importance Sampling Reversible Jump MCMC algorithms

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Karagiannis, Georgios; Andrieu, Christophe

    2013-03-20

    It will soon be 20 years since reversible jump Markov chain Monte Carlo (RJ-MCMC) algorithms have been proposed. They have significantly extended the scope of Markov chain Monte Carlo simulation methods, offering the promise to be able to routinely tackle transdimensional sampling problems, as encountered in Bayesian model selection problems for example, in a principled and flexible fashion. Their practical efficient implementation, however, still remains a challenge. A particular difficulty encountered in practice is in the choice of the dimension matching variables (both their nature and their distribution) and the reversible transformations which allow one to define the one-to-one mappingsmore » underpinning the design of these algorithms. Indeed, even seemingly sensible choices can lead to algorithms with very poor performance. The focus of this paper is the development and performance evaluation of a method, annealed importance sampling RJ-MCMC (aisRJ), which addresses this problem by mitigating the sensitivity of RJ-MCMC algorithms to the aforementioned poor design. As we shall see the algorithm can be understood as being an “exact approximation” of an idealized MCMC algorithm that would sample from the model probabilities directly in a model selection set-up. Such an idealized algorithm may have good theoretical convergence properties, but typically cannot be implemented, and our algorithms can approximate the performance of such idealized algorithms to an arbitrary degree while not introducing any bias for any degree of approximation. Our approach combines the dimension matching ideas of RJ-MCMC with annealed importance sampling and its Markov chain Monte Carlo implementation. We illustrate the performance of the algorithm with numerical simulations which indicate that, although the approach may at first appear computationally involved, it is in fact competitive.« less

  5. Naima: a Python package for inference of particle distribution properties from nonthermal spectra

    NASA Astrophysics Data System (ADS)

    Zabalza, V.

    2015-07-01

    The ultimate goal of the observation of nonthermal emission from astrophysical sources is to understand the underlying particle acceleration and evolution processes, and few tools are publicly available to infer the particle distribution properties from the observed photon spectra from X-ray to VHE gamma rays. Here I present naima, an open source Python package that provides models for nonthermal radiative emission from homogeneous distribution of relativistic electrons and protons. Contributions from synchrotron, inverse Compton, nonthermal bremsstrahlung, and neutral-pion decay can be computed for a series of functional shapes of the particle energy distributions, with the possibility of using user-defined particle distribution functions. In addition, naima provides a set of functions that allow to use these models to fit observed nonthermal spectra through an MCMC procedure, obtaining probability distribution functions for the particle distribution parameters. Here I present the models and methods available in naima and an example of their application to the understanding of a galactic nonthermal source. naima's documentation, including how to install the package, is available at http://naima.readthedocs.org.

  6. Recovery of Graded Response Model Parameters: A Comparison of Marginal Maximum Likelihood and Markov Chain Monte Carlo Estimation

    ERIC Educational Resources Information Center

    Kieftenbeld, Vincent; Natesan, Prathiba

    2012-01-01

    Markov chain Monte Carlo (MCMC) methods enable a fully Bayesian approach to parameter estimation of item response models. In this simulation study, the authors compared the recovery of graded response model parameters using marginal maximum likelihood (MML) and Gibbs sampling (MCMC) under various latent trait distributions, test lengths, and…

  7. Using SAS PROC MCMC for Item Response Theory Models

    ERIC Educational Resources Information Center

    Ames, Allison J.; Samonte, Kelli

    2015-01-01

    Interest in using Bayesian methods for estimating item response theory models has grown at a remarkable rate in recent years. This attentiveness to Bayesian estimation has also inspired a growth in available software such as WinBUGS, R packages, BMIRT, MPLUS, and SAS PROC MCMC. This article intends to provide an accessible overview of Bayesian…

  8. Comparison of missing value imputation methods in time series: the case of Turkish meteorological data

    NASA Astrophysics Data System (ADS)

    Yozgatligil, Ceylan; Aslan, Sipan; Iyigun, Cem; Batmaz, Inci

    2013-04-01

    This study aims to compare several imputation methods to complete the missing values of spatio-temporal meteorological time series. To this end, six imputation methods are assessed with respect to various criteria including accuracy, robustness, precision, and efficiency for artificially created missing data in monthly total precipitation and mean temperature series obtained from the Turkish State Meteorological Service. Of these methods, simple arithmetic average, normal ratio (NR), and NR weighted with correlations comprise the simple ones, whereas multilayer perceptron type neural network and multiple imputation strategy adopted by Monte Carlo Markov Chain based on expectation-maximization (EM-MCMC) are computationally intensive ones. In addition, we propose a modification on the EM-MCMC method. Besides using a conventional accuracy measure based on squared errors, we also suggest the correlation dimension (CD) technique of nonlinear dynamic time series analysis which takes spatio-temporal dependencies into account for evaluating imputation performances. Depending on the detailed graphical and quantitative analysis, it can be said that although computational methods, particularly EM-MCMC method, are computationally inefficient, they seem favorable for imputation of meteorological time series with respect to different missingness periods considering both measures and both series studied. To conclude, using the EM-MCMC algorithm for imputing missing values before conducting any statistical analyses of meteorological data will definitely decrease the amount of uncertainty and give more robust results. Moreover, the CD measure can be suggested for the performance evaluation of missing data imputation particularly with computational methods since it gives more precise results in meteorological time series.

  9. Comparison of statistical sampling methods with ScannerBit, the GAMBIT scanning module

    NASA Astrophysics Data System (ADS)

    Martinez, Gregory D.; McKay, James; Farmer, Ben; Scott, Pat; Roebber, Elinore; Putze, Antje; Conrad, Jan

    2017-11-01

    We introduce ScannerBit, the statistics and sampling module of the public, open-source global fitting framework GAMBIT. ScannerBit provides a standardised interface to different sampling algorithms, enabling the use and comparison of multiple computational methods for inferring profile likelihoods, Bayesian posteriors, and other statistical quantities. The current version offers random, grid, raster, nested sampling, differential evolution, Markov Chain Monte Carlo (MCMC) and ensemble Monte Carlo samplers. We also announce the release of a new standalone differential evolution sampler, Diver, and describe its design, usage and interface to ScannerBit. We subject Diver and three other samplers (the nested sampler MultiNest, the MCMC GreAT, and the native ScannerBit implementation of the ensemble Monte Carlo algorithm T-Walk) to a battery of statistical tests. For this we use a realistic physical likelihood function, based on the scalar singlet model of dark matter. We examine the performance of each sampler as a function of its adjustable settings, and the dimensionality of the sampling problem. We evaluate performance on four metrics: optimality of the best fit found, completeness in exploring the best-fit region, number of likelihood evaluations, and total runtime. For Bayesian posterior estimation at high resolution, T-Walk provides the most accurate and timely mapping of the full parameter space. For profile likelihood analysis in less than about ten dimensions, we find that Diver and MultiNest score similarly in terms of best fit and speed, outperforming GreAT and T-Walk; in ten or more dimensions, Diver substantially outperforms the other three samplers on all metrics.

  10. Inferences of biogeographical histories within subfamily Hyacinthoideae using S-DIVA and Bayesian binary MCMC analysis implemented in RASP (Reconstruct Ancestral State in Phylogenies)

    PubMed Central

    Ali, Syed Shujait; Yu, Yan; Pfosser, Martin; Wetschnig, Wolfgang

    2012-01-01

    Background and Aims Subfamily Hyacinthoideae (Hyacinthaceae) comprises more than 400 species. Members are distributed in sub-Saharan Africa, Madagascar, India, eastern Asia, the Mediterranean region and Eurasia. Hyacinthoideae, like many other plant lineages, show disjunct distribution patterns. The aim of this study was to reconstruct the biogeographical history of Hyacinthoideae based on phylogenetic analyses, to find the possible ancestral range of Hyacinthoideae and to identify factors responsible for the current disjunct distribution pattern. Methods Parsimony and Bayesian approaches were applied to obtain phylogenetic trees, based on sequences of the trnL-F region. Biogeographical inferences were obtained by applying statistical dispersal-vicariance analysis (S-DIVA) and Bayesian binary MCMC (BBM) analysis implemented in RASP (Reconstruct Ancestral State in Phylogenies). Key Results S-DIVA and BBM analyses suggest that the Hyacinthoideae clade seem to have originated in sub-Saharan Africa. Dispersal and vicariance played vital roles in creating the disjunct distribution pattern. Results also suggest an early dispersal to the Mediterranean region, and thus the northward route (from sub-Saharan Africa to Mediterranean) of dispersal is plausible for members of subfamily Hyacinthoideae. Conclusions Biogeographical analyses reveal that subfamily Hyacinthoideae has originated in sub-Saharan Africa. S-DIVA indicates an early dispersal event to the Mediterranean region followed by a vicariance event, which resulted in Hyacintheae and Massonieae tribes. By contrast, BBM analysis favours dispersal to the Mediterranean region, eastern Asia and Europe. Biogeographical analysis suggests that sub-Saharan Africa and the Mediterranean region have played vital roles as centres of diversification and radiation within subfamily Hyacinthoideae. In this bimodal distribution pattern, sub-Saharan Africa is the primary centre of diversity and the Mediterranean region is the secondary centre of diversity. Sub-Saharan Africa was the source area for radiation toward Madagascar, the Mediterranean region and India. Radiations occurred from the Mediterranean region to eastern Asia, Europe, western Asia and India. PMID:22039008

  11. FPGA Acceleration of the phylogenetic likelihood function for Bayesian MCMC inference methods.

    PubMed

    Zierke, Stephanie; Bakos, Jason D

    2010-04-12

    Likelihood (ML)-based phylogenetic inference has become a popular method for estimating the evolutionary relationships among species based on genomic sequence data. This method is used in applications such as RAxML, GARLI, MrBayes, PAML, and PAUP. The Phylogenetic Likelihood Function (PLF) is an important kernel computation for this method. The PLF consists of a loop with no conditional behavior or dependencies between iterations. As such it contains a high potential for exploiting parallelism using micro-architectural techniques. In this paper, we describe a technique for mapping the PLF and supporting logic onto a Field Programmable Gate Array (FPGA)-based co-processor. By leveraging the FPGA's on-chip DSP modules and the high-bandwidth local memory attached to the FPGA, the resultant co-processor can accelerate ML-based methods and outperform state-of-the-art multi-core processors. We use the MrBayes 3 tool as a framework for designing our co-processor. For large datasets, we estimate that our accelerated MrBayes, if run on a current-generation FPGA, achieves a 10x speedup relative to software running on a state-of-the-art server-class microprocessor. The FPGA-based implementation achieves its performance by deeply pipelining the likelihood computations, performing multiple floating-point operations in parallel, and through a natural log approximation that is chosen specifically to leverage a deeply pipelined custom architecture. Heterogeneous computing, which combines general-purpose processors with special-purpose co-processors such as FPGAs and GPUs, is a promising approach for high-performance phylogeny inference as shown by the growing body of literature in this field. FPGAs in particular are well-suited for this task because of their low power consumption as compared to many-core processors and Graphics Processor Units (GPUs).

  12. Recovery of Item Parameters in the Nominal Response Model: A Comparison of Marginal Maximum Likelihood Estimation and Markov Chain Monte Carlo Estimation.

    ERIC Educational Resources Information Center

    Wollack, James A.; Bolt, Daniel M.; Cohen, Allan S.; Lee, Young-Sun

    2002-01-01

    Compared the quality of item parameter estimates for marginal maximum likelihood (MML) and Markov Chain Monte Carlo (MCMC) with the nominal response model using simulation. The quality of item parameter recovery was nearly identical for MML and MCMC, and both methods tended to produce good estimates. (SLD)

  13. Enhancing Data Assimilation by Evolutionary Particle Filter and Markov Chain Monte Carlo

    NASA Astrophysics Data System (ADS)

    Moradkhani, H.; Abbaszadeh, P.; Yan, H.

    2016-12-01

    Particle Filters (PFs) have received increasing attention by the researchers from different disciplines in hydro-geosciences as an effective method to improve model predictions in nonlinear and non-Gaussian dynamical systems. The implication of dual state and parameter estimation by means of data assimilation in hydrology and geoscience has evolved since 2005 from SIR-PF to PF-MCMC and now to the most effective and robust framework through evolutionary PF approach based on Genetic Algorithm (GA) and Markov Chain Monte Carlo (MCMC), the so-called EPF-MCMC. In this framework, the posterior distribution undergoes an evolutionary process to update an ensemble of prior states that more closely resemble realistic posterior probability distribution. The premise of this approach is that the particles move to optimal position using the GA optimization coupled with MCMC increasing the number of effective particles, hence the particle degeneracy is avoided while the particle diversity is improved. The proposed algorithm is applied on a conceptual and highly nonlinear hydrologic model and the effectiveness, robustness and reliability of the method in jointly estimating the states and parameters and also reducing the uncertainty is demonstrated for few river basins across the United States.

  14. Estimation of the four-wave mixing noise probability-density function by the multicanonical Monte Carlo method.

    PubMed

    Neokosmidis, Ioannis; Kamalakis, Thomas; Chipouras, Aristides; Sphicopoulos, Thomas

    2005-01-01

    The performance of high-powered wavelength-division multiplexed (WDM) optical networks can be severely degraded by four-wave-mixing- (FWM-) induced distortion. The multicanonical Monte Carlo method (MCMC) is used to calculate the probability-density function (PDF) of the decision variable of a receiver, limited by FWM noise. Compared with the conventional Monte Carlo method previously used to estimate this PDF, the MCMC method is much faster and can accurately estimate smaller error probabilities. The method takes into account the correlation between the components of the FWM noise, unlike the Gaussian model, which is shown not to provide accurate results.

  15. emcee: The MCMC Hammer

    NASA Astrophysics Data System (ADS)

    Foreman-Mackey, Daniel; Hogg, David W.; Lang, Dustin; Goodman, Jonathan

    2013-03-01

    We introduce a stable, well tested Python implementation of the affine-invariant ensemble sampler for Markov chain Monte Carlo (MCMC) proposed by Goodman & Weare (2010). The code is open source and has already been used in several published projects in the astrophysics literature. The algorithm behind emcee has several advantages over traditional MCMC sampling methods and it has excellent performance as measured by the autocorrelation time (or function calls per independent sample). One major advantage of the algorithm is that it requires hand-tuning of only 1 or 2 parameters compared to ˜N2 for a traditional algorithm in an N-dimensional parameter space. In this document, we describe the algorithm and the details of our implementation. Exploiting the parallelism of the ensemble method, emcee permits any user to take advantage of multiple CPU cores without extra effort. The code is available online at http://dan.iel.fm/emcee under the GNU General Public License v2.

  16. Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks

    PubMed Central

    2017-01-01

    Whole-genome sequencing of pathogens from host samples becomes more and more routine during infectious disease outbreaks. These data provide information on possible transmission events which can be used for further epidemiologic analyses, such as identification of risk factors for infectivity and transmission. However, the relationship between transmission events and sequence data is obscured by uncertainty arising from four largely unobserved processes: transmission, case observation, within-host pathogen dynamics and mutation. To properly resolve transmission events, these processes need to be taken into account. Recent years have seen much progress in theory and method development, but existing applications make simplifying assumptions that often break up the dependency between the four processes, or are tailored to specific datasets with matching model assumptions and code. To obtain a method with wider applicability, we have developed a novel approach to reconstruct transmission trees with sequence data. Our approach combines elementary models for transmission, case observation, within-host pathogen dynamics, and mutation, under the assumption that the outbreak is over and all cases have been observed. We use Bayesian inference with MCMC for which we have designed novel proposal steps to efficiently traverse the posterior distribution, taking account of all unobserved processes at once. This allows for efficient sampling of transmission trees from the posterior distribution, and robust estimation of consensus transmission trees. We implemented the proposed method in a new R package phybreak. The method performs well in tests of both new and published simulated data. We apply the model to five datasets on densely sampled infectious disease outbreaks, covering a wide range of epidemiological settings. Using only sampling times and sequences as data, our analyses confirmed the original results or improved on them: the more realistic infection times place more confidence in the inferred transmission trees. PMID:28545083

  17. Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks.

    PubMed

    Klinkenberg, Don; Backer, Jantien A; Didelot, Xavier; Colijn, Caroline; Wallinga, Jacco

    2017-05-01

    Whole-genome sequencing of pathogens from host samples becomes more and more routine during infectious disease outbreaks. These data provide information on possible transmission events which can be used for further epidemiologic analyses, such as identification of risk factors for infectivity and transmission. However, the relationship between transmission events and sequence data is obscured by uncertainty arising from four largely unobserved processes: transmission, case observation, within-host pathogen dynamics and mutation. To properly resolve transmission events, these processes need to be taken into account. Recent years have seen much progress in theory and method development, but existing applications make simplifying assumptions that often break up the dependency between the four processes, or are tailored to specific datasets with matching model assumptions and code. To obtain a method with wider applicability, we have developed a novel approach to reconstruct transmission trees with sequence data. Our approach combines elementary models for transmission, case observation, within-host pathogen dynamics, and mutation, under the assumption that the outbreak is over and all cases have been observed. We use Bayesian inference with MCMC for which we have designed novel proposal steps to efficiently traverse the posterior distribution, taking account of all unobserved processes at once. This allows for efficient sampling of transmission trees from the posterior distribution, and robust estimation of consensus transmission trees. We implemented the proposed method in a new R package phybreak. The method performs well in tests of both new and published simulated data. We apply the model to five datasets on densely sampled infectious disease outbreaks, covering a wide range of epidemiological settings. Using only sampling times and sequences as data, our analyses confirmed the original results or improved on them: the more realistic infection times place more confidence in the inferred transmission trees.

  18. Comparison of Bootstrapping and Markov Chain Monte Carlo for Copula Analysis of Hydrological Droughts

    NASA Astrophysics Data System (ADS)

    Yang, P.; Ng, T. L.; Yang, W.

    2015-12-01

    Effective water resources management depends on the reliable estimation of the uncertainty of drought events. Confidence intervals (CIs) are commonly applied to quantify this uncertainty. A CI seeks to be at the minimal length necessary to cover the true value of the estimated variable with the desired probability. In drought analysis where two or more variables (e.g., duration and severity) are often used to describe a drought, copulas have been found suitable for representing the joint probability behavior of these variables. However, the comprehensive assessment of the parameter uncertainties of copulas of droughts has been largely ignored, and the few studies that have recognized this issue have not explicitly compared the various methods to produce the best CIs. Thus, the objective of this study to compare the CIs generated using two widely applied uncertainty estimation methods, bootstrapping and Markov Chain Monte Carlo (MCMC). To achieve this objective, (1) the marginal distributions lognormal, Gamma, and Generalized Extreme Value, and the copula functions Clayton, Frank, and Plackett are selected to construct joint probability functions of two drought related variables. (2) The resulting joint functions are then fitted to 200 sets of simulated realizations of drought events with known distribution and extreme parameters and (3) from there, using bootstrapping and MCMC, CIs of the parameters are generated and compared. The effect of an informative prior on the CIs generated by MCMC is also evaluated. CIs are produced for different sample sizes (50, 100, and 200) of the simulated drought events for fitting the joint probability functions. Preliminary results assuming lognormal marginal distributions and the Clayton copula function suggest that for cases with small or medium sample sizes (~50-100), MCMC to be superior method if an informative prior exists. Where an informative prior is unavailable, for small sample sizes (~50), both bootstrapping and MCMC yield the same level of performance, and for medium sample sizes (~100), bootstrapping is better. For cases with a large sample size (~200), there is little difference between the CIs generated using bootstrapping and MCMC regardless of whether or not an informative prior exists.

  19. Bayesian inversion of surface-wave data for radial and azimuthal shear-wave anisotropy, with applications to central Mongolia and west-central Italy

    NASA Astrophysics Data System (ADS)

    Ravenna, Matteo; Lebedev, Sergei

    2018-04-01

    Seismic anisotropy provides important information on the deformation history of the Earth's interior. Rayleigh and Love surface-waves are sensitive to and can be used to determine both radial and azimuthal shear-wave anisotropies at depth, but parameter trade-offs give rise to substantial model non-uniqueness. Here, we explore the trade-offs between isotropic and anisotropic structure parameters and present a suite of methods for the inversion of surface-wave, phase-velocity curves for radial and azimuthal anisotropies. One Markov chain Monte Carlo (McMC) implementation inverts Rayleigh and Love dispersion curves for a radially anisotropic shear velocity profile of the crust and upper mantle. Another McMC implementation inverts Rayleigh phase velocities and their azimuthal anisotropy for profiles of vertically polarized shear velocity and its depth-dependent azimuthal anisotropy. The azimuthal anisotropy inversion is fully non-linear, with the forward problem solved numerically at different azimuths for every model realization, which ensures that any linearization biases are avoided. The computations are performed in parallel, in order to reduce the computing time. The often challenging issue of data noise estimation is addressed by means of a Hierarchical Bayesian approach, with the variance of the noise treated as an unknown during the radial anisotropy inversion. In addition to the McMC inversions, we also present faster, non-linear gradient-search inversions for the same anisotropic structure. The results of the two approaches are mutually consistent; the advantage of the McMC inversions is that they provide a measure of uncertainty of the models. Applying the method to broad-band data from the Baikal-central Mongolia region, we determine radial anisotropy from the crust down to the transition-zone depths. Robust negative anisotropy (Vsh < Vsv) in the asthenosphere, at 100-300 km depths, presents strong new evidence for a vertical component of asthenospheric flow. This is consistent with an upward flow from below the thick lithosphere of the Siberian Craton to below the thinner lithosphere of central Mongolia, likely to give rise to decompression melting and the scattered, sporadic volcanism observed in the Baikal Rift area, as proposed previously. Inversion of phase-velocity data from west-central Italy for azimuthal anisotropy reveals a clear change in the shear-wave fast-propagation direction at 70-100 km depths, near the lithosphere-asthenosphere boundary. The orientation of the fabric in the lithosphere is roughly E-W, parallel to the direction of stretching over the last 10 m.y. The orientation of the fabric in the asthenosphere is NW-SE, matching the fast directions inferred from shear-wave splitting and probably indicating the direction of the asthenospheric flow.

  20. The use of simple reparameterizations to improve the efficiency of Markov chain Monte Carlo estimation for multilevel models with applications to discrete time survival models.

    PubMed

    Browne, William J; Steele, Fiona; Golalizadeh, Mousa; Green, Martin J

    2009-06-01

    We consider the application of Markov chain Monte Carlo (MCMC) estimation methods to random-effects models and in particular the family of discrete time survival models. Survival models can be used in many situations in the medical and social sciences and we illustrate their use through two examples that differ in terms of both substantive area and data structure. A multilevel discrete time survival analysis involves expanding the data set so that the model can be cast as a standard multilevel binary response model. For such models it has been shown that MCMC methods have advantages in terms of reducing estimate bias. However, the data expansion results in very large data sets for which MCMC estimation is often slow and can produce chains that exhibit poor mixing. Any way of improving the mixing will result in both speeding up the methods and more confidence in the estimates that are produced. The MCMC methodological literature is full of alternative algorithms designed to improve mixing of chains and we describe three reparameterization techniques that are easy to implement in available software. We consider two examples of multilevel survival analysis: incidence of mastitis in dairy cattle and contraceptive use dynamics in Indonesia. For each application we show where the reparameterization techniques can be used and assess their performance.

  1. Corruption of accuracy and efficiency of Markov chain Monte Carlo simulation by inaccurate numerical implementation of conceptual hydrologic models

    NASA Astrophysics Data System (ADS)

    Schoups, G.; Vrugt, J. A.; Fenicia, F.; van de Giesen, N. C.

    2010-10-01

    Conceptual rainfall-runoff models have traditionally been applied without paying much attention to numerical errors induced by temporal integration of water balance dynamics. Reliance on first-order, explicit, fixed-step integration methods leads to computationally cheap simulation models that are easy to implement. Computational speed is especially desirable for estimating parameter and predictive uncertainty using Markov chain Monte Carlo (MCMC) methods. Confirming earlier work of Kavetski et al. (2003), we show here that the computational speed of first-order, explicit, fixed-step integration methods comes at a cost: for a case study with a spatially lumped conceptual rainfall-runoff model, it introduces artificial bimodality in the marginal posterior parameter distributions, which is not present in numerically accurate implementations of the same model. The resulting effects on MCMC simulation include (1) inconsistent estimates of posterior parameter and predictive distributions, (2) poor performance and slow convergence of the MCMC algorithm, and (3) unreliable convergence diagnosis using the Gelman-Rubin statistic. We studied several alternative numerical implementations to remedy these problems, including various adaptive-step finite difference schemes and an operator splitting method. Our results show that adaptive-step, second-order methods, based on either explicit finite differencing or operator splitting with analytical integration, provide the best alternative for accurate and efficient MCMC simulation. Fixed-step or adaptive-step implicit methods may also be used for increased accuracy, but they cannot match the efficiency of adaptive-step explicit finite differencing or operator splitting. Of the latter two, explicit finite differencing is more generally applicable and is preferred if the individual hydrologic flux laws cannot be integrated analytically, as the splitting method then loses its advantage.

  2. A Bayesian approach for parameter estimation and prediction using a computationally intensive model

    DOE PAGES

    Higdon, Dave; McDonnell, Jordan D.; Schunck, Nicolas; ...

    2015-02-05

    Bayesian methods have been successful in quantifying uncertainty in physics-based problems in parameter estimation and prediction. In these cases, physical measurements y are modeled as the best fit of a physics-based modelmore » $$\\eta (\\theta )$$, where θ denotes the uncertain, best input setting. Hence the statistical model is of the form $$y=\\eta (\\theta )+\\epsilon ,$$ where $$\\epsilon $$ accounts for measurement, and possibly other, error sources. When nonlinearity is present in $$\\eta (\\cdot )$$, the resulting posterior distribution for the unknown parameters in the Bayesian formulation is typically complex and nonstandard, requiring computationally demanding computational approaches such as Markov chain Monte Carlo (MCMC) to produce multivariate draws from the posterior. Although generally applicable, MCMC requires thousands (or even millions) of evaluations of the physics model $$\\eta (\\cdot )$$. This requirement is problematic if the model takes hours or days to evaluate. To overcome this computational bottleneck, we present an approach adapted from Bayesian model calibration. This approach combines output from an ensemble of computational model runs with physical measurements, within a statistical formulation, to carry out inference. A key component of this approach is a statistical response surface, or emulator, estimated from the ensemble of model runs. We demonstrate this approach with a case study in estimating parameters for a density functional theory model, using experimental mass/binding energy measurements from a collection of atomic nuclei. Lastly, we also demonstrate how this approach produces uncertainties in predictions for recent mass measurements obtained at Argonne National Laboratory.« less

  3. Monte Carlo Analysis of Reservoir Models Using Seismic Data and Geostatistical Models

    NASA Astrophysics Data System (ADS)

    Zunino, A.; Mosegaard, K.; Lange, K.; Melnikova, Y.; Hansen, T. M.

    2013-12-01

    We present a study on the analysis of petroleum reservoir models consistent with seismic data and geostatistical constraints performed on a synthetic reservoir model. Our aim is to invert directly for structure and rock bulk properties of the target reservoir zone. To infer the rock facies, porosity and oil saturation seismology alone is not sufficient but a rock physics model must be taken into account, which links the unknown properties to the elastic parameters. We then combine a rock physics model with a simple convolutional approach for seismic waves to invert the "measured" seismograms. To solve this inverse problem, we employ a Markov chain Monte Carlo (MCMC) method, because it offers the possibility to handle non-linearity, complex and multi-step forward models and provides realistic estimates of uncertainties. However, for large data sets the MCMC method may be impractical because of a very high computational demand. To face this challenge one strategy is to feed the algorithm with realistic models, hence relying on proper prior information. To address this problem, we utilize an algorithm drawn from geostatistics to generate geologically plausible models which represent samples of the prior distribution. The geostatistical algorithm learns the multiple-point statistics from prototype models (in the form of training images), then generates thousands of different models which are accepted or rejected by a Metropolis sampler. To further reduce the computation time we parallelize the software and run it on multi-core machines. The solution of the inverse problem is then represented by a collection of reservoir models in terms of facies, porosity and oil saturation, which constitute samples of the posterior distribution. We are finally able to produce probability maps of the properties we are interested in by performing statistical analysis on the collection of solutions.

  4. Bayesian estimation of differential transcript usage from RNA-seq data.

    PubMed

    Papastamoulis, Panagiotis; Rattray, Magnus

    2017-11-27

    Next generation sequencing allows the identification of genes consisting of differentially expressed transcripts, a term which usually refers to changes in the overall expression level. A specific type of differential expression is differential transcript usage (DTU) and targets changes in the relative within gene expression of a transcript. The contribution of this paper is to: (a) extend the use of cjBitSeq to the DTU context, a previously introduced Bayesian model which is originally designed for identifying changes in overall expression levels and (b) propose a Bayesian version of DRIMSeq, a frequentist model for inferring DTU. cjBitSeq is a read based model and performs fully Bayesian inference by MCMC sampling on the space of latent state of each transcript per gene. BayesDRIMSeq is a count based model and estimates the Bayes Factor of a DTU model against a null model using Laplace's approximation. The proposed models are benchmarked against the existing ones using a recent independent simulation study as well as a real RNA-seq dataset. Our results suggest that the Bayesian methods exhibit similar performance with DRIMSeq in terms of precision/recall but offer better calibration of False Discovery Rate.

  5. Multivariate Copula Analysis Toolbox (MvCAT): Describing dependence and underlying uncertainty using a Bayesian framework

    NASA Astrophysics Data System (ADS)

    Sadegh, Mojtaba; Ragno, Elisa; AghaKouchak, Amir

    2017-06-01

    We present a newly developed Multivariate Copula Analysis Toolbox (MvCAT) which includes a wide range of copula families with different levels of complexity. MvCAT employs a Bayesian framework with a residual-based Gaussian likelihood function for inferring copula parameters and estimating the underlying uncertainties. The contribution of this paper is threefold: (a) providing a Bayesian framework to approximate the predictive uncertainties of fitted copulas, (b) introducing a hybrid-evolution Markov Chain Monte Carlo (MCMC) approach designed for numerical estimation of the posterior distribution of copula parameters, and (c) enabling the community to explore a wide range of copulas and evaluate them relative to the fitting uncertainties. We show that the commonly used local optimization methods for copula parameter estimation often get trapped in local minima. The proposed method, however, addresses this limitation and improves describing the dependence structure. MvCAT also enables evaluation of uncertainties relative to the length of record, which is fundamental to a wide range of applications such as multivariate frequency analysis.

  6. Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets.

    PubMed

    Datta, Abhirup; Banerjee, Sudipto; Finley, Andrew O; Gelfand, Alan E

    2016-01-01

    Spatial process models for analyzing geostatistical data entail computations that become prohibitive as the number of spatial locations become large. This article develops a class of highly scalable nearest-neighbor Gaussian process (NNGP) models to provide fully model-based inference for large geostatistical datasets. We establish that the NNGP is a well-defined spatial process providing legitimate finite-dimensional Gaussian densities with sparse precision matrices. We embed the NNGP as a sparsity-inducing prior within a rich hierarchical modeling framework and outline how computationally efficient Markov chain Monte Carlo (MCMC) algorithms can be executed without storing or decomposing large matrices. The floating point operations (flops) per iteration of this algorithm is linear in the number of spatial locations, thereby rendering substantial scalability. We illustrate the computational and inferential benefits of the NNGP over competing methods using simulation studies and also analyze forest biomass from a massive U.S. Forest Inventory dataset at a scale that precludes alternative dimension-reducing methods. Supplementary materials for this article are available online.

  7. Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets

    PubMed Central

    Datta, Abhirup; Banerjee, Sudipto; Finley, Andrew O.; Gelfand, Alan E.

    2018-01-01

    Spatial process models for analyzing geostatistical data entail computations that become prohibitive as the number of spatial locations become large. This article develops a class of highly scalable nearest-neighbor Gaussian process (NNGP) models to provide fully model-based inference for large geostatistical datasets. We establish that the NNGP is a well-defined spatial process providing legitimate finite-dimensional Gaussian densities with sparse precision matrices. We embed the NNGP as a sparsity-inducing prior within a rich hierarchical modeling framework and outline how computationally efficient Markov chain Monte Carlo (MCMC) algorithms can be executed without storing or decomposing large matrices. The floating point operations (flops) per iteration of this algorithm is linear in the number of spatial locations, thereby rendering substantial scalability. We illustrate the computational and inferential benefits of the NNGP over competing methods using simulation studies and also analyze forest biomass from a massive U.S. Forest Inventory dataset at a scale that precludes alternative dimension-reducing methods. Supplementary materials for this article are available online. PMID:29720777

  8. Inverse Modeling Using Markov Chain Monte Carlo Aided by Adaptive Stochastic Collocation Method with Transformation

    NASA Astrophysics Data System (ADS)

    Zhang, D.; Liao, Q.

    2016-12-01

    The Bayesian inference provides a convenient framework to solve statistical inverse problems. In this method, the parameters to be identified are treated as random variables. The prior knowledge, the system nonlinearity, and the measurement errors can be directly incorporated in the posterior probability density function (PDF) of the parameters. The Markov chain Monte Carlo (MCMC) method is a powerful tool to generate samples from the posterior PDF. However, since the MCMC usually requires thousands or even millions of forward simulations, it can be a computationally intensive endeavor, particularly when faced with large-scale flow and transport models. To address this issue, we construct a surrogate system for the model responses in the form of polynomials by the stochastic collocation method. In addition, we employ interpolation based on the nested sparse grids and takes into account the different importance of the parameters, under the condition of high random dimensions in the stochastic space. Furthermore, in case of low regularity such as discontinuous or unsmooth relation between the input parameters and the output responses, we introduce an additional transform process to improve the accuracy of the surrogate model. Once we build the surrogate system, we may evaluate the likelihood with very little computational cost. We analyzed the convergence rate of the forward solution and the surrogate posterior by Kullback-Leibler divergence, which quantifies the difference between probability distributions. The fast convergence of the forward solution implies fast convergence of the surrogate posterior to the true posterior. We also tested the proposed algorithm on water-flooding two-phase flow reservoir examples. The posterior PDF calculated from a very long chain with direct forward simulation is assumed to be accurate. The posterior PDF calculated using the surrogate model is in reasonable agreement with the reference, revealing a great improvement in terms of computational efficiency.

  9. Comparison of Filtering Methods for the Modeling and Retrospective Forecasting of Influenza Epidemics

    PubMed Central

    Yang, Wan; Karspeck, Alicia; Shaman, Jeffrey

    2014-01-01

    A variety of filtering methods enable the recursive estimation of system state variables and inference of model parameters. These methods have found application in a range of disciplines and settings, including engineering design and forecasting, and, over the last two decades, have been applied to infectious disease epidemiology. For any system of interest, the ideal filter depends on the nonlinearity and complexity of the model to which it is applied, the quality and abundance of observations being entrained, and the ultimate application (e.g. forecast, parameter estimation, etc.). Here, we compare the performance of six state-of-the-art filter methods when used to model and forecast influenza activity. Three particle filters—a basic particle filter (PF) with resampling and regularization, maximum likelihood estimation via iterated filtering (MIF), and particle Markov chain Monte Carlo (pMCMC)—and three ensemble filters—the ensemble Kalman filter (EnKF), the ensemble adjustment Kalman filter (EAKF), and the rank histogram filter (RHF)—were used in conjunction with a humidity-forced susceptible-infectious-recovered-susceptible (SIRS) model and weekly estimates of influenza incidence. The modeling frameworks, first validated with synthetic influenza epidemic data, were then applied to fit and retrospectively forecast the historical incidence time series of seven influenza epidemics during 2003–2012, for 115 cities in the United States. Results suggest that when using the SIRS model the ensemble filters and the basic PF are more capable of faithfully recreating historical influenza incidence time series, while the MIF and pMCMC do not perform as well for multimodal outbreaks. For forecast of the week with the highest influenza activity, the accuracies of the six model-filter frameworks are comparable; the three particle filters perform slightly better predicting peaks 1–5 weeks in the future; the ensemble filters are more accurate predicting peaks in the past. PMID:24762780

  10. Adaptive MCMC in Bayesian phylogenetics: an application to analyzing partitioned data in BEAST.

    PubMed

    Baele, Guy; Lemey, Philippe; Rambaut, Andrew; Suchard, Marc A

    2017-06-15

    Advances in sequencing technology continue to deliver increasingly large molecular sequence datasets that are often heavily partitioned in order to accurately model the underlying evolutionary processes. In phylogenetic analyses, partitioning strategies involve estimating conditionally independent models of molecular evolution for different genes and different positions within those genes, requiring a large number of evolutionary parameters that have to be estimated, leading to an increased computational burden for such analyses. The past two decades have also seen the rise of multi-core processors, both in the central processing unit (CPU) and Graphics processing unit processor markets, enabling massively parallel computations that are not yet fully exploited by many software packages for multipartite analyses. We here propose a Markov chain Monte Carlo (MCMC) approach using an adaptive multivariate transition kernel to estimate in parallel a large number of parameters, split across partitioned data, by exploiting multi-core processing. Across several real-world examples, we demonstrate that our approach enables the estimation of these multipartite parameters more efficiently than standard approaches that typically use a mixture of univariate transition kernels. In one case, when estimating the relative rate parameter of the non-coding partition in a heterochronous dataset, MCMC integration efficiency improves by > 14-fold. Our implementation is part of the BEAST code base, a widely used open source software package to perform Bayesian phylogenetic inference. guy.baele@kuleuven.be. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  11. A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data.

    PubMed

    Liang, Faming; Kim, Jinsu; Song, Qifan

    2016-01-01

    Markov chain Monte Carlo (MCMC) methods have proven to be a very powerful tool for analyzing data of complex structures. However, their computer-intensive nature, which typically require a large number of iterations and a complete scan of the full dataset for each iteration, precludes their use for big data analysis. In this paper, we propose the so-called bootstrap Metropolis-Hastings (BMH) algorithm, which provides a general framework for how to tame powerful MCMC methods to be used for big data analysis; that is to replace the full data log-likelihood by a Monte Carlo average of the log-likelihoods that are calculated in parallel from multiple bootstrap samples. The BMH algorithm possesses an embarrassingly parallel structure and avoids repeated scans of the full dataset in iterations, and is thus feasible for big data problems. Compared to the popular divide-and-combine method, BMH can be generally more efficient as it can asymptotically integrate the whole data information into a single simulation run. The BMH algorithm is very flexible. Like the Metropolis-Hastings algorithm, it can serve as a basic building block for developing advanced MCMC algorithms that are feasible for big data problems. This is illustrated in the paper by the tempering BMH algorithm, which can be viewed as a combination of parallel tempering and the BMH algorithm. BMH can also be used for model selection and optimization by combining with reversible jump MCMC and simulated annealing, respectively.

  12. A Bootstrap Metropolis–Hastings Algorithm for Bayesian Analysis of Big Data

    PubMed Central

    Kim, Jinsu; Song, Qifan

    2016-01-01

    Markov chain Monte Carlo (MCMC) methods have proven to be a very powerful tool for analyzing data of complex structures. However, their computer-intensive nature, which typically require a large number of iterations and a complete scan of the full dataset for each iteration, precludes their use for big data analysis. In this paper, we propose the so-called bootstrap Metropolis-Hastings (BMH) algorithm, which provides a general framework for how to tame powerful MCMC methods to be used for big data analysis; that is to replace the full data log-likelihood by a Monte Carlo average of the log-likelihoods that are calculated in parallel from multiple bootstrap samples. The BMH algorithm possesses an embarrassingly parallel structure and avoids repeated scans of the full dataset in iterations, and is thus feasible for big data problems. Compared to the popular divide-and-combine method, BMH can be generally more efficient as it can asymptotically integrate the whole data information into a single simulation run. The BMH algorithm is very flexible. Like the Metropolis-Hastings algorithm, it can serve as a basic building block for developing advanced MCMC algorithms that are feasible for big data problems. This is illustrated in the paper by the tempering BMH algorithm, which can be viewed as a combination of parallel tempering and the BMH algorithm. BMH can also be used for model selection and optimization by combining with reversible jump MCMC and simulated annealing, respectively. PMID:29033469

  13. Quantifying MCMC exploration of phylogenetic tree space.

    PubMed

    Whidden, Chris; Matsen, Frederick A

    2015-05-01

    In order to gain an understanding of the effectiveness of phylogenetic Markov chain Monte Carlo (MCMC), it is important to understand how quickly the empirical distribution of the MCMC converges to the posterior distribution. In this article, we investigate this problem on phylogenetic tree topologies with a metric that is especially well suited to the task: the subtree prune-and-regraft (SPR) metric. This metric directly corresponds to the minimum number of MCMC rearrangements required to move between trees in common phylogenetic MCMC implementations. We develop a novel graph-based approach to analyze tree posteriors and find that the SPR metric is much more informative than simpler metrics that are unrelated to MCMC moves. In doing so, we show conclusively that topological peaks do occur in Bayesian phylogenetic posteriors from real data sets as sampled with standard MCMC approaches, investigate the efficiency of Metropolis-coupled MCMC (MCMCMC) in traversing the valleys between peaks, and show that conditional clade distribution (CCD) can have systematic problems when there are multiple peaks. © The Author(s) 2015. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.

  14. The estimation of lower refractivity uncertainty from radar sea clutter using the Bayesian—MCMC method

    NASA Astrophysics Data System (ADS)

    Sheng, Zheng

    2013-02-01

    The estimation of lower atmospheric refractivity from radar sea clutter (RFC) is a complicated nonlinear optimization problem. This paper deals with the RFC problem in a Bayesian framework. It uses the unbiased Markov Chain Monte Carlo (MCMC) sampling technique, which can provide accurate posterior probability distributions of the estimated refractivity parameters by using an electromagnetic split-step fast Fourier transform terrain parabolic equation propagation model within a Bayesian inversion framework. In contrast to the global optimization algorithm, the Bayesian—MCMC can obtain not only the approximate solutions, but also the probability distributions of the solutions, that is, uncertainty analyses of solutions. The Bayesian—MCMC algorithm is implemented on the simulation radar sea-clutter data and the real radar sea-clutter data. Reference data are assumed to be simulation data and refractivity profiles are obtained using a helicopter. The inversion algorithm is assessed (i) by comparing the estimated refractivity profiles from the assumed simulation and the helicopter sounding data; (ii) the one-dimensional (1D) and two-dimensional (2D) posterior probability distribution of solutions.

  15. Analysing child mortality in Nigeria with geoadditive discrete-time survival models.

    PubMed

    Adebayo, Samson B; Fahrmeir, Ludwig

    2005-03-15

    Child mortality reflects a country's level of socio-economic development and quality of life. In developing countries, mortality rates are not only influenced by socio-economic, demographic and health variables but they also vary considerably across regions and districts. In this paper, we analysed child mortality in Nigeria with flexible geoadditive discrete-time survival models. This class of models allows us to measure small-area district-specific spatial effects simultaneously with possibly non-linear or time-varying effects of other factors. Inference is fully Bayesian and uses computationally efficient Markov chain Monte Carlo (MCMC) simulation techniques. The application is based on the 1999 Nigeria Demographic and Health Survey. Our method assesses effects at a high level of temporal and spatial resolution not available with traditional parametric models, and the results provide some evidence on how to reduce child mortality by improving socio-economic and public health conditions. Copyright (c) 2004 John Wiley & Sons, Ltd.

  16. Itô-SDE MCMC method for Bayesian characterization of errors associated with data limitations in stochastic expansion methods for uncertainty quantification

    NASA Astrophysics Data System (ADS)

    Arnst, M.; Abello Álvarez, B.; Ponthot, J.-P.; Boman, R.

    2017-11-01

    This paper is concerned with the characterization and the propagation of errors associated with data limitations in polynomial-chaos-based stochastic methods for uncertainty quantification. Such an issue can arise in uncertainty quantification when only a limited amount of data is available. When the available information does not suffice to accurately determine the probability distributions that must be assigned to the uncertain variables, the Bayesian method for assigning these probability distributions becomes attractive because it allows the stochastic model to account explicitly for insufficiency of the available information. In previous work, such applications of the Bayesian method had already been implemented by using the Metropolis-Hastings and Gibbs Markov Chain Monte Carlo (MCMC) methods. In this paper, we present an alternative implementation, which uses an alternative MCMC method built around an Itô stochastic differential equation (SDE) that is ergodic for the Bayesian posterior. We draw together from the mathematics literature a number of formal properties of this Itô SDE that lend support to its use in the implementation of the Bayesian method, and we describe its discretization, including the choice of the free parameters, by using the implicit Euler method. We demonstrate the proposed methodology on a problem of uncertainty quantification in a complex nonlinear engineering application relevant to metal forming.

  17. Do race, ethnicity, and psychiatric diagnoses matter in the prevalence of multiple chronic medical conditions?

    PubMed

    Cabassa, Leopoldo J; Humensky, Jennifer; Druss, Benjamin; Lewis-Fernández, Roberto; Gomes, Arminda P; Wang, Shuai; Blanco, Carlos

    2013-06-01

    The proportion of people in the United States with multiple chronic medical conditions (MCMC) is increasing. Yet, little is known about the relationship that race, ethnicity, and psychiatric disorders have on the prevalence of MCMCs in the general population. This study used data from wave 2 of the National Epidemiologic Survey on Alcohol and Related Conditions (N=33,107). Multinomial logistic regression models adjusting for sociodemographic variables, body mass index, and quality of life were used to examine differences in the 12-month prevalence of MCMC by race/ethnicity, psychiatric diagnosis, and the interactions between race/ethnicity and psychiatric diagnosis. Compared to non-Hispanic Whites, Hispanics reported lower odds of MCMC and African Americans reported higher odds of MCMC after adjusting for covariates. People with psychiatric disorders reported higher odds of MCMC compared with people without psychiatric disorders. There were significant interactions between race and psychiatric diagnosis associated with rates of MCMC. In the presence of certain psychiatric disorders, the odds of MCMC were higher among African Americans with psychiatric disorders compared to non-Hispanic Whites with similar psychiatric disorders. Our study results indicate that race, ethnicity, and psychiatric disorders are associated with the prevalence of MCMC. As the rates of MCMC rise, it is critical to identify which populations are at increased risk and how to best direct services to address their health care needs.

  18. An MCMC method for the evaluation of the Fisher information matrix for non-linear mixed effect models.

    PubMed

    Riviere, Marie-Karelle; Ueckert, Sebastian; Mentré, France

    2016-10-01

    Non-linear mixed effect models (NLMEMs) are widely used for the analysis of longitudinal data. To design these studies, optimal design based on the expected Fisher information matrix (FIM) can be used instead of performing time-consuming clinical trial simulations. In recent years, estimation algorithms for NLMEMs have transitioned from linearization toward more exact higher-order methods. Optimal design, on the other hand, has mainly relied on first-order (FO) linearization to calculate the FIM. Although efficient in general, FO cannot be applied to complex non-linear models and with difficulty in studies with discrete data. We propose an approach to evaluate the expected FIM in NLMEMs for both discrete and continuous outcomes. We used Markov Chain Monte Carlo (MCMC) to integrate the derivatives of the log-likelihood over the random effects, and Monte Carlo to evaluate its expectation w.r.t. the observations. Our method was implemented in R using Stan, which efficiently draws MCMC samples and calculates partial derivatives of the log-likelihood. Evaluated on several examples, our approach showed good performance with relative standard errors (RSEs) close to those obtained by simulations. We studied the influence of the number of MC and MCMC samples and computed the uncertainty of the FIM evaluation. We also compared our approach to Adaptive Gaussian Quadrature, Laplace approximation, and FO. Our method is available in R-package MIXFIM and can be used to evaluate the FIM, its determinant with confidence intervals (CIs), and RSEs with CIs. © The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  19. Assessment of parameter uncertainty in hydrological model using a Markov-Chain-Monte-Carlo-based multilevel-factorial-analysis method

    NASA Astrophysics Data System (ADS)

    Zhang, Junlong; Li, Yongping; Huang, Guohe; Chen, Xi; Bao, Anming

    2016-07-01

    Without a realistic assessment of parameter uncertainty, decision makers may encounter difficulties in accurately describing hydrologic processes and assessing relationships between model parameters and watershed characteristics. In this study, a Markov-Chain-Monte-Carlo-based multilevel-factorial-analysis (MCMC-MFA) method is developed, which can not only generate samples of parameters from a well constructed Markov chain and assess parameter uncertainties with straightforward Bayesian inference, but also investigate the individual and interactive effects of multiple parameters on model output through measuring the specific variations of hydrological responses. A case study is conducted for addressing parameter uncertainties in the Kaidu watershed of northwest China. Effects of multiple parameters and their interactions are quantitatively investigated using the MCMC-MFA with a three-level factorial experiment (totally 81 runs). A variance-based sensitivity analysis method is used to validate the results of parameters' effects. Results disclose that (i) soil conservation service runoff curve number for moisture condition II (CN2) and fraction of snow volume corresponding to 50% snow cover (SNO50COV) are the most significant factors to hydrological responses, implying that infiltration-excess overland flow and snow water equivalent represent important water input to the hydrological system of the Kaidu watershed; (ii) saturate hydraulic conductivity (SOL_K) and soil evaporation compensation factor (ESCO) have obvious effects on hydrological responses; this implies that the processes of percolation and evaporation would impact hydrological process in this watershed; (iii) the interactions of ESCO and SNO50COV as well as CN2 and SNO50COV have an obvious effect, implying that snow cover can impact the generation of runoff on land surface and the extraction of soil evaporative demand in lower soil layers. These findings can help enhance the hydrological model's capability for simulating/predicting water resources.

  20. Inference of Epidemiological Dynamics Based on Simulated Phylogenies Using Birth-Death and Coalescent Models

    PubMed Central

    Boskova, Veronika; Bonhoeffer, Sebastian; Stadler, Tanja

    2014-01-01

    Quantifying epidemiological dynamics is crucial for understanding and forecasting the spread of an epidemic. The coalescent and the birth-death model are used interchangeably to infer epidemiological parameters from the genealogical relationships of the pathogen population under study, which in turn are inferred from the pathogen genetic sequencing data. To compare the performance of these widely applied models, we performed a simulation study. We simulated phylogenetic trees under the constant rate birth-death model and the coalescent model with a deterministic exponentially growing infected population. For each tree, we re-estimated the epidemiological parameters using both a birth-death and a coalescent based method, implemented as an MCMC procedure in BEAST v2.0. In our analyses that estimate the growth rate of an epidemic based on simulated birth-death trees, the point estimates such as the maximum a posteriori/maximum likelihood estimates are not very different. However, the estimates of uncertainty are very different. The birth-death model had a higher coverage than the coalescent model, i.e. contained the true value in the highest posterior density (HPD) interval more often (2–13% vs. 31–75% error). The coverage of the coalescent decreases with decreasing basic reproductive ratio and increasing sampling probability of infecteds. We hypothesize that the biases in the coalescent are due to the assumption of deterministic rather than stochastic population size changes. Both methods performed reasonably well when analyzing trees simulated under the coalescent. The methods can also identify other key epidemiological parameters as long as one of the parameters is fixed to its true value. In summary, when using genetic data to estimate epidemic dynamics, our results suggest that the birth-death method will be less sensitive to population fluctuations of early outbreaks than the coalescent method that assumes a deterministic exponentially growing infected population. PMID:25375100

  1. Approximate Bayesian computation in large-scale structure: constraining the galaxy-halo connection

    NASA Astrophysics Data System (ADS)

    Hahn, ChangHoon; Vakili, Mohammadjavad; Walsh, Kilian; Hearin, Andrew P.; Hogg, David W.; Campbell, Duncan

    2017-08-01

    Standard approaches to Bayesian parameter inference in large-scale structure assume a Gaussian functional form (chi-squared form) for the likelihood. This assumption, in detail, cannot be correct. Likelihood free inferences such as approximate Bayesian computation (ABC) relax these restrictions and make inference possible without making any assumptions on the likelihood. Instead ABC relies on a forward generative model of the data and a metric for measuring the distance between the model and data. In this work, we demonstrate that ABC is feasible for LSS parameter inference by using it to constrain parameters of the halo occupation distribution (HOD) model for populating dark matter haloes with galaxies. Using specific implementation of ABC supplemented with population Monte Carlo importance sampling, a generative forward model using HOD and a distance metric based on galaxy number density, two-point correlation function and galaxy group multiplicity function, we constrain the HOD parameters of mock observation generated from selected 'true' HOD parameters. The parameter constraints we obtain from ABC are consistent with the 'true' HOD parameters, demonstrating that ABC can be reliably used for parameter inference in LSS. Furthermore, we compare our ABC constraints to constraints we obtain using a pseudo-likelihood function of Gaussian form with MCMC and find consistent HOD parameter constraints. Ultimately, our results suggest that ABC can and should be applied in parameter inference for LSS analyses.

  2. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vrugt, Jasper A; Robinson, Bruce A; Ter Braak, Cajo J F

    In recent years, a strong debate has emerged in the hydrologic literature regarding what constitutes an appropriate framework for uncertainty estimation. Particularly, there is strong disagreement whether an uncertainty framework should have its roots within a proper statistical (Bayesian) context, or whether such a framework should be based on a different philosophy and implement informal measures and weaker inference to summarize parameter and predictive distributions. In this paper, we compare a formal Bayesian approach using Markov Chain Monte Carlo (MCMC) with generalized likelihood uncertainty estimation (GLUE) for assessing uncertainty in conceptual watershed modeling. Our formal Bayesian approach is implemented usingmore » the recently developed differential evolution adaptive metropolis (DREAM) MCMC scheme with a likelihood function that explicitly considers model structural, input and parameter uncertainty. Our results demonstrate that DREAM and GLUE can generate very similar estimates of total streamflow uncertainty. This suggests that formal and informal Bayesian approaches have more common ground than the hydrologic literature and ongoing debate might suggest. The main advantage of formal approaches is, however, that they attempt to disentangle the effect of forcing, parameter and model structural error on total predictive uncertainty. This is key to improving hydrologic theory and to better understand and predict the flow of water through catchments.« less

  3. On the Bayesian Treed Multivariate Gaussian Process with Linear Model of Coregionalization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Konomi, Bledar A.; Karagiannis, Georgios; Lin, Guang

    2015-02-01

    The Bayesian treed Gaussian process (BTGP) has gained popularity in recent years because it provides a straightforward mechanism for modeling non-stationary data and can alleviate computational demands by fitting models to less data. The extension of BTGP to the multivariate setting requires us to model the cross-covariance and to propose efficient algorithms that can deal with trans-dimensional MCMC moves. In this paper we extend the cross-covariance of the Bayesian treed multivariate Gaussian process (BTMGP) to that of linear model of Coregionalization (LMC) cross-covariances. Different strategies have been developed to improve the MCMC mixing and invert smaller matrices in the Bayesianmore » inference. Moreover, we compare the proposed BTMGP with existing multiple BTGP and BTMGP in test cases and multiphase flow computer experiment in a full scale regenerator of a carbon capture unit. The use of the BTMGP with LMC cross-covariance helped to predict the computer experiments relatively better than existing competitors. The proposed model has a wide variety of applications, such as computer experiments and environmental data. In the case of computer experiments we also develop an adaptive sampling strategy for the BTMGP with LMC cross-covariance function.« less

  4. Monte Carlo Bayesian Inference on a Statistical Model of Sub-Gridcolumn Moisture Variability using High-Resolution Cloud Observations

    NASA Astrophysics Data System (ADS)

    Norris, P. M.; da Silva, A. M., Jr.

    2016-12-01

    Norris and da Silva recently published a method to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation (CDA). The gridcolumn model includes assumed-PDF intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used are MODIS cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. In the example provided, the method is able to restore marine stratocumulus near the Californian coast where the background state has a clear swath. The new approach not only significantly reduces mean and standard deviation biases with respect to the assimilated observables, but also improves the simulated rotational-Ramman scattering cloud optical centroid pressure against independent (non-assimilated) retrievals from the OMI instrument. One obvious difficulty for the method, and other CDA methods, is the lack of information content in passive cloud observables on cloud vertical structure, beyond cloud-top and thickness, thus necessitating strong dependence on the background vertical moisture structure. It is found that a simple flow-dependent correlation modification due to Riishojgaard is helpful, better honoring inversion structures in the background state.

  5. Cosmology with the largest galaxy cluster surveys: going beyond Fisher matrix forecasts

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Khedekar, Satej; Majumdar, Subhabrata, E-mail: satej@mpa-garching.mpg.de, E-mail: subha@tifr.res.in

    2013-02-01

    We make the first detailed MCMC likelihood study of cosmological constraints that are expected from some of the largest, ongoing and proposed, cluster surveys in different wave-bands and compare the estimates to the prevalent Fisher matrix forecasts. Mock catalogs of cluster counts expected from the surveys — eROSITA, WFXT, RCS2, DES and Planck, along with a mock dataset of follow-up mass calibrations are analyzed for this purpose. A fair agreement between MCMC and Fisher results is found only in the case of minimal models. However, for many cases, the marginalized constraints obtained from Fisher and MCMC methods can differ bymore » factors of 30-100%. The discrepancy can be alarmingly large for a time dependent dark energy equation of state, w(a); the Fisher methods are seen to under-estimate the constraints by as much as a factor of 4-5. Typically, Fisher estimates become more and more inappropriate as we move away from ΛCDM, to a constant-w dark energy to varying-w dark energy cosmologies. Fisher analysis, also, predicts incorrect parameter degeneracies. There are noticeable offsets in the likelihood contours obtained from Fisher methods that is caused due to an asymmetry in the posterior likelihood distribution as seen through a MCMC analysis. From the point of mass-calibration uncertainties, a high value of unknown scatter about the mean mass-observable relation, and its redshift dependence, is seen to have large degeneracies with the cosmological parameters σ{sub 8} and w(a) and can degrade the cosmological constraints considerably. We find that the addition of mass-calibrated cluster datasets can improve dark energy and σ{sub 8} constraints by factors of 2-3 from what can be obtained from CMB+SNe+BAO only . Finally, we show that a joint analysis of datasets of two (or more) different cluster surveys would significantly tighten cosmological constraints from using clusters only. Since, details of future cluster surveys are still being planned, we emphasize that optimal survey design must be done using MCMC analysis rather than Fisher forecasting.« less

  6. Bayesian random local clocks, or one rate to rule them all

    PubMed Central

    2010-01-01

    Background Relaxed molecular clock models allow divergence time dating and "relaxed phylogenetic" inference, in which a time tree is estimated in the face of unequal rates across lineages. We present a new method for relaxing the assumption of a strict molecular clock using Markov chain Monte Carlo to implement Bayesian modeling averaging over random local molecular clocks. The new method approaches the problem of rate variation among lineages by proposing a series of local molecular clocks, each extending over a subregion of the full phylogeny. Each branch in a phylogeny (subtending a clade) is a possible location for a change of rate from one local clock to a new one. Thus, including both the global molecular clock and the unconstrained model results, there are a total of 22n-2 possible rate models available for averaging with 1, 2, ..., 2n - 2 different rate categories. Results We propose an efficient method to sample this model space while simultaneously estimating the phylogeny. The new method conveniently allows a direct test of the strict molecular clock, in which one rate rules them all, against a large array of alternative local molecular clock models. We illustrate the method's utility on three example data sets involving mammal, primate and influenza evolution. Finally, we explore methods to visualize the complex posterior distribution that results from inference under such models. Conclusions The examples suggest that large sequence datasets may only require a small number of local molecular clocks to reconcile their branch lengths with a time scale. All of the analyses described here are implemented in the open access software package BEAST 1.5.4 (http://beast-mcmc.googlecode.com/). PMID:20807414

  7. Estimating the Attack Ratio of Dengue Epidemics under Time-varying Force of Infection using Aggregated Notification Data

    NASA Astrophysics Data System (ADS)

    Coelho, Flavio Codeço; Carvalho, Luiz Max De

    2015-12-01

    Quantifying the attack ratio of disease is key to epidemiological inference and public health planning. For multi-serotype pathogens, however, different levels of serotype-specific immunity make it difficult to assess the population at risk. In this paper we propose a Bayesian method for estimation of the attack ratio of an epidemic and the initial fraction of susceptibles using aggregated incidence data. We derive the probability distribution of the effective reproductive number, Rt, and use MCMC to obtain posterior distributions of the parameters of a single-strain SIR transmission model with time-varying force of infection. Our method is showcased in a data set consisting of 18 years of dengue incidence in the city of Rio de Janeiro, Brazil. We demonstrate that it is possible to learn about the initial fraction of susceptibles and the attack ratio even in the absence of serotype specific data. On the other hand, the information provided by this approach is limited, stressing the need for detailed serological surveys to characterise the distribution of serotype-specific immunity in the population.

  8. An adaptive Gaussian process-based method for efficient Bayesian experimental design in groundwater contaminant source identification problems: ADAPTIVE GAUSSIAN PROCESS-BASED INVERSION

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Jiangjiang; Li, Weixuan; Zeng, Lingzao

    Surrogate models are commonly used in Bayesian approaches such as Markov Chain Monte Carlo (MCMC) to avoid repetitive CPU-demanding model evaluations. However, the approximation error of a surrogate may lead to biased estimations of the posterior distribution. This bias can be corrected by constructing a very accurate surrogate or implementing MCMC in a two-stage manner. Since the two-stage MCMC requires extra original model evaluations, the computational cost is still high. If the information of measurement is incorporated, a locally accurate approximation of the original model can be adaptively constructed with low computational cost. Based on this idea, we propose amore » Gaussian process (GP) surrogate-based Bayesian experimental design and parameter estimation approach for groundwater contaminant source identification problems. A major advantage of the GP surrogate is that it provides a convenient estimation of the approximation error, which can be incorporated in the Bayesian formula to avoid over-confident estimation of the posterior distribution. The proposed approach is tested with a numerical case study. Without sacrificing the estimation accuracy, the new approach achieves about 200 times of speed-up compared to our previous work using two-stage MCMC.« less

  9. Bayesian Analysis of Biogeography when the Number of Areas is Large

    PubMed Central

    Landis, Michael J.; Matzke, Nicholas J.; Moore, Brian R.; Huelsenbeck, John P.

    2013-01-01

    Historical biogeography is increasingly studied from an explicitly statistical perspective, using stochastic models to describe the evolution of species range as a continuous-time Markov process of dispersal between and extinction within a set of discrete geographic areas. The main constraint of these methods is the computational limit on the number of areas that can be specified. We propose a Bayesian approach for inferring biogeographic history that extends the application of biogeographic models to the analysis of more realistic problems that involve a large number of areas. Our solution is based on a “data-augmentation” approach, in which we first populate the tree with a history of biogeographic events that is consistent with the observed species ranges at the tips of the tree. We then calculate the likelihood of a given history by adopting a mechanistic interpretation of the instantaneous-rate matrix, which specifies both the exponential waiting times between biogeographic events and the relative probabilities of each biogeographic change. We develop this approach in a Bayesian framework, marginalizing over all possible biogeographic histories using Markov chain Monte Carlo (MCMC). Besides dramatically increasing the number of areas that can be accommodated in a biogeographic analysis, our method allows the parameters of a given biogeographic model to be estimated and different biogeographic models to be objectively compared. Our approach is implemented in the program, BayArea. [ancestral area analysis; Bayesian biogeographic inference; data augmentation; historical biogeography; Markov chain Monte Carlo.] PMID:23736102

  10. Ascertainment correction for Markov chain Monte Carlo segregation and linkage analysis of a quantitative trait.

    PubMed

    Ma, Jianzhong; Amos, Christopher I; Warwick Daw, E

    2007-09-01

    Although extended pedigrees are often sampled through probands with extreme levels of a quantitative trait, Markov chain Monte Carlo (MCMC) methods for segregation and linkage analysis have not been able to perform ascertainment corrections. Further, the extent to which ascertainment of pedigrees leads to biases in the estimation of segregation and linkage parameters has not been previously studied for MCMC procedures. In this paper, we studied these issues with a Bayesian MCMC approach for joint segregation and linkage analysis, as implemented in the package Loki. We first simulated pedigrees ascertained through individuals with extreme values of a quantitative trait in spirit of the sequential sampling theory of Cannings and Thompson [Cannings and Thompson [1977] Clin. Genet. 12:208-212]. Using our simulated data, we detected no bias in estimates of the trait locus location. However, in addition to allele frequencies, when the ascertainment threshold was higher than or close to the true value of the highest genotypic mean, bias was also found in the estimation of this parameter. When there were multiple trait loci, this bias destroyed the additivity of the effects of the trait loci, and caused biases in the estimation all genotypic means when a purely additive model was used for analyzing the data. To account for pedigree ascertainment with sequential sampling, we developed a Bayesian ascertainment approach and implemented Metropolis-Hastings updates in the MCMC samplers used in Loki. Ascertainment correction greatly reduced biases in parameter estimates. Our method is designed for multiple, but a fixed number of trait loci. Copyright (c) 2007 Wiley-Liss, Inc.

  11. Markov switching multinomial logit model: An application to accident-injury severities.

    PubMed

    Malyshkina, Nataliya V; Mannering, Fred L

    2009-07-01

    In this study, two-state Markov switching multinomial logit models are proposed for statistical modeling of accident-injury severities. These models assume Markov switching over time between two unobserved states of roadway safety as a means of accounting for potential unobserved heterogeneity. The states are distinct in the sense that in different states accident-severity outcomes are generated by separate multinomial logit processes. To demonstrate the applicability of the approach, two-state Markov switching multinomial logit models are estimated for severity outcomes of accidents occurring on Indiana roads over a four-year time period. Bayesian inference methods and Markov Chain Monte Carlo (MCMC) simulations are used for model estimation. The estimated Markov switching models result in a superior statistical fit relative to the standard (single-state) multinomial logit models for a number of roadway classes and accident types. It is found that the more frequent state of roadway safety is correlated with better weather conditions and that the less frequent state is correlated with adverse weather conditions.

  12. Stochastic inversion of time-lapse geophysical data to characterize the vadose zone at the Arrenaes field site (Denmark)

    NASA Astrophysics Data System (ADS)

    Marie, S.; Irving, J. D.; Looms, M. C.; Nielsen, L.; Holliger, K.

    2011-12-01

    Geophysical methods such as ground-penetrating radar (GPR) can provide valuable information on the hydrological properties of the vadose zone. In particular, there is evidence to suggest that the stochastic inversion of such data may allow for significant reductions in uncertainty regarding subsurface van-Genuchten-Mualem (VGM) parameters, which characterize unsaturated hydrodynamic behaviour as defined by the combination of the water retention and hydraulic conductivity functions. A significant challenge associated with the use of geophysical methods in a hydrological context is that they generally exhibit an indirect and/or weak sensitivity to the hydraulic parameters of interest. A novel and increasingly popular means of addressing this issue involves the acquisition of geophysical data in a time-lapse fashion while changes occur in the hydrological condition of the probed subsurface region. Another significant challenge when attempting to use geophysical data for the estimation of subsurface hydrological properties is the inherent non-linearity and non-uniqueness of the corresponding inverse problems. Stochastic inversion approaches have the advantage of providing a comprehensive exploration of the model space, which makes them ideally suited for addressing such issues. In this work, we present the stochastic inversion of time-lapse zero-offset-profile (ZOP) crosshole GPR traveltime data, collected during a forced infiltration experiment at the Arreneas field site in Denmark, in order to estimate subsurface VGM parameters and their corresponding uncertainties. We do this using a Bayesian Markov-chain-Monte-Carlo (MCMC) inversion approach. We find that the Bayesian-MCMC methodology indeed allows for a substantial refinement in the inferred posterior parameter distributions of the VGM parameters as compared to the corresponding priors. To further understand the potential impact on capturing the underlying hydrological behaviour, we also explore how the posterior VGM parameter distributions affect the hydrodynamic characteristics. In doing so, we find clear evidence that the approach pursued in this study allows for effective characterization of the hydrological behaviour of the probed subsurface region.

  13. Trans-dimensional MCMC methods for fully automatic motion analysis in tagged MRI.

    PubMed

    Smal, Ihor; Carranza-Herrezuelo, Noemí; Klein, Stefan; Niessen, Wiro; Meijering, Erik

    2011-01-01

    Tagged magnetic resonance imaging (tMRI) is a well-known noninvasive method allowing quantitative analysis of regional heart dynamics. Its clinical use has so far been limited, in part due to the lack of robustness and accuracy of existing tag tracking algorithms in dealing with low (and intrinsically time-varying) image quality. In this paper, we propose a novel probabilistic method for tag tracking, implemented by means of Bayesian particle filtering and a trans-dimensional Markov chain Monte Carlo (MCMC) approach, which efficiently combines information about the imaging process and tag appearance with prior knowledge about the heart dynamics obtained by means of non-rigid image registration. Experiments using synthetic image data (with ground truth) and real data (with expert manual annotation) from preclinical (small animal) and clinical (human) studies confirm that the proposed method yields higher consistency, accuracy, and intrinsic tag reliability assessment in comparison with other frequently used tag tracking methods.

  14. Multilocus lod scores in large pedigrees: combination of exact and approximate calculations.

    PubMed

    Tong, Liping; Thompson, Elizabeth

    2008-01-01

    To detect the positions of disease loci, lod scores are calculated at multiple chromosomal positions given trait and marker data on members of pedigrees. Exact lod score calculations are often impossible when the size of the pedigree and the number of markers are both large. In this case, a Markov Chain Monte Carlo (MCMC) approach provides an approximation. However, to provide accurate results, mixing performance is always a key issue in these MCMC methods. In this paper, we propose two methods to improve MCMC sampling and hence obtain more accurate lod score estimates in shorter computation time. The first improvement generalizes the block-Gibbs meiosis (M) sampler to multiple meiosis (MM) sampler in which multiple meioses are updated jointly, across all loci. The second one divides the computations on a large pedigree into several parts by conditioning on the haplotypes of some 'key' individuals. We perform exact calculations for the descendant parts where more data are often available, and combine this information with sampling of the hidden variables in the ancestral parts. Our approaches are expected to be most useful for data on a large pedigree with a lot of missing data. (c) 2007 S. Karger AG, Basel

  15. Multilocus Lod Scores in Large Pedigrees: Combination of Exact and Approximate Calculations

    PubMed Central

    Tong, Liping; Thompson, Elizabeth

    2007-01-01

    To detect the positions of disease loci, lod scores are calculated at multiple chromosomal positions given trait and marker data on members of pedigrees. Exact lod score calculations are often impossible when the size of the pedigree and the number of markers are both large. In this case, a Markov Chain Monte Carlo (MCMC) approach provides an approximation. However, to provide accurate results, mixing performance is always a key issue in these MCMC methods. In this paper, we propose two methods to improve MCMC sampling and hence obtain more accurate lod score estimates in shorter computation time. The first improvement generalizes the block-Gibbs meiosis (M) sampler to multiple meiosis (MM) sampler in which multiple meioses are updated jointly, across all loci. The second one divides the computations on a large pedigree into several parts by conditioning on the haplotypes of some ‘key’ individuals. We perform exact calculations for the descendant parts where more data are often available, and combine this information with sampling of the hidden variables in the ancestral parts. Our approaches are expected to be most useful for data on a large pedigree with a lot of missing data. PMID:17934317

  16. Combined state and parameter identification of nonlinear structural dynamical systems based on Rao-Blackwellization and Markov chain Monte Carlo simulations

    NASA Astrophysics Data System (ADS)

    Abhinav, S.; Manohar, C. S.

    2018-03-01

    The problem of combined state and parameter estimation in nonlinear state space models, based on Bayesian filtering methods, is considered. A novel approach, which combines Rao-Blackwellized particle filters for state estimation with Markov chain Monte Carlo (MCMC) simulations for parameter identification, is proposed. In order to ensure successful performance of the MCMC samplers, in situations involving large amount of dynamic measurement data and (or) low measurement noise, the study employs a modified measurement model combined with an importance sampling based correction. The parameters of the process noise covariance matrix are also included as quantities to be identified. The study employs the Rao-Blackwellization step at two stages: one, associated with the state estimation problem in the particle filtering step, and, secondly, in the evaluation of the ratio of likelihoods in the MCMC run. The satisfactory performance of the proposed method is illustrated on three dynamical systems: (a) a computational model of a nonlinear beam-moving oscillator system, (b) a laboratory scale beam traversed by a loaded trolley, and (c) an earthquake shake table study on a bending-torsion coupled nonlinear frame subjected to uniaxial support motion.

  17. Measurements of Kepler Planet Masses and Eccentricities from Transit Timing Variations: Analytic and N-body Results

    NASA Astrophysics Data System (ADS)

    Hadden, Sam; Lithwick, Yoram

    2015-12-01

    Several Kepler planets reside in multi-planet systems where gravitational interactions result in transit timing variations (TTVs) that provide exquisitely sensitive probes of their masses of and orbits. Measuring these planets' masses and orbits constrains their bulk compositions and can provide clues about their formation. However, inverting TTV measurements in order to infer planet properties can be challenging: it involves fitting a nonlinear model with a large number of parameters to noisy data, often with significant degeneracies between parameters. I present results from two complementary approaches to TTV inversion: Markov chain Monte Carlo simulations that use N-body integrations to compute transit times and a simplified analytic model for computing the TTVs of planets near mean motion resonances. The analytic model allows for straightforward interpretations of N-body results and provides an independent estimate of parameter uncertainties that can be compared to MCMC results which may be sensitive to factors such as priors. We have conducted extensive MCMC simulations along with analytic fits to model the TTVs of dozens of Kepler multi-planet systems. We find that the bulk of these sub-Jovian planets have low densities that necessitate significant gaseous envelopes. We also find that the planets' eccentricities are generally small but often definitively non-zero.

  18. An Efficient MCMC Algorithm to Sample Binary Matrices with Fixed Marginals

    ERIC Educational Resources Information Center

    Verhelst, Norman D.

    2008-01-01

    Uniform sampling of binary matrices with fixed margins is known as a difficult problem. Two classes of algorithms to sample from a distribution not too different from the uniform are studied in the literature: importance sampling and Markov chain Monte Carlo (MCMC). Existing MCMC algorithms converge slowly, require a long burn-in period and yield…

  19. Spectral decompositions of multiple time series: a Bayesian non-parametric approach.

    PubMed

    Macaro, Christian; Prado, Raquel

    2014-01-01

    We consider spectral decompositions of multiple time series that arise in studies where the interest lies in assessing the influence of two or more factors. We write the spectral density of each time series as a sum of the spectral densities associated to the different levels of the factors. We then use Whittle's approximation to the likelihood function and follow a Bayesian non-parametric approach to obtain posterior inference on the spectral densities based on Bernstein-Dirichlet prior distributions. The prior is strategically important as it carries identifiability conditions for the models and allows us to quantify our degree of confidence in such conditions. A Markov chain Monte Carlo (MCMC) algorithm for posterior inference within this class of frequency-domain models is presented.We illustrate the approach by analyzing simulated and real data via spectral one-way and two-way models. In particular, we present an analysis of functional magnetic resonance imaging (fMRI) brain responses measured in individuals who participated in a designed experiment to study pain perception in humans.

  20. Maximal compression of the redshift-space galaxy power spectrum and bispectrum

    NASA Astrophysics Data System (ADS)

    Gualdi, Davide; Manera, Marc; Joachimi, Benjamin; Lahav, Ofer

    2018-05-01

    We explore two methods of compressing the redshift-space galaxy power spectrum and bispectrum with respect to a chosen set of cosmological parameters. Both methods involve reducing the dimension of the original data vector (e.g. 1000 elements) to the number of cosmological parameters considered (e.g. seven ) using the Karhunen-Loève algorithm. In the first case, we run MCMC sampling on the compressed data vector in order to recover the 1D and 2D posterior distributions. The second option, approximately 2000 times faster, works by orthogonalizing the parameter space through diagonalization of the Fisher information matrix before the compression, obtaining the posterior distributions without the need of MCMC sampling. Using these methods for future spectroscopic redshift surveys like DESI, Euclid, and PFS would drastically reduce the number of simulations needed to compute accurate covariance matrices with minimal loss of constraining power. We consider a redshift bin of a DESI-like experiment. Using the power spectrum combined with the bispectrum as a data vector, both compression methods on average recover the 68 {per cent} credible regions to within 0.7 {per cent} and 2 {per cent} of those resulting from standard MCMC sampling, respectively. These confidence intervals are also smaller than the ones obtained using only the power spectrum by 81 per cent, 80 per cent, and 82 per cent respectively, for the bias parameter b1, the growth rate f, and the scalar amplitude parameter As.

  1. A menu-driven software package of Bayesian nonparametric (and parametric) mixed models for regression analysis and density estimation.

    PubMed

    Karabatsos, George

    2017-02-01

    Most of applied statistics involves regression analysis of data. In practice, it is important to specify a regression model that has minimal assumptions which are not violated by data, to ensure that statistical inferences from the model are informative and not misleading. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models, constructed from MATLAB Compiler. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal linear models. Each of the 78 regression models handles either a continuous, binary, or ordinal dependent variable, and can handle multi-level (grouped) data. All 83 Bayesian models can handle the analysis of weighted observations (e.g., for meta-analysis), and the analysis of left-censored, right-censored, and/or interval-censored data. Each BNP infinite-mixture model has a mixture distribution assigned one of various BNP prior distributions, including priors defined by either the Dirichlet process, Pitman-Yor process (including the normalized stable process), beta (two-parameter) process, normalized inverse-Gaussian process, geometric weights prior, dependent Dirichlet process, or the dependent infinite-probits prior. The software user can mouse-click to select a Bayesian model and perform data analysis via Markov chain Monte Carlo (MCMC) sampling. After the sampling completes, the software automatically opens text output that reports MCMC-based estimates of the model's posterior distribution and model predictive fit to the data. Additional text and/or graphical output can be generated by mouse-clicking other menu options. This includes output of MCMC convergence analyses, and estimates of the model's posterior predictive distribution, for selected functionals and values of covariates. The software is illustrated through the BNP regression analysis of real data.

  2. SaaS enabled admission control for MCMC simulation in cloud computing infrastructures

    NASA Astrophysics Data System (ADS)

    Vázquez-Poletti, J. L.; Moreno-Vozmediano, R.; Han, R.; Wang, W.; Llorente, I. M.

    2017-02-01

    Markov Chain Monte Carlo (MCMC) methods are widely used in the field of simulation and modelling of materials, producing applications that require a great amount of computational resources. Cloud computing represents a seamless source for these resources in the form of HPC. However, resource over-consumption can be an important drawback, specially if the cloud provision process is not appropriately optimized. In the present contribution we propose a two-level solution that, on one hand, takes advantage of approximate computing for reducing the resource demand and on the other, uses admission control policies for guaranteeing an optimal provision to running applications.

  3. Multiple-try differential evolution adaptive Metropolis for efficient solution of highly parameterized models

    NASA Astrophysics Data System (ADS)

    Eric, L.; Vrugt, J. A.

    2010-12-01

    Spatially distributed hydrologic models potentially contain hundreds of parameters that need to be derived by calibration against a historical record of input-output data. The quality of this calibration strongly determines the predictive capability of the model and thus its usefulness for science-based decision making and forecasting. Unfortunately, high-dimensional optimization problems are typically difficult to solve. Here we present our recent developments to the Differential Evolution Adaptive Metropolis (DREAM) algorithm (Vrugt et al., 2009) to warrant efficient solution of high-dimensional parameter estimation problems. The algorithm samples from an archive of past states (Ter Braak and Vrugt, 2008), and uses multiple-try Metropolis sampling (Liu et al., 2000) to decrease the required burn-in time for each individual chain and increase efficiency of posterior sampling. This approach is hereafter referred to as MT-DREAM. We present results for 2 synthetic mathematical case studies, and 2 real-world examples involving from 10 to 240 parameters. Results for those cases show that our multiple-try sampler, MT-DREAM, can consistently find better solutions than other Bayesian MCMC methods. Moreover, MT-DREAM is admirably suited to be implemented and ran on a parallel machine and is therefore a powerful method for posterior inference.

  4. Nonstationary Extreme Value Analysis in a Changing Climate: A Software Package

    NASA Astrophysics Data System (ADS)

    Cheng, L.; AghaKouchak, A.; Gilleland, E.

    2013-12-01

    Numerous studies show that climatic extremes have increased substantially in the second half of the 20th century. For this reason, analysis of extremes under a nonstationary assumption has received a great deal of attention. This paper presents a software package developed for estimation of return levels, return periods, and risks of climatic extremes in a changing climate. This MATLAB software package offers tools for analysis of climate extremes under both stationary and non-stationary assumptions. The Nonstationary Extreme Value Analysis (hereafter, NEVA) provides an efficient and generalized framework for analyzing extremes using Bayesian inference. NEVA estimates the extreme value parameters using a Differential Evolution Markov Chain (DE-MC) which utilizes the genetic algorithm Differential Evolution (DE) for global optimization over the real parameter space with the Markov Chain Monte Carlo (MCMC) approach and has the advantage of simplicity, speed of calculation and convergence over conventional MCMC. NEVA also offers the confidence interval and uncertainty bounds of estimated return levels based on the sampled parameters. NEVA integrates extreme value design concepts, data analysis tools, optimization and visualization, explicitly designed to facilitate analysis extremes in geosciences. The generalized input and output files of this software package make it attractive for users from across different fields. Both stationary and nonstationary components of the package are validated for a number of case studies using empirical return levels. The results show that NEVA reliably describes extremes and their return levels.

  5. RevBayes: Bayesian Phylogenetic Inference Using Graphical Models and an Interactive Model-Specification Language

    PubMed Central

    Höhna, Sebastian; Landis, Michael J.

    2016-01-01

    Programs for Bayesian inference of phylogeny currently implement a unique and fixed suite of models. Consequently, users of these software packages are simultaneously forced to use a number of programs for a given study, while also lacking the freedom to explore models that have not been implemented by the developers of those programs. We developed a new open-source software package, RevBayes, to address these problems. RevBayes is entirely based on probabilistic graphical models, a powerful generic framework for specifying and analyzing statistical models. Phylogenetic-graphical models can be specified interactively in RevBayes, piece by piece, using a new succinct and intuitive language called Rev. Rev is similar to the R language and the BUGS model-specification language, and should be easy to learn for most users. The strength of RevBayes is the simplicity with which one can design, specify, and implement new and complex models. Fortunately, this tremendous flexibility does not come at the cost of slower computation; as we demonstrate, RevBayes outperforms competing software for several standard analyses. Compared with other programs, RevBayes has fewer black-box elements. Users need to explicitly specify each part of the model and analysis. Although this explicitness may initially be unfamiliar, we are convinced that this transparency will improve understanding of phylogenetic models in our field. Moreover, it will motivate the search for improvements to existing methods by brazenly exposing the model choices that we make to critical scrutiny. RevBayes is freely available at http://www.RevBayes.com. [Bayesian inference; Graphical models; MCMC; statistical phylogenetics.] PMID:27235697

  6. RevBayes: Bayesian Phylogenetic Inference Using Graphical Models and an Interactive Model-Specification Language.

    PubMed

    Höhna, Sebastian; Landis, Michael J; Heath, Tracy A; Boussau, Bastien; Lartillot, Nicolas; Moore, Brian R; Huelsenbeck, John P; Ronquist, Fredrik

    2016-07-01

    Programs for Bayesian inference of phylogeny currently implement a unique and fixed suite of models. Consequently, users of these software packages are simultaneously forced to use a number of programs for a given study, while also lacking the freedom to explore models that have not been implemented by the developers of those programs. We developed a new open-source software package, RevBayes, to address these problems. RevBayes is entirely based on probabilistic graphical models, a powerful generic framework for specifying and analyzing statistical models. Phylogenetic-graphical models can be specified interactively in RevBayes, piece by piece, using a new succinct and intuitive language called Rev. Rev is similar to the R language and the BUGS model-specification language, and should be easy to learn for most users. The strength of RevBayes is the simplicity with which one can design, specify, and implement new and complex models. Fortunately, this tremendous flexibility does not come at the cost of slower computation; as we demonstrate, RevBayes outperforms competing software for several standard analyses. Compared with other programs, RevBayes has fewer black-box elements. Users need to explicitly specify each part of the model and analysis. Although this explicitness may initially be unfamiliar, we are convinced that this transparency will improve understanding of phylogenetic models in our field. Moreover, it will motivate the search for improvements to existing methods by brazenly exposing the model choices that we make to critical scrutiny. RevBayes is freely available at http://www.RevBayes.com [Bayesian inference; Graphical models; MCMC; statistical phylogenetics.]. © The Author(s) 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.

  7. Hydrologic Process Parameterization of Electrical Resistivity Imaging of Solute Plumes Using POD McMC

    NASA Astrophysics Data System (ADS)

    Awatey, M. T.; Irving, J.; Oware, E. K.

    2016-12-01

    Markov chain Monte Carlo (McMC) inversion frameworks are becoming increasingly popular in geophysics due to their ability to recover multiple equally plausible geologic features that honor the limited noisy measurements. Standard McMC methods, however, become computationally intractable with increasing dimensionality of the problem, for example, when working with spatially distributed geophysical parameter fields. We present a McMC approach based on a sparse proper orthogonal decomposition (POD) model parameterization that implicitly incorporates the physics of the underlying process. First, we generate training images (TIs) via Monte Carlo simulations of the target process constrained to a conceptual model. We then apply POD to construct basis vectors from the TIs. A small number of basis vectors can represent most of the variability in the TIs, leading to dimensionality reduction. A projection of the starting model into the reduced basis space generates the starting POD coefficients. At each iteration, only coefficients within a specified sampling window are resimulated assuming a Gaussian prior. The sampling window grows at a specified rate as the number of iteration progresses starting from the coefficients corresponding to the highest ranked basis to those of the least informative basis. We found this gradual increment in the sampling window to be more stable compared to resampling all the coefficients right from the first iteration. We demonstrate the performance of the algorithm with both synthetic and lab-scale electrical resistivity imaging of saline tracer experiments, employing the same set of basis vectors for all inversions. We consider two scenarios of unimodal and bimodal plumes. The unimodal plume is consistent with the hypothesis underlying the generation of the TIs whereas bimodality in plume morphology was not theorized. We show that uncertainty quantification using McMC can proceed in the reduced dimensionality space while accounting for the physics of the underlying process.

  8. On how to avoid input and structural uncertainties corrupt the inference of hydrological parameters using a Bayesian framework

    NASA Astrophysics Data System (ADS)

    Hernández, Mario R.; Francés, Félix

    2015-04-01

    One phase of the hydrological models implementation process, significantly contributing to the hydrological predictions uncertainty, is the calibration phase in which values of the unknown model parameters are tuned by optimizing an objective function. An unsuitable error model (e.g. Standard Least Squares or SLS) introduces noise into the estimation of the parameters. The main sources of this noise are the input errors and the hydrological model structural deficiencies. Thus, the biased calibrated parameters cause the divergence model phenomenon, where the errors variance of the (spatially and temporally) forecasted flows far exceeds the errors variance in the fitting period, and provoke the loss of part or all of the physical meaning of the modeled processes. In other words, yielding a calibrated hydrological model which works well, but not for the right reasons. Besides, an unsuitable error model yields a non-reliable predictive uncertainty assessment. Hence, with the aim of prevent all these undesirable effects, this research focuses on the Bayesian joint inference (BJI) of both the hydrological and error model parameters, considering a general additive (GA) error model that allows for correlation, non-stationarity (in variance and bias) and non-normality of model residuals. As hydrological model, it has been used a conceptual distributed model called TETIS, with a particular split structure of the effective model parameters. Bayesian inference has been performed with the aid of a Markov Chain Monte Carlo (MCMC) algorithm called Dream-ZS. MCMC algorithm quantifies the uncertainty of the hydrological and error model parameters by getting the joint posterior probability distribution, conditioned on the observed flows. The BJI methodology is a very powerful and reliable tool, but it must be used correctly this is, if non-stationarity in errors variance and bias is modeled, the Total Laws must be taken into account. The results of this research show that the application of BJI with a GA error model outperforms the hydrological parameters robustness (diminishing the divergence model phenomenon) and improves the reliability of the streamflow predictive distribution, in respect of the results of a bad error model as SLS. Finally, the most likely prediction in a validation period, for both BJI+GA and SLS error models shows a similar performance.

  9. Parameter Estimation of Partial Differential Equation Models.

    PubMed

    Xun, Xiaolei; Cao, Jiguo; Mallick, Bani; Carroll, Raymond J; Maity, Arnab

    2013-01-01

    Partial differential equation (PDE) models are commonly used to model complex dynamic systems in applied sciences such as biology and finance. The forms of these PDE models are usually proposed by experts based on their prior knowledge and understanding of the dynamic system. Parameters in PDE models often have interesting scientific interpretations, but their values are often unknown, and need to be estimated from the measurements of the dynamic system in the present of measurement errors. Most PDEs used in practice have no analytic solutions, and can only be solved with numerical methods. Currently, methods for estimating PDE parameters require repeatedly solving PDEs numerically under thousands of candidate parameter values, and thus the computational load is high. In this article, we propose two methods to estimate parameters in PDE models: a parameter cascading method and a Bayesian approach. In both methods, the underlying dynamic process modeled with the PDE model is represented via basis function expansion. For the parameter cascading method, we develop two nested levels of optimization to estimate the PDE parameters. For the Bayesian method, we develop a joint model for data and the PDE, and develop a novel hierarchical model allowing us to employ Markov chain Monte Carlo (MCMC) techniques to make posterior inference. Simulation studies show that the Bayesian method and parameter cascading method are comparable, and both outperform other available methods in terms of estimation accuracy. The two methods are demonstrated by estimating parameters in a PDE model from LIDAR data.

  10. Mass change distribution inverted from space-borne gravimetric data using a Monte Carlo method

    NASA Astrophysics Data System (ADS)

    Zhou, X.; Sun, X.; Wu, Y.; Sun, W.

    2017-12-01

    Mass estimate plays a key role in using temporally satellite gravimetric data to quantify the terrestrial water storage change. GRACE (Gravity Recovery and Climate Experiment) only observes the low degree gravity field changes, which can be used to estimate the total surface density or equivalent water height (EWH) variation, with a limited spatial resolution of 300 km. There are several methods to estimate the mass variation in an arbitrary region, such as averaging kernel, forward modelling and mass concentration (mascon). Mascon method can isolate the local mass from the gravity change at a large scale through solving the observation equation (objective function) which represents the relationship between unknown masses and the measurements. To avoid the unreasonable local mass inverted from smoothed gravity change map, regularization has to be used in the inversion. We herein give a Markov chain Monte Carlo (MCMC) method to objectively determine the regularization parameter for the non-negative mass inversion problem. We first apply this approach to the mass inversion from synthetic data. Result show MCMC can effectively reproduce the local mass variation taking GRACE measurement error into consideration. We then use MCMC to estimate the ground water change rate of North China Plain from GRACE gravity change rate from 2003 to 2014 under a supposition of the continuous ground water loss in this region. Inversion result show that the ground water loss rate in North China Plain is 7.6±0.2Gt/yr during past 12 years which is coincident with that from previous researches.

  11. MCMC-ODPR: primer design optimization using Markov Chain Monte Carlo sampling.

    PubMed

    Kitchen, James L; Moore, Jonathan D; Palmer, Sarah A; Allaby, Robin G

    2012-11-05

    Next generation sequencing technologies often require numerous primer designs that require good target coverage that can be financially costly. We aimed to develop a system that would implement primer reuse to design degenerate primers that could be designed around SNPs, thus find the fewest necessary primers and the lowest cost whilst maintaining an acceptable coverage and provide a cost effective solution. We have implemented Metropolis-Hastings Markov Chain Monte Carlo for optimizing primer reuse. We call it the Markov Chain Monte Carlo Optimized Degenerate Primer Reuse (MCMC-ODPR) algorithm. After repeating the program 1020 times to assess the variance, an average of 17.14% fewer primers were found to be necessary using MCMC-ODPR for an equivalent coverage without implementing primer reuse. The algorithm was able to reuse primers up to five times. We compared MCMC-ODPR with single sequence primer design programs Primer3 and Primer-BLAST and achieved a lower primer cost per amplicon base covered of 0.21 and 0.19 and 0.18 primer nucleotides on three separate gene sequences, respectively. With multiple sequences, MCMC-ODPR achieved a lower cost per base covered of 0.19 than programs BatchPrimer3 and PAMPS, which achieved 0.25 and 0.64 primer nucleotides, respectively. MCMC-ODPR is a useful tool for designing primers at various melting temperatures at good target coverage. By combining degeneracy with optimal primer reuse the user may increase coverage of sequences amplified by the designed primers at significantly lower costs. Our analyses showed that overall MCMC-ODPR outperformed the other primer-design programs in our study in terms of cost per covered base.

  12. MCMC-ODPR: Primer design optimization using Markov Chain Monte Carlo sampling

    PubMed Central

    2012-01-01

    Background Next generation sequencing technologies often require numerous primer designs that require good target coverage that can be financially costly. We aimed to develop a system that would implement primer reuse to design degenerate primers that could be designed around SNPs, thus find the fewest necessary primers and the lowest cost whilst maintaining an acceptable coverage and provide a cost effective solution. We have implemented Metropolis-Hastings Markov Chain Monte Carlo for optimizing primer reuse. We call it the Markov Chain Monte Carlo Optimized Degenerate Primer Reuse (MCMC-ODPR) algorithm. Results After repeating the program 1020 times to assess the variance, an average of 17.14% fewer primers were found to be necessary using MCMC-ODPR for an equivalent coverage without implementing primer reuse. The algorithm was able to reuse primers up to five times. We compared MCMC-ODPR with single sequence primer design programs Primer3 and Primer-BLAST and achieved a lower primer cost per amplicon base covered of 0.21 and 0.19 and 0.18 primer nucleotides on three separate gene sequences, respectively. With multiple sequences, MCMC-ODPR achieved a lower cost per base covered of 0.19 than programs BatchPrimer3 and PAMPS, which achieved 0.25 and 0.64 primer nucleotides, respectively. Conclusions MCMC-ODPR is a useful tool for designing primers at various melting temperatures at good target coverage. By combining degeneracy with optimal primer reuse the user may increase coverage of sequences amplified by the designed primers at significantly lower costs. Our analyses showed that overall MCMC-ODPR outperformed the other primer-design programs in our study in terms of cost per covered base. PMID:23126469

  13. A Computationally-Efficient Inverse Approach to Probabilistic Strain-Based Damage Diagnosis

    NASA Technical Reports Server (NTRS)

    Warner, James E.; Hochhalter, Jacob D.; Leser, William P.; Leser, Patrick E.; Newman, John A

    2016-01-01

    This work presents a computationally-efficient inverse approach to probabilistic damage diagnosis. Given strain data at a limited number of measurement locations, Bayesian inference and Markov Chain Monte Carlo (MCMC) sampling are used to estimate probability distributions of the unknown location, size, and orientation of damage. Substantial computational speedup is obtained by replacing a three-dimensional finite element (FE) model with an efficient surrogate model. The approach is experimentally validated on cracked test specimens where full field strains are determined using digital image correlation (DIC). Access to full field DIC data allows for testing of different hypothetical sensor arrangements, facilitating the study of strain-based diagnosis effectiveness as the distance between damage and measurement locations increases. The ability of the framework to effectively perform both probabilistic damage localization and characterization in cracked plates is demonstrated and the impact of measurement location on uncertainty in the predictions is shown. Furthermore, the analysis time to produce these predictions is orders of magnitude less than a baseline Bayesian approach with the FE method by utilizing surrogate modeling and effective numerical sampling approaches.

  14. Effective Online Bayesian Phylogenetics via Sequential Monte Carlo with Guided Proposals

    PubMed Central

    Fourment, Mathieu; Claywell, Brian C; Dinh, Vu; McCoy, Connor; Matsen IV, Frederick A; Darling, Aaron E

    2018-01-01

    Abstract Modern infectious disease outbreak surveillance produces continuous streams of sequence data which require phylogenetic analysis as data arrives. Current software packages for Bayesian phylogenetic inference are unable to quickly incorporate new sequences as they become available, making them less useful for dynamically unfolding evolutionary stories. This limitation can be addressed by applying a class of Bayesian statistical inference algorithms called sequential Monte Carlo (SMC) to conduct online inference, wherein new data can be continuously incorporated to update the estimate of the posterior probability distribution. In this article, we describe and evaluate several different online phylogenetic sequential Monte Carlo (OPSMC) algorithms. We show that proposing new phylogenies with a density similar to the Bayesian prior suffers from poor performance, and we develop “guided” proposals that better match the proposal density to the posterior. Furthermore, we show that the simplest guided proposals can exhibit pathological behavior in some situations, leading to poor results, and that the situation can be resolved by heating the proposal density. The results demonstrate that relative to the widely used MCMC-based algorithm implemented in MrBayes, the total time required to compute a series of phylogenetic posteriors as sequences arrive can be significantly reduced by the use of OPSMC, without incurring a significant loss in accuracy. PMID:29186587

  15. BigFoot: Bayesian alignment and phylogenetic footprinting with MCMC

    PubMed Central

    Satija, Rahul; Novák, Ádám; Miklós, István; Lyngsø, Rune; Hein, Jotun

    2009-01-01

    Background We have previously combined statistical alignment and phylogenetic footprinting to detect conserved functional elements without assuming a fixed alignment. Considering a probability-weighted distribution of alignments removes sensitivity to alignment errors, properly accommodates regions of alignment uncertainty, and increases the accuracy of functional element prediction. Our method utilized standard dynamic programming hidden markov model algorithms to analyze up to four sequences. Results We present a novel approach, implemented in the software package BigFoot, for performing phylogenetic footprinting on greater numbers of sequences. We have developed a Markov chain Monte Carlo (MCMC) approach which samples both sequence alignments and locations of slowly evolving regions. We implement our method as an extension of the existing StatAlign software package and test it on well-annotated regions controlling the expression of the even-skipped gene in Drosophila and the α-globin gene in vertebrates. The results exhibit how adding additional sequences to the analysis has the potential to improve the accuracy of functional predictions, and demonstrate how BigFoot outperforms existing alignment-based phylogenetic footprinting techniques. Conclusion BigFoot extends a combined alignment and phylogenetic footprinting approach to analyze larger amounts of sequence data using MCMC. Our approach is robust to alignment error and uncertainty and can be applied to a variety of biological datasets. The source code and documentation are publicly available for download from PMID:19715598

  16. BigFoot: Bayesian alignment and phylogenetic footprinting with MCMC.

    PubMed

    Satija, Rahul; Novák, Adám; Miklós, István; Lyngsø, Rune; Hein, Jotun

    2009-08-28

    We have previously combined statistical alignment and phylogenetic footprinting to detect conserved functional elements without assuming a fixed alignment. Considering a probability-weighted distribution of alignments removes sensitivity to alignment errors, properly accommodates regions of alignment uncertainty, and increases the accuracy of functional element prediction. Our method utilized standard dynamic programming hidden markov model algorithms to analyze up to four sequences. We present a novel approach, implemented in the software package BigFoot, for performing phylogenetic footprinting on greater numbers of sequences. We have developed a Markov chain Monte Carlo (MCMC) approach which samples both sequence alignments and locations of slowly evolving regions. We implement our method as an extension of the existing StatAlign software package and test it on well-annotated regions controlling the expression of the even-skipped gene in Drosophila and the alpha-globin gene in vertebrates. The results exhibit how adding additional sequences to the analysis has the potential to improve the accuracy of functional predictions, and demonstrate how BigFoot outperforms existing alignment-based phylogenetic footprinting techniques. BigFoot extends a combined alignment and phylogenetic footprinting approach to analyze larger amounts of sequence data using MCMC. Our approach is robust to alignment error and uncertainty and can be applied to a variety of biological datasets. The source code and documentation are publicly available for download from http://www.stats.ox.ac.uk/~satija/BigFoot/

  17. Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data.

    PubMed Central

    Drummond, Alexei J; Nicholls, Geoff K; Rodrigo, Allen G; Solomon, Wiremu

    2002-01-01

    Molecular sequences obtained at different sampling times from populations of rapidly evolving pathogens and from ancient subfossil and fossil sources are increasingly available with modern sequencing technology. Here, we present a Bayesian statistical inference approach to the joint estimation of mutation rate and population size that incorporates the uncertainty in the genealogy of such temporally spaced sequences by using Markov chain Monte Carlo (MCMC) integration. The Kingman coalescent model is used to describe the time structure of the ancestral tree. We recover information about the unknown true ancestral coalescent tree, population size, and the overall mutation rate from temporally spaced data, that is, from nucleotide sequences gathered at different times, from different individuals, in an evolving haploid population. We briefly discuss the methodological implications and show what can be inferred, in various practically relevant states of prior knowledge. We develop extensions for exponentially growing population size and joint estimation of substitution model parameters. We illustrate some of the important features of this approach on a genealogy of HIV-1 envelope (env) partial sequences. PMID:12136032

  18. Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data.

    PubMed

    Drummond, Alexei J; Nicholls, Geoff K; Rodrigo, Allen G; Solomon, Wiremu

    2002-07-01

    Molecular sequences obtained at different sampling times from populations of rapidly evolving pathogens and from ancient subfossil and fossil sources are increasingly available with modern sequencing technology. Here, we present a Bayesian statistical inference approach to the joint estimation of mutation rate and population size that incorporates the uncertainty in the genealogy of such temporally spaced sequences by using Markov chain Monte Carlo (MCMC) integration. The Kingman coalescent model is used to describe the time structure of the ancestral tree. We recover information about the unknown true ancestral coalescent tree, population size, and the overall mutation rate from temporally spaced data, that is, from nucleotide sequences gathered at different times, from different individuals, in an evolving haploid population. We briefly discuss the methodological implications and show what can be inferred, in various practically relevant states of prior knowledge. We develop extensions for exponentially growing population size and joint estimation of substitution model parameters. We illustrate some of the important features of this approach on a genealogy of HIV-1 envelope (env) partial sequences.

  19. Cross-Border Sexual Transmission of the Newly Emerging HIV-1 Clade CRF51_01B

    PubMed Central

    Cheong, Hui Ting; Ng, Kim Tien; Ong, Lai Yee; Chook, Jack Bee; Chan, Kok Gan; Takebe, Yutaka; Kamarulzaman, Adeeba; Tee, Kok Keng

    2014-01-01

    A novel HIV-1 recombinant clade (CRF51_01B) was recently identified among men who have sex with men (MSM) in Singapore. As cases of sexually transmitted HIV-1 infection increase concurrently in two socioeconomically intimate countries such as Malaysia and Singapore, cross transmission of HIV-1 between said countries is highly probable. In order to investigate the timeline for the emergence of HIV-1 CRF51_01B in Singapore and its possible introduction into Malaysia, 595 HIV-positive subjects recruited in Kuala Lumpur from 2008 to 2012 were screened. Phylogenetic relationship of 485 amplified polymerase gene sequences was determined through neighbour-joining method. Next, near-full length sequences were amplified for genomic sequences inferred to be CRF51_01B and subjected to further analysis implemented through Bayesian Markov chain Monte Carlo (MCMC) sampling and maximum likelihood methods. Based on the near full length genomes, two isolates formed a phylogenetic cluster with CRF51_01B sequences of Singapore origin, sharing identical recombination structure. Spatial and temporal information from Bayesian MCMC coalescent and maximum likelihood analysis of the protease, gp120 and gp41 genes suggest that Singapore is probably the country of origin of CRF51_01B (as early as in the mid-1990s) and featured a Malaysian who acquired the infection through heterosexual contact as host for its ancestral lineages. CRF51_01B then spread rapidly among the MSM in Singapore and Malaysia. Although the importation of CRF51_01B from Singapore to Malaysia is supported by coalescence analysis, the narrow timeframe of the transmission event indicates a closely linked epidemic. Discrepancies in the estimated divergence times suggest that CRF51_01B may have arisen through multiple recombination events from more than one parental lineage. We report the cross transmission of a novel CRF51_01B lineage between countries that involved different sexual risk groups. Understanding the cross-border transmission of HIV-1 involving sexual networks is crucial for effective intervention strategies in the region. PMID:25340817

  20. Cross-border sexual transmission of the newly emerging HIV-1 clade CRF51_01B.

    PubMed

    Cheong, Hui Ting; Ng, Kim Tien; Ong, Lai Yee; Chook, Jack Bee; Chan, Kok Gan; Takebe, Yutaka; Kamarulzaman, Adeeba; Tee, Kok Keng

    2014-01-01

    A novel HIV-1 recombinant clade (CRF51_01B) was recently identified among men who have sex with men (MSM) in Singapore. As cases of sexually transmitted HIV-1 infection increase concurrently in two socioeconomically intimate countries such as Malaysia and Singapore, cross transmission of HIV-1 between said countries is highly probable. In order to investigate the timeline for the emergence of HIV-1 CRF51_01B in Singapore and its possible introduction into Malaysia, 595 HIV-positive subjects recruited in Kuala Lumpur from 2008 to 2012 were screened. Phylogenetic relationship of 485 amplified polymerase gene sequences was determined through neighbour-joining method. Next, near-full length sequences were amplified for genomic sequences inferred to be CRF51_01B and subjected to further analysis implemented through Bayesian Markov chain Monte Carlo (MCMC) sampling and maximum likelihood methods. Based on the near full length genomes, two isolates formed a phylogenetic cluster with CRF51_01B sequences of Singapore origin, sharing identical recombination structure. Spatial and temporal information from Bayesian MCMC coalescent and maximum likelihood analysis of the protease, gp120 and gp41 genes suggest that Singapore is probably the country of origin of CRF51_01B (as early as in the mid-1990s) and featured a Malaysian who acquired the infection through heterosexual contact as host for its ancestral lineages. CRF51_01B then spread rapidly among the MSM in Singapore and Malaysia. Although the importation of CRF51_01B from Singapore to Malaysia is supported by coalescence analysis, the narrow timeframe of the transmission event indicates a closely linked epidemic. Discrepancies in the estimated divergence times suggest that CRF51_01B may have arisen through multiple recombination events from more than one parental lineage. We report the cross transmission of a novel CRF51_01B lineage between countries that involved different sexual risk groups. Understanding the cross-border transmission of HIV-1 involving sexual networks is crucial for effective intervention strategies in the region.

  1. Systematic evaluation of sequential geostatistical resampling within MCMC for posterior sampling of near-surface geophysical inverse problems

    NASA Astrophysics Data System (ADS)

    Ruggeri, Paolo; Irving, James; Holliger, Klaus

    2015-08-01

    We critically examine the performance of sequential geostatistical resampling (SGR) as a model proposal mechanism for Bayesian Markov-chain-Monte-Carlo (MCMC) solutions to near-surface geophysical inverse problems. Focusing on a series of simple yet realistic synthetic crosshole georadar tomographic examples characterized by different numbers of data, levels of data error and degrees of model parameter spatial correlation, we investigate the efficiency of three different resampling strategies with regard to their ability to generate statistically independent realizations from the Bayesian posterior distribution. Quite importantly, our results show that, no matter what resampling strategy is employed, many of the examined test cases require an unreasonably high number of forward model runs to produce independent posterior samples, meaning that the SGR approach as currently implemented will not be computationally feasible for a wide range of problems. Although use of a novel gradual-deformation-based proposal method can help to alleviate these issues, it does not offer a full solution. Further, we find that the nature of the SGR is found to strongly influence MCMC performance; however no clear rule exists as to what set of inversion parameters and/or overall proposal acceptance rate will allow for the most efficient implementation. We conclude that although the SGR methodology is highly attractive as it allows for the consideration of complex geostatistical priors as well as conditioning to hard and soft data, further developments are necessary in the context of novel or hybrid MCMC approaches for it to be considered generally suitable for near-surface geophysical inversions.

  2. Reparametrization-based estimation of genetic parameters in multi-trait animal model using Integrated Nested Laplace Approximation.

    PubMed

    Mathew, Boby; Holand, Anna Marie; Koistinen, Petri; Léon, Jens; Sillanpää, Mikko J

    2016-02-01

    A novel reparametrization-based INLA approach as a fast alternative to MCMC for the Bayesian estimation of genetic parameters in multivariate animal model is presented. Multi-trait genetic parameter estimation is a relevant topic in animal and plant breeding programs because multi-trait analysis can take into account the genetic correlation between different traits and that significantly improves the accuracy of the genetic parameter estimates. Generally, multi-trait analysis is computationally demanding and requires initial estimates of genetic and residual correlations among the traits, while those are difficult to obtain. In this study, we illustrate how to reparametrize covariance matrices of a multivariate animal model/animal models using modified Cholesky decompositions. This reparametrization-based approach is used in the Integrated Nested Laplace Approximation (INLA) methodology to estimate genetic parameters of multivariate animal model. Immediate benefits are: (1) to avoid difficulties of finding good starting values for analysis which can be a problem, for example in Restricted Maximum Likelihood (REML); (2) Bayesian estimation of (co)variance components using INLA is faster to execute than using Markov Chain Monte Carlo (MCMC) especially when realized relationship matrices are dense. The slight drawback is that priors for covariance matrices are assigned for elements of the Cholesky factor but not directly to the covariance matrix elements as in MCMC. Additionally, we illustrate the concordance of the INLA results with the traditional methods like MCMC and REML approaches. We also present results obtained from simulated data sets with replicates and field data in rice.

  3. Fitting Residual Error Structures for Growth Models in SAS PROC MCMC

    ERIC Educational Resources Information Center

    McNeish, Daniel

    2017-01-01

    In behavioral sciences broadly, estimating growth models with Bayesian methods is becoming increasingly common, especially to combat small samples common with longitudinal data. Although Mplus is becoming an increasingly common program for applied research employing Bayesian methods, the limited selection of prior distributions for the elements of…

  4. Phylodynamic Analysis Reveals CRF01_AE Dissemination between Japan and Neighboring Asian Countries and the Role of Intravenous Drug Use in Transmission

    PubMed Central

    Shiino, Teiichiro; Hattori, Junko; Yokomaku, Yoshiyuki; Iwatani, Yasumasa; Sugiura, Wataru

    2014-01-01

    Background One major circulating HIV-1 subtype in Southeast Asian countries is CRF01_AE, but little is known about its epidemiology in Japan. We conducted a molecular phylodynamic study of patients newly diagnosed with CRF01_AE from 2003 to 2010. Methods Plasma samples from patients registered in Japanese Drug Resistance HIV-1 Surveillance Network were analyzed for protease-reverse transcriptase sequences; all sequences undergo subtyping and phylogenetic analysis using distance-matrix-based, maximum likelihood and Bayesian coalescent Markov Chain Monte Carlo (MCMC) phylogenetic inferences. Transmission clusters were identified using interior branch test and depth-first searches for sub-tree partitions. Times of most recent common ancestor (tMRCAs) of significant clusters were estimated using Bayesian MCMC analysis. Results Among 3618 patient registered in our network, 243 were infected with CRF01_AE. The majority of individuals with CRF01_AE were Japanese, predominantly male, and reported heterosexual contact as their risk factor. We found 5 large clusters with ≥5 members and 25 small clusters consisting of pairs of individuals with highly related CRF01_AE strains. The earliest cluster showed a tMRCA of 1996, and consisted of individuals with their known risk as heterosexual contacts. The other four large clusters showed later tMRCAs between 2000 and 2002 with members including intravenous drug users (IVDU) and non-Japanese, but not men who have sex with men (MSM). In contrast, small clusters included a high frequency of individuals reporting MSM risk factors. Phylogenetic analysis also showed that some individuals infected with HIV strains spread in East and South-eastern Asian countries. Conclusions Introduction of CRF01_AE viruses into Japan is estimated to have occurred in the 1990s. CFR01_AE spread via heterosexual behavior, then among persons connected with non-Japanese, IVDU, and MSM. Phylogenetic analysis demonstrated that some viral variants are largely restricted to Japan, while others have a broad geographic distribution. PMID:25025900

  5. SCoPE: an efficient method of Cosmological Parameter Estimation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Das, Santanu; Souradeep, Tarun, E-mail: santanud@iucaa.ernet.in, E-mail: tarun@iucaa.ernet.in

    Markov Chain Monte Carlo (MCMC) sampler is widely used for cosmological parameter estimation from CMB and other data. However, due to the intrinsic serial nature of the MCMC sampler, convergence is often very slow. Here we present a fast and independently written Monte Carlo method for cosmological parameter estimation named as Slick Cosmological Parameter Estimator (SCoPE), that employs delayed rejection to increase the acceptance rate of a chain, and pre-fetching that helps an individual chain to run on parallel CPUs. An inter-chain covariance update is also incorporated to prevent clustering of the chains allowing faster and better mixing of themore » chains. We use an adaptive method for covariance calculation to calculate and update the covariance automatically as the chains progress. Our analysis shows that the acceptance probability of each step in SCoPE is more than 95% and the convergence of the chains are faster. Using SCoPE, we carry out some cosmological parameter estimations with different cosmological models using WMAP-9 and Planck results. One of the current research interests in cosmology is quantifying the nature of dark energy. We analyze the cosmological parameters from two illustrative commonly used parameterisations of dark energy models. We also asses primordial helium fraction in the universe can be constrained by the present CMB data from WMAP-9 and Planck. The results from our MCMC analysis on the one hand helps us to understand the workability of the SCoPE better, on the other hand it provides a completely independent estimation of cosmological parameters from WMAP-9 and Planck data.« less

  6. Hybrid Gibbs Sampling and MCMC for CMB Analysis at Small Angular Scales

    NASA Technical Reports Server (NTRS)

    Jewell, Jeffrey B.; Eriksen, H. K.; Wandelt, B. D.; Gorski, K. M.; Huey, G.; O'Dwyer, I. J.; Dickinson, C.; Banday, A. J.; Lawrence, C. R.

    2008-01-01

    A) Gibbs Sampling has now been validated as an efficient, statistically exact, and practically useful method for "low-L" (as demonstrated on WMAP temperature polarization data). B) We are extending Gibbs sampling to directly propagate uncertainties in both foreground and instrument models to total uncertainty in cosmological parameters for the entire range of angular scales relevant for Planck. C) Made possible by inclusion of foreground model parameters in Gibbs sampling and hybrid MCMC and Gibbs sampling for the low signal to noise (high-L) regime. D) Future items to be included in the Bayesian framework include: 1) Integration with Hybrid Likelihood (or posterior) code for cosmological parameters; 2) Include other uncertainties in instrumental systematics? (I.e. beam uncertainties, noise estimation, calibration errors, other).

  7. A trans-dimensional Bayesian Markov chain Monte Carlo algorithm for model assessment using frequency-domain electromagnetic data

    USGS Publications Warehouse

    Minsley, Burke J.

    2011-01-01

    A meaningful interpretation of geophysical measurements requires an assessment of the space of models that are consistent with the data, rather than just a single, ‘best’ model which does not convey information about parameter uncertainty. For this purpose, a trans-dimensional Bayesian Markov chain Monte Carlo (MCMC) algorithm is developed for assessing frequencydomain electromagnetic (FDEM) data acquired from airborne or ground-based systems. By sampling the distribution of models that are consistent with measured data and any prior knowledge, valuable inferences can be made about parameter values such as the likely depth to an interface, the distribution of possible resistivity values as a function of depth and non-unique relationships between parameters. The trans-dimensional aspect of the algorithm allows the number of layers to be a free parameter that is controlled by the data, where models with fewer layers are inherently favoured, which provides a natural measure of parsimony and a significant degree of flexibility in parametrization. The MCMC algorithm is used with synthetic examples to illustrate how the distribution of acceptable models is affected by the choice of prior information, the system geometry and configuration and the uncertainty in the measured system elevation. An airborne FDEM data set that was acquired for the purpose of hydrogeological characterization is also studied. The results compare favorably with traditional least-squares analysis, borehole resistivity and lithology logs from the site, and also provide new information about parameter uncertainty necessary for model assessment.

  8. Inclusion of historical information in flood frequency analysis using a Bayesian MCMC technique: a case study for the power dam Orlík, Czech Republic

    NASA Astrophysics Data System (ADS)

    Gaál, Ladislav; Szolgay, Ján; Kohnová, Silvia; Hlavčová, Kamila; Viglione, Alberto

    2010-01-01

    The paper deals with at-site flood frequency estimation in the case when also information on hydrological events from the past with extraordinary magnitude are available. For the joint frequency analysis of systematic observations and historical data, respectively, the Bayesian framework is chosen, which, through adequately defined likelihood functions, allows for incorporation of different sources of hydrological information, e.g., maximum annual flood peaks, historical events as well as measurement errors. The distribution of the parameters of the fitted distribution function and the confidence intervals of the flood quantiles are derived by means of the Markov chain Monte Carlo simulation (MCMC) technique. The paper presents a sensitivity analysis related to the choice of the most influential parameters of the statistical model, which are the length of the historical period h and the perception threshold X0. These are involved in the statistical model under the assumption that except for the events termed as ‘historical’ ones, none of the (unknown) peak discharges from the historical period h should have exceeded the threshold X0. Both higher values of h and lower values of X0 lead to narrower confidence intervals of the estimated flood quantiles; however, it is emphasized that one should be prudent of selecting those parameters, in order to avoid making inferences with wrong assumptions on the unknown hydrological events having occurred in the past. The Bayesian MCMC methodology is presented on the example of the maximum discharges observed during the warm half year at the station Vltava-Kamýk (Czech Republic) in the period 1877-2002. Although the 2002 flood peak, which is related to the vast flooding that affected a large part of Central Europe at that time, occurred in the near past, in the analysis it is treated virtually as a ‘historical’ event in order to illustrate some crucial aspects of including information on extreme historical floods into at-site flood frequency analyses.

  9. Tutorial: Asteroseismic Stellar Modelling with AIMS

    NASA Astrophysics Data System (ADS)

    Lund, Mikkel N.; Reese, Daniel R.

    The goal of aims (Asteroseismic Inference on a Massive Scale) is to estimate stellar parameters and credible intervals/error bars in a Bayesian manner from a set of asteroseismic frequency data and so-called classical constraints. To achieve reliable parameter estimates and computational efficiency, it searches through a grid of pre-computed models using an MCMC algorithm—interpolation within the grid of models is performed by first tessellating the grid using a Delaunay triangulation and then doing a linear barycentric interpolation on matching simplexes. Inputs for the modelling consist of individual frequencies from peak-bagging, which can be complemented with classical spectroscopic constraints. aims is mostly written in Python with a modular structure to facilitate contributions from the community. Only a few computationally intensive parts have been rewritten in Fortran in order to speed up calculations.

  10. Item Response Theory Equating Using Bayesian Informative Priors.

    ERIC Educational Resources Information Center

    de la Torre, Jimmy; Patz, Richard J.

    This paper seeks to extend the application of Markov chain Monte Carlo (MCMC) methods in item response theory (IRT) to include the estimation of equating relationships along with the estimation of test item parameters. A method is proposed that incorporates estimation of the equating relationship in the item calibration phase. Item parameters from…

  11. Active Subspace Methods for Data-Intensive Inverse Problems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, Qiqi

    2017-04-27

    The project has developed theory and computational tools to exploit active subspaces to reduce the dimension in statistical calibration problems. This dimension reduction enables MCMC methods to calibrate otherwise intractable models. The same theoretical and computational tools can also reduce the measurement dimension for calibration problems that use large stores of data.

  12. On the estimation of the reproduction number based on misreported epidemic data.

    PubMed

    Azmon, Amin; Faes, Christel; Hens, Niel

    2014-03-30

    Epidemic data often suffer from underreporting and delay in reporting. In this paper, we investigated the impact of delays and underreporting on estimates of reproduction number. We used a thinned version of the epidemic renewal equation to describe the epidemic process while accounting for the underlying reporting system. Assuming a constant reporting parameter, we used different delay patterns to represent the delay structure in our model. Instead of assuming a fixed delay distribution, we estimated the delay parameters while assuming a smooth function for the reproduction number over time. In order to estimate the parameters, we used a Bayesian semiparametric approach with penalized splines, allowing both flexibility and exact inference provided by MCMC. To show the performance of our method, we performed different simulation studies. We conducted sensitivity analyses to investigate the impact of misspecification of the delay pattern and the impact of assuming nonconstant reporting parameters on the estimates of the reproduction numbers. We showed that, whenever available, additional information about time-dependent underreporting can be taken into account. As an application of our method, we analyzed confirmed daily A(H1N1) v2009 cases made publicly available by the World Health Organization for Mexico and the USA. Copyright © 2013 John Wiley & Sons, Ltd.

  13. Population subdivision and molecular sequence variation: theory and analysis of Drosophila ananassae data.

    PubMed

    Vogl, Claus; Das, Aparup; Beaumont, Mark; Mohanty, Sujata; Stephan, Wolfgang

    2003-11-01

    Population subdivision complicates analysis of molecular variation. Even if neutrality is assumed, three evolutionary forces need to be considered: migration, mutation, and drift. Simplification can be achieved by assuming that the process of migration among and drift within subpopulations is occurring fast compared to mutation and drift in the entire population. This allows a two-step approach in the analysis: (i) analysis of population subdivision and (ii) analysis of molecular variation in the migrant pool. We model population subdivision using an infinite island model, where we allow the migration/drift parameter Theta to vary among populations. Thus, central and peripheral populations can be differentiated. For inference of Theta, we use a coalescence approach, implemented via a Markov chain Monte Carlo (MCMC) integration method that allows estimation of allele frequencies in the migrant pool. The second step of this approach (analysis of molecular variation in the migrant pool) uses the estimated allele frequencies in the migrant pool for the study of molecular variation. We apply this method to a Drosophila ananassae sequence data set. We find little indication of isolation by distance, but large differences in the migration parameter among populations. The population as a whole seems to be expanding. A population from Bogor (Java, Indonesia) shows the highest variation and seems closest to the species center.

  14. Including non-additive genetic effects in Bayesian methods for the prediction of genetic values based on genome-wide markers

    PubMed Central

    2011-01-01

    Background Molecular marker information is a common source to draw inferences about the relationship between genetic and phenotypic variation. Genetic effects are often modelled as additively acting marker allele effects. The true mode of biological action can, of course, be different from this plain assumption. One possibility to better understand the genetic architecture of complex traits is to include intra-locus (dominance) and inter-locus (epistasis) interaction of alleles as well as the additive genetic effects when fitting a model to a trait. Several Bayesian MCMC approaches exist for the genome-wide estimation of genetic effects with high accuracy of genetic value prediction. Including pairwise interaction for thousands of loci would probably go beyond the scope of such a sampling algorithm because then millions of effects are to be estimated simultaneously leading to months of computation time. Alternative solving strategies are required when epistasis is studied. Methods We extended a fast Bayesian method (fBayesB), which was previously proposed for a purely additive model, to include non-additive effects. The fBayesB approach was used to estimate genetic effects on the basis of simulated datasets. Different scenarios were simulated to study the loss of accuracy of prediction, if epistatic effects were not simulated but modelled and vice versa. Results If 23 QTL were simulated to cause additive and dominance effects, both fBayesB and a conventional MCMC sampler BayesB yielded similar results in terms of accuracy of genetic value prediction and bias of variance component estimation based on a model including additive and dominance effects. Applying fBayesB to data with epistasis, accuracy could be improved by 5% when all pairwise interactions were modelled as well. The accuracy decreased more than 20% if genetic variation was spread over 230 QTL. In this scenario, accuracy based on modelling only additive and dominance effects was generally superior to that of the complex model including epistatic effects. Conclusions This simulation study showed that the fBayesB approach is convenient for genetic value prediction. Jointly estimating additive and non-additive effects (especially dominance) has reasonable impact on the accuracy of prediction and the proportion of genetic variation assigned to the additive genetic source. PMID:21867519

  15. A review and comparison of Bayesian and likelihood-based inferences in beta regression and zero-or-one-inflated beta regression.

    PubMed

    Liu, Fang; Eugenio, Evercita C

    2018-04-01

    Beta regression is an increasingly popular statistical technique in medical research for modeling of outcomes that assume values in (0, 1), such as proportions and patient reported outcomes. When outcomes take values in the intervals [0,1), (0,1], or [0,1], zero-or-one-inflated beta (zoib) regression can be used. We provide a thorough review on beta regression and zoib regression in the modeling, inferential, and computational aspects via the likelihood-based and Bayesian approaches. We demonstrate the statistical and practical importance of correctly modeling the inflation at zero/one rather than ad hoc replacing them with values close to zero/one via simulation studies; the latter approach can lead to biased estimates and invalid inferences. We show via simulation studies that the likelihood-based approach is computationally faster in general than MCMC algorithms used in the Bayesian inferences, but runs the risk of non-convergence, large biases, and sensitivity to starting values in the optimization algorithm especially with clustered/correlated data, data with sparse inflation at zero and one, and data that warrant regularization of the likelihood. The disadvantages of the regular likelihood-based approach make the Bayesian approach an attractive alternative in these cases. Software packages and tools for fitting beta and zoib regressions in both the likelihood-based and Bayesian frameworks are also reviewed.

  16. Two new methods to fit models for network meta-analysis with random inconsistency effects.

    PubMed

    Law, Martin; Jackson, Dan; Turner, Rebecca; Rhodes, Kirsty; Viechtbauer, Wolfgang

    2016-07-28

    Meta-analysis is a valuable tool for combining evidence from multiple studies. Network meta-analysis is becoming more widely used as a means to compare multiple treatments in the same analysis. However, a network meta-analysis may exhibit inconsistency, whereby the treatment effect estimates do not agree across all trial designs, even after taking between-study heterogeneity into account. We propose two new estimation methods for network meta-analysis models with random inconsistency effects. The model we consider is an extension of the conventional random-effects model for meta-analysis to the network meta-analysis setting and allows for potential inconsistency using random inconsistency effects. Our first new estimation method uses a Bayesian framework with empirically-based prior distributions for both the heterogeneity and the inconsistency variances. We fit the model using importance sampling and thereby avoid some of the difficulties that might be associated with using Markov Chain Monte Carlo (MCMC). However, we confirm the accuracy of our importance sampling method by comparing the results to those obtained using MCMC as the gold standard. The second new estimation method we describe uses a likelihood-based approach, implemented in the metafor package, which can be used to obtain (restricted) maximum-likelihood estimates of the model parameters and profile likelihood confidence intervals of the variance components. We illustrate the application of the methods using two contrasting examples. The first uses all-cause mortality as an outcome, and shows little evidence of between-study heterogeneity or inconsistency. The second uses "ear discharge" as an outcome, and exhibits substantial between-study heterogeneity and inconsistency. Both new estimation methods give results similar to those obtained using MCMC. The extent of heterogeneity and inconsistency should be assessed and reported in any network meta-analysis. Our two new methods can be used to fit models for network meta-analysis with random inconsistency effects. They are easily implemented using the accompanying R code in the Additional file 1. Using these estimation methods, the extent of inconsistency can be assessed and reported.

  17. A General Model for Estimating Macroevolutionary Landscapes.

    PubMed

    Boucher, Florian C; Démery, Vincent; Conti, Elena; Harmon, Luke J; Uyeda, Josef

    2018-03-01

    The evolution of quantitative characters over long timescales is often studied using stochastic diffusion models. The current toolbox available to students of macroevolution is however limited to two main models: Brownian motion and the Ornstein-Uhlenbeck process, plus some of their extensions. Here, we present a very general model for inferring the dynamics of quantitative characters evolving under both random diffusion and deterministic forces of any possible shape and strength, which can accommodate interesting evolutionary scenarios like directional trends, disruptive selection, or macroevolutionary landscapes with multiple peaks. This model is based on a general partial differential equation widely used in statistical mechanics: the Fokker-Planck equation, also known in population genetics as the Kolmogorov forward equation. We thus call the model FPK, for Fokker-Planck-Kolmogorov. We first explain how this model can be used to describe macroevolutionary landscapes over which quantitative traits evolve and, more importantly, we detail how it can be fitted to empirical data. Using simulations, we show that the model has good behavior both in terms of discrimination from alternative models and in terms of parameter inference. We provide R code to fit the model to empirical data using either maximum-likelihood or Bayesian estimation, and illustrate the use of this code with two empirical examples of body mass evolution in mammals. FPK should greatly expand the set of macroevolutionary scenarios that can be studied since it opens the way to estimating macroevolutionary landscapes of any conceivable shape. [Adaptation; bounds; diffusion; FPK model; macroevolution; maximum-likelihood estimation; MCMC methods; phylogenetic comparative data; selection.].

  18. Bayesian analysis of the flutter margin method in aeroelasticity

    DOE PAGES

    Khalil, Mohammad; Poirel, Dominique; Sarkar, Abhijit

    2016-08-27

    A Bayesian statistical framework is presented for Zimmerman and Weissenburger flutter margin method which considers the uncertainties in aeroelastic modal parameters. The proposed methodology overcomes the limitations of the previously developed least-square based estimation technique which relies on the Gaussian approximation of the flutter margin probability density function (pdf). Using the measured free-decay responses at subcritical (preflutter) airspeeds, the joint non-Gaussain posterior pdf of the modal parameters is sampled using the Metropolis–Hastings (MH) Markov chain Monte Carlo (MCMC) algorithm. The posterior MCMC samples of the modal parameters are then used to obtain the flutter margin pdfs and finally the fluttermore » speed pdf. The usefulness of the Bayesian flutter margin method is demonstrated using synthetic data generated from a two-degree-of-freedom pitch-plunge aeroelastic model. The robustness of the statistical framework is demonstrated using different sets of measurement data. In conclusion, it will be shown that the probabilistic (Bayesian) approach reduces the number of test points required in providing a flutter speed estimate for a given accuracy and precision.« less

  19. Smoothing spline ANOVA frailty model for recurrent event data.

    PubMed

    Du, Pang; Jiang, Yihua; Wang, Yuedong

    2011-12-01

    Gap time hazard estimation is of particular interest in recurrent event data. This article proposes a fully nonparametric approach for estimating the gap time hazard. Smoothing spline analysis of variance (ANOVA) decompositions are used to model the log gap time hazard as a joint function of gap time and covariates, and general frailty is introduced to account for between-subject heterogeneity and within-subject correlation. We estimate the nonparametric gap time hazard function and parameters in the frailty distribution using a combination of the Newton-Raphson procedure, the stochastic approximation algorithm (SAA), and the Markov chain Monte Carlo (MCMC) method. The convergence of the algorithm is guaranteed by decreasing the step size of parameter update and/or increasing the MCMC sample size along iterations. Model selection procedure is also developed to identify negligible components in a functional ANOVA decomposition of the log gap time hazard. We evaluate the proposed methods with simulation studies and illustrate its use through the analysis of bladder tumor data. © 2011, The International Biometric Society.

  20. Probability density of spatially distributed soil moisture inferred from crosshole georadar traveltime measurements

    NASA Astrophysics Data System (ADS)

    Linde, N.; Vrugt, J. A.

    2009-04-01

    Geophysical models are increasingly used in hydrological simulations and inversions, where they are typically treated as an artificial data source with known uncorrelated "data errors". The model appraisal problem in classical deterministic linear and non-linear inversion approaches based on linearization is often addressed by calculating model resolution and model covariance matrices. These measures offer only a limited potential to assign a more appropriate "data covariance matrix" for future hydrological applications, simply because the regularization operators used to construct a stable inverse solution bear a strong imprint on such estimates and because the non-linearity of the geophysical inverse problem is not explored. We present a parallelized Markov Chain Monte Carlo (MCMC) scheme to efficiently derive the posterior spatially distributed radar slowness and water content between boreholes given first-arrival traveltimes. This method is called DiffeRential Evolution Adaptive Metropolis (DREAM_ZS) with snooker updater and sampling from past states. Our inverse scheme does not impose any smoothness on the final solution, and uses uniform prior ranges of the parameters. The posterior distribution of radar slowness is converted into spatially distributed soil moisture values using a petrophysical relationship. To benchmark the performance of DREAM_ZS, we first apply our inverse method to a synthetic two-dimensional infiltration experiment using 9421 traveltimes contaminated with Gaussian errors and 80 different model parameters, corresponding to a model discretization of 0.3 m × 0.3 m. After this, the method is applied to field data acquired in the vadose zone during snowmelt. This work demonstrates that fully non-linear stochastic inversion can be applied with few limiting assumptions to a range of common two-dimensional tomographic geophysical problems. The main advantage of DREAM_ZS is that it provides a full view of the posterior distribution of spatially distributed soil moisture, which is key to appropriately treat geophysical parameter uncertainty and infer hydrologic models.

  1. Random Partition Distribution Indexed by Pairwise Information

    PubMed Central

    Dahl, David B.; Day, Ryan; Tsai, Jerry W.

    2017-01-01

    We propose a random partition distribution indexed by pairwise similarity information such that partitions compatible with the similarities are given more probability. The use of pairwise similarities, in the form of distances, is common in some clustering algorithms (e.g., hierarchical clustering), but we show how to use this type of information to define a prior partition distribution for flexible Bayesian modeling. A defining feature of the distribution is that it allocates probability among partitions within a given number of subsets, but it does not shift probability among sets of partitions with different numbers of subsets. Our distribution places more probability on partitions that group similar items yet keeps the total probability of partitions with a given number of subsets constant. The distribution of the number of subsets (and its moments) is available in closed-form and is not a function of the similarities. Our formulation has an explicit probability mass function (with a tractable normalizing constant) so the full suite of MCMC methods may be used for posterior inference. We compare our distribution with several existing partition distributions, showing that our formulation has attractive properties. We provide three demonstrations to highlight the features and relative performance of our distribution. PMID:29276318

  2. Joint Estimation of Contamination, Error and Demography for Nuclear DNA from Ancient Humans

    PubMed Central

    Slatkin, Montgomery

    2016-01-01

    When sequencing an ancient DNA sample from a hominin fossil, DNA from present-day humans involved in excavation and extraction will be sequenced along with the endogenous material. This type of contamination is problematic for downstream analyses as it will introduce a bias towards the population of the contaminating individual(s). Quantifying the extent of contamination is a crucial step as it allows researchers to account for possible biases that may arise in downstream genetic analyses. Here, we present an MCMC algorithm to co-estimate the contamination rate, sequencing error rate and demographic parameters—including drift times and admixture rates—for an ancient nuclear genome obtained from human remains, when the putative contaminating DNA comes from present-day humans. We assume we have a large panel representing the putative contaminant population (e.g. European, East Asian or African). The method is implemented in a C++ program called ‘Demographic Inference with Contamination and Error’ (DICE). We applied it to simulations and genome data from ancient Neanderthals and modern humans. With reasonable levels of genome sequence coverage (>3X), we find we can recover accurate estimates of all these parameters, even when the contamination rate is as high as 50%. PMID:27049965

  3. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Tony Y.; Wechsler, Risa H.; Devaraj, Kiruthika

    Intensity mapping, which images a single spectral line from unresolved galaxies across cosmological volumes, is a promising technique for probing the early universe. Here we present predictions for the intensity map and power spectrum of the CO(1–0) line from galaxies atmore » $$z\\sim 2.4$$–2.8, based on a parameterized model for the galaxy–halo connection, and demonstrate the extent to which properties of high-redshift galaxies can be directly inferred from such observations. We find that our fiducial prediction should be detectable by a realistic experiment. Motivated by significant modeling uncertainties, we demonstrate the effect on the power spectrum of varying each parameter in our model. Using simulated observations, we infer constraints on our model parameter space with an MCMC procedure, and show corresponding constraints on the $${L}_{\\mathrm{IR}}$$–$${L}_{\\mathrm{CO}}$$ relation and the CO luminosity function. These constraints would be complementary to current high-redshift galaxy observations, which can detect the brightest galaxies but not complete samples from the faint end of the luminosity function. Furthermore, by probing these populations in aggregate, CO intensity mapping could be a valuable tool for probing molecular gas and its relation to star formation in high-redshift galaxies.« less

  4. Technical Note: Approximate Bayesian parameterization of a process-based tropical forest model

    NASA Astrophysics Data System (ADS)

    Hartig, F.; Dislich, C.; Wiegand, T.; Huth, A.

    2014-02-01

    Inverse parameter estimation of process-based models is a long-standing problem in many scientific disciplines. A key question for inverse parameter estimation is how to define the metric that quantifies how well model predictions fit to the data. This metric can be expressed by general cost or objective functions, but statistical inversion methods require a particular metric, the probability of observing the data given the model parameters, known as the likelihood. For technical and computational reasons, likelihoods for process-based stochastic models are usually based on general assumptions about variability in the observed data, and not on the stochasticity generated by the model. Only in recent years have new methods become available that allow the generation of likelihoods directly from stochastic simulations. Previous applications of these approximate Bayesian methods have concentrated on relatively simple models. Here, we report on the application of a simulation-based likelihood approximation for FORMIND, a parameter-rich individual-based model of tropical forest dynamics. We show that approximate Bayesian inference, based on a parametric likelihood approximation placed in a conventional Markov chain Monte Carlo (MCMC) sampler, performs well in retrieving known parameter values from virtual inventory data generated by the forest model. We analyze the results of the parameter estimation, examine its sensitivity to the choice and aggregation of model outputs and observed data (summary statistics), and demonstrate the application of this method by fitting the FORMIND model to field data from an Ecuadorian tropical forest. Finally, we discuss how this approach differs from approximate Bayesian computation (ABC), another method commonly used to generate simulation-based likelihood approximations. Our results demonstrate that simulation-based inference, which offers considerable conceptual advantages over more traditional methods for inverse parameter estimation, can be successfully applied to process-based models of high complexity. The methodology is particularly suitable for heterogeneous and complex data structures and can easily be adjusted to other model types, including most stochastic population and individual-based models. Our study therefore provides a blueprint for a fairly general approach to parameter estimation of stochastic process-based models.

  5. A High-Resolution Aerosol Retrieval Method for Urban Areas Using MISR Data

    NASA Astrophysics Data System (ADS)

    Moon, T.; Wang, Y.; Liu, Y.; Yu, B.

    2012-12-01

    Satellite-retrieved Aerosol Optical Depth (AOD) can provide a cost-effective way to monitor particulate air pollution without using expensive ground measurement sensors. One of the current state-of-the-art AOD retrieval method is NASA's Multi-angle Imaging SpectroRadiometer (MISR) operational algorithm, which has the spatial resolution of 17.6 km x 17.6 km. While the MISR baseline scheme already leads to exciting research opportunities to study particle compositions at regional scale, its spatial resolution is too coarse for analyzing urban areas where the AOD level has stronger spatial variations. We develop a novel high-resolution AOD retrieval algorithm that still uses MISR's radiance observations but has the resolution of 4.4km x 4.4km. We achieve the high resolution AOD retrieval by implementing a hierarchical Bayesian model and Monte-Carlo Markov Chain (MCMC) inference method. Our algorithm not only improves the spatial resolution, but also extends the coverage of AOD retrieval and provides with additional composition information of aerosol components that contribute to the AOD. We validate our method using the recent NASA's DISCOVER-AQ mission data, which contains the ground measured AOD values for Washington DC and Baltimore area. The validation result shows that, compared to the operational MISR retrievals, our scheme has 41.1% more AOD retrieval coverage for the DISCOVER-AQ data points and 24.2% improvement in mean-squared error (MSE) with respect to the AERONET ground measurements.

  6. Neural Network Emulation of Reionization Simulations

    NASA Astrophysics Data System (ADS)

    Schmit, Claude J.; Pritchard, Jonathan R.

    2018-05-01

    Next generation radio experiments such as LOFAR, HERA and SKA are expected to probe the Epoch of Reionization and claim a first direct detection of the cosmic 21cm signal within the next decade. One of the major challenges for these experiments will be dealing with enormous incoming data volumes. Machine learning is key to increasing our data analysis efficiency. We consider the use of an artificial neural network to emulate 21cmFAST simulations and use it in a Bayesian parameter inference study. We then compare the network predictions to a direct evaluation of the EoR simulations and analyse the dependence of the results on the training set size. We find that the use of a training set of size 100 samples can recover the error contours of a full scale MCMC analysis which evaluates the model at each step.

  7. Two-dimensional probabilistic inversion of plane-wave electromagnetic data: methodology, model constraints and joint inversion with electrical resistivity data

    NASA Astrophysics Data System (ADS)

    Rosas-Carbajal, Marina; Linde, Niklas; Kalscheuer, Thomas; Vrugt, Jasper A.

    2014-03-01

    Probabilistic inversion methods based on Markov chain Monte Carlo (MCMC) simulation are well suited to quantify parameter and model uncertainty of nonlinear inverse problems. Yet, application of such methods to CPU-intensive forward models can be a daunting task, particularly if the parameter space is high dimensional. Here, we present a 2-D pixel-based MCMC inversion of plane-wave electromagnetic (EM) data. Using synthetic data, we investigate how model parameter uncertainty depends on model structure constraints using different norms of the likelihood function and the model constraints, and study the added benefits of joint inversion of EM and electrical resistivity tomography (ERT) data. Our results demonstrate that model structure constraints are necessary to stabilize the MCMC inversion results of a highly discretized model. These constraints decrease model parameter uncertainty and facilitate model interpretation. A drawback is that these constraints may lead to posterior distributions that do not fully include the true underlying model, because some of its features exhibit a low sensitivity to the EM data, and hence are difficult to resolve. This problem can be partly mitigated if the plane-wave EM data is augmented with ERT observations. The hierarchical Bayesian inverse formulation introduced and used herein is able to successfully recover the probabilistic properties of the measurement data errors and a model regularization weight. Application of the proposed inversion methodology to field data from an aquifer demonstrates that the posterior mean model realization is very similar to that derived from a deterministic inversion with similar model constraints.

  8. Joint analysis of input and parametric uncertainties in watershed water quality modeling: A formal Bayesian approach

    NASA Astrophysics Data System (ADS)

    Han, Feng; Zheng, Yi

    2018-06-01

    Significant Input uncertainty is a major source of error in watershed water quality (WWQ) modeling. It remains challenging to address the input uncertainty in a rigorous Bayesian framework. This study develops the Bayesian Analysis of Input and Parametric Uncertainties (BAIPU), an approach for the joint analysis of input and parametric uncertainties through a tight coupling of Markov Chain Monte Carlo (MCMC) analysis and Bayesian Model Averaging (BMA). The formal likelihood function for this approach is derived considering a lag-1 autocorrelated, heteroscedastic, and Skew Exponential Power (SEP) distributed error model. A series of numerical experiments were performed based on a synthetic nitrate pollution case and on a real study case in the Newport Bay Watershed, California. The Soil and Water Assessment Tool (SWAT) and Differential Evolution Adaptive Metropolis (DREAM(ZS)) were used as the representative WWQ model and MCMC algorithm, respectively. The major findings include the following: (1) the BAIPU can be implemented and used to appropriately identify the uncertain parameters and characterize the predictive uncertainty; (2) the compensation effect between the input and parametric uncertainties can seriously mislead the modeling based management decisions, if the input uncertainty is not explicitly accounted for; (3) the BAIPU accounts for the interaction between the input and parametric uncertainties and therefore provides more accurate calibration and uncertainty results than a sequential analysis of the uncertainties; and (4) the BAIPU quantifies the credibility of different input assumptions on a statistical basis and can be implemented as an effective inverse modeling approach to the joint inference of parameters and inputs.

  9. A trans-dimensional Bayesian Markov chain Monte Carlo algorithm for model assessment using frequency-domain electromagnetic data

    USGS Publications Warehouse

    Minsley, B.J.

    2011-01-01

    A meaningful interpretation of geophysical measurements requires an assessment of the space of models that are consistent with the data, rather than just a single, 'best' model which does not convey information about parameter uncertainty. For this purpose, a trans-dimensional Bayesian Markov chain Monte Carlo (MCMC) algorithm is developed for assessing frequency-domain electromagnetic (FDEM) data acquired from airborne or ground-based systems. By sampling the distribution of models that are consistent with measured data and any prior knowledge, valuable inferences can be made about parameter values such as the likely depth to an interface, the distribution of possible resistivity values as a function of depth and non-unique relationships between parameters. The trans-dimensional aspect of the algorithm allows the number of layers to be a free parameter that is controlled by the data, where models with fewer layers are inherently favoured, which provides a natural measure of parsimony and a significant degree of flexibility in parametrization. The MCMC algorithm is used with synthetic examples to illustrate how the distribution of acceptable models is affected by the choice of prior information, the system geometry and configuration and the uncertainty in the measured system elevation. An airborne FDEM data set that was acquired for the purpose of hydrogeological characterization is also studied. The results compare favourably with traditional least-squares analysis, borehole resistivity and lithology logs from the site, and also provide new information about parameter uncertainty necessary for model assessment. ?? 2011. Geophysical Journal International ?? 2011 RAS.

  10. Bayesian Regression of Thermodynamic Models of Redox Active Materials

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Johnston, Katherine

    Finding a suitable functional redox material is a critical challenge to achieving scalable, economically viable technologies for storing concentrated solar energy in the form of a defected oxide. Demonstrating e ectiveness for thermal storage or solar fuel is largely accomplished by using a thermodynamic model derived from experimental data. The purpose of this project is to test the accuracy of our regression model on representative data sets. Determining the accuracy of the model includes parameter tting the model to the data, comparing the model using di erent numbers of param- eters, and analyzing the entropy and enthalpy calculated from themore » model. Three data sets were considered in this project: two demonstrating materials for solar fuels by wa- ter splitting and the other of a material for thermal storage. Using Bayesian Inference and Markov Chain Monte Carlo (MCMC), parameter estimation was preformed on the three data sets. Good results were achieved, except some there was some deviations on the edges of the data input ranges. The evidence values were then calculated in a variety of ways and used to compare models with di erent number of parameters. It was believed that at least one of the parameters was unnecessary and comparing evidence values demonstrated that the parameter was need on one data set and not signi cantly helpful on another. The entropy was calculated by taking the derivative in one variable and integrating over another. and its uncertainty was also calculated by evaluating the entropy over multiple MCMC samples. Afterwards, all the parts were written up as a tutorial for the Uncertainty Quanti cation Toolkit (UQTk).« less

  11. Reciprocal Sliding Friction Model for an Electro-Deposited Coating and Its Parameter Estimation Using Markov Chain Monte Carlo Method

    PubMed Central

    Kim, Kyungmok; Lee, Jaewook

    2016-01-01

    This paper describes a sliding friction model for an electro-deposited coating. Reciprocating sliding tests using ball-on-flat plate test apparatus are performed to determine an evolution of the kinetic friction coefficient. The evolution of the friction coefficient is classified into the initial running-in period, steady-state sliding, and transition to higher friction. The friction coefficient during the initial running-in period and steady-state sliding is expressed as a simple linear function. The friction coefficient in the transition to higher friction is described with a mathematical model derived from Kachanov-type damage law. The model parameters are then estimated using the Markov Chain Monte Carlo (MCMC) approach. It is identified that estimated friction coefficients obtained by MCMC approach are in good agreement with measured ones. PMID:28773359

  12. Linking big models to big data: efficient ecosystem model calibration through Bayesian model emulation

    NASA Astrophysics Data System (ADS)

    Fer, I.; Kelly, R.; Andrews, T.; Dietze, M.; Richardson, A. D.

    2016-12-01

    Our ability to forecast ecosystems is limited by how well we parameterize ecosystem models. Direct measurements for all model parameters are not always possible and inverse estimation of these parameters through Bayesian methods is computationally costly. A solution to computational challenges of Bayesian calibration is to approximate the posterior probability surface using a Gaussian Process that emulates the complex process-based model. Here we report the integration of this method within an ecoinformatics toolbox, Predictive Ecosystem Analyzer (PEcAn), and its application with two ecosystem models: SIPNET and ED2.1. SIPNET is a simple model, allowing application of MCMC methods both to the model itself and to its emulator. We used both approaches to assimilate flux (CO2 and latent heat), soil respiration, and soil carbon data from Bartlett Experimental Forest. This comparison showed that emulator is reliable in terms of convergence to the posterior distribution. A 10000-iteration MCMC analysis with SIPNET itself required more than two orders of magnitude greater computation time than an MCMC run of same length with its emulator. This difference would be greater for a more computationally demanding model. Validation of the emulator-calibrated SIPNET against both the assimilated data and out-of-sample data showed improved fit and reduced uncertainty around model predictions. We next applied the validated emulator method to the ED2, whose complexity precludes standard Bayesian data assimilation. We used the ED2 emulator to assimilate demographic data from a network of inventory plots. For validation of the calibrated ED2, we compared the model to results from Empirical Succession Mapping (ESM), a novel synthesis of successional patterns in Forest Inventory and Analysis data. Our results revealed that while the pre-assimilation ED2 formulation cannot capture the emergent demographic patterns from ESM analysis, constrained model parameters controlling demographic processes increased their agreement considerably.

  13. A Bayesian approach to modeling 2D gravity data using polygon states

    NASA Astrophysics Data System (ADS)

    Titus, W. J.; Titus, S.; Davis, J. R.

    2015-12-01

    We present a Bayesian Markov chain Monte Carlo (MCMC) method for the 2D gravity inversion of a localized subsurface object with constant density contrast. Our models have four parameters: the density contrast, the number of vertices in a polygonal approximation of the object, an upper bound on the ratio of the perimeter squared to the area, and the vertices of a polygon container that bounds the object. Reasonable parameter values can be estimated prior to inversion using a forward model and geologic information. In addition, we assume that the field data have a common random uncertainty that lies between two bounds but that it has no systematic uncertainty. Finally, we assume that there is no uncertainty in the spatial locations of the measurement stations. For any set of model parameters, we use MCMC methods to generate an approximate probability distribution of polygons for the object. We then compute various probability distributions for the object, including the variance between the observed and predicted fields (an important quantity in the MCMC method), the area, the center of area, and the occupancy probability (the probability that a spatial point lies within the object). In addition, we compare probabilities of different models using parallel tempering, a technique which also mitigates trapping in local optima that can occur in certain model geometries. We apply our method to several synthetic data sets generated from objects of varying shape and location. We also analyze a natural data set collected across the Rio Grande Gorge Bridge in New Mexico, where the object (i.e. the air below the bridge) is known and the canyon is approximately 2D. Although there are many ways to view results, the occupancy probability proves quite powerful. We also find that the choice of the container is important. In particular, large containers should be avoided, because the more closely a container confines the object, the better the predictions match properties of object.

  14. Technical Note: Approximate Bayesian parameterization of a complex tropical forest model

    NASA Astrophysics Data System (ADS)

    Hartig, F.; Dislich, C.; Wiegand, T.; Huth, A.

    2013-08-01

    Inverse parameter estimation of process-based models is a long-standing problem in ecology and evolution. A key problem of inverse parameter estimation is to define a metric that quantifies how well model predictions fit to the data. Such a metric can be expressed by general cost or objective functions, but statistical inversion approaches are based on a particular metric, the probability of observing the data given the model, known as the likelihood. Deriving likelihoods for dynamic models requires making assumptions about the probability for observations to deviate from mean model predictions. For technical reasons, these assumptions are usually derived without explicit consideration of the processes in the simulation. Only in recent years have new methods become available that allow generating likelihoods directly from stochastic simulations. Previous applications of these approximate Bayesian methods have concentrated on relatively simple models. Here, we report on the application of a simulation-based likelihood approximation for FORMIND, a parameter-rich individual-based model of tropical forest dynamics. We show that approximate Bayesian inference, based on a parametric likelihood approximation placed in a conventional MCMC, performs well in retrieving known parameter values from virtual field data generated by the forest model. We analyze the results of the parameter estimation, examine the sensitivity towards the choice and aggregation of model outputs and observed data (summary statistics), and show results from using this method to fit the FORMIND model to field data from an Ecuadorian tropical forest. Finally, we discuss differences of this approach to Approximate Bayesian Computing (ABC), another commonly used method to generate simulation-based likelihood approximations. Our results demonstrate that simulation-based inference, which offers considerable conceptual advantages over more traditional methods for inverse parameter estimation, can successfully be applied to process-based models of high complexity. The methodology is particularly suited to heterogeneous and complex data structures and can easily be adjusted to other model types, including most stochastic population and individual-based models. Our study therefore provides a blueprint for a fairly general approach to parameter estimation of stochastic process-based models in ecology and evolution.

  15. Markov Chain Monte Carlo Inversion of Mantle Temperature and Composition, with Application to Iceland

    NASA Astrophysics Data System (ADS)

    Brown, Eric; Petersen, Kenni; Lesher, Charles

    2017-04-01

    Basalts are formed by adiabatic decompression melting of the asthenosphere, and thus provide records of the thermal, chemical and dynamical state of the upper mantle. However, uniquely constraining the importance of these factors through the lens of melting is challenging given the inevitability that primary basalts are the product of variable mixing of melts derived from distinct lithologies having different melting behaviors (e.g. peridotite vs. pyroxenite). Forward mantle melting models, such as REEBOX PRO [1], are useful tools in this regard, because they can account for differences in melting behavior and melt pooling processes, and provide estimates of bulk crust composition and volume that can be compared with geochemical and geophysical constraints, respectively. Nevertheless, these models require critical assumptions regarding mantle temperature, and lithologic abundance(s)/composition(s), all of which are poorly constrained. To provide better constraints on these parameters and their uncertainties, we have coupled a Markov Chain Monte Carlo (MCMC) sampling technique with the REEBOX PRO melting model. The MCMC method systematically samples distributions of key REEBOX PRO input parameters (mantle potential temperature, and initial abundances and compositions of the source lithologies) based on a likelihood function that describes the 'fit' of the model outputs (bulk crust composition and volume and end-member peridotite and pyroxenite melts) relative to geochemical and geophysical constraints and their associated uncertainties. As a case study, we have tested and applied the model to magmatism along Reykjanes Peninsula in Iceland, where pyroxenite has been inferred to be present in the mantle source. This locale is ideal because there exist sufficient geochemical and geophysical data to estimate bulk crust compositions and volumes, as well as the range of near-parental melts derived from the mantle. We find that for the case of passive upwelling, the models that best fit the geochemical and geophysical observables require elevated mantle potential temperatures ( 120 °C above ambient mantle), and 5% pyroxenite. The modeled peridotite source has a trace element composition similar to depleted MORB mantle, whereas the trace element composition of the pyroxenite is similar to enriched mid-ocean ridge basalt. These results highlight the promise of this method for efficiently exploring the range of mantle temperatures, lithologic abundances, and mantle source compositions that are most consistent with available observational constraints in individual volcanic systems. 1 Brown and Lesher (2016), G-cubed, 17, 3929-3968

  16. Rotational evolution of slow-rotator sequence stars

    NASA Astrophysics Data System (ADS)

    Lanzafame, A. C.; Spada, F.

    2015-12-01

    Context. The observed relationship between mass, age and rotation in open clusters shows the progressive development of a slow-rotator sequence among stars possessing a radiative interior and a convective envelope during their pre-main sequence and main-sequence evolution. After 0.6 Gyr, most cluster members of this type have settled on this sequence. Aims: The observed clustering on this sequence suggests that it corresponds to some equilibrium or asymptotic condition that still lacks a complete theoretical interpretation, and which is crucial to our understanding of the stellar angular momentum evolution. Methods: We couple a rotational evolution model, which takes internal differential rotation into account, with classical and new proposals for the wind braking law, and fit models to the data using a Monte Carlo Markov chain (MCMC) method tailored to the problem at hand. We explore to what extent these models are able to reproduce the mass and time dependence of the stellar rotational evolution on the slow-rotator sequence. Results: The description of the evolution of the slow-rotator sequence requires taking the transfer of angular momentum from the radiative core to the convective envelope into account. We find that, in the mass range 0.85-1.10 M⊙, the core-envelope coupling timescale for stars in the slow-rotator sequence scales as M-7.28. Quasi-solid body rotation is achieved only after 1-2 Gyr, depending on stellar mass, which implies that observing small deviations from the Skumanich law (P ∝ √{t}) would require period data of older open clusters than is available to date. The observed evolution in the 0.1-2.5 Gyr age range and in the 0.85-1.10 M⊙ mass range is best reproduced by assuming an empirical mass dependence of the wind angular momentum loss proportional to the convective turnover timescale and to the stellar moment of inertia. Period isochrones based on our MCMC fit provide a tool for inferring stellar ages of solar-like main-sequence stars from their mass and rotation period that is largely independent of the wind braking model adopted. These effectively represent gyro-chronology relationships that take the physics of the two-zone model for the stellar angular momentum evolution into account.

  17. Bayesian analysis of biogeography when the number of areas is large.

    PubMed

    Landis, Michael J; Matzke, Nicholas J; Moore, Brian R; Huelsenbeck, John P

    2013-11-01

    Historical biogeography is increasingly studied from an explicitly statistical perspective, using stochastic models to describe the evolution of species range as a continuous-time Markov process of dispersal between and extinction within a set of discrete geographic areas. The main constraint of these methods is the computational limit on the number of areas that can be specified. We propose a Bayesian approach for inferring biogeographic history that extends the application of biogeographic models to the analysis of more realistic problems that involve a large number of areas. Our solution is based on a "data-augmentation" approach, in which we first populate the tree with a history of biogeographic events that is consistent with the observed species ranges at the tips of the tree. We then calculate the likelihood of a given history by adopting a mechanistic interpretation of the instantaneous-rate matrix, which specifies both the exponential waiting times between biogeographic events and the relative probabilities of each biogeographic change. We develop this approach in a Bayesian framework, marginalizing over all possible biogeographic histories using Markov chain Monte Carlo (MCMC). Besides dramatically increasing the number of areas that can be accommodated in a biogeographic analysis, our method allows the parameters of a given biogeographic model to be estimated and different biogeographic models to be objectively compared. Our approach is implemented in the program, BayArea.

  18. Time-domain induced polarization - an analysis of Cole-Cole parameter resolution and correlation using Markov Chain Monte Carlo inversion

    NASA Astrophysics Data System (ADS)

    Madsen, Line Meldgaard; Fiandaca, Gianluca; Auken, Esben; Christiansen, Anders Vest

    2017-12-01

    The application of time-domain induced polarization (TDIP) is increasing with advances in acquisition techniques, data processing and spectral inversion schemes. An inversion of TDIP data for the spectral Cole-Cole parameters is a non-linear problem, but by applying a 1-D Markov Chain Monte Carlo (MCMC) inversion algorithm, a full non-linear uncertainty analysis of the parameters and the parameter correlations can be accessed. This is essential to understand to what degree the spectral Cole-Cole parameters can be resolved from TDIP data. MCMC inversions of synthetic TDIP data, which show bell-shaped probability distributions with a single maximum, show that the Cole-Cole parameters can be resolved from TDIP data if an acquisition range above two decades in time is applied. Linear correlations between the Cole-Cole parameters are observed and by decreasing the acquisitions ranges, the correlations increase and become non-linear. It is further investigated how waveform and parameter values influence the resolution of the Cole-Cole parameters. A limiting factor is the value of the frequency exponent, C. As C decreases, the resolution of all the Cole-Cole parameters decreases and the results become increasingly non-linear. While the values of the time constant, τ, must be in the acquisition range to resolve the parameters well, the choice between a 50 per cent and a 100 per cent duty cycle for the current injection does not have an influence on the parameter resolution. The limits of resolution and linearity are also studied in a comparison between the MCMC and a linearized gradient-based inversion approach. The two methods are consistent for resolved models, but the linearized approach tends to underestimate the uncertainties for poorly resolved parameters due to the corresponding non-linear features. Finally, an MCMC inversion of 1-D field data verifies that spectral Cole-Cole parameters can also be resolved from TD field measurements.

  19. Self-consistent Bulge/Disk/Halo Galaxy Dynamical Modeling Using Integral Field Kinematics

    NASA Astrophysics Data System (ADS)

    Taranu, D. S.; Obreschkow, D.; Dubinski, J. J.; Fogarty, L. M. R.; van de Sande, J.; Catinella, B.; Cortese, L.; Moffett, A.; Robotham, A. S. G.; Allen, J. T.; Bland-Hawthorn, J.; Bryant, J. J.; Colless, M.; Croom, S. M.; D'Eugenio, F.; Davies, R. L.; Drinkwater, M. J.; Driver, S. P.; Goodwin, M.; Konstantopoulos, I. S.; Lawrence, J. S.; López-Sánchez, Á. R.; Lorente, N. P. F.; Medling, A. M.; Mould, J. R.; Owers, M. S.; Power, C.; Richards, S. N.; Tonini, C.

    2017-11-01

    We introduce a method for modeling disk galaxies designed to take full advantage of data from integral field spectroscopy (IFS). The method fits equilibrium models to simultaneously reproduce the surface brightness, rotation, and velocity dispersion profiles of a galaxy. The models are fully self-consistent 6D distribution functions for a galaxy with a Sérsic profile stellar bulge, exponential disk, and parametric dark-matter halo, generated by an updated version of GalactICS. By creating realistic flux-weighted maps of the kinematic moments (flux, mean velocity, and dispersion), we simultaneously fit photometric and spectroscopic data using both maximum-likelihood and Bayesian (MCMC) techniques. We apply the method to a GAMA spiral galaxy (G79635) with kinematics from the SAMI Galaxy Survey and deep g- and r-band photometry from the VST-KiDS survey, comparing parameter constraints with those from traditional 2D bulge-disk decomposition. Our method returns broadly consistent results for shared parameters while constraining the mass-to-light ratios of stellar components and reproducing the H I-inferred circular velocity well beyond the limits of the SAMI data. Although the method is tailored for fitting integral field kinematic data, it can use other dynamical constraints like central fiber dispersions and H I circular velocities, and is well-suited for modeling galaxies with a combination of deep imaging and H I and/or optical spectra (resolved or otherwise). Our implementation (MagRite) is computationally efficient and can generate well-resolved models and kinematic maps in under a minute on modern processors.

  20. Cognitive diagnosis modelling incorporating item response times.

    PubMed

    Zhan, Peida; Jiao, Hong; Liao, Dandan

    2018-05-01

    To provide more refined diagnostic feedback with collateral information in item response times (RTs), this study proposed joint modelling of attributes and response speed using item responses and RTs simultaneously for cognitive diagnosis. For illustration, an extended deterministic input, noisy 'and' gate (DINA) model was proposed for joint modelling of responses and RTs. Model parameter estimation was explored using the Bayesian Markov chain Monte Carlo (MCMC) method. The PISA 2012 computer-based mathematics data were analysed first. These real data estimates were treated as true values in a subsequent simulation study. A follow-up simulation study with ideal testing conditions was conducted as well to further evaluate model parameter recovery. The results indicated that model parameters could be well recovered using the MCMC approach. Further, incorporating RTs into the DINA model would improve attribute and profile correct classification rates and result in more accurate and precise estimation of the model parameters. © 2017 The British Psychological Society.

  1. Experiences with Markov Chain Monte Carlo Convergence Assessment in Two Psychometric Examples

    ERIC Educational Resources Information Center

    Sinharay, Sandip

    2004-01-01

    There is an increasing use of Markov chain Monte Carlo (MCMC) algorithms for fitting statistical models in psychometrics, especially in situations where the traditional estimation techniques are very difficult to apply. One of the disadvantages of using an MCMC algorithm is that it is not straightforward to determine the convergence of the…

  2. A Technology Solution Strengthens Comprehensive Environmental Management

    DTIC Science & Technology

    2012-05-23

    General Navigation  Chemical Approval Example  NEPA Coordination Example  Safety PPE Example  Summary Marine Corps Support Facility...coordination, completion and documentation through automated workflows of various business processes  Chemical Approval  NEPA Coordination  Safety ...Completion Diagram Government Employee/M CMC MCMC Chemical Manager MCMC HS&E Specialist IMO Chemical Safety Specialist IMO Chemical Environmental

  3. Bayesian Estimation of Multidimensional Item Response Models. A Comparison of Analytic and Simulation Algorithms

    ERIC Educational Resources Information Center

    Martin-Fernandez, Manuel; Revuelta, Javier

    2017-01-01

    This study compares the performance of two estimation algorithms of new usage, the Metropolis-Hastings Robins-Monro (MHRM) and the Hamiltonian MCMC (HMC), with two consolidated algorithms in the psychometric literature, the marginal likelihood via EM algorithm (MML-EM) and the Markov chain Monte Carlo (MCMC), in the estimation of multidimensional…

  4. Locating hazardous gas leaks in the atmosphere via modified genetic, MCMC and particle swarm optimization algorithms

    NASA Astrophysics Data System (ADS)

    Wang, Ji; Zhang, Ru; Yan, Yuting; Dong, Xiaoqiang; Li, Jun Ming

    2017-05-01

    Hazardous gas leaks in the atmosphere can cause significant economic losses in addition to environmental hazards, such as fires and explosions. A three-stage hazardous gas leak source localization method was developed that uses movable and stationary gas concentration sensors. The method calculates a preliminary source inversion with a modified genetic algorithm (MGA) and has the potential to crossover with eliminated individuals from the population, following the selection of the best candidate. The method then determines a search zone using Markov Chain Monte Carlo (MCMC) sampling, utilizing a partial evaluation strategy. The leak source is then accurately localized using a modified guaranteed convergence particle swarm optimization algorithm with several bad-performing individuals, following selection of the most successful individual with dynamic updates. The first two stages are based on data collected by motionless sensors, and the last stage is based on data from movable robots with sensors. The measurement error adaptability and the effect of the leak source location were analyzed. The test results showed that this three-stage localization process can localize a leak source within 1.0 m of the source for different leak source locations, with measurement error standard deviation smaller than 2.0.

  5. Time-dependent Reliability of Dynamic Systems using Subset Simulation with Splitting over a Series of Correlated Time Intervals

    DTIC Science & Technology

    2013-08-01

    cost due to potential warranty costs, repairs and loss of market share. Reliability is the probability that the system will perform its intended...MCMC and splitting sampling schemes. Our proposed SS/ STP method is presented in Section 4, including accuracy bounds and computational effort

  6. Markov chain Monte Carlo linkage analysis: effect of bin width on the probability of linkage.

    PubMed

    Slager, S L; Juo, S H; Durner, M; Hodge, S E

    2001-01-01

    We analyzed part of the Genetic Analysis Workshop (GAW) 12 simulated data using Monte Carlo Markov chain (MCMC) methods that are implemented in the computer program Loki. The MCMC method reports the "probability of linkage" (PL) across the chromosomal regions of interest. The point of maximum PL can then be taken as a "location estimate" for the location of the quantitative trait locus (QTL). However, Loki does not provide a formal statistical test of linkage. In this paper, we explore how the bin width used in the calculations affects the max PL and the location estimate. We analyzed age at onset (AO) and quantitative trait number 5, Q5, from 26 replicates of the general simulated data in one region where we knew a major gene, MG5, is located. For each trait, we found the max PL and the corresponding location estimate, using four different bin widths. We found that bin width, as expected, does affect the max PL and the location estimate, and we recommend that users of Loki explore how their results vary with different bin widths.

  7. Real-time individual predictions of prostate cancer recurrence using joint models

    PubMed Central

    Taylor, Jeremy M. G.; Park, Yongseok; Ankerst, Donna P.; Proust-Lima, Cecile; Williams, Scott; Kestin, Larry; Bae, Kyoungwha; Pickles, Tom; Sandler, Howard

    2012-01-01

    Summary Patients who were previously treated for prostate cancer with radiation therapy are monitored at regular intervals using a laboratory test called Prostate Specific Antigen (PSA). If the value of the PSA test starts to rise, this is an indication that the prostate cancer is more likely to recur, and the patient may wish to initiate new treatments. Such patients could be helped in making medical decisions by an accurate estimate of the probability of recurrence of the cancer in the next few years. In this paper, we describe the methodology for giving the probability of recurrence for a new patient, as implemented on a web-based calculator. The methods use a joint longitudinal survival model. The model is developed on a training dataset of 2,386 patients and tested on a dataset of 846 patients. Bayesian estimation methods are used with one Markov chain Monte Carlo (MCMC) algorithm developed for estimation of the parameters from the training dataset and a second quick MCMC developed for prediction of the risk of recurrence that uses the longitudinal PSA measures from a new patient. PMID:23379600

  8. Variable selection models for genomic selection using whole-genome sequence data and singular value decomposition.

    PubMed

    Meuwissen, Theo H E; Indahl, Ulf G; Ødegård, Jørgen

    2017-12-27

    Non-linear Bayesian genomic prediction models such as BayesA/B/C/R involve iteration and mostly Markov chain Monte Carlo (MCMC) algorithms, which are computationally expensive, especially when whole-genome sequence (WGS) data are analyzed. Singular value decomposition (SVD) of the genotype matrix can facilitate genomic prediction in large datasets, and can be used to estimate marker effects and their prediction error variances (PEV) in a computationally efficient manner. Here, we developed, implemented, and evaluated a direct, non-iterative method for the estimation of marker effects for the BayesC genomic prediction model. The BayesC model assumes a priori that markers have normally distributed effects with probability [Formula: see text] and no effect with probability (1 - [Formula: see text]). Marker effects and their PEV are estimated by using SVD and the posterior probability of the marker having a non-zero effect is calculated. These posterior probabilities are used to obtain marker-specific effect variances, which are subsequently used to approximate BayesC estimates of marker effects in a linear model. A computer simulation study was conducted to compare alternative genomic prediction methods, where a single reference generation was used to estimate marker effects, which were subsequently used for 10 generations of forward prediction, for which accuracies were evaluated. SVD-based posterior probabilities of markers having non-zero effects were generally lower than MCMC-based posterior probabilities, but for some regions the opposite occurred, resulting in clear signals for QTL-rich regions. The accuracies of breeding values estimated using SVD- and MCMC-based BayesC analyses were similar across the 10 generations of forward prediction. For an intermediate number of generations (2 to 5) of forward prediction, accuracies obtained with the BayesC model tended to be slightly higher than accuracies obtained using the best linear unbiased prediction of SNP effects (SNP-BLUP model). When reducing marker density from WGS data to 30 K, SNP-BLUP tended to yield the highest accuracies, at least in the short term. Based on SVD of the genotype matrix, we developed a direct method for the calculation of BayesC estimates of marker effects. Although SVD- and MCMC-based marker effects differed slightly, their prediction accuracies were similar. Assuming that the SVD of the marker genotype matrix is already performed for other reasons (e.g. for SNP-BLUP), computation times for the BayesC predictions were comparable to those of SNP-BLUP.

  9. Simultaneous fitting of genomic-BLUP and Bayes-C components in a genomic prediction model.

    PubMed

    Iheshiulor, Oscar O M; Woolliams, John A; Svendsen, Morten; Solberg, Trygve; Meuwissen, Theo H E

    2017-08-24

    The rapid adoption of genomic selection is due to two key factors: availability of both high-throughput dense genotyping and statistical methods to estimate and predict breeding values. The development of such methods is still ongoing and, so far, there is no consensus on the best approach. Currently, the linear and non-linear methods for genomic prediction (GP) are treated as distinct approaches. The aim of this study was to evaluate the implementation of an iterative method (called GBC) that incorporates aspects of both linear [genomic-best linear unbiased prediction (G-BLUP)] and non-linear (Bayes-C) methods for GP. The iterative nature of GBC makes it less computationally demanding similar to other non-Markov chain Monte Carlo (MCMC) approaches. However, as a Bayesian method, GBC differs from both MCMC- and non-MCMC-based methods by combining some aspects of G-BLUP and Bayes-C methods for GP. Its relative performance was compared to those of G-BLUP and Bayes-C. We used an imputed 50 K single-nucleotide polymorphism (SNP) dataset based on the Illumina Bovine50K BeadChip, which included 48,249 SNPs and 3244 records. Daughter yield deviations for somatic cell count, fat yield, milk yield, and protein yield were used as response variables. GBC was frequently (marginally) superior to G-BLUP and Bayes-C in terms of prediction accuracy and was significantly better than G-BLUP only for fat yield. On average across the four traits, GBC yielded a 0.009 and 0.006 increase in prediction accuracy over G-BLUP and Bayes-C, respectively. Computationally, GBC was very much faster than Bayes-C and similar to G-BLUP. Our results show that incorporating some aspects of G-BLUP and Bayes-C in a single model can improve accuracy of GP over the commonly used method: G-BLUP. Generally, GBC did not statistically perform better than G-BLUP and Bayes-C, probably due to the close relationships between reference and validation individuals. Nevertheless, it is a flexible tool, in the sense, that it simultaneously incorporates some aspects of linear and non-linear models for GP, thereby exploiting family relationships while also accounting for linkage disequilibrium between SNPs and genes with large effects. The application of GBC in GP merits further exploration.

  10. Bayesian seismic tomography by parallel interacting Markov chains

    NASA Astrophysics Data System (ADS)

    Gesret, Alexandrine; Bottero, Alexis; Romary, Thomas; Noble, Mark; Desassis, Nicolas

    2014-05-01

    The velocity field estimated by first arrival traveltime tomography is commonly used as a starting point for further seismological, mineralogical, tectonic or similar analysis. In order to interpret quantitatively the results, the tomography uncertainty values as well as their spatial distribution are required. The estimated velocity model is obtained through inverse modeling by minimizing an objective function that compares observed and computed traveltimes. This step is often performed by gradient-based optimization algorithms. The major drawback of such local optimization schemes, beyond the possibility of being trapped in a local minimum, is that they do not account for the multiple possible solutions of the inverse problem. They are therefore unable to assess the uncertainties linked to the solution. Within a Bayesian (probabilistic) framework, solving the tomography inverse problem aims at estimating the posterior probability density function of velocity model using a global sampling algorithm. Markov chains Monte-Carlo (MCMC) methods are known to produce samples of virtually any distribution. In such a Bayesian inversion, the total number of simulations we can afford is highly related to the computational cost of the forward model. Although fast algorithms have been recently developed for computing first arrival traveltimes of seismic waves, the complete browsing of the posterior distribution of velocity model is hardly performed, especially when it is high dimensional and/or multimodal. In the latter case, the chain may even stay stuck in one of the modes. In order to improve the mixing properties of classical single MCMC, we propose to make interact several Markov chains at different temperatures. This method can make efficient use of large CPU clusters, without increasing the global computational cost with respect to classical MCMC and is therefore particularly suited for Bayesian inversion. The exchanges between the chains allow a precise sampling of the high probability zones of the model space while avoiding the chains to end stuck in a probability maximum. This approach supplies thus a robust way to analyze the tomography imaging uncertainties. The interacting MCMC approach is illustrated on two synthetic examples of tomography of calibration shots such as encountered in induced microseismic studies. On the second application, a wavelet based model parameterization is presented that allows to significantly reduce the dimension of the problem, making thus the algorithm efficient even for a complex velocity model.

  11. Species delimitation using Bayes factors: simulations and application to the Sceloporus scalaris species group (Squamata: Phrynosomatidae).

    PubMed

    Grummer, Jared A; Bryson, Robert W; Reeder, Tod W

    2014-03-01

    Current molecular methods of species delimitation are limited by the types of species delimitation models and scenarios that can be tested. Bayes factors allow for more flexibility in testing non-nested species delimitation models and hypotheses of individual assignment to alternative lineages. Here, we examined the efficacy of Bayes factors in delimiting species through simulations and empirical data from the Sceloporus scalaris species group. Marginal-likelihood scores of competing species delimitation models, from which Bayes factor values were compared, were estimated with four different methods: harmonic mean estimation (HME), smoothed harmonic mean estimation (sHME), path-sampling/thermodynamic integration (PS), and stepping-stone (SS) analysis. We also performed model selection using a posterior simulation-based analog of the Akaike information criterion through Markov chain Monte Carlo analysis (AICM). Bayes factor species delimitation results from the empirical data were then compared with results from the reversible-jump MCMC (rjMCMC) coalescent-based species delimitation method Bayesian Phylogenetics and Phylogeography (BP&P). Simulation results show that HME and sHME perform poorly compared with PS and SS marginal-likelihood estimators when identifying the true species delimitation model. Furthermore, Bayes factor delimitation (BFD) of species showed improved performance when species limits are tested by reassigning individuals between species, as opposed to either lumping or splitting lineages. In the empirical data, BFD through PS and SS analyses, as well as the rjMCMC method, each provide support for the recognition of all scalaris group taxa as independent evolutionary lineages. Bayes factor species delimitation and BP&P also support the recognition of three previously undescribed lineages. In both simulated and empirical data sets, harmonic and smoothed harmonic mean marginal-likelihood estimators provided much higher marginal-likelihood estimates than PS and SS estimators. The AICM displayed poor repeatability in both simulated and empirical data sets, and produced inconsistent model rankings across replicate runs with the empirical data. Our results suggest that species delimitation through the use of Bayes factors with marginal-likelihood estimates via PS or SS analyses provide a useful and complementary alternative to existing species delimitation methods.

  12. Mining geographic variations of Plasmodium vivax for active surveillance: a case study in China.

    PubMed

    Shi, Benyun; Tan, Qi; Zhou, Xiao-Nong; Liu, Jiming

    2015-05-27

    Geographic variations of an infectious disease characterize the spatial differentiation of disease incidences caused by various impact factors, such as environmental, demographic, and socioeconomic factors. Some factors may directly determine the force of infection of the disease (namely, explicit factors), while many other factors may indirectly affect the number of disease incidences via certain unmeasurable processes (namely, implicit factors). In this study, the impact of heterogeneous factors on geographic variations of Plasmodium vivax incidences is systematically investigate in Tengchong, Yunnan province, China. A space-time model that resembles a P. vivax transmission model and a hidden time-dependent process, is presented by taking into consideration both explicit and implicit factors. Specifically, the transmission model is built upon relevant demographic, environmental, and biophysical factors to describe the local infections of P. vivax. While the hidden time-dependent process is assessed by several socioeconomic factors to account for the imported cases of P. vivax. To quantitatively assess the impact of heterogeneous factors on geographic variations of P. vivax infections, a Markov chain Monte Carlo (MCMC) simulation method is developed to estimate the model parameters by fitting the space-time model to the reported spatial-temporal disease incidences. Since there is no ground-truth information available, the performance of the MCMC method is first evaluated against a synthetic dataset. The results show that the model parameters can be well estimated using the proposed MCMC method. Then, the proposed model is applied to investigate the geographic variations of P. vivax incidences among all 18 towns in Tengchong, Yunnan province, China. Based on the geographic variations, the 18 towns can be further classify into five groups with similar socioeconomic causality for P. vivax incidences. Although this study focuses mainly on the transmission of P. vivax, the proposed space-time model is general and can readily be extended to investigate geographic variations of other diseases. Practically, such a computational model will offer new insights into active surveillance and strategic planning for disease surveillance and control.

  13. An MCMC determination of the primordial helium abundance

    NASA Astrophysics Data System (ADS)

    Aver, Erik; Olive, Keith A.; Skillman, Evan D.

    2012-04-01

    Spectroscopic observations of the chemical abundances in metal-poor H II regions provide an independent method for estimating the primordial helium abundance. H II regions are described by several physical parameters such as electron density, electron temperature, and reddening, in addition to y, the ratio of helium to hydrogen. It had been customary to estimate or determine self-consistently these parameters to calculate y. Frequentist analyses of the parameter space have been shown to be successful in these parameter determinations, and Markov Chain Monte Carlo (MCMC) techniques have proven to be very efficient in sampling this parameter space. Nevertheless, accurate determination of the primordial helium abundance from observations of H II regions is constrained by both systematic and statistical uncertainties. In an attempt to better reduce the latter, and continue to better characterize the former, we apply MCMC methods to the large dataset recently compiled by Izotov, Thuan, & Stasińska (2007). To improve the reliability of the determination, a high quality dataset is needed. In pursuit of this, a variety of cuts are explored. The efficacy of the He I λ4026 emission line as a constraint on the solutions is first examined, revealing the introduction of systematic bias through its absence. As a clear measure of the quality of the physical solution, a χ2 analysis proves instrumental in the selection of data compatible with the theoretical model. Nearly two-thirds of the observations fall outside a standard 95% confidence level cut, which highlights the care necessary in selecting systems and warrants further investigation into potential deficiencies of the model or data. In addition, the method also allows us to exclude systems for which parameter estimations are statistical outliers. As a result, the final selected dataset gains in reliability and exhibits improved consistency. Regression to zero metallicity yields Yp = 0.2534 ± 0.0083, in broad agreement with the WMAP result. The inclusion of more observations shows promise for further reducing the uncertainty, but more high quality spectra are required.

  14. Using Bayesian neural networks to classify forest scenes

    NASA Astrophysics Data System (ADS)

    Vehtari, Aki; Heikkonen, Jukka; Lampinen, Jouko; Juujarvi, Jouni

    1998-10-01

    We present results that compare the performance of Bayesian learning methods for neural networks on the task of classifying forest scenes into trees and background. Classification task is demanding due to the texture richness of the trees, occlusions of the forest scene objects and diverse lighting conditions under operation. This makes it difficult to determine which are optimal image features for the classification. A natural way to proceed is to extract many different types of potentially suitable features, and to evaluate their usefulness in later processing stages. One approach to cope with large number of features is to use Bayesian methods to control the model complexity. Bayesian learning uses a prior on model parameters, combines this with evidence from a training data, and the integrates over the resulting posterior to make predictions. With this method, we can use large networks and many features without fear of overfitting. For this classification task we compare two Bayesian learning methods for multi-layer perceptron (MLP) neural networks: (1) The evidence framework of MacKay uses a Gaussian approximation to the posterior weight distribution and maximizes with respect to hyperparameters. (2) In a Markov Chain Monte Carlo (MCMC) method due to Neal, the posterior distribution of the network parameters is numerically integrated using the MCMC method. As baseline classifiers for comparison we use (3) MLP early stop committee, (4) K-nearest-neighbor and (5) Classification And Regression Tree.

  15. Phylogeny, time divergence, and historical biogeography of the South American Liolaemus alticolor-bibronii group (Iguania: Liolaemidae)

    PubMed Central

    2018-01-01

    The genus Liolaemus comprises more than 260 species and can be divided in two subgenera: Eulaemus and Liolaemus sensu stricto. In this paper, we present a phylogenetic analysis, divergence times, and ancestral distribution ranges of the Liolaemus alticolor-bibronii group (Liolaemus sensu stricto subgenus). We inferred a total evidence phylogeny combining molecular (Cytb and 12S genes) and morphological characters using Maximum Parsimony and Bayesian Inference. Divergence times were calculated using Bayesian MCMC with an uncorrelated lognormal distributed relaxed clock, calibrated with a fossil record. Ancestral ranges were estimated using the Dispersal-Extinction-Cladogenesis (DEC-Lagrange). Effects of some a priori parameters of DEC were also tested. Distribution ranged from central Perú to southern Argentina, including areas at sea level up to the high Andes. The L. alticolor-bibronii group was recovered as monophyletic, formed by two clades: L. walkeri and L. gracilis, the latter can be split in two groups. Additionally, many species candidates were recognized. We estimate that the L. alticolor-bibronii group diversified 14.5 Myr ago, during the Middle Miocene. Our results suggest that the ancestor of the Liolaemus alticolor-bibronii group was distributed in a wide area including Patagonia and Puna highlands. The speciation pattern follows the South-North Diversification Hypothesis, following the Andean uplift. PMID:29479502

  16. Connecting CO intensity mapping to molecular gas and star formation in the epoch of galaxy assembly

    DOE PAGES

    Li, Tony Y.; Wechsler, Risa H.; Devaraj, Kiruthika; ...

    2016-01-29

    Intensity mapping, which images a single spectral line from unresolved galaxies across cosmological volumes, is a promising technique for probing the early universe. Here we present predictions for the intensity map and power spectrum of the CO(1–0) line from galaxies atmore » $$z\\sim 2.4$$–2.8, based on a parameterized model for the galaxy–halo connection, and demonstrate the extent to which properties of high-redshift galaxies can be directly inferred from such observations. We find that our fiducial prediction should be detectable by a realistic experiment. Motivated by significant modeling uncertainties, we demonstrate the effect on the power spectrum of varying each parameter in our model. Using simulated observations, we infer constraints on our model parameter space with an MCMC procedure, and show corresponding constraints on the $${L}_{\\mathrm{IR}}$$–$${L}_{\\mathrm{CO}}$$ relation and the CO luminosity function. These constraints would be complementary to current high-redshift galaxy observations, which can detect the brightest galaxies but not complete samples from the faint end of the luminosity function. Furthermore, by probing these populations in aggregate, CO intensity mapping could be a valuable tool for probing molecular gas and its relation to star formation in high-redshift galaxies.« less

  17. An adaptive Bayesian inference algorithm to estimate the parameters of a hazardous atmospheric release

    NASA Astrophysics Data System (ADS)

    Rajaona, Harizo; Septier, François; Armand, Patrick; Delignon, Yves; Olry, Christophe; Albergel, Armand; Moussafir, Jacques

    2015-12-01

    In the eventuality of an accidental or intentional atmospheric release, the reconstruction of the source term using measurements from a set of sensors is an important and challenging inverse problem. A rapid and accurate estimation of the source allows faster and more efficient action for first-response teams, in addition to providing better damage assessment. This paper presents a Bayesian probabilistic approach to estimate the location and the temporal emission profile of a pointwise source. The release rate is evaluated analytically by using a Gaussian assumption on its prior distribution, and is enhanced with a positivity constraint to improve the estimation. The source location is obtained by the means of an advanced iterative Monte-Carlo technique called Adaptive Multiple Importance Sampling (AMIS), which uses a recycling process at each iteration to accelerate its convergence. The proposed methodology is tested using synthetic and real concentration data in the framework of the Fusion Field Trials 2007 (FFT-07) experiment. The quality of the obtained results is comparable to those coming from the Markov Chain Monte Carlo (MCMC) algorithm, a popular Bayesian method used for source estimation. Moreover, the adaptive processing of the AMIS provides a better sampling efficiency by reusing all the generated samples.

  18. Bayesian semi-parametric analysis of Poisson change-point regression models: application to policy making in Cali, Colombia.

    PubMed

    Park, Taeyoung; Krafty, Robert T; Sánchez, Alvaro I

    2012-07-27

    A Poisson regression model with an offset assumes a constant baseline rate after accounting for measured covariates, which may lead to biased estimates of coefficients in an inhomogeneous Poisson process. To correctly estimate the effect of time-dependent covariates, we propose a Poisson change-point regression model with an offset that allows a time-varying baseline rate. When the nonconstant pattern of a log baseline rate is modeled with a nonparametric step function, the resulting semi-parametric model involves a model component of varying dimension and thus requires a sophisticated varying-dimensional inference to obtain correct estimates of model parameters of fixed dimension. To fit the proposed varying-dimensional model, we devise a state-of-the-art MCMC-type algorithm based on partial collapse. The proposed model and methods are used to investigate an association between daily homicide rates in Cali, Colombia and policies that restrict the hours during which the legal sale of alcoholic beverages is permitted. While simultaneously identifying the latent changes in the baseline homicide rate which correspond to the incidence of sociopolitical events, we explore the effect of policies governing the sale of alcohol on homicide rates and seek a policy that balances the economic and cultural dependencies on alcohol sales to the health of the public.

  19. Volatility modeling for IDR exchange rate through APARCH model with student-t distribution

    NASA Astrophysics Data System (ADS)

    Nugroho, Didit Budi; Susanto, Bambang

    2017-08-01

    The aim of this study is to empirically investigate the performance of APARCH(1,1) volatility model with the Student-t error distribution on five foreign currency selling rates to Indonesian rupiah (IDR), including the Swiss franc (CHF), the Euro (EUR), the British pound (GBP), Japanese yen (JPY), and the US dollar (USD). Six years daily closing rates over the period of January 2010 to December 2016 for a total number of 1722 observations have analysed. The Bayesian inference using the efficient independence chain Metropolis-Hastings and adaptive random walk Metropolis methods in the Markov chain Monte Carlo (MCMC) scheme has been applied to estimate the parameters of model. According to the DIC criterion, this study has found that the APARCH(1,1) model under Student-t distribution is a better fit than the model under normal distribution for any observed rate return series. The 95% highest posterior density interval suggested the APARCH models to model the IDR/JPY and IDR/USD volatilities. In particular, the IDR/JPY and IDR/USD data, respectively, have significant negative and positive leverage effect in the rate returns. Meanwhile, the optimal power coefficient of volatility has been found to be statistically different from 2 in adopting all rate return series, save the IDR/EUR rate return series.

  20. Accelerating Fibre Orientation Estimation from Diffusion Weighted Magnetic Resonance Imaging Using GPUs

    PubMed Central

    Hernández, Moisés; Guerrero, Ginés D.; Cecilia, José M.; García, José M.; Inuggi, Alberto; Jbabdi, Saad; Behrens, Timothy E. J.; Sotiropoulos, Stamatios N.

    2013-01-01

    With the performance of central processing units (CPUs) having effectively reached a limit, parallel processing offers an alternative for applications with high computational demands. Modern graphics processing units (GPUs) are massively parallel processors that can execute simultaneously thousands of light-weight processes. In this study, we propose and implement a parallel GPU-based design of a popular method that is used for the analysis of brain magnetic resonance imaging (MRI). More specifically, we are concerned with a model-based approach for extracting tissue structural information from diffusion-weighted (DW) MRI data. DW-MRI offers, through tractography approaches, the only way to study brain structural connectivity, non-invasively and in-vivo. We parallelise the Bayesian inference framework for the ball & stick model, as it is implemented in the tractography toolbox of the popular FSL software package (University of Oxford). For our implementation, we utilise the Compute Unified Device Architecture (CUDA) programming model. We show that the parameter estimation, performed through Markov Chain Monte Carlo (MCMC), is accelerated by at least two orders of magnitude, when comparing a single GPU with the respective sequential single-core CPU version. We also illustrate similar speed-up factors (up to 120x) when comparing a multi-GPU with a multi-CPU implementation. PMID:23658616

  1. A Bayesian approach to the modelling of α Cen A

    NASA Astrophysics Data System (ADS)

    Bazot, M.; Bourguignon, S.; Christensen-Dalsgaard, J.

    2012-12-01

    Determining the physical characteristics of a star is an inverse problem consisting of estimating the parameters of models for the stellar structure and evolution, and knowing certain observable quantities. We use a Bayesian approach to solve this problem for α Cen A, which allows us to incorporate prior information on the parameters to be estimated, in order to better constrain the problem. Our strategy is based on the use of a Markov chain Monte Carlo (MCMC) algorithm to estimate the posterior probability densities of the stellar parameters: mass, age, initial chemical composition, etc. We use the stellar evolutionary code ASTEC to model the star. To constrain this model both seismic and non-seismic observations were considered. Several different strategies were tested to fit these values, using either two free parameters or five free parameters in ASTEC. We are thus able to show evidence that MCMC methods become efficient with respect to more classical grid-based strategies when the number of parameters increases. The results of our MCMC algorithm allow us to derive estimates for the stellar parameters and robust uncertainties thanks to the statistical analysis of the posterior probability densities. We are also able to compute odds for the presence of a convective core in α Cen A. When using core-sensitive seismic observational constraints, these can rise above ˜40 per cent. The comparison of results to previous studies also indicates that these seismic constraints are of critical importance for our knowledge of the structure of this star.

  2. Joint simulation of regional areas burned in Canadian forest fires: A Markov Chain Monte Carlo approach

    Treesearch

    Steen Magnussen

    2009-01-01

    Areas burned annually in 29 Canadian forest fire regions show a patchy and irregular correlation structure that significantly influences the distribution of annual totals for Canada and for groups of regions. A binary Monte Carlo Markov Chain (MCMC) is constructed for the purpose of joint simulation of regional areas burned in forest fires. For each year the MCMC...

  3. Modeling the Hyperdistribution of Item Parameters To Improve the Accuracy of Recovery in Estimation Procedures.

    ERIC Educational Resources Information Center

    Matthews-Lopez, Joy L.; Hombo, Catherine M.

    The purpose of this study was to examine the recovery of item parameters in simulated Automatic Item Generation (AIG) conditions, using Markov chain Monte Carlo (MCMC) estimation methods to attempt to recover the generating distributions. To do this, variability in item and ability parameters was manipulated. Realistic AIG conditions were…

  4. Marginal Maximum A Posteriori Item Parameter Estimation for the Generalized Graded Unfolding Model

    ERIC Educational Resources Information Center

    Roberts, James S.; Thompson, Vanessa M.

    2011-01-01

    A marginal maximum a posteriori (MMAP) procedure was implemented to estimate item parameters in the generalized graded unfolding model (GGUM). Estimates from the MMAP method were compared with those derived from marginal maximum likelihood (MML) and Markov chain Monte Carlo (MCMC) procedures in a recovery simulation that varied sample size,…

  5. Hierarchical multistage MCMC follow-up of continuous gravitational wave candidates

    NASA Astrophysics Data System (ADS)

    Ashton, G.; Prix, R.

    2018-05-01

    Leveraging Markov chain Monte Carlo optimization of the F statistic, we introduce a method for the hierarchical follow-up of continuous gravitational wave candidates identified by wide-parameter space semicoherent searches. We demonstrate parameter estimation for continuous wave sources and develop a framework and tools to understand and control the effective size of the parameter space, critical to the success of the method. Monte Carlo tests of simulated signals in noise demonstrate that this method is close to the theoretical optimal performance.

  6. An example of complex modelling in dentistry using Markov chain Monte Carlo (MCMC) simulation.

    PubMed

    Helfenstein, Ulrich; Menghini, Giorgio; Steiner, Marcel; Murati, Francesca

    2002-09-01

    In the usual regression setting one regression line is computed for a whole data set. In a more complex situation, each person may be observed for example at several points in time and thus a regression line might be calculated for each person. Additional complexities, such as various forms of errors in covariables may make a straightforward statistical evaluation difficult or even impossible. During recent years methods have been developed allowing convenient analysis of problems where the data and the corresponding models show these and many other forms of complexity. The methodology makes use of a Bayesian approach and Markov chain Monte Carlo (MCMC) simulations. The methods allow the construction of increasingly elaborate models by building them up from local sub-models. The essential structure of the models can be represented visually by directed acyclic graphs (DAG). This attractive property allows communication and discussion of the essential structure and the substantial meaning of a complex model without needing algebra. After presentation of the statistical methods an example from dentistry is presented in order to demonstrate their application and use. The dataset of the example had a complex structure; each of a set of children was followed up over several years. The number of new fillings in permanent teeth had been recorded at several ages. The dependent variables were markedly different from the normal distribution and could not be transformed to normality. In addition, explanatory variables were assumed to be measured with different forms of error. Illustration of how the corresponding models can be estimated conveniently via MCMC simulation, in particular, 'Gibbs sampling', using the freely available software BUGS is presented. In addition, how the measurement error may influence the estimates of the corresponding coefficients is explored. It is demonstrated that the effect of the independent variable on the dependent variable may be markedly underestimated if the measurement error is not taken into account ('regression dilution bias'). Markov chain Monte Carlo methods may be of great value to dentists in allowing analysis of data sets which exhibit a wide range of different forms of complexity.

  7. Estimation of interplate coupling along Nankai trough considering the block motion model based on onland GNSS and seafloor GPS/A observation data using MCMC method

    NASA Astrophysics Data System (ADS)

    Kimura, H.; Ito, T.; Tadokoro, K.

    2017-12-01

    Introduction In southwest Japan, Philippine sea plate is subducting under the overriding plate such as Amurian plate, and mega interplate earthquakes has occurred at about 100 years interval. There is no occurrence of mega interplate earthquakes in southwest Japan, although it has passed about 70 years since the last mega interplate earthquakes: 1944 and 1946 along Nankai trough, meaning that the strain has been accumulated at plate interface. Therefore, it is essential to reveal the interplate coupling more precisely for predicting or understanding the mechanism of next occurring mega interplate earthquake. Recently, seafloor geodetic observation revealed the detailed interplate coupling distribution in expected source region of Nankai trough earthquake (e.g., Yokota et al. [2016]). In this study, we estimated interplate coupling in southwest Japan, considering block motion model and using seafloor geodetic observation data as well as onland GNSS observation data, based on Markov Chain Monte Carlo (MCMC) method. Method Observed crustal deformation is assumed that sum of rigid block motion and elastic deformation due to coupling at block boundaries. We modeled this relationship as a non-linear inverse problem that the unknown parameters are Euler pole of each block and coupling at each subfault, and solved them simultaneously based on MCMC method. Input data we used in this study are 863 onland GNSS observation data and 24 seafloor GPS/A observation data. We made some block division models based on the map of active fault tracing and selected the best model based on Akaike's Information Criterion (AIC): that is consist of 12 blocks. Result We find that the interplate coupling along Nankai trough has heterogeneous spatial distribution, strong at the depth of 0 to 20km at off Tokai region, and 0 to 30km at off Shikoku region. Moreover, we find that observed crustal deformation at off Tokai region is well explained by elastic deformation due to subducting Izu Micro Plate. We will present more details of our result, and discuss about not only interplate coupling but also rigid block motion, elastic deformation due to inland fault coupling, and resolution of estimated parameters.

  8. Phylogeny and biogeography of the amphi-Pacific genus Aphananthe

    PubMed Central

    Yang, Mei-Qing; Li, De-Zhu; Wen, Jun; Yi, Ting-Shuang

    2017-01-01

    Aphananthe is a small genus of five species showing an intriguing amphi-Pacific distribution in eastern, southern and southeastern Asia, Australia, and Mexico, also with one species in Madagascar. The phylogenetic relationships of Aphananthe were reconstructed with two nuclear (ITS & ETS) and two plastid (psbA-trnH & trnL-trnF) regions. Clade divergence times were estimated with a Bayesian approach, and the ancestral areas were inferred using the dispersal-extinction-cladogenesis and Bayesian Binary MCMC analyses. Aphananthe was supported to be monophyletic, with the eastern Asian A. aspera resolved as sister to a clade of the remaining four species. Aphananthe was inferred to have originated in the Late Cretaceous (71.5 mya, with 95% HPD: 66.6–81.3 mya), and the crown age of the genus was dated to be in the early Miocene (19.1 mya, with 95% HPD: 12.4–28.9 mya). The fossil record indicates that Aphananthe was present in the high latitude thermophilic forests in the early Tertiary, and experienced extinctions from the middle Tertiary onwards. Aphananthe originated in Europe based on the inference that included fossil and extant species, but eastern Asia was estimated to be the ancestral area of the clade of the extant species of Aphananthe. Both the West Gondwanan vicariance hypothesis and the boreotropics hypothesis could be excluded as explanation for its amphi-Pacific distribution. Long-distance dispersals out of eastern Asia into North America, southern and southeastern Asia and Australia, and Madagascar during the Miocene account for its wide intercontinental disjunct distribution. PMID:28170425

  9. Enhancing hydrologic data assimilation by evolutionary Particle Filter and Markov Chain Monte Carlo

    NASA Astrophysics Data System (ADS)

    Abbaszadeh, Peyman; Moradkhani, Hamid; Yan, Hongxiang

    2018-01-01

    Particle Filters (PFs) have received increasing attention by researchers from different disciplines including the hydro-geosciences, as an effective tool to improve model predictions in nonlinear and non-Gaussian dynamical systems. The implication of dual state and parameter estimation using the PFs in hydrology has evolved since 2005 from the PF-SIR (sampling importance resampling) to PF-MCMC (Markov Chain Monte Carlo), and now to the most effective and robust framework through evolutionary PF approach based on Genetic Algorithm (GA) and MCMC, the so-called EPFM. In this framework, the prior distribution undergoes an evolutionary process based on the designed mutation and crossover operators of GA. The merit of this approach is that the particles move to an appropriate position by using the GA optimization and then the number of effective particles is increased by means of MCMC, whereby the particle degeneracy is avoided and the particle diversity is improved. In this study, the usefulness and effectiveness of the proposed EPFM is investigated by applying the technique on a conceptual and highly nonlinear hydrologic model over four river basins located in different climate and geographical regions of the United States. Both synthetic and real case studies demonstrate that the EPFM improves both the state and parameter estimation more effectively and reliably as compared with the PF-MCMC.

  10. Comparing hierarchical models via the marginalized deviance information criterion.

    PubMed

    Quintero, Adrian; Lesaffre, Emmanuel

    2018-07-20

    Hierarchical models are extensively used in pharmacokinetics and longitudinal studies. When the estimation is performed from a Bayesian approach, model comparison is often based on the deviance information criterion (DIC). In hierarchical models with latent variables, there are several versions of this statistic: the conditional DIC (cDIC) that incorporates the latent variables in the focus of the analysis and the marginalized DIC (mDIC) that integrates them out. Regardless of the asymptotic and coherency difficulties of cDIC, this alternative is usually used in Markov chain Monte Carlo (MCMC) methods for hierarchical models because of practical convenience. The mDIC criterion is more appropriate in most cases but requires integration of the likelihood, which is computationally demanding and not implemented in Bayesian software. Therefore, we consider a method to compute mDIC by generating replicate samples of the latent variables that need to be integrated out. This alternative can be easily conducted from the MCMC output of Bayesian packages and is widely applicable to hierarchical models in general. Additionally, we propose some approximations in order to reduce the computational complexity for large-sample situations. The method is illustrated with simulated data sets and 2 medical studies, evidencing that cDIC may be misleading whilst mDIC appears pertinent. Copyright © 2018 John Wiley & Sons, Ltd.

  11. A modelling approach to assessing the timescale uncertainties in proxy series with chronological errors

    NASA Astrophysics Data System (ADS)

    Divine, D.; Godtliebsen, F.; Rue, H.

    2012-04-01

    Detailed knowledge of past climate variations is of high importance for gaining a better insight into the possible future climate scenarios. The relative shortness of available high quality instrumental climate data conditions the use of various climate proxy archives in making inference about past climate evolution. It, however, requires an accurate assessment of timescale errors in proxy-based paleoclimatic reconstructions. We here propose an approach to assessment of timescale errors in proxy-based series with chronological uncertainties. The method relies on approximation of the physical process(es) forming a proxy archive by a random Gamma process. Parameters of the process are partly data-driven and partly determined from prior assumptions. For a particular case of a linear accumulation model and absolutely dated tie points an analytical solution is found suggesting the Beta-distributed probability density on age estimates along the length of a proxy archive. In a general situation of uncertainties in the ages of the tie points the proposed method employs MCMC simulations of age-depth profiles yielding empirical confidence intervals on the constructed piecewise linear best guess timescale. It is suggested that the approach can be further extended to a more general case of a time-varying expected accumulation between the tie points. The approach is illustrated by using two ice and two lake/marine sediment cores representing the typical examples of paleoproxy archives with age models constructed using tie points of mixed origin.

  12. Preparation of biocompatible magnetite-carboxymethyl cellulose nanocomposite: characterization of nanocomposite by FTIR, XRD, FESEM and TEM.

    PubMed

    Habibi, Neda

    2014-10-15

    The preparation and characterization of magnetite-carboxymethyl cellulose nano-composite (M-CMC) material is described. Magnetite nano-particles were synthesized by a modified co-precipitation method using ferrous chloride tetrahydrate and ferric chloride hexahydrate in ammonium hydroxide solution. The M-CMC nano-composite particles were synthesized by embedding the magnetite nanoparticles inside carboxymethyl cellulose (CMC) using a freshly prepared mixture of Fe3O4 with CMC precursor. Morphology, particle size, and structural properties of magnetite-carboxymethyl cellulose nano-composite was accomplished using X-ray powder diffraction (XRD), transmission electron microscopy (TEM), Fourier transformed infrared (FTIR) and field emission scanning electron microscopy (FESEM) analysis. As a result, magnetite nano-particles with an average size of 35nm were obtained. The biocompatible Fe3O4-carboxymethyl cellulose nano-composite particles obtained from the natural CMC polymers have a potential range of application in biomedical field. Copyright © 2014 Elsevier B.V. All rights reserved.

  13. Asteroid mass estimation using Markov-chain Monte Carlo

    NASA Astrophysics Data System (ADS)

    Siltala, Lauri; Granvik, Mikael

    2017-11-01

    Estimates for asteroid masses are based on their gravitational perturbations on the orbits of other objects such as Mars, spacecraft, or other asteroids and/or their satellites. In the case of asteroid-asteroid perturbations, this leads to an inverse problem in at least 13 dimensions where the aim is to derive the mass of the perturbing asteroid(s) and six orbital elements for both the perturbing asteroid(s) and the test asteroid(s) based on astrometric observations. We have developed and implemented three different mass estimation algorithms utilizing asteroid-asteroid perturbations: the very rough 'marching' approximation, in which the asteroids' orbital elements are not fitted, thereby reducing the problem to a one-dimensional estimation of the mass, an implementation of the Nelder-Mead simplex method, and most significantly, a Markov-chain Monte Carlo (MCMC) approach. We describe each of these algorithms with particular focus on the MCMC algorithm, and present example results using both synthetic and real data. Our results agree with the published mass estimates, but suggest that the published uncertainties may be misleading as a consequence of using linearized mass-estimation methods. Finally, we discuss remaining challenges with the algorithms as well as future plans.

  14. New Biogeographic insight into Bauhinia s.l. (Leguminosae): integration from fossil records and molecular analyses

    PubMed Central

    2014-01-01

    Background Given that most species that have ever existed on earth are extinct, it stands to reason that the evolutionary history can be better understood with fossil taxa. Bauhinia is a typical genus of pantropical intercontinental disjunction among the Asian, African, and American continents. Geographic distribution patterns are better recognized when fossil records and molecular sequences are combined in the analyses. Here, we describe a new macrofossil species of Bauhinia from the Upper Miocene Xiaolongtan Formation in Wenshan County, Southeast Yunnan, China, and elucidate the biogeographic significance through the analyses of molecules and fossils. Results Morphometric analysis demonstrates that the leaf shapes of B. acuminata, B. championii, B. chalcophylla, B. purpurea, and B. podopetala closely resemble the leaf shapes of the new finding fossil. Phylogenetic relationships among the Bauhinia species were reconstructed using maximum parsimony and Bayesian inference, which inferred that species in Bauhinia species are well-resolved into three main groups. Divergence times were estimated by the Bayesian Markov chain Monte Carlo (MCMC) method under a relaxed clock, and inferred that the stem diversification time of Bauhinia was ca. 62.7 Ma. The Asian lineage first diverged at ca. 59.8 Ma, followed by divergence of the Africa lineage starting during the late Eocene, whereas that of the neotropical lineage starting during the middle Miocene. Conclusions Hypotheses relying on vicariance or continental history to explain pantropical disjunct distributions are dismissed because they require mostly Palaeogene and older tectonic events. We suggest that Bauhinia originated in the middle Paleocene in Laurasia, probably in Asia, implying a possible Tethys Seaway origin or an “Out of Tropical Asia”, and dispersal of legumes. Its present pantropical disjunction resulted from disruption of the boreotropical flora by climatic cooling after the Paleocene-Eocene Thermal Maximum (PETM). North Atlantic land bridges (NALB) seem the most plausible route for migration of Bauhinia from Asia to America; and additional aspects of the Bauhinia species distribution are explained by migration and long distance dispersal (LDD) from Eurasia to the African and American continents. PMID:25288346

  15. New Biogeographic insight into Bauhinia s.l. (Leguminosae): integration from fossil records and molecular analyses.

    PubMed

    Meng, Hong-Hu; Jacques, Frédéric Mb; Su, Tao; Huang, Yong-Jiang; Zhang, Shi-Tao; Ma, Hong-Jie; Zhou, Zhe-Kun

    2014-08-10

    Given that most species that have ever existed on earth are extinct, it stands to reason that the evolutionary history can be better understood with fossil taxa. Bauhinia is a typical genus of pantropical intercontinental disjunction among the Asian, African, and American continents. Geographic distribution patterns are better recognized when fossil records and molecular sequences are combined in the analyses. Here, we describe a new macrofossil species of Bauhinia from the Upper Miocene Xiaolongtan Formation in Wenshan County, Southeast Yunnan, China, and elucidate the biogeographic significance through the analyses of molecules and fossils. Morphometric analysis demonstrates that the leaf shapes of B. acuminata, B. championii, B. chalcophylla, B. purpurea, and B. podopetala closely resemble the leaf shapes of the new finding fossil. Phylogenetic relationships among the Bauhinia species were reconstructed using maximum parsimony and Bayesian inference, which inferred that species in Bauhinia species are well-resolved into three main groups. Divergence times were estimated by the Bayesian Markov chain Monte Carlo (MCMC) method under a relaxed clock, and inferred that the stem diversification time of Bauhinia was ca. 62.7 Ma. The Asian lineage first diverged at ca. 59.8 Ma, followed by divergence of the Africa lineage starting during the late Eocene, whereas that of the neotropical lineage starting during the middle Miocene. Hypotheses relying on vicariance or continental history to explain pantropical disjunct distributions are dismissed because they require mostly Palaeogene and older tectonic events. We suggest that Bauhinia originated in the middle Paleocene in Laurasia, probably in Asia, implying a possible Tethys Seaway origin or an "Out of Tropical Asia", and dispersal of legumes. Its present pantropical disjunction resulted from disruption of the boreotropical flora by climatic cooling after the Paleocene-Eocene Thermal Maximum (PETM). North Atlantic land bridges (NALB) seem the most plausible route for migration of Bauhinia from Asia to America; and additional aspects of the Bauhinia species distribution are explained by migration and long distance dispersal (LDD) from Eurasia to the African and American continents.

  16. Learning Weight Uncertainty with Stochastic Gradient MCMC for Shape Classification

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Chunyuan; Stevens, Andrew J.; Chen, Changyou

    2016-08-10

    Learning the representation of shape cues in 2D & 3D objects for recognition is a fundamental task in computer vision. Deep neural networks (DNNs) have shown promising performance on this task. Due to the large variability of shapes, accurate recognition relies on good estimates of model uncertainty, ignored in traditional training of DNNs, typically learned via stochastic optimization. This paper leverages recent advances in stochastic gradient Markov Chain Monte Carlo (SG-MCMC) to learn weight uncertainty in DNNs. It yields principled Bayesian interpretations for the commonly used Dropout/DropConnect techniques and incorporates them into the SG-MCMC framework. Extensive experiments on 2D &more » 3D shape datasets and various DNN models demonstrate the superiority of the proposed approach over stochastic optimization. Our approach yields higher recognition accuracy when used in conjunction with Dropout and Batch-Normalization.« less

  17. Lotic ecosystem response to chronic metal contamination assessed by the resazurin-resorufin smart tracer with data assimilation by the Markov chain Monte Carlo method

    NASA Astrophysics Data System (ADS)

    Stanaway, D. J.; Flores, A. N.; Haggerty, R.; Benner, S. G.; Feris, K. P.

    2011-12-01

    Concurrent assessment of biogeochemical and solute transport data (i.e. advection, dispersion, transient storage) within lotic systems remains a challenge in eco-hydrological research. Recently, the Resazurin-Resorufin Smart Tracer System (RRST) was proposed as a mechanism to measure microbial activity at the sediment-water interface [Haggerty et al., 2008, 2009] associating metabolic and hydrologic processes and allowing for the reach scale extrapolation of biotic function in the context of a dynamic physical environment. This study presents a Markov Chain Monte Carlo (MCMC) data assimilation technique to solve the inverse model of the Raz Rru Advection Dispersion Equation (RRADE). The RRADE is a suite of dependent 1-D reactive ADEs, associated through the microbially mediated reduction of Raz to Rru (k12). This reduction is proportional to DO consumption (R^2=0.928). MCMC is a suite of algorithms that solve Bayes theorem to condition uncertain model states and parameters on imperfect observations. Here, the RRST is employed to quantify the effect of chronic metal exposure on hyporheic microbial metabolism along a 100+ year old metal contamination gradient in the Clark Fork River (CF). We hypothesized that 1) the energetic cost of metal tolerance limits heterotrophic microbial respiration in communities evolved in chronic metal contaminated environments, with respiration inhibition directly correlated to degree of contamination (observational experiment) and 2) when experiencing acute metal stress, respiration rate inhibition of metal tolerant communities is less than that of naïve communities (manipulative experiment). To test these hypotheses, 4 replicate columns containing sediment collected from differently contaminated CF reaches and reference sites were fed a solution of RRST, NaCl, and cadmium (manipulative experiment only) within 24 hrs post collection. Column effluent was collected and measured for Raz, Rru, and EC to determine the Raz Rru breakthrough curves (BTC), subsequently modeled by the RRADE and thereby allowing derivation of in situ rates of metabolism. RRADE parameter values are estimated through Metropolis Hastings MCMC optimization. Unknown prior parameter distributions (PD) were constrained via a sensitivity analysis, except for the empirically estimated velocity. MCMC simulations were initiated at random points within the PD. Convergence of target distributions (TD) is achieved when the variance of the mode values of the six RRADE parameters in independent model replication is at least 10^{-3} less than the mode value. Convergence of k12, the parameter of interest, was more resolved, with modal variance of replicate simulations ranging from 10^{-4} less than the modal value to 0. The MCMC algorithm presented here offers a robust approach to solve the inverse RRST model and could be easily adapted to other inverse problems.

  18. Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations.

    PubMed

    Neuwald, Andrew F; Altschul, Stephen F

    2016-12-01

    Over evolutionary time, members of a superfamily of homologous proteins sharing a common structural core diverge into subgroups filling various functional niches. At the sequence level, such divergence appears as correlations that arise from residue patterns distinct to each subgroup. Such a superfamily may be viewed as a population of sequences corresponding to a complex, high-dimensional probability distribution. Here we model this distribution as hierarchical interrelated hidden Markov models (hiHMMs), which describe these sequence correlations implicitly. By characterizing such correlations one may hope to obtain information regarding functionally-relevant properties that have thus far evaded detection. To do so, we infer a hiHMM distribution from sequence data using Bayes' theorem and Markov chain Monte Carlo (MCMC) sampling, which is widely recognized as the most effective approach for characterizing a complex, high dimensional distribution. Other routines then map correlated residue patterns to available structures with a view to hypothesis generation. When applied to N-acetyltransferases, this reveals sequence and structural features indicative of functionally important, yet generally unknown biochemical properties. Even for sets of proteins for which nothing is known beyond unannotated sequences and structures, this can lead to helpful insights. We describe, for example, a putative coenzyme-A-induced-fit substrate binding mechanism mediated by arginine residue switching between salt bridge and π-π stacking interactions. A suite of programs implementing this approach is available (psed.igs.umaryland.edu).

  19. Time-varying nonstationary multivariate risk analysis using a dynamic Bayesian copula

    NASA Astrophysics Data System (ADS)

    Sarhadi, Ali; Burn, Donald H.; Concepción Ausín, María.; Wiper, Michael P.

    2016-03-01

    A time-varying risk analysis is proposed for an adaptive design framework in nonstationary conditions arising from climate change. A Bayesian, dynamic conditional copula is developed for modeling the time-varying dependence structure between mixed continuous and discrete multiattributes of multidimensional hydrometeorological phenomena. Joint Bayesian inference is carried out to fit the marginals and copula in an illustrative example using an adaptive, Gibbs Markov Chain Monte Carlo (MCMC) sampler. Posterior mean estimates and credible intervals are provided for the model parameters and the Deviance Information Criterion (DIC) is used to select the model that best captures different forms of nonstationarity over time. This study also introduces a fully Bayesian, time-varying joint return period for multivariate time-dependent risk analysis in nonstationary environments. The results demonstrate that the nature and the risk of extreme-climate multidimensional processes are changed over time under the impact of climate change, and accordingly the long-term decision making strategies should be updated based on the anomalies of the nonstationary environment.

  20. Phylogenetic analysis and victim contact tracing of rabies virus from humans and dogs in Bali, Indonesia.

    PubMed

    Mahardika, G N K; Dibia, N; Budayanti, N S; Susilawathi, N M; Subrata, K; Darwinata, A E; Wignall, F S; Richt, J A; Valdivia-Granda, W A; Sudewi, A A R

    2014-06-01

    The emergence of human and animal rabies in Bali since November 2008 has attracted local, national and international interest. The potential origin and time of introduction of rabies virus to Bali is described. The nucleoprotein (N) gene of rabies virus from dog brain and human clinical specimens was sequenced using an automated DNA sequencer. Phylogenetic inference with Bayesian Markov Chain Monte Carlo (MCMC) analysis using the Bayesian Evolutionary Analysis by Sampling Trees (BEAST) v. 1.7.5 software confirmed that the outbreak of rabies in Bali was caused by an Indonesian lineage virus following a single introduction. The ancestor of Bali viruses was the descendant of a virus from Kalimantan. Contact tracing showed that the event most likely occurred in early 2008. The introduction of rabies into a large unvaccinated dog population in Bali clearly demonstrates the risk of disease transmission for government agencies and should lead to an increased preparedness and efforts for sustained risk reduction to prevent such events from occurring in future.

  1. Extreme-Scale Bayesian Inference for Uncertainty Quantification of Complex Simulations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Biros, George

    Uncertainty quantification (UQ)—that is, quantifying uncertainties in complex mathematical models and their large-scale computational implementations—is widely viewed as one of the outstanding challenges facing the field of CS&E over the coming decade. The EUREKA project set to address the most difficult class of UQ problems: those for which both the underlying PDE model as well as the uncertain parameters are of extreme scale. In the project we worked on these extreme-scale challenges in the following four areas: 1. Scalable parallel algorithms for sampling and characterizing the posterior distribution that exploit the structure of the underlying PDEs and parameter-to-observable map. Thesemore » include structure-exploiting versions of the randomized maximum likelihood method, which aims to overcome the intractability of employing conventional MCMC methods for solving extreme-scale Bayesian inversion problems by appealing to and adapting ideas from large-scale PDE-constrained optimization, which have been very successful at exploring high-dimensional spaces. 2. Scalable parallel algorithms for construction of prior and likelihood functions based on learning methods and non-parametric density estimation. Constructing problem-specific priors remains a critical challenge in Bayesian inference, and more so in high dimensions. Another challenge is construction of likelihood functions that capture unmodeled couplings between observations and parameters. We will create parallel algorithms for non-parametric density estimation using high dimensional N-body methods and combine them with supervised learning techniques for the construction of priors and likelihood functions. 3. Bayesian inadequacy models, which augment physics models with stochastic models that represent their imperfections. The success of the Bayesian inference framework depends on the ability to represent the uncertainty due to imperfections of the mathematical model of the phenomena of interest. This is a central challenge in UQ, especially for large-scale models. We propose to develop the mathematical tools to address these challenges in the context of extreme-scale problems. 4. Parallel scalable algorithms for Bayesian optimal experimental design (OED). Bayesian inversion yields quantified uncertainties in the model parameters, which can be propagated forward through the model to yield uncertainty in outputs of interest. This opens the way for designing new experiments to reduce the uncertainties in the model parameters and model predictions. Such experimental design problems have been intractable for large-scale problems using conventional methods; we will create OED algorithms that exploit the structure of the PDE model and the parameter-to-output map to overcome these challenges. Parallel algorithms for these four problems were created, analyzed, prototyped, implemented, tuned, and scaled up for leading-edge supercomputers, including UT-Austin’s own 10 petaflops Stampede system, ANL’s Mira system, and ORNL’s Titan system. While our focus is on fundamental mathematical/computational methods and algorithms, we will assess our methods on model problems derived from several DOE mission applications, including multiscale mechanics and ice sheet dynamics.« less

  2. Safety performance of traffic phases and phase transitions in three phase traffic theory.

    PubMed

    Xu, Chengcheng; Liu, Pan; Wang, Wei; Li, Zhibin

    2015-12-01

    Crash risk prediction models were developed to link safety to various phases and phase transitions defined by the three phase traffic theory. Results of the Bayesian conditional logit analysis showed that different traffic states differed distinctly with respect to safety performance. The random-parameter logit approach was utilized to account for the heterogeneity caused by unobserved factors. The Bayesian inference approach based on the Markov Chain Monte Carlo (MCMC) method was used for the estimation of the random-parameter logit model. The proposed approach increased the prediction performance of the crash risk models as compared with the conventional logit model. The three phase traffic theory can help us better understand the mechanism of crash occurrences in various traffic states. The contributing factors to crash likelihood can be well explained by the mechanism of phase transitions. We further discovered that the free flow state can be divided into two sub-phases on the basis of safety performance, including a true free flow state in which the interactions between vehicles are minor, and a platooned traffic state in which bunched vehicles travel in successions. The results of this study suggest that a safety perspective can be added to the three phase traffic theory. The results also suggest that the heterogeneity between different traffic states should be considered when estimating the risks of crash occurrences on freeways. Copyright © 2015 Elsevier Ltd. All rights reserved.

  3. On the validity of cosmological Fisher matrix forecasts

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wolz, Laura; Kilbinger, Martin; Weller, Jochen

    2012-09-01

    We present a comparison of Fisher matrix forecasts for cosmological probes with Monte Carlo Markov Chain (MCMC) posterior likelihood estimation methods. We analyse the performance of future Dark Energy Task Force (DETF) stage-III and stage-IV dark-energy surveys using supernovae, baryon acoustic oscillations and weak lensing as probes. We concentrate in particular on the dark-energy equation of state parameters w{sub 0} and w{sub a}. For purely geometrical probes, and especially when marginalising over w{sub a}, we find considerable disagreement between the two methods, since in this case the Fisher matrix can not reproduce the highly non-elliptical shape of the likelihood function.more » More quantitatively, the Fisher method underestimates the marginalized errors for purely geometrical probes between 30%-70%. For cases including structure formation such as weak lensing, we find that the posterior probability contours from the Fisher matrix estimation are in good agreement with the MCMC contours and the forecasted errors only changing on the 5% level. We then explore non-linear transformations resulting in physically-motivated parameters and investigate whether these parameterisations exhibit a Gaussian behaviour. We conclude that for the purely geometrical probes and, more generally, in cases where it is not known whether the likelihood is close to Gaussian, the Fisher matrix is not the appropriate tool to produce reliable forecasts.« less

  4. Iterative Importance Sampling Algorithms for Parameter Estimation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Grout, Ray W; Morzfeld, Matthias; Day, Marcus S.

    In parameter estimation problems one computes a posterior distribution over uncertain parameters defined jointly by a prior distribution, a model, and noisy data. Markov chain Monte Carlo (MCMC) is often used for the numerical solution of such problems. An alternative to MCMC is importance sampling, which can exhibit near perfect scaling with the number of cores on high performance computing systems because samples are drawn independently. However, finding a suitable proposal distribution is a challenging task. Several sampling algorithms have been proposed over the past years that take an iterative approach to constructing a proposal distribution. We investigate the applicabilitymore » of such algorithms by applying them to two realistic and challenging test problems, one in subsurface flow, and one in combustion modeling. More specifically, we implement importance sampling algorithms that iterate over the mean and covariance matrix of Gaussian or multivariate t-proposal distributions. Our implementation leverages massively parallel computers, and we present strategies to initialize the iterations using 'coarse' MCMC runs or Gaussian mixture models.« less

  5. Towards a formal genealogical classification of the Lezgian languages (North Caucasus): testing various phylogenetic methods on lexical data.

    PubMed

    Kassian, Alexei

    2015-01-01

    A lexicostatistical classification is proposed for 20 languages and dialects of the Lezgian group of the North Caucasian family, based on meticulously compiled 110-item wordlists, published as part of the Global Lexicostatistical Database project. The lexical data have been subsequently analyzed with the aid of the principal phylogenetic methods, both distance-based and character-based: Starling neighbor joining (StarlingNJ), Neighbor joining (NJ), Unweighted pair group method with arithmetic mean (UPGMA), Bayesian Markov chain Monte Carlo (MCMC), Unweighted maximum parsimony (UMP). Cognation indexes within the input matrix were marked by two different algorithms: traditional etymological approach and phonetic similarity, i.e., the automatic method of consonant classes (Levenshtein distances). Due to certain reasons (first of all, high lexicographic quality of the wordlists and a consensus about the Lezgian phylogeny among Caucasologists), the Lezgian database is a perfect testing area for appraisal of phylogenetic methods. For the etymology-based input matrix, all the phylogenetic methods, with the possible exception of UMP, have yielded trees that are sufficiently compatible with each other to generate a consensus phylogenetic tree of the Lezgian lects. The obtained consensus tree agrees with the traditional expert classification as well as some of the previously proposed formal classifications of this linguistic group. Contrary to theoretical expectations, the UMP method has suggested the least plausible tree of all. In the case of the phonetic similarity-based input matrix, the distance-based methods (StarlingNJ, NJ, UPGMA) have produced the trees that are rather close to the consensus etymology-based tree and the traditional expert classification, whereas the character-based methods (Bayesian MCMC, UMP) have yielded less likely topologies.

  6. Towards a Formal Genealogical Classification of the Lezgian Languages (North Caucasus): Testing Various Phylogenetic Methods on Lexical Data

    PubMed Central

    Kassian, Alexei

    2015-01-01

    A lexicostatistical classification is proposed for 20 languages and dialects of the Lezgian group of the North Caucasian family, based on meticulously compiled 110-item wordlists, published as part of the Global Lexicostatistical Database project. The lexical data have been subsequently analyzed with the aid of the principal phylogenetic methods, both distance-based and character-based: Starling neighbor joining (StarlingNJ), Neighbor joining (NJ), Unweighted pair group method with arithmetic mean (UPGMA), Bayesian Markov chain Monte Carlo (MCMC), Unweighted maximum parsimony (UMP). Cognation indexes within the input matrix were marked by two different algorithms: traditional etymological approach and phonetic similarity, i.e., the automatic method of consonant classes (Levenshtein distances). Due to certain reasons (first of all, high lexicographic quality of the wordlists and a consensus about the Lezgian phylogeny among Caucasologists), the Lezgian database is a perfect testing area for appraisal of phylogenetic methods. For the etymology-based input matrix, all the phylogenetic methods, with the possible exception of UMP, have yielded trees that are sufficiently compatible with each other to generate a consensus phylogenetic tree of the Lezgian lects. The obtained consensus tree agrees with the traditional expert classification as well as some of the previously proposed formal classifications of this linguistic group. Contrary to theoretical expectations, the UMP method has suggested the least plausible tree of all. In the case of the phonetic similarity-based input matrix, the distance-based methods (StarlingNJ, NJ, UPGMA) have produced the trees that are rather close to the consensus etymology-based tree and the traditional expert classification, whereas the character-based methods (Bayesian MCMC, UMP) have yielded less likely topologies. PMID:25719456

  7. Single neuron modeling and data assimilation in BNST neurons

    NASA Astrophysics Data System (ADS)

    Farsian, Reza

    Neurons, although tiny in size, are vastly complicated systems, which are responsible for the most basic yet essential functions of any nervous system. Even the most simple models of single neurons are usually high dimensional, nonlinear, and contain many parameters and states which are unobservable in a typical neurophysiological experiment. One of the most fundamental problems in experimental neurophysiology is the estimation of these parameters and states, since knowing their values is essential in identification, model construction, and forward prediction of biological neurons. Common methods of parameter and state estimation do not perform well for neural models due to their high dimensionality and nonlinearity. In this dissertation, two alternative approaches for parameters and state estimation of biological neurons have been demonstrated: dynamical parameter estimation (DPE) and a Markov Chain Monte Carlo (MCMC) method. The first method uses elements of chaos control and synchronization theory for parameter and state estimation. MCMC is a statistical approach which uses a path integral formulation to evaluate a mean and an error bound for these unobserved parameters and states. These methods have been applied to biological system of neurons in Bed Nucleus of Stria Termialis neurons (BNST) of rats. State and parameters of neurons in both systems were estimated, and their value were used for recreating a realistic model and predicting the behavior of the neurons successfully. The knowledge of biological parameters can ultimately provide a better understanding of the internal dynamics of a neuron in order to build robust models of neuron networks.

  8. LATTE Linking Acoustic Tests and Tagging Using Statistical Estimation

    DTIC Science & Technology

    2015-09-30

    the complexity of the model: (from simplest to most complex) Kalman filter , Markov chain Monte-Carlo (MCMC) and ABC. Many of these methods have been...using SMMs fitted using Kalman filters . Therefore, using the DTAG data, we can estimate the distributions associated with 2D horizontal displacement...speed (a key problem in the previous Kalman filter implementation). This new approach also allows the animal’s horizontal movement direction to differ

  9. Asteroid mass estimation using Markov-Chain Monte Carlo techniques

    NASA Astrophysics Data System (ADS)

    Siltala, Lauri; Granvik, Mikael

    2016-10-01

    Estimates for asteroid masses are based on their gravitational perturbations on the orbits of other objects such as Mars, spacecraft, or other asteroids and/or their satellites. In the case of asteroid-asteroid perturbations, this leads to a 13-dimensional inverse problem where the aim is to derive the mass of the perturbing asteroid and six orbital elements for both the perturbing asteroid and the test asteroid using astrometric observations. We have developed and implemented three different mass estimation algorithms utilizing asteroid-asteroid perturbations into the OpenOrb asteroid-orbit-computation software: the very rough 'marching' approximation, in which the asteroid orbits are fixed at a given epoch, reducing the problem to a one-dimensional estimation of the mass, an implementation of the Nelder-Mead simplex method, and most significantly, a Markov-Chain Monte Carlo (MCMC) approach. We will introduce each of these algorithms with particular focus on the MCMC algorithm, and present example results for both synthetic and real data. Our results agree with the published mass estimates, but suggest that the published uncertainties may be misleading as a consequence of using linearized mass-estimation methods. Finally, we discuss remaining challenges with the algorithms as well as future plans, particularly in connection with ESA's Gaia mission.

  10. Efficient estimation of ideal-observer performance in classification tasks involving high-dimensional complex backgrounds

    PubMed Central

    Park, Subok; Clarkson, Eric

    2010-01-01

    The Bayesian ideal observer is optimal among all observers and sets an absolute upper bound for the performance of any observer in classification tasks [Van Trees, Detection, Estimation, and Modulation Theory, Part I (Academic, 1968).]. Therefore, the ideal observer should be used for objective image quality assessment whenever possible. However, computation of ideal-observer performance is difficult in practice because this observer requires the full description of unknown, statistical properties of high-dimensional, complex data arising in real life problems. Previously, Markov-chain Monte Carlo (MCMC) methods were developed by Kupinski et al. [J. Opt. Soc. Am. A 20, 430(2003) ] and by Park et al. [J. Opt. Soc. Am. A 24, B136 (2007) and IEEE Trans. Med. Imaging 28, 657 (2009) ] to estimate the performance of the ideal observer and the channelized ideal observer (CIO), respectively, in classification tasks involving non-Gaussian random backgrounds. However, both algorithms had the disadvantage of long computation times. We propose a fast MCMC for real-time estimation of the likelihood ratio for the CIO. Our simulation results show that our method has the potential to speed up ideal-observer performance in tasks involving complex data when efficient channels are used for the CIO. PMID:19884916

  11. Hierarchical models and the analysis of bird survey information

    USGS Publications Warehouse

    Sauer, J.R.; Link, W.A.

    2003-01-01

    Management of birds often requires analysis of collections of estimates. We describe a hierarchical modeling approach to the analysis of these data, in which parameters associated with the individual species estimates are treated as random variables, and probability statements are made about the species parameters conditioned on the data. A Markov-Chain Monte Carlo (MCMC) procedure is used to fit the hierarchical model. This approach is computer intensive, and is based upon simulation. MCMC allows for estimation both of parameters and of derived statistics. To illustrate the application of this method, we use the case in which we are interested in attributes of a collection of estimates of population change. Using data for 28 species of grassland-breeding birds from the North American Breeding Bird Survey, we estimate the number of species with increasing populations, provide precision-adjusted rankings of species trends, and describe a measure of population stability as the probability that the trend for a species is within a certain interval. Hierarchical models can be applied to a variety of bird survey applications, and we are investigating their use in estimation of population change from survey data.

  12. HMO selection and Medicare costs: Bayesian MCMC estimation of a robust panel data tobit model with survival.

    PubMed

    Hamilton, B H

    1999-08-01

    The fraction of US Medicare recipients enrolled in health maintenance organizations (HMOs) has increased substantially over the past 10 years. However, the impact of HMOs on health care costs is still hotly debated. In particular, it is argued that HMOs achieve cost reduction through 'cream-skimming' and enrolling relatively healthy patients. This paper develops a Bayesian panel data tobit model of HMO selection and Medicare expenditures for recent US retirees that accounts for mortality over the course of the panel. The model is estimated using Markov Chain Monte Carlo (MCMC) simulation methods, and is novel in that a multivariate t-link is used in place of normality to allow for the heavy-tailed distributions often found in health care expenditure data. The findings indicate that HMOs select individuals who are less likely to have positive health care expenditures prior to enrollment. However, there is no evidence that HMOs disenrol high cost patients. The results also indicate the importance of accounting for survival over the panel, since high mortality probabilities are associated with higher health care expenditures in the last year of life.

  13. Diversity, Relationships, and Biogeography of the Lambeosaurine Dinosaurs from the European Archipelago, with Description of the New Aralosaurin Canardia garonnensis

    PubMed Central

    Prieto-Márquez, Albert; Dalla Vecchia, Fabio M.; Gaete, Rodrigo; Galobart, Àngel

    2013-01-01

    We provide a thorough re-evaluation of the taxonomic diversity, phylogenetic relationships, and historical biogeography of the lambeosaurine hadrosaurids from the European Archipelago. Previously published occurrences of European Lambeosaurinae are reviewed and new specimens collected from upper Maastrichtian strata of the south-central Pyrenees are described. No support is found for the recognition of European saurolophines in the available hadrosaurid materials recovered so far from this area. A new genus and species of basal lambeosaurine, Canardia garonnensis, is described on the basis of cranial and appendicular elements collected from upper Maastrichtian strata of southern France. C. garonnensis differs from all other hadrosaurids, except Aralosaurus tuberiferus, in having maxilla with prominent subrectangular rostrodorsal flange; it differs from A. tuberiferus in a few maxillary and prefrontal characters. Together with A. tuberiferus, C. garonnensis integrates the newly recognized tribe Aralosaurini. Inference of lambeosaurine interrelationships via maximum parsimony analysis indicates that the other three known European lambeosaurines are representatives of two additional subclades (tribes) of these hadrosaurids: Tsintaosaurini (Pararhabdodon isonensis) and Lambeosaurini (the Arenysaurus ardevoli-Blasisaurus canudoi clade). The tribes Aralosaurini, Tsintaosaurini, Lambeosaurini, and Parasaurolophini are formally defined and diagnosed for the first time. Three event-based quantitative methods of ancestral range reconstruction were implemented to infer the historical biogeography of European lambeosaurines: Dispersal-Vicariance Analysis, Bayesian Binary MCMC, and Dispersal-Extinction-Cladogenesis. The results of these analyses, coupled with the absence of pre-Maastrichtian lambeosaurines in the Mesozoic vertebrate fossil record of Europe, favor the hypothesis that aralosaurins and tsintaosaurins were Asian immigrants that reached the Ibero-Armorican island via dispersal events sometime during the Maastrichtian. Less conclusive is the biogeographical history of European lambeosaurins; several scenarios, occurring sometime during the Maastrichtian, are possible, from vicariance leading to the splitting of Asian or North American from European ranges to a dispersal event from North America to the European Archipelago. PMID:23922815

  14. Evaluating impacts using a BACI design, ratios, and a Bayesian approach with a focus on restoration.

    PubMed

    Conner, Mary M; Saunders, W Carl; Bouwes, Nicolaas; Jordan, Chris

    2015-10-01

    Before-after-control-impact (BACI) designs are an effective method to evaluate natural and human-induced perturbations on ecological variables when treatment sites cannot be randomly chosen. While effect sizes of interest can be tested with frequentist methods, using Bayesian Markov chain Monte Carlo (MCMC) sampling methods, probabilities of effect sizes, such as a ≥20 % increase in density after restoration, can be directly estimated. Although BACI and Bayesian methods are used widely for assessing natural and human-induced impacts for field experiments, the application of hierarchal Bayesian modeling with MCMC sampling to BACI designs is less common. Here, we combine these approaches and extend the typical presentation of results with an easy to interpret ratio, which provides an answer to the main study question-"How much impact did a management action or natural perturbation have?" As an example of this approach, we evaluate the impact of a restoration project, which implemented beaver dam analogs, on survival and density of juvenile steelhead. Results indicated the probabilities of a ≥30 % increase were high for survival and density after the dams were installed, 0.88 and 0.99, respectively, while probabilities for a higher increase of ≥50 % were variable, 0.17 and 0.82, respectively. This approach demonstrates a useful extension of Bayesian methods that can easily be generalized to other study designs from simple (e.g., single factor ANOVA, paired t test) to more complicated block designs (e.g., crossover, split-plot). This approach is valuable for estimating the probabilities of restoration impacts or other management actions.

  15. Assessing Mediational Models: Testing and Interval Estimation for Indirect Effects.

    PubMed

    Biesanz, Jeremy C; Falk, Carl F; Savalei, Victoria

    2010-08-06

    Theoretical models specifying indirect or mediated effects are common in the social sciences. An indirect effect exists when an independent variable's influence on the dependent variable is mediated through an intervening variable. Classic approaches to assessing such mediational hypotheses ( Baron & Kenny, 1986 ; Sobel, 1982 ) have in recent years been supplemented by computationally intensive methods such as bootstrapping, the distribution of the product methods, and hierarchical Bayesian Markov chain Monte Carlo (MCMC) methods. These different approaches for assessing mediation are illustrated using data from Dunn, Biesanz, Human, and Finn (2007). However, little is known about how these methods perform relative to each other, particularly in more challenging situations, such as with data that are incomplete and/or nonnormal. This article presents an extensive Monte Carlo simulation evaluating a host of approaches for assessing mediation. We examine Type I error rates, power, and coverage. We study normal and nonnormal data as well as complete and incomplete data. In addition, we adapt a method, recently proposed in statistical literature, that does not rely on confidence intervals (CIs) to test the null hypothesis of no indirect effect. The results suggest that the new inferential method-the partial posterior p value-slightly outperforms existing ones in terms of maintaining Type I error rates while maximizing power, especially with incomplete data. Among confidence interval approaches, the bias-corrected accelerated (BC a ) bootstrapping approach often has inflated Type I error rates and inconsistent coverage and is not recommended; In contrast, the bootstrapped percentile confidence interval and the hierarchical Bayesian MCMC method perform best overall, maintaining Type I error rates, exhibiting reasonable power, and producing stable and accurate coverage rates.

  16. Epidemic Reconstruction in a Phylogenetics Framework: Transmission Trees as Partitions of the Node Set

    PubMed Central

    Hall, Matthew; Woolhouse, Mark; Rambaut, Andrew

    2015-01-01

    The use of genetic data to reconstruct the transmission tree of infectious disease epidemics and outbreaks has been the subject of an increasing number of studies, but previous approaches have usually either made assumptions that are not fully compatible with phylogenetic inference, or, where they have based inference on a phylogeny, have employed a procedure that requires this tree to be fixed. At the same time, the coalescent-based models of the pathogen population that are employed in the methods usually used for time-resolved phylogeny reconstruction are a considerable simplification of epidemic process, as they assume that pathogen lineages mix freely. Here, we contribute a new method that is simultaneously a phylogeny reconstruction method for isolates taken from an epidemic, and a procedure for transmission tree reconstruction. We observe that, if one or more samples is taken from each host in an epidemic or outbreak and these are used to build a phylogeny, a transmission tree is equivalent to a partition of the set of nodes of this phylogeny, such that each partition element is a set of nodes that is connected in the full tree and contains all the tips corresponding to samples taken from one and only one host. We then implement a Monte Carlo Markov Chain (MCMC) procedure for simultaneous sampling from the spaces of both trees, utilising a newly-designed set of phylogenetic tree proposals that also respect node partitions. We calculate the posterior probability of these partitioned trees based on a model that acknowledges the population structure of an epidemic by employing an individual-based disease transmission model and a coalescent process taking place within each host. We demonstrate our method, first using simulated data, and then with sequences taken from the H7N7 avian influenza outbreak that occurred in the Netherlands in 2003. We show that it is superior to established coalescent methods for reconstructing the topology and node heights of the phylogeny and performs well for transmission tree reconstruction when the phylogeny is well-resolved by the genetic data, but caution that this will often not be the case in practice and that existing genetic and epidemiological data should be used to configure such analyses whenever possible. This method is available for use by the research community as part of BEAST, one of the most widely-used packages for reconstruction of dated phylogenies. PMID:26717515

  17. Regression without truth with Markov chain Monte-Carlo

    NASA Astrophysics Data System (ADS)

    Madan, Hennadii; Pernuš, Franjo; Likar, Boštjan; Å piclin, Žiga

    2017-03-01

    Regression without truth (RWT) is a statistical technique for estimating error model parameters of each method in a group of methods used for measurement of a certain quantity. A very attractive aspect of RWT is that it does not rely on a reference method or "gold standard" data, which is otherwise difficult RWT was used for a reference-free performance comparison of several methods for measuring left ventricular ejection fraction (EF), i.e. a percentage of blood leaving the ventricle each time the heart contracts, and has since been applied for various other quantitative imaging biomarkerss (QIBs). Herein, we show how Markov chain Monte-Carlo (MCMC), a computational technique for drawing samples from a statistical distribution with probability density function known only up to a normalizing coefficient, can be used to augment RWT to gain a number of important benefits compared to the original approach based on iterative optimization. For instance, the proposed MCMC-based RWT enables the estimation of joint posterior distribution of the parameters of the error model, straightforward quantification of uncertainty of the estimates, estimation of true value of the measurand and corresponding credible intervals (CIs), does not require a finite support for prior distribution of the measureand generally has a much improved robustness against convergence to non-global maxima. The proposed approach is validated using synthetic data that emulate the EF data for 45 patients measured with 8 different methods. The obtained results show that 90% CI of the corresponding parameter estimates contain the true values of all error model parameters and the measurand. A potential real-world application is to take measurements of a certain QIB several different methods and then use the proposed framework to compute the estimates of the true values and their uncertainty, a vital information for diagnosis based on QIB.

  18. NIMROD: a program for inference via a normal approximation of the posterior in models with random effects based on ordinary differential equations.

    PubMed

    Prague, Mélanie; Commenges, Daniel; Guedj, Jérémie; Drylewicz, Julia; Thiébaut, Rodolphe

    2013-08-01

    Models based on ordinary differential equations (ODE) are widespread tools for describing dynamical systems. In biomedical sciences, data from each subject can be sparse making difficult to precisely estimate individual parameters by standard non-linear regression but information can often be gained from between-subjects variability. This makes natural the use of mixed-effects models to estimate population parameters. Although the maximum likelihood approach is a valuable option, identifiability issues favour Bayesian approaches which can incorporate prior knowledge in a flexible way. However, the combination of difficulties coming from the ODE system and from the presence of random effects raises a major numerical challenge. Computations can be simplified by making a normal approximation of the posterior to find the maximum of the posterior distribution (MAP). Here we present the NIMROD program (normal approximation inference in models with random effects based on ordinary differential equations) devoted to the MAP estimation in ODE models. We describe the specific implemented features such as convergence criteria and an approximation of the leave-one-out cross-validation to assess the model quality of fit. In pharmacokinetics models, first, we evaluate the properties of this algorithm and compare it with FOCE and MCMC algorithms in simulations. Then, we illustrate NIMROD use on Amprenavir pharmacokinetics data from the PUZZLE clinical trial in HIV infected patients. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  19. Application of the Markov Chain Monte Carlo method for snow water equivalent retrieval based on passive microwave measurements

    NASA Astrophysics Data System (ADS)

    Pan, J.; Durand, M. T.; Vanderjagt, B. J.

    2015-12-01

    Markov Chain Monte Carlo (MCMC) method is a retrieval algorithm based on Bayes' rule, which starts from an initial state of snow/soil parameters, and updates it to a series of new states by comparing the posterior probability of simulated snow microwave signals before and after each time of random walk. It is a realization of the Bayes' rule, which gives an approximation to the probability of the snow/soil parameters in condition of the measured microwave TB signals at different bands. Although this method could solve all snow parameters including depth, density, snow grain size and temperature at the same time, it still needs prior information of these parameters for posterior probability calculation. How the priors will influence the SWE retrieval is a big concern. Therefore, in this paper at first, a sensitivity test will be carried out to study how accurate the snow emission models and how explicit the snow priors need to be to maintain the SWE error within certain amount. The synthetic TB simulated from the measured snow properties plus a 2-K observation error will be used for this purpose. It aims to provide a guidance on the MCMC application under different circumstances. Later, the method will be used for the snowpits at different sites, including Sodankyla, Finland, Churchill, Canada and Colorado, USA, using the measured TB from ground-based radiometers at different bands. Based on the previous work, the error in these practical cases will be studied, and the error sources will be separated and quantified.

  20. Approximate likelihood calculation on a phylogeny for Bayesian estimation of divergence times.

    PubMed

    dos Reis, Mario; Yang, Ziheng

    2011-07-01

    The molecular clock provides a powerful way to estimate species divergence times. If information on some species divergence times is available from the fossil or geological record, it can be used to calibrate a phylogeny and estimate divergence times for all nodes in the tree. The Bayesian method provides a natural framework to incorporate different sources of information concerning divergence times, such as information in the fossil and molecular data. Current models of sequence evolution are intractable in a Bayesian setting, and Markov chain Monte Carlo (MCMC) is used to generate the posterior distribution of divergence times and evolutionary rates. This method is computationally expensive, as it involves the repeated calculation of the likelihood function. Here, we explore the use of Taylor expansion to approximate the likelihood during MCMC iteration. The approximation is much faster than conventional likelihood calculation. However, the approximation is expected to be poor when the proposed parameters are far from the likelihood peak. We explore the use of parameter transforms (square root, logarithm, and arcsine) to improve the approximation to the likelihood curve. We found that the new methods, particularly the arcsine-based transform, provided very good approximations under relaxed clock models and also under the global clock model when the global clock is not seriously violated. The approximation is poorer for analysis under the global clock when the global clock is seriously wrong and should thus not be used. The results suggest that the approximate method may be useful for Bayesian dating analysis using large data sets.

  1. Inverse models of plate coupling and mantle rheology: Towards a direct link between large-scale mantle flow and mega thrust earthquakes

    NASA Astrophysics Data System (ADS)

    Gurnis, M.; Ratnaswamy, V.; Stadler, G.; Rudi, J.; Liu, X.; Ghattas, O.

    2017-12-01

    We are developing high-resolution inverse models for plate motions and mantle flow to recover the degree of mechanical coupling between plates and the non-linear and plastic parameters governing viscous flow within the lithosphere and mantle. We have developed adjoint versions of the Stokes equations with fully non-linear viscosity with a cost function that measures the fit with plate motions and with regional constrains on effective upper mantle viscosity (from post-glacial rebound and post seismic relaxation). In our earlier work, we demonstrate that when the temperature field is known, the strength of plate boundaries, the yield stress and strain rate exponent in the upper mantle are recoverable. As the plate boundary coupling drops below a threshold, the uncertainty of the inferred parameters increases due to insensitivity of plate motion to plate coupling. Comparing the trade-offs between inferred rheological parameters found from a Gaussian approximation of the parameter distribution and from MCMC sampling, we found that the Gaussian approximation—which is significantly cheaper to compute—is often a good approximation. We have extended our earlier method such that we can recover normal and shear stresses within the zones determining the interface between subducting and over-riding plates determined through seismic constraints (using the Slab1.0 model). We find that those subduction zones with low seismic coupling correspond with low inferred values of mechanical coupling. By fitting plate motion data in the optimization scheme, we find that Tonga and the Marianas have the lowest values of mechanical coupling while Chile and Sumatra the highest, among the subduction zones we have studies. Moreover, because of the nature of the high-resolution adjoint models, the subduction zones with the lowest coupling have back-arc extension. Globally we find that the non-linear stress-strain exponent, n, is about 3.0 +/- 0.25 (in the upper mantle and lithosphere) and a pressure-independent yield stress is 150 +/- 25 MPa. The stress in the shear zones is just tens of MPa, and in preliminary models, we find that both the shear and the normal stresses are elevated in the coupled compared to the uncoupled subduction zones.

  2. Improving the Fitness of High-Dimensional Biomechanical Models via Data-Driven Stochastic Exploration

    PubMed Central

    Bustamante, Carlos D.; Valero-Cuevas, Francisco J.

    2010-01-01

    The field of complex biomechanical modeling has begun to rely on Monte Carlo techniques to investigate the effects of parameter variability and measurement uncertainty on model outputs, search for optimal parameter combinations, and define model limitations. However, advanced stochastic methods to perform data-driven explorations, such as Markov chain Monte Carlo (MCMC), become necessary as the number of model parameters increases. Here, we demonstrate the feasibility and, what to our knowledge is, the first use of an MCMC approach to improve the fitness of realistically large biomechanical models. We used a Metropolis–Hastings algorithm to search increasingly complex parameter landscapes (3, 8, 24, and 36 dimensions) to uncover underlying distributions of anatomical parameters of a “truth model” of the human thumb on the basis of simulated kinematic data (thumbnail location, orientation, and linear and angular velocities) polluted by zero-mean, uncorrelated multivariate Gaussian “measurement noise.” Driven by these data, ten Markov chains searched each model parameter space for the subspace that best fit the data (posterior distribution). As expected, the convergence time increased, more local minima were found, and marginal distributions broadened as the parameter space complexity increased. In the 36-D scenario, some chains found local minima but the majority of chains converged to the true posterior distribution (confirmed using a cross-validation dataset), thus demonstrating the feasibility and utility of these methods for realistically large biomechanical problems. PMID:19272906

  3. User's guide for mapIMG 3--Map image re-projection software package

    USGS Publications Warehouse

    Finn, Michael P.; Mattli, David M.

    2012-01-01

    Version 0.0 (1995), Dan Steinwand, U.S. Geological Survey (USGS)/Earth Resources Observation Systems (EROS) Data Center (EDC)--Version 0.0 was a command line version for UNIX that required four arguments: the input metadata, the output metadata, the input data file, and the output destination path. Version 1.0 (2003), Stephen Posch and Michael P. Finn, USGS/Mid-Continent Mapping Center (MCMC--Version 1.0 added a GUI interface that was built using the Qt library for cross platform development. Version 1.01 (2004), Jason Trent and Michael P. Finn, USGS/MCMC--Version 1.01 suggested bounds for the parameters of each projection. Support was added for larger input files, storage of the last used input and output folders, and for TIFF/ GeoTIFF input images. Version 2.0 (2005), Robert Buehler, Jason Trent, and Michael P. Finn, USGS/National Geospatial Technical Operations Center (NGTOC)--Version 2.0 added Resampling Methods (Mean, Mode, Min, Max, and Sum), updated the GUI design, and added the viewer/pre-viewer. The metadata style was changed to XML and was switched to a new naming convention. Version 3.0 (2009), David Mattli and Michael P. Finn, USGS/Center of Excellence for Geospatial Information Science (CEGIS)--Version 3.0 brings optimized resampling methods, an updated GUI, support for less than global datasets, UTM support and the whole codebase was ported to Qt4.

  4. No control genes required: Bayesian analysis of qRT-PCR data.

    PubMed

    Matz, Mikhail V; Wright, Rachel M; Scott, James G

    2013-01-01

    Model-based analysis of data from quantitative reverse-transcription PCR (qRT-PCR) is potentially more powerful and versatile than traditional methods. Yet existing model-based approaches cannot properly deal with the higher sampling variances associated with low-abundant targets, nor do they provide a natural way to incorporate assumptions about the stability of control genes directly into the model-fitting process. In our method, raw qPCR data are represented as molecule counts, and described using generalized linear mixed models under Poisson-lognormal error. A Markov Chain Monte Carlo (MCMC) algorithm is used to sample from the joint posterior distribution over all model parameters, thereby estimating the effects of all experimental factors on the expression of every gene. The Poisson-based model allows for the correct specification of the mean-variance relationship of the PCR amplification process, and can also glean information from instances of no amplification (zero counts). Our method is very flexible with respect to control genes: any prior knowledge about the expected degree of their stability can be directly incorporated into the model. Yet the method provides sensible answers without such assumptions, or even in the complete absence of control genes. We also present a natural Bayesian analogue of the "classic" analysis, which uses standard data pre-processing steps (logarithmic transformation and multi-gene normalization) but estimates all gene expression changes jointly within a single model. The new methods are considerably more flexible and powerful than the standard delta-delta Ct analysis based on pairwise t-tests. Our methodology expands the applicability of the relative-quantification analysis protocol all the way to the lowest-abundance targets, and provides a novel opportunity to analyze qRT-PCR data without making any assumptions concerning target stability. These procedures have been implemented as the MCMC.qpcr package in R.

  5. Forward and Inverse Modeling of Self-potential. A Tomography of Groundwater Flow and Comparison Between Deterministic and Stochastic Inversion Methods

    NASA Astrophysics Data System (ADS)

    Quintero-Chavarria, E.; Ochoa Gutierrez, L. H.

    2016-12-01

    Applications of the Self-potential Method in the fields of Hydrogeology and Environmental Sciences have had significant developments during the last two decades with a strong use on groundwater flows identification. Although only few authors deal with the forward problem's solution -especially in geophysics literature- different inversion procedures are currently being developed but in most cases they are compared with unconventional groundwater velocity fields and restricted to structured meshes. This research solves the forward problem based on the finite element method using the St. Venant's Principle to transform a point dipole, which is the field generated by a single vector, into a distribution of electrical monopoles. Then, two simple aquifer models were generated with specific boundary conditions and head potentials, velocity fields and electric potentials in the medium were computed. With the model's surface electric potential, the inverse problem is solved to retrieve the source of electric potential (vector field associated to groundwater flow) using deterministic and stochastic approaches. The first approach was carried out by implementing a Tikhonov regularization with a stabilized operator adapted to the finite element mesh while for the second a hierarchical Bayesian model based on Markov chain Monte Carlo (McMC) and Markov Random Fields (MRF) was constructed. For all implemented methods, the result between the direct and inverse models was contrasted in two ways: 1) shape and distribution of the vector field, and 2) magnitude's histogram. Finally, it was concluded that inversion procedures are improved when the velocity field's behavior is considered, thus, the deterministic method is more suitable for unconfined aquifers than confined ones. McMC has restricted applications and requires a lot of information (particularly in potentials fields) while MRF has a remarkable response especially when dealing with confined aquifers.

  6. Time-varying Concurrent Risk of Extreme Droughts and Heatwaves in California

    NASA Astrophysics Data System (ADS)

    Sarhadi, A.; Diffenbaugh, N. S.; Ausin, M. C.

    2016-12-01

    Anthropogenic global warming has changed the nature and the risk of extreme climate phenomena such as droughts and heatwaves. The concurrent of these nature-changing climatic extremes may result in intensifying undesirable consequences in terms of human health and destructive effects in water resources. The present study assesses the risk of concurrent extreme droughts and heatwaves under dynamic nonstationary conditions arising from climate change in California. For doing so, a generalized fully Bayesian time-varying multivariate risk framework is proposed evolving through time under dynamic human-induced environment. In this methodology, an extreme, Bayesian, dynamic copula (Gumbel) is developed to model the time-varying dependence structure between the two different climate extremes. The time-varying extreme marginals are previously modeled using a Generalized Extreme Value (GEV) distribution. Bayesian Markov Chain Monte Carlo (MCMC) inference is integrated to estimate parameters of the nonstationary marginals and copula using a Gibbs sampling method. Modelled marginals and copula are then used to develop a fully Bayesian, time-varying joint return period concept for the estimation of concurrent risk. Here we argue that climate change has increased the chance of concurrent droughts and heatwaves over decades in California. It is also demonstrated that a time-varying multivariate perspective should be incorporated to assess realistic concurrent risk of the extremes for water resources planning and management in a changing climate in this area. The proposed generalized methodology can be applied for other stochastic nature-changing compound climate extremes that are under the influence of climate change.

  7. Bayesian structured additive regression modeling of epidemic data: application to cholera

    PubMed Central

    2012-01-01

    Background A significant interest in spatial epidemiology lies in identifying associated risk factors which enhances the risk of infection. Most studies, however, make no, or limited use of the spatial structure of the data, as well as possible nonlinear effects of the risk factors. Methods We develop a Bayesian Structured Additive Regression model for cholera epidemic data. Model estimation and inference is based on fully Bayesian approach via Markov Chain Monte Carlo (MCMC) simulations. The model is applied to cholera epidemic data in the Kumasi Metropolis, Ghana. Proximity to refuse dumps, density of refuse dumps, and proximity to potential cholera reservoirs were modeled as continuous functions; presence of slum settlers and population density were modeled as fixed effects, whereas spatial references to the communities were modeled as structured and unstructured spatial effects. Results We observe that the risk of cholera is associated with slum settlements and high population density. The risk of cholera is equal and lower for communities with fewer refuse dumps, but variable and higher for communities with more refuse dumps. The risk is also lower for communities distant from refuse dumps and potential cholera reservoirs. The results also indicate distinct spatial variation in the risk of cholera infection. Conclusion The study highlights the usefulness of Bayesian semi-parametric regression model analyzing public health data. These findings could serve as novel information to help health planners and policy makers in making effective decisions to control or prevent cholera epidemics. PMID:22866662

  8. The behavior of Metropolis-coupled Markov chains when sampling rugged phylogenetic distributions.

    PubMed

    Brown, Jeremy M; Thomson, Robert C

    2018-02-15

    Bayesian phylogenetic inference involves sampling from posterior distributions of trees, which sometimes exhibit local optima, or peaks, separated by regions of low posterior density. Markov chain Monte Carlo (MCMC) algorithms are the most widely used numerical method for generating samples from these posterior distributions, but they are susceptible to entrapment on individual optima in rugged distributions when they are unable to easily cross through or jump across regions of low posterior density. Ruggedness of posterior distributions can result from a variety of factors, including unmodeled variation in evolutionary processes and unrecognized variation in the true topology across sites or genes. Ruggedness can also become exaggerated when constraints are placed on topologies that require the presence or absence of particular bipartitions (often referred to as positive or negative constraints, respectively). These types of constraints are frequently employed when conducting tests of topological hypotheses (Bergsten et al. 2013; Brown and Thomson 2017). Negative constraints can lead to particularly rugged distributions when the data strongly support a forbidden clade, because monophyly of the clade can be disrupted by inserting outgroup taxa in many different ways. However, topological moves between the alternative disruptions are very difficult, because they require swaps between the inserted outgroup taxa while the data constrain taxa from the forbidden clade to remain close together on the tree. While this precise form of ruggedness is particular to negative constraints, trees with high posterior density can be separated by similarly complicated topological rearrangements, even in the absence of constraints.

  9. Conditional Spectral Analysis of Replicated Multiple Time Series with Application to Nocturnal Physiology.

    PubMed

    Krafty, Robert T; Rosen, Ori; Stoffer, David S; Buysse, Daniel J; Hall, Martica H

    2017-01-01

    This article considers the problem of analyzing associations between power spectra of multiple time series and cross-sectional outcomes when data are observed from multiple subjects. The motivating application comes from sleep medicine, where researchers are able to non-invasively record physiological time series signals during sleep. The frequency patterns of these signals, which can be quantified through the power spectrum, contain interpretable information about biological processes. An important problem in sleep research is drawing connections between power spectra of time series signals and clinical characteristics; these connections are key to understanding biological pathways through which sleep affects, and can be treated to improve, health. Such analyses are challenging as they must overcome the complicated structure of a power spectrum from multiple time series as a complex positive-definite matrix-valued function. This article proposes a new approach to such analyses based on a tensor-product spline model of Cholesky components of outcome-dependent power spectra. The approach exibly models power spectra as nonparametric functions of frequency and outcome while preserving geometric constraints. Formulated in a fully Bayesian framework, a Whittle likelihood based Markov chain Monte Carlo (MCMC) algorithm is developed for automated model fitting and for conducting inference on associations between outcomes and spectral measures. The method is used to analyze data from a study of sleep in older adults and uncovers new insights into how stress and arousal are connected to the amount of time one spends in bed.

  10. Atmospheric Tracer Inverse Modeling Using Markov Chain Monte Carlo (MCMC)

    NASA Astrophysics Data System (ADS)

    Kasibhatla, P.

    2004-12-01

    In recent years, there has been an increasing emphasis on the use of Bayesian statistical estimation techniques to characterize the temporal and spatial variability of atmospheric trace gas sources and sinks. The applications have been varied in terms of the particular species of interest, as well as in terms of the spatial and temporal resolution of the estimated fluxes. However, one common characteristic has been the use of relatively simple statistical models for describing the measurement and chemical transport model error statistics and prior source statistics. For example, multivariate normal probability distribution functions (pdfs) are commonly used to model these quantities and inverse source estimates are derived for fixed values of pdf paramaters. While the advantage of this approach is that closed form analytical solutions for the a posteriori pdfs of interest are available, it is worth exploring Bayesian analysis approaches which allow for a more general treatment of error and prior source statistics. Here, we present an application of the Markov Chain Monte Carlo (MCMC) methodology to an atmospheric tracer inversion problem to demonstrate how more gereral statistical models for errors can be incorporated into the analysis in a relatively straightforward manner. The MCMC approach to Bayesian analysis, which has found wide application in a variety of fields, is a statistical simulation approach that involves computing moments of interest of the a posteriori pdf by efficiently sampling this pdf. The specific inverse problem that we focus on is the annual mean CO2 source/sink estimation problem considered by the TransCom3 project. TransCom3 was a collaborative effort involving various modeling groups and followed a common modeling and analysis protocoal. As such, this problem provides a convenient case study to demonstrate the applicability of the MCMC methodology to atmospheric tracer source/sink estimation problems.

  11. Modeling Nitrogen Dynamics in a Waste Stabilization Pond System Using Flexible Modeling Environment with MCMC.

    PubMed

    Mukhtar, Hussnain; Lin, Yu-Pin; Shipin, Oleg V; Petway, Joy R

    2017-07-12

    This study presents an approach for obtaining realization sets of parameters for nitrogen removal in a pilot-scale waste stabilization pond (WSP) system. The proposed approach was designed for optimal parameterization, local sensitivity analysis, and global uncertainty analysis of a dynamic simulation model for the WSP by using the R software package Flexible Modeling Environment (R-FME) with the Markov chain Monte Carlo (MCMC) method. Additionally, generalized likelihood uncertainty estimation (GLUE) was integrated into the FME to evaluate the major parameters that affect the simulation outputs in the study WSP. Comprehensive modeling analysis was used to simulate and assess nine parameters and concentrations of ON-N, NH₃-N and NO₃-N. Results indicate that the integrated FME-GLUE-based model, with good Nash-Sutcliffe coefficients (0.53-0.69) and correlation coefficients (0.76-0.83), successfully simulates the concentrations of ON-N, NH₃-N and NO₃-N. Moreover, the Arrhenius constant was the only parameter sensitive to model performances of ON-N and NH₃-N simulations. However, Nitrosomonas growth rate, the denitrification constant, and the maximum growth rate at 20 °C were sensitive to ON-N and NO₃-N simulation, which was measured using global sensitivity.

  12. Incorporating approximation error in surrogate based Bayesian inversion

    NASA Astrophysics Data System (ADS)

    Zhang, J.; Zeng, L.; Li, W.; Wu, L.

    2015-12-01

    There are increasing interests in applying surrogates for inverse Bayesian modeling to reduce repetitive evaluations of original model. In this way, the computational cost is expected to be saved. However, the approximation error of surrogate model is usually overlooked. This is partly because that it is difficult to evaluate the approximation error for many surrogates. Previous studies have shown that, the direct combination of surrogates and Bayesian methods (e.g., Markov Chain Monte Carlo, MCMC) may lead to biased estimations when the surrogate cannot emulate the highly nonlinear original system. This problem can be alleviated by implementing MCMC in a two-stage manner. However, the computational cost is still high since a relatively large number of original model simulations are required. In this study, we illustrate the importance of incorporating approximation error in inverse Bayesian modeling. Gaussian process (GP) is chosen to construct the surrogate for its convenience in approximation error evaluation. Numerical cases of Bayesian experimental design and parameter estimation for contaminant source identification are used to illustrate this idea. It is shown that, once the surrogate approximation error is well incorporated into Bayesian framework, promising results can be obtained even when the surrogate is directly used, and no further original model simulations are required.

  13. Protein structure refinement using a quantum mechanics-based chemical shielding predictor.

    PubMed

    Bratholm, Lars A; Jensen, Jan H

    2017-03-01

    The accurate prediction of protein chemical shifts using a quantum mechanics (QM)-based method has been the subject of intense research for more than 20 years but so far empirical methods for chemical shift prediction have proven more accurate. In this paper we show that a QM-based predictor of a protein backbone and CB chemical shifts (ProCS15, PeerJ , 2016, 3, e1344) is of comparable accuracy to empirical chemical shift predictors after chemical shift-based structural refinement that removes small structural errors. We present a method by which quantum chemistry based predictions of isotropic chemical shielding values (ProCS15) can be used to refine protein structures using Markov Chain Monte Carlo (MCMC) simulations, relating the chemical shielding values to the experimental chemical shifts probabilistically. Two kinds of MCMC structural refinement simulations were performed using force field geometry optimized X-ray structures as starting points: simulated annealing of the starting structure and constant temperature MCMC simulation followed by simulated annealing of a representative ensemble structure. Annealing of the CHARMM structure changes the CA-RMSD by an average of 0.4 Å but lowers the chemical shift RMSD by 1.0 and 0.7 ppm for CA and N. Conformational averaging has a relatively small effect (0.1-0.2 ppm) on the overall agreement with carbon chemical shifts but lowers the error for nitrogen chemical shifts by 0.4 ppm. If an amino acid specific offset is included the ProCS15 predicted chemical shifts have RMSD values relative to experiments that are comparable to popular empirical chemical shift predictors. The annealed representative ensemble structures differ in CA-RMSD relative to the initial structures by an average of 2.0 Å, with >2.0 Å difference for six proteins. In four of the cases, the largest structural differences arise in structurally flexible regions of the protein as determined by NMR, and in the remaining two cases, the large structural change may be due to force field deficiencies. The overall accuracy of the empirical methods are slightly improved by annealing the CHARMM structure with ProCS15, which may suggest that the minor structural changes introduced by ProCS15-based annealing improves the accuracy of the protein structures. Having established that QM-based chemical shift prediction can deliver the same accuracy as empirical shift predictors we hope this can help increase the accuracy of related approaches such as QM/MM or linear scaling approaches or interpreting protein structural dynamics from QM-derived chemical shift.

  14. A Markov Chain Monte Carlo Inversion Approach For Inverting InSAR Data With Application To Subsurface CO2 Injection

    NASA Astrophysics Data System (ADS)

    Ramirez, A. L.; Foxall, W.

    2011-12-01

    Surface displacements caused by reservoir pressure perturbations resulting from CO2 injection can often be measured by geodetic methods such as InSAR, tilt and GPS. We have developed a Markov Chain Monte Carlo (MCMC) approach to invert surface displacements measured by InSAR to map the pressure distribution associated with CO2 injection at the In Salah Krechba field, Algeria. The MCMC inversion entails sampling the solution space by proposing a series of trial 3D pressure-plume models. In the case of In Salah, the range of allowable models is constrained by prior information provided by well and geophysical data for the reservoir and possible fluid pathways in the overburden, and injection pressures and volumes. Each trial pressure distribution source is run through a (mathematical) forward model to calculate a set of synthetic surface deformation data. The likelihood that a particular proposal represents the true source is determined from the fit of the calculated data to the InSAR measurements, and those having higher likelihoods are passed to the posterior distribution. This procedure is repeated over typically ~104 - 105 trials until the posterior distribution converges to a stable solution. The solution to each stochastic inversion is in the form of Bayesian posterior probability density function (pdf) over the range of the alternative models that are consistent with the measured data and prior information. Therefore, the solution provides not only the highest likelihood model but also a realistic estimate of the solution uncertainty. Our InSalah work considered three flow model alternatives: 1) The first model assumed that the CO2 saturation and fluid pressure changes were confined to the reservoir; 2) the second model allowed the perturbations to occur also in a damage zone inferred in the lower caprock from 3D seismic surveys; and 3) the third model allowed fluid pressure changes anywhere within the reservoir and overburden. Alternative (2) yielded optimal fits to the data in inversions of InSAR data collected in 2007. The results indicate that pressure changes developed near the injection well and then penetrated into the lower caprock along the postulated damage zone. As in many geophysical inverse problems, inversion of surface displacement data for subsurface sources of deformation is inherently uncertain and non-unique. We will also discuss the approach used to characterize solution uncertainty. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

  15. magicaxis: Pretty scientific plotting with minor-tick and log minor-tick support

    NASA Astrophysics Data System (ADS)

    Robotham, Aaron S. G.

    2016-04-01

    The R suite magicaxis makes useful and pretty plots for scientific plotting and includes functions for base plotting, with particular emphasis on pretty axis labelling in a number of circumstances that are often used in scientific plotting. It also includes functions for generating images and contours that reflect the 2D quantile levels of the data designed particularly for output of MCMC posteriors where visualizing the location of the 68% and 95% 2D quantiles for covariant parameters is a necessary part of the post MCMC analysis, can generate low and high error bars, and allows clipping of values, rejection of bad values, and log stretching.

  16. The Full Monte Carlo: A Live Performance with Stars

    NASA Astrophysics Data System (ADS)

    Meng, Xiao-Li

    2014-06-01

    Markov chain Monte Carlo (MCMC) is being applied increasingly often in modern Astrostatistics. It is indeed incredibly powerful, but also very dangerous. It is popular because of its apparent generality (from simple to highly complex problems) and simplicity (the availability of out-of-the-box recipes). It is dangerous because it always produces something but there is no surefire way to verify or even diagnosis that the “something” is remotely close to what the MCMC theory predicts or one hopes. Using very simple models (e.g., conditionally Gaussian), this talk starts with a tutorial of the two most popular MCMC algorithms, namely, the Gibbs Sampler and the Metropolis-Hasting Algorithm, and illustratestheir good, bad, and ugly implementations via live demonstration. The talk ends with a story of how a recent advance, the Ancillary-Sufficient Interweaving Strategy (ASIS) (Yu and Meng, 2011, http://www.stat.harvard.edu/Faculty_Content/meng/jcgs.2011-article.pdf)reduces the danger. It was discovered almost by accident during a Ph.D. student’s (Yaming Yu) struggle with fitting a Cox process model for detecting changes in source intensity of photon counts observed by the Chandra X-ray telescope from a (candidate) neutron/quark star.

  17. Comprehensive cosmographic analysis by Markov chain method

    NASA Astrophysics Data System (ADS)

    Capozziello, S.; Lazkoz, R.; Salzano, V.

    2011-12-01

    We study the possibility of extracting model independent information about the dynamics of the Universe by using cosmography. We intend to explore it systematically, to learn about its limitations and its real possibilities. Here we are sticking to the series expansion approach on which cosmography is based. We apply it to different data sets: Supernovae type Ia (SNeIa), Hubble parameter extracted from differential galaxy ages, gamma ray bursts, and the baryon acoustic oscillations data. We go beyond past results in the literature extending the series expansion up to the fourth order in the scale factor, which implies the analysis of the deceleration q0, the jerk j0, and the snap s0. We use the Markov chain Monte Carlo method (MCMC) to analyze the data statistically. We also try to relate direct results from cosmography to dark energy (DE) dynamical models parametrized by the Chevallier-Polarski-Linder model, extracting clues about the matter content and the dark energy parameters. The main results are: (a) even if relying on a mathematical approximate assumption such as the scale factor series expansion in terms of time, cosmography can be extremely useful in assessing dynamical properties of the Universe; (b) the deceleration parameter clearly confirms the present acceleration phase; (c) the MCMC method can help giving narrower constraints in parameter estimation, in particular for higher order cosmographic parameters (the jerk and the snap), with respect to the literature; and (d) both the estimation of the jerk and the DE parameters reflect the possibility of a deviation from the ΛCDM cosmological model.

  18. A Bayesian connectivity-based approach to constructing probabilistic gene regulatory networks.

    PubMed

    Zhou, Xiaobo; Wang, Xiaodong; Pal, Ranadip; Ivanov, Ivan; Bittner, Michael; Dougherty, Edward R

    2004-11-22

    We have hypothesized that the construction of transcriptional regulatory networks using a method that optimizes connectivity would lead to regulation consistent with biological expectations. A key expectation is that the hypothetical networks should produce a few, very strong attractors, highly similar to the original observations, mimicking biological state stability and determinism. Another central expectation is that, since it is expected that the biological control is distributed and mutually reinforcing, interpretation of the observations should lead to a very small number of connection schemes. We propose a fully Bayesian approach to constructing probabilistic gene regulatory networks (PGRNs) that emphasizes network topology. The method computes the possible parent sets of each gene, the corresponding predictors and the associated probabilities based on a nonlinear perceptron model, using a reversible jump Markov chain Monte Carlo (MCMC) technique, and an MCMC method is employed to search the network configurations to find those with the highest Bayesian scores to construct the PGRN. The Bayesian method has been used to construct a PGRN based on the observed behavior of a set of genes whose expression patterns vary across a set of melanoma samples exhibiting two very different phenotypes with respect to cell motility and invasiveness. Key biological features have been faithfully reflected in the model. Its steady-state distribution contains attractors that are either identical or very similar to the states observed in the data, and many of the attractors are singletons, which mimics the biological propensity to stably occupy a given state. Most interestingly, the connectivity rules for the most optimal generated networks constituting the PGRN are remarkably similar, as would be expected for a network operating on a distributed basis, with strong interactions between the components.

  19. A probabilistic model framework for evaluating year-to-year variation in crop productivity

    NASA Astrophysics Data System (ADS)

    Yokozawa, M.; Iizumi, T.; Tao, F.

    2008-12-01

    Most models describing the relation between crop productivity and weather condition have so far been focused on mean changes of crop yield. For keeping stable food supply against abnormal weather as well as climate change, evaluating the year-to-year variations in crop productivity rather than the mean changes is more essential. We here propose a new framework of probabilistic model based on Bayesian inference and Monte Carlo simulation. As an example, we firstly introduce a model on paddy rice production in Japan. It is called PRYSBI (Process- based Regional rice Yield Simulator with Bayesian Inference; Iizumi et al., 2008). The model structure is the same as that of SIMRIW, which was developed and used widely in Japan. The model includes three sub- models describing phenological development, biomass accumulation and maturing of rice crop. These processes are formulated to include response nature of rice plant to weather condition. This model inherently was developed to predict rice growth and yield at plot paddy scale. We applied it to evaluate the large scale rice production with keeping the same model structure. Alternatively, we assumed the parameters as stochastic variables. In order to let the model catch up actual yield at larger scale, model parameters were determined based on agricultural statistical data of each prefecture of Japan together with weather data averaged over the region. The posterior probability distribution functions (PDFs) of parameters included in the model were obtained using Bayesian inference. The MCMC (Markov Chain Monte Carlo) algorithm was conducted to numerically solve the Bayesian theorem. For evaluating the year-to-year changes in rice growth/yield under this framework, we firstly iterate simulations with set of parameter values sampled from the estimated posterior PDF of each parameter and then take the ensemble mean weighted with the posterior PDFs. We will also present another example for maize productivity in China. The framework proposed here provides us information on uncertainties, possibilities and limitations on future improvements in crop model as well.

  20. Probabilistic Magnetotelluric Inversion with Adaptive Regularisation Using the No-U-Turns Sampler

    NASA Astrophysics Data System (ADS)

    Conway, Dennis; Simpson, Janelle; Didana, Yohannes; Rugari, Joseph; Heinson, Graham

    2018-04-01

    We present the first inversion of magnetotelluric (MT) data using a Hamiltonian Monte Carlo algorithm. The inversion of MT data is an underdetermined problem which leads to an ensemble of feasible models for a given dataset. A standard approach in MT inversion is to perform a deterministic search for the single solution which is maximally smooth for a given data-fit threshold. An alternative approach is to use Markov Chain Monte Carlo (MCMC) methods, which have been used in MT inversion to explore the entire solution space and produce a suite of likely models. This approach has the advantage of assigning confidence to resistivity models, leading to better geological interpretations. Recent advances in MCMC techniques include the No-U-Turns Sampler (NUTS), an efficient and rapidly converging method which is based on Hamiltonian Monte Carlo. We have implemented a 1D MT inversion which uses the NUTS algorithm. Our model includes a fixed number of layers of variable thickness and resistivity, as well as probabilistic smoothing constraints which allow sharp and smooth transitions. We present the results of a synthetic study and show the accuracy of the technique, as well as the fast convergence, independence of starting models, and sampling efficiency. Finally, we test our technique on MT data collected from a site in Boulia, Queensland, Australia to show its utility in geological interpretation and ability to provide probabilistic estimates of features such as depth to basement.

  1. Computational strategies for alternative single-step Bayesian regression models with large numbers of genotyped and non-genotyped animals.

    PubMed

    Fernando, Rohan L; Cheng, Hao; Golden, Bruce L; Garrick, Dorian J

    2016-12-08

    Two types of models have been used for single-step genomic prediction and genome-wide association studies that include phenotypes from both genotyped animals and their non-genotyped relatives. The two types are breeding value models (BVM) that fit breeding values explicitly and marker effects models (MEM) that express the breeding values in terms of the effects of observed or imputed genotypes. MEM can accommodate a wider class of analyses, including variable selection or mixture model analyses. The order of the equations that need to be solved and the inverses required in their construction vary widely, and thus the computational effort required depends upon the size of the pedigree, the number of genotyped animals and the number of loci. We present computational strategies to avoid storing large, dense blocks of the MME that involve imputed genotypes. Furthermore, we present a hybrid model that fits a MEM for animals with observed genotypes and a BVM for those without genotypes. The hybrid model is computationally attractive for pedigree files containing millions of animals with a large proportion of those being genotyped. We demonstrate the practicality on both the original MEM and the hybrid model using real data with 6,179,960 animals in the pedigree with 4,934,101 phenotypes and 31,453 animals genotyped at 40,214 informative loci. To complete a single-trait analysis on a desk-top computer with four graphics cards required about 3 h using the hybrid model to obtain both preconditioned conjugate gradient solutions and 42,000 Markov chain Monte-Carlo (MCMC) samples of breeding values, which allowed making inferences from posterior means, variances and covariances. The MCMC sampling required one quarter of the effort when the hybrid model was used compared to the published MEM. We present a hybrid model that fits a MEM for animals with genotypes and a BVM for those without genotypes. Its practicality and considerable reduction in computing effort was demonstrated. This model can readily be extended to accommodate multiple traits, multiple breeds, maternal effects, and additional random effects such as polygenic residual effects.

  2. Multiple-Event Seismic Location Using the Markov-Chain Monte Carlo Technique

    NASA Astrophysics Data System (ADS)

    Myers, S. C.; Johannesson, G.; Hanley, W.

    2005-12-01

    We develop a new multiple-event location algorithm (MCMCloc) that utilizes the Markov-Chain Monte Carlo (MCMC) method. Unlike most inverse methods, the MCMC approach produces a suite of solutions, each of which is consistent with observations and prior estimates of data and model uncertainties. Model parameters in MCMCloc consist of event hypocenters, and travel-time predictions. Data are arrival time measurements and phase assignments. Posteriori estimates of event locations, path corrections, pick errors, and phase assignments are made through analysis of the posteriori suite of acceptable solutions. Prior uncertainty estimates include correlations between travel-time predictions, correlations between measurement errors, the probability of misidentifying one phase for another, and the probability of spurious data. Inclusion of prior constraints on location accuracy allows direct utilization of ground-truth locations or well-constrained location parameters (e.g. from InSAR) that aid in the accuracy of the solution. Implementation of a correlation structure for travel-time predictions allows MCMCloc to operate over arbitrarily large geographic areas. Transition in behavior between a multiple-event locator for tightly clustered events and a single-event locator for solitary events is controlled by the spatial correlation of travel-time predictions. We test the MCMC locator on a regional data set of Nevada Test Site nuclear explosions. Event locations and origin times are known for these events, allowing us to test the features of MCMCloc using a high-quality ground truth data set. Preliminary tests suggest that MCMCloc provides excellent relative locations, often outperforming traditional multiple-event location algorithms, and excellent absolute locations are attained when constraints from one or more ground truth event are included. When phase assignments are switched, we find that MCMCloc properly corrects the error when predicted arrival times are separated by several seconds. In cases where the predicted arrival times are within the combined uncertainty of prediction and measurement errors, MCMCloc determines the probability of one or the other phase assignment and propagates this uncertainty into all model parameters. We find that MCMCloc is a promising method for simultaneously locating large, geographically distributed data sets. Because we incorporate prior knowledge on many parameters, MCMCloc is ideal for combining trusted data with data of unknown reliability. This work was performed under the auspices of the U.S. Department of Energy by the University of California Lawrence Livermore National Laboratory under contract No. W-7405-Eng-48, Contribution UCRL-ABS-215048

  3. Near-optimal alternative generation using modified hit-and-run sampling for non-linear, non-convex problems

    NASA Astrophysics Data System (ADS)

    Rosenberg, D. E.; Alafifi, A.

    2016-12-01

    Water resources systems analysis often focuses on finding optimal solutions. Yet an optimal solution is optimal only for the modelled issues and managers often seek near-optimal alternatives that address un-modelled objectives, preferences, limits, uncertainties, and other issues. Early on, Modelling to Generate Alternatives (MGA) formalized near-optimal as the region comprising the original problem constraints plus a new constraint that allowed performance within a specified tolerance of the optimal objective function value. MGA identified a few maximally-different alternatives from the near-optimal region. Subsequent work applied Markov Chain Monte Carlo (MCMC) sampling to generate a larger number of alternatives that span the near-optimal region of linear problems or select portions for non-linear problems. We extend the MCMC Hit-And-Run method to generate alternatives that span the full extent of the near-optimal region for non-linear, non-convex problems. First, start at a feasible hit point within the near-optimal region, then run a random distance in a random direction to a new hit point. Next, repeat until generating the desired number of alternatives. The key step at each iterate is to run a random distance along the line in the specified direction to a new hit point. If linear equity constraints exist, we construct an orthogonal basis and use a null space transformation to confine hits and runs to a lower-dimensional space. Linear inequity constraints define the convex bounds on the line that runs through the current hit point in the specified direction. We then use slice sampling to identify a new hit point along the line within bounds defined by the non-linear inequity constraints. This technique is computationally efficient compared to prior near-optimal alternative generation techniques such MGA, MCMC Metropolis-Hastings, evolutionary, or firefly algorithms because search at each iteration is confined to the hit line, the algorithm can move in one step to any point in the near-optimal region, and each iterate generates a new, feasible alternative. We use the method to generate alternatives that span the near-optimal regions of simple and more complicated water management problems and may be preferred to optimal solutions. We also discuss extensions to handle non-linear equity constraints.

  4. Genomic prediction using an iterative conditional expectation algorithm for a fast BayesC-like model.

    PubMed

    Dong, Linsong; Wang, Zhiyong

    2018-06-11

    Genomic prediction is feasible for estimating genomic breeding values because of dense genome-wide markers and credible statistical methods, such as Genomic Best Linear Unbiased Prediction (GBLUP) and various Bayesian methods. Compared with GBLUP, Bayesian methods propose more flexible assumptions for the distributions of SNP effects. However, most Bayesian methods are performed based on Markov chain Monte Carlo (MCMC) algorithms, leading to computational efficiency challenges. Hence, some fast Bayesian approaches, such as fast BayesB (fBayesB), were proposed to speed up the calculation. This study proposed another fast Bayesian method termed fast BayesC (fBayesC). The prior distribution of fBayesC assumes that a SNP with probability γ has a non-zero effect which comes from a normal density with a common variance. The simulated data from QTLMAS XII workshop and actual data on large yellow croaker were used to compare the predictive results of fBayesB, fBayesC and (MCMC-based) BayesC. The results showed that when γ was set as a small value, such as 0.01 in the simulated data or 0.001 in the actual data, fBayesB and fBayesC yielded lower prediction accuracies (abilities) than BayesC. In the actual data, fBayesC could yield very similar predictive abilities as BayesC when γ ≥ 0.01. When γ = 0.01, fBayesB could also yield similar results as fBayesC and BayesC. However, fBayesB could not yield an explicit result when γ ≥ 0.1, but a similar situation was not observed for fBayesC. Moreover, the computational speed of fBayesC was significantly faster than that of BayesC, making fBayesC a promising method for genomic prediction.

  5. Modeling Nitrogen Dynamics in a Waste Stabilization Pond System Using Flexible Modeling Environment with MCMC

    PubMed Central

    Mukhtar, Hussnain; Lin, Yu-Pin; Shipin, Oleg V.; Petway, Joy R.

    2017-01-01

    This study presents an approach for obtaining realization sets of parameters for nitrogen removal in a pilot-scale waste stabilization pond (WSP) system. The proposed approach was designed for optimal parameterization, local sensitivity analysis, and global uncertainty analysis of a dynamic simulation model for the WSP by using the R software package Flexible Modeling Environment (R-FME) with the Markov chain Monte Carlo (MCMC) method. Additionally, generalized likelihood uncertainty estimation (GLUE) was integrated into the FME to evaluate the major parameters that affect the simulation outputs in the study WSP. Comprehensive modeling analysis was used to simulate and assess nine parameters and concentrations of ON-N, NH3-N and NO3-N. Results indicate that the integrated FME-GLUE-based model, with good Nash–Sutcliffe coefficients (0.53–0.69) and correlation coefficients (0.76–0.83), successfully simulates the concentrations of ON-N, NH3-N and NO3-N. Moreover, the Arrhenius constant was the only parameter sensitive to model performances of ON-N and NH3-N simulations. However, Nitrosomonas growth rate, the denitrification constant, and the maximum growth rate at 20 °C were sensitive to ON-N and NO3-N simulation, which was measured using global sensitivity. PMID:28704958

  6. Cenozoic biogeography and evolution in direct-developing frogs of Central America (Leptodactylidae: Eleutherodactylus) as inferred from a phylogenetic analysis of nuclear and mitochondrial genes.

    PubMed

    Crawford, Andrew J; Smith, Eric N

    2005-06-01

    We report the first phylogenetic analysis of DNA sequence data for the Central American component of the genus Eleutherodactylus (Anura: Leptodactylidae: Eleutherodactylinae), one of the most ubiquitous, diverse, and abundant components of the Neotropical amphibian fauna. We obtained DNA sequence data from 55 specimens representing 45 species. Sampling was focused on Central America, but also included Bolivia, Brazil, Jamaica, and the USA. We sequenced 1460 contiguous base pairs (bp) of the mitochondrial genome containing ND2 and five neighboring tRNA genes, plus 1300 bp of the c-myc nuclear gene. The resulting phylogenetic inferences were broadly concordant between data sets and among analytical methods. The subgenus Craugastor is monophyletic and its initial radiation was potentially rapid and adaptive. Within Craugastor, the earliest splits separate three northern Central American species groups, milesi, augusti, and alfredi, from a clade comprising the rest of Craugastor. Within the latter clade, the rhodopis group as formerly recognized comprises three deeply divergent clades that do not form a monophyletic group; we therefore restrict the content of the rhodopis group to one of two northern clades, and use new names for the other northern (mexicanus group) and one southern clade (bransfordii group). The new rhodopis and bransfordii groups together form the sister taxon to a clade comprising the biporcatus, fitzingeri, mexicanus, and rugulosus groups. We used a Bayesian MCMC approach together with geological and biogeographic assumptions to estimate divergence times from the combined DNA sequence data. Our results corroborated three independent dispersal events for the origins of Central American Eleutherodactylus: (1) an ancestor of Craugastor entered northern Central America from South American in the early Paleocene, (2) an ancestor of the subgenus Syrrhophus entered northern Central America from the Caribbean at the end of the Eocene, and (3) a wave of independent dispersal events from South America coincided with formation of the Isthmus of Panama during the Pliocene. We elevate the subgenus Craugastor to the genus rank.

  7. Randomized Trial of a Peer-Led, Telephone-Based Empowerment Intervention for Persons With Chronic Spinal Cord Injury Improves Health Self-Management.

    PubMed

    Houlihan, Bethlyn Vergo; Brody, Miriam; Everhart-Skeels, Sarah; Pernigotti, Diana; Burnett, Sam; Zazula, Judi; Green, Christa; Hasiotis, Stathis; Belliveau, Timothy; Seetharama, Subramani; Rosenblum, David; Jette, Alan

    2017-06-01

    To evaluate the impact of "My Care My Call" (MCMC), a peer-led, telephone-based health self-management intervention in adults with chronic spinal cord injury (SCI). Single-blinded randomized controlled trial. General community. Convenience sample of adults with SCI (N=84; mean time post-SCI, 9.9y; mean age, 46y; 73.8% men; 44% with paraplegia; 58% white). Trained peer health coaches applied the person-centered health self-management intervention with 42 experimental subjects over 6 months on a tapered call schedule. The 42 control subjects received usual care. Both groups received the MCMC Resource Guide. Primary outcome-health self-management as measured by the Patient Activation Measure (PAM). Secondary outcomes-global ratings of service/resource use, health-related quality of life, and quality of primary care. Intervention participants averaged 12 calls over 6 months (averaging 21.8min each), with distinct variation. At 6 months, intervention participants reported a significantly greater change in PAM scores (6mo: estimate, 7.029; 95% confidence interval, .1018-13.956; P=.0468) compared with controls, with a trend toward significance at 4 months. At 6 months, intervention participants reported a significantly greater decrease in social/role activity limitations (estimate, -.443; P=.0389), greater life satisfaction (estimate, 1.0091; P=.0522), greater services/resources awareness (estimate, 1.678; P=.0253), greater overall service use (estimate, 1.069; P=.0240), and a greater number of services used (estimate, 1.542; P=.0077). Subgroups most impacted by MCMC on PAM change scores included the following: high social support, white persons, men, 1 to 6 years postinjury, and tetraplegic. This trial demonstrates that the MCMC peer-led, health self-management intervention achieved a positive impact on self-management to prevent secondary conditions in adults with SCI. These results warrant a larger, multisite trial of its efficacy and cost-effectiveness. Copyright © 2017 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.

  8. Copula-based assessment of the relationship between food peaks and flood volumes using information on historical floods by Bayesian Monte Carlo Markov Chain simulations

    NASA Astrophysics Data System (ADS)

    Gaál, Ladislav; Szolgay, Ján.; Bacigál, Tomáå.¡; Kohnová, Silvia

    2010-05-01

    Copula-based estimation methods of hydro-climatological extremes have increasingly been gaining attention of researchers and practitioners in the last couple of years. Unlike the traditional estimation methods which are based on bivariate cumulative distribution functions (CDFs), copulas are a relatively flexible tool of statistics that allow for modelling dependencies between two or more variables such as flood peaks and flood volumes without making strict assumptions on the marginal distributions. The dependence structure and the reliability of the joint estimates of hydro-climatological extremes, mainly in the right tail of the joint CDF not only depends on the particular copula adopted but also on the data available for the estimation of the marginal distributions of the individual variables. Generally, data samples for frequency modelling have limited temporal extent, which is a considerable drawback of frequency analyses in practice. Therefore, it is advised to deal with statistical methods that improve any part of the process of copula construction and result in more reliable design values of hydrological variables. The scarcity of the data sample mostly in the extreme tail of the joint CDF can be bypassed, e.g., by using a considerably larger amount of simulated data by rainfall-runoff analysis or by including historical information on the variables under study. The latter approach of data extension is used here to make the quantile estimates of the individual marginals of the copula more reliable. In the presented paper it is proposed to use historical information in the frequency analysis of the marginal distributions in the framework of Bayesian Monte Carlo Markov Chain (MCMC) simulations. Generally, a Bayesian approach allows for a straightforward combination of different sources of information on floods (e.g. flood data from systematic measurements and historical flood records, respectively) in terms of a product of the corresponding likelihood functions. On the other hand, the MCMC algorithm is a numerical approach for sampling from the likelihood distributions. The Bayesian MCMC methods therefore provide an attractive way to estimate the uncertainty in parameters and quantile metrics of frequency distributions. The applicability of the method is demonstrated in a case study of the hydroelectric power station Orlík on the Vltava River. This site has a key role in the flood prevention of Prague, the capital city of the Czech Republic. The record length of the available flood data is 126 years from the period 1877-2002, while the flood event observed in 2002 that caused extensive damages and numerous casualties is treated as a historic one. To estimate the joint probabilities of flood peaks and volumes, different copulas are fitted and their goodness-of-fit are evaluated by bootstrap simulations. Finally, selected quantiles of flood volumes conditioned on given flood peaks are derived and compared with those obtained by the traditional method used in the practice of water management specialists of the Vltava River.

  9. Protein structure refinement using a quantum mechanics-based chemical shielding predictor† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c6sc04344e Click here for additional data file.

    PubMed Central

    2017-01-01

    The accurate prediction of protein chemical shifts using a quantum mechanics (QM)-based method has been the subject of intense research for more than 20 years but so far empirical methods for chemical shift prediction have proven more accurate. In this paper we show that a QM-based predictor of a protein backbone and CB chemical shifts (ProCS15, PeerJ, 2016, 3, e1344) is of comparable accuracy to empirical chemical shift predictors after chemical shift-based structural refinement that removes small structural errors. We present a method by which quantum chemistry based predictions of isotropic chemical shielding values (ProCS15) can be used to refine protein structures using Markov Chain Monte Carlo (MCMC) simulations, relating the chemical shielding values to the experimental chemical shifts probabilistically. Two kinds of MCMC structural refinement simulations were performed using force field geometry optimized X-ray structures as starting points: simulated annealing of the starting structure and constant temperature MCMC simulation followed by simulated annealing of a representative ensemble structure. Annealing of the CHARMM structure changes the CA-RMSD by an average of 0.4 Å but lowers the chemical shift RMSD by 1.0 and 0.7 ppm for CA and N. Conformational averaging has a relatively small effect (0.1–0.2 ppm) on the overall agreement with carbon chemical shifts but lowers the error for nitrogen chemical shifts by 0.4 ppm. If an amino acid specific offset is included the ProCS15 predicted chemical shifts have RMSD values relative to experiments that are comparable to popular empirical chemical shift predictors. The annealed representative ensemble structures differ in CA-RMSD relative to the initial structures by an average of 2.0 Å, with >2.0 Å difference for six proteins. In four of the cases, the largest structural differences arise in structurally flexible regions of the protein as determined by NMR, and in the remaining two cases, the large structural change may be due to force field deficiencies. The overall accuracy of the empirical methods are slightly improved by annealing the CHARMM structure with ProCS15, which may suggest that the minor structural changes introduced by ProCS15-based annealing improves the accuracy of the protein structures. Having established that QM-based chemical shift prediction can deliver the same accuracy as empirical shift predictors we hope this can help increase the accuracy of related approaches such as QM/MM or linear scaling approaches or interpreting protein structural dynamics from QM-derived chemical shift. PMID:28451325

  10. Bayesian linkage and segregation analysis: factoring the problem.

    PubMed

    Matthysse, S

    2000-01-01

    Complex segregation analysis and linkage methods are mathematical techniques for the genetic dissection of complex diseases. They are used to delineate complex modes of familial transmission and to localize putative disease susceptibility loci to specific chromosomal locations. The computational problem of Bayesian linkage and segregation analysis is one of integration in high-dimensional spaces. In this paper, three available techniques for Bayesian linkage and segregation analysis are discussed: Markov Chain Monte Carlo (MCMC), importance sampling, and exact calculation. The contribution of each to the overall integration will be explicitly discussed.

  11. Sworn testimony of the model evidence: Gaussian Mixture Importance (GAME) sampling

    NASA Astrophysics Data System (ADS)

    Volpi, Elena; Schoups, Gerrit; Firmani, Giovanni; Vrugt, Jasper A.

    2017-07-01

    What is the "best" model? The answer to this question lies in part in the eyes of the beholder, nevertheless a good model must blend rigorous theory with redeeming qualities such as parsimony and quality of fit. Model selection is used to make inferences, via weighted averaging, from a set of K candidate models, Mk; k=>(1,…,K>), and help identify which model is most supported by the observed data, Y>˜=>(y˜1,…,y˜n>). Here, we introduce a new and robust estimator of the model evidence, p>(Y>˜|Mk>), which acts as normalizing constant in the denominator of Bayes' theorem and provides a single quantitative measure of relative support for each hypothesis that integrates model accuracy, uncertainty, and complexity. However, p>(Y>˜|Mk>) is analytically intractable for most practical modeling problems. Our method, coined GAussian Mixture importancE (GAME) sampling, uses bridge sampling of a mixture distribution fitted to samples of the posterior model parameter distribution derived from MCMC simulation. We benchmark the accuracy and reliability of GAME sampling by application to a diverse set of multivariate target distributions (up to 100 dimensions) with known values of p>(Y>˜|Mk>) and to hypothesis testing using numerical modeling of the rainfall-runoff transformation of the Leaf River watershed in Mississippi, USA. These case studies demonstrate that GAME sampling provides robust and unbiased estimates of the evidence at a relatively small computational cost outperforming commonly used estimators. The GAME sampler is implemented in the MATLAB package of DREAM and simplifies considerably scientific inquiry through hypothesis testing and model selection.

  12. Optimal Experimental Design of Borehole Locations for Bayesian Inference of Past Ice Sheet Surface Temperatures

    NASA Astrophysics Data System (ADS)

    Davis, A. D.; Huan, X.; Heimbach, P.; Marzouk, Y.

    2017-12-01

    Borehole data are essential for calibrating ice sheet models. However, field expeditions for acquiring borehole data are often time-consuming, expensive, and dangerous. It is thus essential to plan the best sampling locations that maximize the value of data while minimizing costs and risks. We present an uncertainty quantification (UQ) workflow based on rigorous probability framework to achieve these objectives. First, we employ an optimal experimental design (OED) procedure to compute borehole locations that yield the highest expected information gain. We take into account practical considerations of location accessibility (e.g., proximity to research sites, terrain, and ice velocity may affect feasibility of drilling) and robustness (e.g., real-time constraints such as weather may force researchers to drill at sub-optimal locations near those originally planned), by incorporating a penalty reflecting accessibility as well as sensitivity to deviations from the optimal locations. Next, we extract vertical temperature profiles from these boreholes and formulate a Bayesian inverse problem to reconstruct past surface temperatures. Using a model of temperature advection/diffusion, the top boundary condition (corresponding to surface temperatures) is calibrated via efficient Markov chain Monte Carlo (MCMC). The overall procedure can then be iterated to choose new optimal borehole locations for the next expeditions.Through this work, we demonstrate powerful UQ methods for designing experiments, calibrating models, making predictions, and assessing sensitivity--all performed under an uncertain environment. We develop a theoretical framework as well as practical software within an intuitive workflow, and illustrate their usefulness for combining data and models for environmental and climate research.

  13. How Much Can We Learn from a Single Chromatographic Experiment? A Bayesian Perspective.

    PubMed

    Wiczling, Paweł; Kaliszan, Roman

    2016-01-05

    In this work, we proposed and investigated a Bayesian inference procedure to find the desired chromatographic conditions based on known analyte properties (lipophilicity, pKa, and polar surface area) using one preliminary experiment. A previously developed nonlinear mixed effect model was used to specify the prior information about a new analyte with known physicochemical properties. Further, the prior (no preliminary data) and posterior predictive distribution (prior + one experiment) were determined sequentially to search towards the desired separation. The following isocratic high-performance reversed-phase liquid chromatographic conditions were sought: (1) retention time of a single analyte within the range of 4-6 min and (2) baseline separation of two analytes with retention times within the range of 4-10 min. The empirical posterior Bayesian distribution of parameters was estimated using the "slice sampling" Markov Chain Monte Carlo (MCMC) algorithm implemented in Matlab. The simulations with artificial analytes and experimental data of ketoprofen and papaverine were used to test the proposed methodology. The simulation experiment showed that for a single and two randomly selected analytes, there is 97% and 74% probability of obtaining a successful chromatogram using none or one preliminary experiment. The desired separation for ketoprofen and papaverine was established based on a single experiment. It was confirmed that the search for a desired separation rarely requires a large number of chromatographic analyses at least for a simple optimization problem. The proposed Bayesian-based optimization scheme is a powerful method of finding a desired chromatographic separation based on a small number of preliminary experiments.

  14. Estimation of spatially varying heat transfer coefficient from a flat plate with flush mounted heat sources using Bayesian inference

    NASA Astrophysics Data System (ADS)

    Jakkareddy, Pradeep S.; Balaji, C.

    2016-09-01

    This paper employs the Bayesian based Metropolis Hasting - Markov Chain Monte Carlo algorithm to solve inverse heat transfer problem of determining the spatially varying heat transfer coefficient from a flat plate with flush mounted discrete heat sources with measured temperatures at the bottom of the plate. The Nusselt number is assumed to be of the form Nu = aReb(x/l)c . To input reasonable values of ’a’ and ‘b’ into the inverse problem, first limited two dimensional conjugate convection simulations were done with Comsol. Based on the guidance from this different values of ‘a’ and ‘b’ are input to a computationally less complex problem of conjugate conduction in the flat plate (15mm thickness) and temperature distributions at the bottom of the plate which is a more convenient location for measuring the temperatures without disturbing the flow were obtained. Since the goal of this work is to demonstrate the eficiacy of the Bayesian approach to accurately retrieve ‘a’ and ‘b’, numerically generated temperatures with known values of ‘a’ and ‘b’ are treated as ‘surrogate’ experimental data. The inverse problem is then solved by repeatedly using the forward solutions together with the MH-MCMC aprroach. To speed up the estimation, the forward model is replaced by an artificial neural network. The mean, maximum-a-posteriori and standard deviation of the estimated parameters ‘a’ and ‘b’ are reported. The robustness of the proposed method is examined, by synthetically adding noise to the temperatures.

  15. DLRS: gene tree evolution in light of a species tree.

    PubMed

    Sjöstrand, Joel; Sennblad, Bengt; Arvestad, Lars; Lagergren, Jens

    2012-11-15

    PrIME-DLRS (or colloquially: 'Delirious') is a phylogenetic software tool to simultaneously infer and reconcile a gene tree given a species tree. It accounts for duplication and loss events, a relaxed molecular clock and is intended for the study of homologous gene families, for example in a comparative genomics setting involving multiple species. PrIME-DLRS uses a Bayesian MCMC framework, where the input is a known species tree with divergence times and a multiple sequence alignment, and the output is a posterior distribution over gene trees and model parameters. PrIME-DLRS is available for Java SE 6+ under the New BSD License, and JAR files and source code can be downloaded from http://code.google.com/p/jprime/. There is also a slightly older C++ version available as a binary package for Ubuntu, with download instructions at http://prime.sbc.su.se. The C++ source code is available upon request. joel.sjostrand@scilifelab.se or jens.lagergren@scilifelab.se. PrIME-DLRS is based on a sound probabilistic model (Åkerborg et al., 2009) and has been thoroughly validated on synthetic and biological datasets (Supplementary Material online).

  16. Estimating the Term Structure With a Semiparametric Bayesian Hierarchical Model: An Application to Corporate Bonds.

    PubMed

    Cruz-Marcelo, Alejandro; Ensor, Katherine B; Rosner, Gary L

    2011-06-01

    The term structure of interest rates is used to price defaultable bonds and credit derivatives, as well as to infer the quality of bonds for risk management purposes. We introduce a model that jointly estimates term structures by means of a Bayesian hierarchical model with a prior probability model based on Dirichlet process mixtures. The modeling methodology borrows strength across term structures for purposes of estimation. The main advantage of our framework is its ability to produce reliable estimators at the company level even when there are only a few bonds per company. After describing the proposed model, we discuss an empirical application in which the term structure of 197 individual companies is estimated. The sample of 197 consists of 143 companies with only one or two bonds. In-sample and out-of-sample tests are used to quantify the improvement in accuracy that results from approximating the term structure of corporate bonds with estimators by company rather than by credit rating, the latter being a popular choice in the financial literature. A complete description of a Markov chain Monte Carlo (MCMC) scheme for the proposed model is available as Supplementary Material.

  17. Estimating the Term Structure With a Semiparametric Bayesian Hierarchical Model: An Application to Corporate Bonds1

    PubMed Central

    Cruz-Marcelo, Alejandro; Ensor, Katherine B.; Rosner, Gary L.

    2011-01-01

    The term structure of interest rates is used to price defaultable bonds and credit derivatives, as well as to infer the quality of bonds for risk management purposes. We introduce a model that jointly estimates term structures by means of a Bayesian hierarchical model with a prior probability model based on Dirichlet process mixtures. The modeling methodology borrows strength across term structures for purposes of estimation. The main advantage of our framework is its ability to produce reliable estimators at the company level even when there are only a few bonds per company. After describing the proposed model, we discuss an empirical application in which the term structure of 197 individual companies is estimated. The sample of 197 consists of 143 companies with only one or two bonds. In-sample and out-of-sample tests are used to quantify the improvement in accuracy that results from approximating the term structure of corporate bonds with estimators by company rather than by credit rating, the latter being a popular choice in the financial literature. A complete description of a Markov chain Monte Carlo (MCMC) scheme for the proposed model is available as Supplementary Material. PMID:21765566

  18. AzTEC Survey of the Central Molecular Zone: Modeling Dust SEDs and N-PDF with Hierarchical Bayesian Analysis

    NASA Astrophysics Data System (ADS)

    Tang, Yuping; Wang, Daniel; Wilson, Grant; Gutermuth, Robert; Heyer, Mark

    2018-01-01

    We present the AzTEC/LMT survey of dust continuum at 1.1mm on the central ˜ 200pc (CMZ) of our Galaxy. A joint SED analysis of all existing dust continuum surveys on the CMZ is performed, from 160µm to 1.1mm. Our analysis follows a MCMC sampling strategy incorporating the knowledge of PSFs in different maps, which provides unprecedented spacial resolution on distributions of dust temperature, column density and emissivity index. The dense clumps in the CMZ typically show low dust temperature ( 20K), with no significant sign of buried star formation, and a weak evolution of higher emissivity index toward dense peak. A new model is proposed, allowing for varying dust temperature inside a cloud and self-shielding of dust emission, which leads to similar conclusions on dust temperature and grain properties. We further apply a hierarchical Bayesian analysis to infer the column density probability distribution function (N-PDF), while simultaneously removing the Galactic foreground and background emission. The N-PDF shows a steep power-law profile with α > 3, indicating that formation of dense structures are suppressed.

  19. Halo models of HI selected galaxies

    NASA Astrophysics Data System (ADS)

    Paul, Niladri; Choudhury, Tirthankar Roy; Paranjape, Aseem

    2018-06-01

    Modelling the distribution of neutral hydrogen (HI) in dark matter halos is important for studying galaxy evolution in the cosmological context. We use a novel approach to infer the HI-dark matter connection at the massive end (m_H{I} > 10^{9.8} M_{⊙}) from radio HI emission surveys, using optical properties of low-redshift galaxies as an intermediary. In particular, we use a previously calibrated optical HOD describing the luminosity- and colour-dependent clustering of SDSS galaxies and describe the HI content using a statistical scaling relation between the optical properties and HI mass. This allows us to compute the abundance and clustering properties of HI-selected galaxies and compare with data from the ALFALFA survey. We apply an MCMC-based statistical analysis to constrain the free parameters related to the scaling relation. The resulting best-fit scaling relation identifies massive HI galaxies primarily with optically faint blue centrals, consistent with expectations from galaxy formation models. We compare the Hi-stellar mass relation predicted by our model with independent observations from matched Hi-optical galaxy samples, finding reasonable agreement. As a further application, we make some preliminary forecasts for future observations of HI and optical galaxies in the expected overlap volume of SKA and Euclid/LSST.

  20. Exploring Hot Exoplanet Atmospheres with JWST/NIRSpec and a Hybrid Version of NEMESIS

    NASA Astrophysics Data System (ADS)

    Badhan, Mahmuda A.; Mandell, Avi; Batalha, Natasha; Irwin, Patrick GJ; Barstow, Joanna; Garland, Ryan; Deming, Drake; Hesman, Brigette E.; Nixon, Conor A.

    2016-01-01

    Understanding the formation environments and evolution scenarios of hot Jupiters demands robust measures for constraining their atmospheric physical properties and transit observations at unprecedented resolutions. Here we have utilized a combination of two different approaches, Optimal Estimation (OE) and Markov Chain Monte Carlo (MCMC), as part of the extensively validated NEMESIS atmospheric retrieval code, to infer pressure-temperature (P-T) profiles & gas mixing ratios (VMR) of H2O, CO2, CH4 and CO, from a series of tests conducted on JWST/NIRSpec simulations of the dayside thermal emission spectra (secondary eclipse) of H2-dominated hot-Jupiter candidates. To keep the number of parameters low and henceforth retrieve more plausible profile shapes, we have used a parametrized form of the temperature profile based upon the analytic radiative equilibrium derivation in Guillot et al. 2010. For the purpose of testing and validation, we also show some preliminary work on published dataset from the Hubble Space Telescope (HST) and Spitzer missions. Finally, high-temperature (T> 1000K) spectroscopic line lists are slowly but continually being improved by the atmospheric retrieval community. Since this carries the potential of impacting hot Jupiter atmospheric models quite significantly, we compare results from different databases.

  1. Accurate Phylogenetic Tree Reconstruction from Quartets: A Heuristic Approach

    PubMed Central

    Reaz, Rezwana; Bayzid, Md. Shamsuzzoha; Rahman, M. Sohel

    2014-01-01

    Supertree methods construct trees on a set of taxa (species) combining many smaller trees on the overlapping subsets of the entire set of taxa. A ‘quartet’ is an unrooted tree over taxa, hence the quartet-based supertree methods combine many -taxon unrooted trees into a single and coherent tree over the complete set of taxa. Quartet-based phylogeny reconstruction methods have been receiving considerable attentions in the recent years. An accurate and efficient quartet-based method might be competitive with the current best phylogenetic tree reconstruction methods (such as maximum likelihood or Bayesian MCMC analyses), without being as computationally intensive. In this paper, we present a novel and highly accurate quartet-based phylogenetic tree reconstruction method. We performed an extensive experimental study to evaluate the accuracy and scalability of our approach on both simulated and biological datasets. PMID:25117474

  2. No Control Genes Required: Bayesian Analysis of qRT-PCR Data

    PubMed Central

    Matz, Mikhail V.; Wright, Rachel M.; Scott, James G.

    2013-01-01

    Background Model-based analysis of data from quantitative reverse-transcription PCR (qRT-PCR) is potentially more powerful and versatile than traditional methods. Yet existing model-based approaches cannot properly deal with the higher sampling variances associated with low-abundant targets, nor do they provide a natural way to incorporate assumptions about the stability of control genes directly into the model-fitting process. Results In our method, raw qPCR data are represented as molecule counts, and described using generalized linear mixed models under Poisson-lognormal error. A Markov Chain Monte Carlo (MCMC) algorithm is used to sample from the joint posterior distribution over all model parameters, thereby estimating the effects of all experimental factors on the expression of every gene. The Poisson-based model allows for the correct specification of the mean-variance relationship of the PCR amplification process, and can also glean information from instances of no amplification (zero counts). Our method is very flexible with respect to control genes: any prior knowledge about the expected degree of their stability can be directly incorporated into the model. Yet the method provides sensible answers without such assumptions, or even in the complete absence of control genes. We also present a natural Bayesian analogue of the “classic” analysis, which uses standard data pre-processing steps (logarithmic transformation and multi-gene normalization) but estimates all gene expression changes jointly within a single model. The new methods are considerably more flexible and powerful than the standard delta-delta Ct analysis based on pairwise t-tests. Conclusions Our methodology expands the applicability of the relative-quantification analysis protocol all the way to the lowest-abundance targets, and provides a novel opportunity to analyze qRT-PCR data without making any assumptions concerning target stability. These procedures have been implemented as the MCMC.qpcr package in R. PMID:23977043

  3. A Bayesian trans-dimensional approach for the fusion of multiple geophysical datasets

    NASA Astrophysics Data System (ADS)

    JafarGandomi, Arash; Binley, Andrew

    2013-09-01

    We propose a Bayesian fusion approach to integrate multiple geophysical datasets with different coverage and sensitivity. The fusion strategy is based on the capability of various geophysical methods to provide enough resolution to identify either subsurface material parameters or subsurface structure, or both. We focus on electrical resistivity as the target material parameter and electrical resistivity tomography (ERT), electromagnetic induction (EMI), and ground penetrating radar (GPR) as the set of geophysical methods. However, extending the approach to different sets of geophysical parameters and methods is straightforward. Different geophysical datasets are entered into a trans-dimensional Markov chain Monte Carlo (McMC) search-based joint inversion algorithm. The trans-dimensional property of the McMC algorithm allows dynamic parameterisation of the model space, which in turn helps to avoid bias of the post-inversion results towards a particular model. Given that we are attempting to develop an approach that has practical potential, we discretize the subsurface into an array of one-dimensional earth-models. Accordingly, the ERT data that are collected by using two-dimensional acquisition geometry are re-casted to a set of equivalent vertical electric soundings. Different data are inverted either individually or jointly to estimate one-dimensional subsurface models at discrete locations. We use Shannon's information measure to quantify the information obtained from the inversion of different combinations of geophysical datasets. Information from multiple methods is brought together via introducing joint likelihood function and/or constraining the prior information. A Bayesian maximum entropy approach is used for spatial fusion of spatially dispersed estimated one-dimensional models and mapping of the target parameter. We illustrate the approach with a synthetic dataset and then apply it to a field dataset. We show that the proposed fusion strategy is successful not only in enhancing the subsurface information but also as a survey design tool to identify the appropriate combination of the geophysical tools and show whether application of an individual method for further investigation of a specific site is beneficial.

  4. A New Approach for Obtaining Cosmological Constraints from Type Ia Supernovae using Approximate Bayesian Computation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jennings, Elise; Wolf, Rachel; Sako, Masao

    2016-11-09

    Cosmological parameter estimation techniques that robustly account for systematic measurement uncertainties will be crucial for the next generation of cosmological surveys. We present a new analysis method, superABC, for obtaining cosmological constraints from Type Ia supernova (SN Ia) light curves using Approximate Bayesian Computation (ABC) without any likelihood assumptions. The ABC method works by using a forward model simulation of the data where systematic uncertainties can be simulated and marginalized over. A key feature of the method presented here is the use of two distinct metrics, the `Tripp' and `Light Curve' metrics, which allow us to compare the simulated data to the observed data set. The Tripp metric takes as input the parameters of models fit to each light curve with the SALT-II method, whereas the Light Curve metric uses the measured fluxes directly without model fitting. We apply the superABC sampler to a simulated data set ofmore » $$\\sim$$1000 SNe corresponding to the first season of the Dark Energy Survey Supernova Program. Varying $$\\Omega_m, w_0, \\alpha$$ and $$\\beta$$ and a magnitude offset parameter, with no systematics we obtain $$\\Delta(w_0) = w_0^{\\rm true} - w_0^{\\rm best \\, fit} = -0.036\\pm0.109$$ (a $$\\sim11$$% 1$$\\sigma$$ uncertainty) using the Tripp metric and $$\\Delta(w_0) = -0.055\\pm0.068$$ (a $$\\sim7$$% 1$$\\sigma$$ uncertainty) using the Light Curve metric. Including 1% calibration uncertainties in four passbands, adding 4 more parameters, we obtain $$\\Delta(w_0) = -0.062\\pm0.132$$ (a $$\\sim14$$% 1$$\\sigma$$ uncertainty) using the Tripp metric. Overall we find a $17$% increase in the uncertainty on $$w_0$$ with systematics compared to without. We contrast this with a MCMC approach where systematic effects are approximately included. We find that the MCMC method slightly underestimates the impact of calibration uncertainties for this simulated data set.« less

  5. The Joker: A Custom Monte Carlo Sampler for Binary-star and Exoplanet Radial Velocity Data

    NASA Astrophysics Data System (ADS)

    Price-Whelan, Adrian M.; Hogg, David W.; Foreman-Mackey, Daniel; Rix, Hans-Walter

    2017-03-01

    Given sparse or low-quality radial velocity measurements of a star, there are often many qualitatively different stellar or exoplanet companion orbit models that are consistent with the data. The consequent multimodality of the likelihood function leads to extremely challenging search, optimization, and Markov chain Monte Carlo (MCMC) posterior sampling over the orbital parameters. Here we create a custom Monte Carlo sampler for sparse or noisy radial velocity measurements of two-body systems that can produce posterior samples for orbital parameters even when the likelihood function is poorly behaved. The six standard orbital parameters for a binary system can be split into four nonlinear parameters (period, eccentricity, argument of pericenter, phase) and two linear parameters (velocity amplitude, barycenter velocity). We capitalize on this by building a sampling method in which we densely sample the prior probability density function (pdf) in the nonlinear parameters and perform rejection sampling using a likelihood function marginalized over the linear parameters. With sparse or uninformative data, the sampling obtained by this rejection sampling is generally multimodal and dense. With informative data, the sampling becomes effectively unimodal but too sparse: in these cases we follow the rejection sampling with standard MCMC. The method produces correct samplings in orbital parameters for data that include as few as three epochs. The Joker can therefore be used to produce proper samplings of multimodal pdfs, which are still informative and can be used in hierarchical (population) modeling. We give some examples that show how the posterior pdf depends sensitively on the number and time coverage of the observations and their uncertainties.

  6. Stochastic Simulation and Forecast of Hydrologic Time Series Based on Probabilistic Chaos Expansion

    NASA Astrophysics Data System (ADS)

    Li, Z.; Ghaith, M.

    2017-12-01

    Hydrological processes are characterized by many complex features, such as nonlinearity, dynamics and uncertainty. How to quantify and address such complexities and uncertainties has been a challenging task for water engineers and managers for decades. To support robust uncertainty analysis, an innovative approach for the stochastic simulation and forecast of hydrologic time series is developed is this study. Probabilistic Chaos Expansions (PCEs) are established through probabilistic collocation to tackle uncertainties associated with the parameters of traditional hydrological models. The uncertainties are quantified in model outputs as Hermite polynomials with regard to standard normal random variables. Sequentially, multivariate analysis techniques are used to analyze the complex nonlinear relationships between meteorological inputs (e.g., temperature, precipitation, evapotranspiration, etc.) and the coefficients of the Hermite polynomials. With the established relationships between model inputs and PCE coefficients, forecasts of hydrologic time series can be generated and the uncertainties in the future time series can be further tackled. The proposed approach is demonstrated using a case study in China and is compared to a traditional stochastic simulation technique, the Markov-Chain Monte-Carlo (MCMC) method. Results show that the proposed approach can serve as a reliable proxy to complicated hydrological models. It can provide probabilistic forecasting in a more computationally efficient manner, compared to the traditional MCMC method. This work provides technical support for addressing uncertainties associated with hydrological modeling and for enhancing the reliability of hydrological modeling results. Applications of the developed approach can be extended to many other complicated geophysical and environmental modeling systems to support the associated uncertainty quantification and risk analysis.

  7. DNA motif alignment by evolving a population of Markov chains.

    PubMed

    Bi, Chengpeng

    2009-01-30

    Deciphering cis-regulatory elements or de novo motif-finding in genomes still remains elusive although much algorithmic effort has been expended. The Markov chain Monte Carlo (MCMC) method such as Gibbs motif samplers has been widely employed to solve the de novo motif-finding problem through sequence local alignment. Nonetheless, the MCMC-based motif samplers still suffer from local maxima like EM. Therefore, as a prerequisite for finding good local alignments, these motif algorithms are often independently run a multitude of times, but without information exchange between different chains. Hence it would be worth a new algorithm design enabling such information exchange. This paper presents a novel motif-finding algorithm by evolving a population of Markov chains with information exchange (PMC), each of which is initialized as a random alignment and run by the Metropolis-Hastings sampler (MHS). It is progressively updated through a series of local alignments stochastically sampled. Explicitly, the PMC motif algorithm performs stochastic sampling as specified by a population-based proposal distribution rather than individual ones, and adaptively evolves the population as a whole towards a global maximum. The alignment information exchange is accomplished by taking advantage of the pooled motif site distributions. A distinct method for running multiple independent Markov chains (IMC) without information exchange, or dubbed as the IMC motif algorithm, is also devised to compare with its PMC counterpart. Experimental studies demonstrate that the performance could be improved if pooled information were used to run a population of motif samplers. The new PMC algorithm was able to improve the convergence and outperformed other popular algorithms tested using simulated and biological motif sequences.

  8. Hierarchical Models for Type Ia Supernova Light Curves in the Optical and Near Infrared

    NASA Astrophysics Data System (ADS)

    Mandel, Kaisey; Narayan, G.; Kirshner, R. P.

    2011-01-01

    I have constructed a comprehensive statistical model for Type Ia supernova optical and near infrared light curves. Since the near infrared light curves are excellent standard candles and are less sensitive to dust extinction and reddening, the combination of near infrared and optical data better constrains the host galaxy extinction and improves the precision of distance predictions to SN Ia. A hierarchical probabilistic model coherently accounts for multiple random and uncertain effects, including photometric error, intrinsic supernova light curve variations and correlations across phase and wavelength, dust extinction and reddening, peculiar velocity dispersion and distances. An improved BayeSN MCMC code is implemented for computing probabilistic inferences for individual supernovae and the SN Ia and host galaxy dust populations. I use this hierarchical model to analyze nearby Type Ia supernovae with optical and near infared data from the PAIRITEL, CfA3, and CSP samples and the literature. Using cross-validation to test the robustness of the model predictions, I find that the rms Hubble diagram scatter of predicted distance moduli is 0.11 mag for SN with optical and near infrared data versus 0.15 mag for SN with only optical data. Accounting for the dispersion expected from random peculiar velocities, the rms intrinsic prediction error is 0.08-0.10 mag for SN with both optical and near infrared light curves. I discuss results for the inferred intrinsic correlation structures of the optical-NIR SN Ia light curves and the host galaxy dust distribution captured by the hierarchical model. The continued observation and analysis of Type Ia SN in the optical and near infrared is important for improving their utility as precise and accurate cosmological distance indicators.

  9. Bayesian forecasting and uncertainty quantifying of stream flows using Metropolis-Hastings Markov Chain Monte Carlo algorithm

    NASA Astrophysics Data System (ADS)

    Wang, Hongrui; Wang, Cheng; Wang, Ying; Gao, Xiong; Yu, Chen

    2017-06-01

    This paper presents a Bayesian approach using Metropolis-Hastings Markov Chain Monte Carlo algorithm and applies this method for daily river flow rate forecast and uncertainty quantification for Zhujiachuan River using data collected from Qiaotoubao Gage Station and other 13 gage stations in Zhujiachuan watershed in China. The proposed method is also compared with the conventional maximum likelihood estimation (MLE) for parameter estimation and quantification of associated uncertainties. While the Bayesian method performs similarly in estimating the mean value of daily flow rate, it performs over the conventional MLE method on uncertainty quantification, providing relatively narrower reliable interval than the MLE confidence interval and thus more precise estimation by using the related information from regional gage stations. The Bayesian MCMC method might be more favorable in the uncertainty analysis and risk management.

  10. Characterizing the Trade Space Between Capability and Complexity in Next Generation Cloud and Precipitation Observing Systems Using Markov Chain Monte Carlos Techniques

    NASA Astrophysics Data System (ADS)

    Xu, Z.; Mace, G. G.; Posselt, D. J.

    2017-12-01

    As we begin to contemplate the next generation atmospheric observing systems, it will be critically important that we are able to make informed decisions regarding the trade space between scientific capability and the need to keep complexity and cost within definable limits. To explore this trade space as it pertains to understanding key cloud and precipitation processes, we are developing a Markov Chain Monte Carlo (MCMC) algorithm suite that allows us to arbitrarily define the specifications of candidate observing systems and then explore how the uncertainties in key retrieved geophysical parameters respond to that observing system. MCMC algorithms produce a more complete posterior solution space, and allow for an objective examination of information contained in measurements. In our initial implementation, MCMC experiments are performed to retrieve vertical profiles of cloud and precipitation properties from a spectrum of active and passive measurements collected by aircraft during the ACE Radiation Definition Experiments (RADEX). Focusing on shallow cumulus clouds observed during the Integrated Precipitation and Hydrology EXperiment (IPHEX), observing systems in this study we consider W and Ka-band radar reflectivity, path-integrated attenuation at those frequencies, 31 and 94 GHz brightness temperatures as well as visible and near-infrared reflectance. By varying the sensitivity and uncertainty of these measurements, we quantify the capacity of various combinations of observations to characterize the physical properties of clouds and precipitation.

  11. Numerical Demons in Monte Carlo Estimation of Bayesian Model Evidence with Application to Soil Respiration Models

    NASA Astrophysics Data System (ADS)

    Elshall, A. S.; Ye, M.; Niu, G. Y.; Barron-Gafford, G.

    2016-12-01

    Bayesian multimodel inference is increasingly being used in hydrology. Estimating Bayesian model evidence (BME) is of central importance in many Bayesian multimodel analysis such as Bayesian model averaging and model selection. BME is the overall probability of the model in reproducing the data, accounting for the trade-off between the goodness-of-fit and the model complexity. Yet estimating BME is challenging, especially for high dimensional problems with complex sampling space. Estimating BME using the Monte Carlo numerical methods is preferred, as the methods yield higher accuracy than semi-analytical solutions (e.g. Laplace approximations, BIC, KIC, etc.). However, numerical methods are prone the numerical demons arising from underflow of round off errors. Although few studies alluded to this issue, to our knowledge this is the first study that illustrates these numerical demons. We show that the precision arithmetic can become a threshold on likelihood values and Metropolis acceptance ratio, which results in trimming parameter regions (when likelihood function is less than the smallest floating point number that a computer can represent) and corrupting of the empirical measures of the random states of the MCMC sampler (when using log-likelihood function). We consider two of the most powerful numerical estimators of BME that are the path sampling method of thermodynamic integration (TI) and the importance sampling method of steppingstone sampling (SS). We also consider the two most widely used numerical estimators, which are the prior sampling arithmetic mean (AS) and posterior sampling harmonic mean (HM). We investigate the vulnerability of these four estimators to the numerical demons. Interesting, the most biased estimator, namely the HM, turned out to be the least vulnerable. While it is generally assumed that AM is a bias-free estimator that will always approximate the true BME by investing in computational effort, we show that arithmetic underflow can hamper AM resulting in severe underestimation of BME. TI turned out to be the most vulnerable, resulting in BME overestimation. Finally, we show how SS can be largely invariant to rounding errors, yielding the most accurate and computational efficient results. These research results are useful for MC simulations to estimate Bayesian model evidence.

  12. Honest Importance Sampling with Multiple Markov Chains

    PubMed Central

    Tan, Aixin; Doss, Hani; Hobert, James P.

    2017-01-01

    Importance sampling is a classical Monte Carlo technique in which a random sample from one probability density, π1, is used to estimate an expectation with respect to another, π. The importance sampling estimator is strongly consistent and, as long as two simple moment conditions are satisfied, it obeys a central limit theorem (CLT). Moreover, there is a simple consistent estimator for the asymptotic variance in the CLT, which makes for routine computation of standard errors. Importance sampling can also be used in the Markov chain Monte Carlo (MCMC) context. Indeed, if the random sample from π1 is replaced by a Harris ergodic Markov chain with invariant density π1, then the resulting estimator remains strongly consistent. There is a price to be paid however, as the computation of standard errors becomes more complicated. First, the two simple moment conditions that guarantee a CLT in the iid case are not enough in the MCMC context. Second, even when a CLT does hold, the asymptotic variance has a complex form and is difficult to estimate consistently. In this paper, we explain how to use regenerative simulation to overcome these problems. Actually, we consider a more general set up, where we assume that Markov chain samples from several probability densities, π1, …, πk, are available. We construct multiple-chain importance sampling estimators for which we obtain a CLT based on regeneration. We show that if the Markov chains converge to their respective target distributions at a geometric rate, then under moment conditions similar to those required in the iid case, the MCMC-based importance sampling estimator obeys a CLT. Furthermore, because the CLT is based on a regenerative process, there is a simple consistent estimator of the asymptotic variance. We illustrate the method with two applications in Bayesian sensitivity analysis. The first concerns one-way random effects models under different priors. The second involves Bayesian variable selection in linear regression, and for this application, importance sampling based on multiple chains enables an empirical Bayes approach to variable selection. PMID:28701855

  13. Honest Importance Sampling with Multiple Markov Chains.

    PubMed

    Tan, Aixin; Doss, Hani; Hobert, James P

    2015-01-01

    Importance sampling is a classical Monte Carlo technique in which a random sample from one probability density, π 1 , is used to estimate an expectation with respect to another, π . The importance sampling estimator is strongly consistent and, as long as two simple moment conditions are satisfied, it obeys a central limit theorem (CLT). Moreover, there is a simple consistent estimator for the asymptotic variance in the CLT, which makes for routine computation of standard errors. Importance sampling can also be used in the Markov chain Monte Carlo (MCMC) context. Indeed, if the random sample from π 1 is replaced by a Harris ergodic Markov chain with invariant density π 1 , then the resulting estimator remains strongly consistent. There is a price to be paid however, as the computation of standard errors becomes more complicated. First, the two simple moment conditions that guarantee a CLT in the iid case are not enough in the MCMC context. Second, even when a CLT does hold, the asymptotic variance has a complex form and is difficult to estimate consistently. In this paper, we explain how to use regenerative simulation to overcome these problems. Actually, we consider a more general set up, where we assume that Markov chain samples from several probability densities, π 1 , …, π k , are available. We construct multiple-chain importance sampling estimators for which we obtain a CLT based on regeneration. We show that if the Markov chains converge to their respective target distributions at a geometric rate, then under moment conditions similar to those required in the iid case, the MCMC-based importance sampling estimator obeys a CLT. Furthermore, because the CLT is based on a regenerative process, there is a simple consistent estimator of the asymptotic variance. We illustrate the method with two applications in Bayesian sensitivity analysis. The first concerns one-way random effects models under different priors. The second involves Bayesian variable selection in linear regression, and for this application, importance sampling based on multiple chains enables an empirical Bayes approach to variable selection.

  14. RadVel: The Radial Velocity Modeling Toolkit

    NASA Astrophysics Data System (ADS)

    Fulton, Benjamin J.; Petigura, Erik A.; Blunt, Sarah; Sinukoff, Evan

    2018-04-01

    RadVel is an open-source Python package for modeling Keplerian orbits in radial velocity (RV) timeseries. RadVel provides a convenient framework to fit RVs using maximum a posteriori optimization and to compute robust confidence intervals by sampling the posterior probability density via Markov Chain Monte Carlo (MCMC). RadVel allows users to float or fix parameters, impose priors, and perform Bayesian model comparison. We have implemented real-time MCMC convergence tests to ensure adequate sampling of the posterior. RadVel can output a number of publication-quality plots and tables. Users may interface with RadVel through a convenient command-line interface or directly from Python. The code is object-oriented and thus naturally extensible. We encourage contributions from the community. Documentation is available at http://radvel.readthedocs.io.

  15. Occupancy in community-level studies

    USGS Publications Warehouse

    MacKenzie, Darryl I.; Nichols, James; Royle, Andy; Pollock, Kenneth H.; Bailey, Larissa L.; Hines, James

    2018-01-01

    Another type of multi-species studies, are those focused on community-level metrics such as species richness. In this chapter we detail how some of the single-species occupancy models described in earlier chapters have been applied, or extended, for use in such studies, while accounting for imperfect detection. We highlight how Bayesian methods using MCMC are particularly useful in such settings to easily calculate relevant community-level summaries based on presence/absence data. These modeling approaches can be used to assess richness at a single point in time, or to investigate changes in the species pool over time.

  16. MC3: Multi-core Markov-chain Monte Carlo code

    NASA Astrophysics Data System (ADS)

    Cubillos, Patricio; Harrington, Joseph; Lust, Nate; Foster, AJ; Stemm, Madison; Loredo, Tom; Stevenson, Kevin; Campo, Chris; Hardin, Matt; Hardy, Ryan

    2016-10-01

    MC3 (Multi-core Markov-chain Monte Carlo) is a Bayesian statistics tool that can be executed from the shell prompt or interactively through the Python interpreter with single- or multiple-CPU parallel computing. It offers Markov-chain Monte Carlo (MCMC) posterior-distribution sampling for several algorithms, Levenberg-Marquardt least-squares optimization, and uniform non-informative, Jeffreys non-informative, or Gaussian-informative priors. MC3 can share the same value among multiple parameters and fix the value of parameters to constant values, and offers Gelman-Rubin convergence testing and correlated-noise estimation with time-averaging or wavelet-based likelihood estimation methods.

  17. Estimation for coefficient of variation of an extension of the exponential distribution under type-II censoring scheme

    NASA Astrophysics Data System (ADS)

    Bakoban, Rana A.

    2017-08-01

    The coefficient of variation [CV] has several applications in applied statistics. So in this paper, we adopt Bayesian and non-Bayesian approaches for the estimation of CV under type-II censored data from extension exponential distribution [EED]. The point and interval estimate of the CV are obtained for each of the maximum likelihood and parametric bootstrap techniques. Also the Bayesian approach with the help of MCMC method is presented. A real data set is presented and analyzed, hence the obtained results are used to assess the obtained theoretical results.

  18. MCMC multilocus lod scores: application of a new approach.

    PubMed

    George, Andrew W; Wijsman, Ellen M; Thompson, Elizabeth A

    2005-01-01

    On extended pedigrees with extensive missing data, the calculation of multilocus likelihoods for linkage analysis is often beyond the computational bounds of exact methods. Growing interest therefore surrounds the implementation of Monte Carlo estimation methods. In this paper, we demonstrate the speed and accuracy of a new Markov chain Monte Carlo method for the estimation of linkage likelihoods through an analysis of real data from a study of early-onset Alzheimer's disease. For those data sets where comparison with exact analysis is possible, we achieved up to a 100-fold increase in speed. Our approach is implemented in the program lm_bayes within the framework of the freely available MORGAN 2.6 package for Monte Carlo genetic analysis (http://www.stat.washington.edu/thompson/Genepi/MORGAN/Morgan.shtml).

  19. A Bayesian Retrieval of Greenland Ice Sheet Internal Temperature from Ultra-wideband Software-defined Microwave Radiometer (UWBRAD) Measurements

    NASA Astrophysics Data System (ADS)

    Duan, Y.; Durand, M. T.; Jezek, K. C.; Yardim, C.; Bringer, A.; Aksoy, M.; Johnson, J. T.

    2017-12-01

    The ultra-wideband software-defined microwave radiometer (UWBRAD) is designed to provide ice sheet internal temperature product via measuring low frequency microwave emission. Twelve channels ranging from 0.5 to 2.0 GHz are covered by the instrument. A Greenland air-borne demonstration was demonstrated in September 2016, provided first demonstration of Ultra-wideband radiometer observations of geophysical scenes, including ice sheets. Another flight is planned for September 2017 for acquiring measurements in central ice sheet. A Bayesian framework is designed to retrieve the ice sheet internal temperature from simulated UWBRAD brightness temperature (Tb) measurements over Greenland flight path with limited prior information of the ground. A 1-D heat-flow model, the Robin Model, was used to model the ice sheet internal temperature profile with ground information. Synthetic UWBRAD Tb observations was generated via the partially coherent radiation transfer model, which utilizes the Robin model temperature profile and an exponential fit of ice density from Borehole measurement as input, and corrupted with noise. The effective surface temperature, geothermal heat flux, the variance of upper layer ice density, and the variance of fine scale density variation at deeper ice sheet were treated as unknown variables within the retrieval framework. Each parameter is defined with its possible range and set to be uniformly distributed. The Markov Chain Monte Carlo (MCMC) approach is applied to make the unknown parameters randomly walk in the parameter space. We investigate whether the variables can be improved over priors using the MCMC approach and contribute to the temperature retrieval theoretically. UWBRAD measurements near camp century from 2016 was also treated with the MCMC to examine the framework with scattering effect. The fine scale density fluctuation is an important parameter. It is the most sensitive yet highly unknown parameter in the estimation framework. Including the fine scale density fluctuation greatly improved the retrieval results. The ice sheet vertical temperature profile, especially the 10m temperature, can be well retrieved via the MCMC process. Future retrieval work will apply the Bayesian approach to UWBRAD airborne measurements.

  20. A Bayesian Uncertainty Framework for Conceptual Snowmelt and Hydrologic Models Applied to the Tenderfoot Creek Experimental Forest

    NASA Astrophysics Data System (ADS)

    Smith, T.; Marshall, L.

    2007-12-01

    In many mountainous regions, the single most important parameter in forecasting the controls on regional water resources is snowpack (Williams et al., 1999). In an effort to bridge the gap between theoretical understanding and functional modeling of snow-driven watersheds, a flexible hydrologic modeling framework is being developed. The aim is to create a suite of models that move from parsimonious structures, concentrated on aggregated watershed response, to those focused on representing finer scale processes and distributed response. This framework will operate as a tool to investigate the link between hydrologic model predictive performance, uncertainty, model complexity, and observable hydrologic processes. Bayesian methods, and particularly Markov chain Monte Carlo (MCMC) techniques, are extremely useful in uncertainty assessment and parameter estimation of hydrologic models. However, these methods have some difficulties in implementation. In a traditional Bayesian setting, it can be difficult to reconcile multiple data types, particularly those offering different spatial and temporal coverage, depending on the model type. These difficulties are also exacerbated by sensitivity of MCMC algorithms to model initialization and complex parameter interdependencies. As a way of circumnavigating some of the computational complications, adaptive MCMC algorithms have been developed to take advantage of the information gained from each successive iteration. Two adaptive algorithms are compared is this study, the Adaptive Metropolis (AM) algorithm, developed by Haario et al (2001), and the Delayed Rejection Adaptive Metropolis (DRAM) algorithm, developed by Haario et al (2006). While neither algorithm is truly Markovian, it has been proven that each satisfies the desired ergodicity and stationarity properties of Markov chains. Both algorithms were implemented as the uncertainty and parameter estimation framework for a conceptual rainfall-runoff model based on the Probability Distributed Model (PDM), developed by Moore (1985). We implement the modeling framework in Stringer Creek watershed in the Tenderfoot Creek Experimental Forest (TCEF), Montana. The snowmelt-driven watershed offers that additional challenge of modeling snow accumulation and melt and current efforts are aimed at developing a temperature- and radiation-index snowmelt model. Auxiliary data available from within TCEF's watersheds are used to support in the understanding of information value as it relates to predictive performance. Because the model is based on lumped parameters, auxiliary data are hard to incorporate directly. However, these additional data offer benefits through the ability to inform prior distributions of the lumped, model parameters. By incorporating data offering different information into the uncertainty assessment process, a cross-validation technique is engaged to better ensure that modeled results reflect real process complexity.

  1. Advanced Large Scale Cross Domain Temporal Topic Modeling Algorithms to Infer the Influence of Recent Research on IPCC Assessment Reports

    NASA Astrophysics Data System (ADS)

    Sleeman, J.; Halem, M.; Finin, T.; Cane, M. A.

    2016-12-01

    Approximately every five years dating back to 1989, thousands of climate scientists, research centers and government labs volunteer to prepare comprehensive Assessment Reports for the Intergovernmental Panel on Climate Change. These are highly curated reports distributed to 200 nation policy makers. There have been five IPCC Assessment Reports to date, the latest leading to a Paris Agreement in Dec. 2016 signed thus far by 172 nations to limit the amount of global Greenhouse gases emitted to producing no more than a 20 C warming of the atmosphere. These reports are a living evolving big data collection tracing 30 years of climate science research, observations, and model scenario intercomparisons. They contain more than 200,000 citations over a 30 year period that trace the evolution of the physical basis of climate science, the observed and predicted impact, risk and adaptation to increased greenhouse gases and mitigation approaches, pathways, policies for climate change. Document-topic and topic-term probability distributions are built from the vocabularies of the respective assessment report chapters and citations. Using Microsoft Bing, we retrieve 150,000 citations referenced across chapters and convert those citations to text. Using a word n-gram model based on a heterogeneous set of climate change terminology, lemmatization, noise filtering and stopword elimination, we calculate word frequencies for chapters and citations. Temporal document sets are built based on the assessment period. In addition to topic modeling, we employ cross domain correlation measures. Using the Jensen-Shannon divergence and Pearson correlation we build correlation matrices for chapter and citations topics. The shared vocabulary acts as the bridge between domains resulting in chapter-citation point pairs in space. Pairs are established based on a document-topic probability distribution. Each chapter and citation is associated with a vector of topics and based on the n most probable topics, we establish which chapter-citation pairs are most similar. We will perform posterior inferences based on Hastings -Metropolis simulated annealing MCMC algorithm to infer, from the evolution of topics starting from AR1 to AR4, assertions of topics for AR5 and potentially AR6.

  2. Bayesian forecasting and uncertainty quantifying of stream flows using Metropolis–Hastings Markov Chain Monte Carlo algorithm

    DOE PAGES

    Wang, Hongrui; Wang, Cheng; Wang, Ying; ...

    2017-04-05

    This paper presents a Bayesian approach using Metropolis-Hastings Markov Chain Monte Carlo algorithm and applies this method for daily river flow rate forecast and uncertainty quantification for Zhujiachuan River using data collected from Qiaotoubao Gage Station and other 13 gage stations in Zhujiachuan watershed in China. The proposed method is also compared with the conventional maximum likelihood estimation (MLE) for parameter estimation and quantification of associated uncertainties. While the Bayesian method performs similarly in estimating the mean value of daily flow rate, it performs over the conventional MLE method on uncertainty quantification, providing relatively narrower reliable interval than the MLEmore » confidence interval and thus more precise estimation by using the related information from regional gage stations. As a result, the Bayesian MCMC method might be more favorable in the uncertainty analysis and risk management.« less

  3. A Kolmogorov-Smirnov test for the molecular clock based on Bayesian ensembles of phylogenies

    PubMed Central

    Antoneli, Fernando; Passos, Fernando M.; Lopes, Luciano R.

    2018-01-01

    Divergence date estimates are central to understand evolutionary processes and depend, in the case of molecular phylogenies, on tests of molecular clocks. Here we propose two non-parametric tests of strict and relaxed molecular clocks built upon a framework that uses the empirical cumulative distribution (ECD) of branch lengths obtained from an ensemble of Bayesian trees and well known non-parametric (one-sample and two-sample) Kolmogorov-Smirnov (KS) goodness-of-fit test. In the strict clock case, the method consists in using the one-sample Kolmogorov-Smirnov (KS) test to directly test if the phylogeny is clock-like, in other words, if it follows a Poisson law. The ECD is computed from the discretized branch lengths and the parameter λ of the expected Poisson distribution is calculated as the average branch length over the ensemble of trees. To compensate for the auto-correlation in the ensemble of trees and pseudo-replication we take advantage of thinning and effective sample size, two features provided by Bayesian inference MCMC samplers. Finally, it is observed that tree topologies with very long or very short branches lead to Poisson mixtures and in this case we propose the use of the two-sample KS test with samples from two continuous branch length distributions, one obtained from an ensemble of clock-constrained trees and the other from an ensemble of unconstrained trees. Moreover, in this second form the test can also be applied to test for relaxed clock models. The use of a statistically equivalent ensemble of phylogenies to obtain the branch lengths ECD, instead of one consensus tree, yields considerable reduction of the effects of small sample size and provides a gain of power. PMID:29300759

  4. Improvement on Exoplanet Detection Methods and Analysis via Gaussian Process Fitting Techniques

    NASA Astrophysics Data System (ADS)

    Van Ross, Bryce; Teske, Johanna

    2018-01-01

    Planetary signals in radial velocity (RV) data are often accompanied by signals coming solely from stellar photo- or chromospheric variation. Such variation can reduce the precision of planet detection and mass measurements, and cause misidentification of planetary signals. Recently, several authors have demonstrated the utility of Gaussian Process (GP) regression for disentangling planetary signals in RV observations (Aigrain et al. 2012; Angus et al. 2017; Czekala et al. 2017; Faria et al. 2016; Gregory 2015; Haywood et al. 2014; Rajpaul et al. 2015; Foreman-Mackey et al. 2017). GP models the covariance of multivariate data to make predictions about likely underlying trends in the data, which can be applied to regions where there are no existing observations. The potency of GP has been used to infer stellar rotation periods; to model and disentangle time series spectra; and to determine physical aspects, populations, and detection of exoplanets, among other astrophysical applications. Here, we implement similar analysis techniques to times series of the Ca-2 H and K activity indicator measured simultaneously with RVs in a small sample of stars from the large Keck/HIRES RV planet search program. Our goal is to characterize the pattern(s) of non-planetary variation to be able to know what is/ is not a planetary signal. We investigated ten different GP kernels and their respective hyperparameters to determine the optimal combination (e.g., the lowest Bayesian Information Criterion value) in each stellar data set. To assess the hyperparameters’ error, we sampled their posterior distributions using Markov chain Monte Carlo (MCMC) analysis on the optimized kernels. Our results demonstrate how GP analysis of stellar activity indicators alone can contribute to exoplanet detection in RV data, and highlight the challenges in applying GP analysis to relatively small, irregularly sampled time series.

  5. Inferring the Spatio-temporal Patterns of Dengue Transmission from Surveillance Data in Guangzhou, China

    PubMed Central

    Liu, Jiming; Tan, Qi; Shi, Benyun

    2016-01-01

    Background Dengue is a serious vector-borne disease, and incidence rates have significantly increased during the past few years, particularly in 2014 in Guangzhou. The current situation is more complicated, due to various factors such as climate warming, urbanization, population increase, and human mobility. The purpose of this study is to detect dengue transmission patterns and identify the disease dispersion dynamics in Guangzhou, China. Methodology We conducted surveys in 12 districts of Guangzhou, and collected daily data of Breteau index (BI) and reported cases between September and November 2014 from the public health authority reports. Based on the available data and the Ross-Macdonald theory, we propose a dengue transmission model that systematically integrates entomologic, demographic, and environmental information. In this model, we use (1) BI data and geographic variables to evaluate the spatial heterogeneities of Aedes mosquitoes, (2) a radiation model to simulate the daily mobility of humans, and (3) a Markov chain Monte Carlo (MCMC) method to estimate the model parameters. Results/Conclusions By implementing our proposed model, we can (1) estimate the incidence rates of dengue, and trace the infection time and locations, (2) assess risk factors and evaluate the infection threat in a city, and (3) evaluate the primary diffusion process in different districts. From the results, we can see that dengue infections exhibited a spatial and temporal variation during 2014 in Guangzhou. We find that urbanization, vector activities, and human behavior play significant roles in shaping the dengue outbreak and the patterns of its spread. This study offers useful information on dengue dynamics, which can help policy makers improve control and prevention measures. PMID:27105350

  6. A Bayesian destructive weighted Poisson cure rate model and an application to a cutaneous melanoma data.

    PubMed

    Rodrigues, Josemar; Cancho, Vicente G; de Castro, Mário; Balakrishnan, N

    2012-12-01

    In this article, we propose a new Bayesian flexible cure rate survival model, which generalises the stochastic model of Klebanov et al. [Klebanov LB, Rachev ST and Yakovlev AY. A stochastic-model of radiation carcinogenesis--latent time distributions and their properties. Math Biosci 1993; 113: 51-75], and has much in common with the destructive model formulated by Rodrigues et al. [Rodrigues J, de Castro M, Balakrishnan N and Cancho VG. Destructive weighted Poisson cure rate models. Technical Report, Universidade Federal de São Carlos, São Carlos-SP. Brazil, 2009 (accepted in Lifetime Data Analysis)]. In our approach, the accumulated number of lesions or altered cells follows a compound weighted Poisson distribution. This model is more flexible than the promotion time cure model in terms of dispersion. Moreover, it possesses an interesting and realistic interpretation of the biological mechanism of the occurrence of the event of interest as it includes a destructive process of tumour cells after an initial treatment or the capacity of an individual exposed to irradiation to repair altered cells that results in cancer induction. In other words, what is recorded is only the damaged portion of the original number of altered cells not eliminated by the treatment or repaired by the repair system of an individual. Markov Chain Monte Carlo (MCMC) methods are then used to develop Bayesian inference for the proposed model. Also, some discussions on the model selection and an illustration with a cutaneous melanoma data set analysed by Rodrigues et al. [Rodrigues J, de Castro M, Balakrishnan N and Cancho VG. Destructive weighted Poisson cure rate models. Technical Report, Universidade Federal de São Carlos, São Carlos-SP. Brazil, 2009 (accepted in Lifetime Data Analysis)] are presented.

  7. The evolution of defense mechanisms correlate with the explosive diversification of autodigesting Coprinellus mushrooms (Agaricales, Fungi).

    PubMed

    Nagy, László G; Házi, Judit; Szappanos, Balázs; Kocsubé, Sándor; Bálint, Balázs; Rákhely, Gábor; Vágvölgyi, Csaba; Papp, Tamás

    2012-07-01

    Bursts of diversification are known to have contributed significantly to the extant morphological and species diversity, but evidence for many of the theoretical predictions about adaptive radiations have remained contentious. Despite their tremendous diversity, patterns of evolutionary diversification and the contribution of explosive episodes in fungi are largely unknown. Here, using the genus Coprinellus (Psathyrellaceae, Agaricales) as a model, we report the first explosive fungal radiation and infer that the onset of the radiation correlates with a change from a multilayered to a much simpler defense structure on the fruiting bodies. We hypothesize that this change constitutes a key innovation, probably relaxing constraints on diversification imposed by nutritional investment into the development of protective tissues of fruiting bodies. Fossil calibration suggests that Coprinellus mushrooms radiated during the Miocene coinciding with global radiation of large grazing mammals following expansion of dry open grasslands. In addition to diversification rate-based methods, we test the hard polytomy hypothesis, by analyzing the resolvability of internal nodes of the backbone of the putative radiation using Reversible-Jump MCMC. We discuss potential applications and pitfalls of this approach as well as how biologically meaningful polytomies can be distinguished from alignment shortcomings. Our data provide insights into the nature of adaptive radiations in general by revealing a deceleration of morphological diversification through time. The dynamics of morphological diversification was approximated by obtaining the temporal distribution of state changes in discrete traits along the trees and comparing it with the tempo of lineage accumulation. We found that the number of state changes correlate with the number of lineages, even in parts of the tree with short internal branches, and peaks around the onset of the explosive radiation followed by a slowdown, most likely because of the decrease in available niches.

  8. Evolutionary History and Phylodynamics of Influenza A and B Neuraminidase (NA) Genes Inferred from Large-Scale Sequence Analyses

    PubMed Central

    Xu, Jianpeng; Davis, C. Todd; Christman, Mary C.; Rivailler, Pierre; Zhong, Haizhen; Donis, Ruben O.; Lu, Guoqing

    2012-01-01

    Background Influenza neuraminidase (NA) is an important surface glycoprotein and plays a vital role in viral replication and drug development. The NA is found in influenza A and B viruses, with nine subtypes classified in influenza A. The complete knowledge of influenza NA evolutionary history and phylodynamics, although critical for the prevention and control of influenza epidemics and pandemics, remains lacking. Methodology/Principal findings Evolutionary and phylogenetic analyses of influenza NA sequences using Maximum Likelihood and Bayesian MCMC methods demonstrated that the divergence of influenza viruses into types A and B occurred earlier than the divergence of influenza A NA subtypes. Twenty-three lineages were identified within influenza A, two lineages were classified within influenza B, and most lineages were specific to host, subtype or geographical location. Interestingly, evolutionary rates vary not only among lineages but also among branches within lineages. The estimated tMRCAs of influenza lineages suggest that the viruses of different lineages emerge several months or even years before their initial detection. The d N /d S ratios ranged from 0.062 to 0.313 for influenza A lineages, and 0.257 to 0.259 for influenza B lineages. Structural analyses revealed that all positively selected sites are at the surface of the NA protein, with a number of sites found to be important for host antibody and drug binding. Conclusions/Significance The divergence into influenza type A and B from a putative ancestral NA was followed by the divergence of type A into nine NA subtypes, of which 23 lineages subsequently diverged. This study provides a better understanding of influenza NA lineages and their evolutionary dynamics, which may facilitate early detection of newly emerging influenza viruses and thus improve influenza surveillance. PMID:22808012

  9. Mapping malaria incidence distribution that accounts for environmental factors in Maputo Province - Mozambique

    PubMed Central

    2010-01-01

    Background The objective was to study if an association exists between the incidence of malaria and some weather parameters in tropical Maputo province, Mozambique. Methods A Bayesian hierarchical model to malaria count data aggregated at district level over a two years period is formulated. This model made it possible to account for spatial area variations. The model was extended to include environmental covariates temperature and rainfall. Study period was then divided into two climate conditions: rainy and dry seasons. The incidences of malaria between the two seasons were compared. Parameter estimation and inference were carried out using MCMC simulation techniques based on Poisson variation. Model comparisons are made using DIC. Results For winter season, in 2001 the temperature covariate with estimated value of -8.88 shows no association to malaria incidence. In year 2002, the parameter estimation of the same covariate resulted in 5.498 of positive level of association. In both years rainfall covariate determines no dependency to malaria incidence. Malaria transmission is higher in wet season with both covariates positively related to malaria with posterior means 1.99 and 2.83 in year 2001. For 2002 only temperature is associated to malaria incidence with estimated value 2.23. Conclusions The incidence of malaria in year 2001, presents an independent spatial pattern for temperature in summer and for rainfall in winter seasons respectively. In year 2002 temperature determines the spatial pattern of malaria incidence in the region. Temperature influences the model in cases where both covariates are introduced in winter and summer season. Its influence is extended to the summer model with temperature covariate only. It is reasonable to state that with the occurrence of high temperatures, malaria incidence had certainly escalated in this year. PMID:20302674

  10. A High Performance Bayesian Computing Framework for Spatiotemporal Uncertainty Modeling

    NASA Astrophysics Data System (ADS)

    Cao, G.

    2015-12-01

    All types of spatiotemporal measurements are subject to uncertainty. With spatiotemporal data becomes increasingly involved in scientific research and decision making, it is important to appropriately model the impact of uncertainty. Quantitatively modeling spatiotemporal uncertainty, however, is a challenging problem considering the complex dependence and dataheterogeneities.State-space models provide a unifying and intuitive framework for dynamic systems modeling. In this paper, we aim to extend the conventional state-space models for uncertainty modeling in space-time contexts while accounting for spatiotemporal effects and data heterogeneities. Gaussian Markov Random Field (GMRF) models, also known as conditional autoregressive models, are arguably the most commonly used methods for modeling of spatially dependent data. GMRF models basically assume that a geo-referenced variable primarily depends on its neighborhood (Markov property), and the spatial dependence structure is described via a precision matrix. Recent study has shown that GMRFs are efficient approximation to the commonly used Gaussian fields (e.g., Kriging), and compared with Gaussian fields, GMRFs enjoy a series of appealing features, such as fast computation and easily accounting for heterogeneities in spatial data (e.g, point and areal). This paper represents each spatial dataset as a GMRF and integrates them into a state-space form to statistically model the temporal dynamics. Different types of spatial measurements (e.g., categorical, count or continuous), can be accounted for by according link functions. A fast alternative to MCMC framework, so-called Integrated Nested Laplace Approximation (INLA), was adopted for model inference.Preliminary case studies will be conducted to showcase the advantages of the described framework. In the first case, we apply the proposed method for modeling the water table elevation of Ogallala aquifer over the past decades. In the second case, we analyze the drought impacts in Texas counties in the past years, where the spatiotemporal dynamics are represented in areal data.

  11. Finite element model updating using the shadow hybrid Monte Carlo technique

    NASA Astrophysics Data System (ADS)

    Boulkaibet, I.; Mthembu, L.; Marwala, T.; Friswell, M. I.; Adhikari, S.

    2015-02-01

    Recent research in the field of finite element model updating (FEM) advocates the adoption of Bayesian analysis techniques to dealing with the uncertainties associated with these models. However, Bayesian formulations require the evaluation of the Posterior Distribution Function which may not be available in analytical form. This is the case in FEM updating. In such cases sampling methods can provide good approximations of the Posterior distribution when implemented in the Bayesian context. Markov Chain Monte Carlo (MCMC) algorithms are the most popular sampling tools used to sample probability distributions. However, the efficiency of these algorithms is affected by the complexity of the systems (the size of the parameter space). The Hybrid Monte Carlo (HMC) offers a very important MCMC approach to dealing with higher-dimensional complex problems. The HMC uses the molecular dynamics (MD) steps as the global Monte Carlo (MC) moves to reach areas of high probability where the gradient of the log-density of the Posterior acts as a guide during the search process. However, the acceptance rate of HMC is sensitive to the system size as well as the time step used to evaluate the MD trajectory. To overcome this limitation we propose the use of the Shadow Hybrid Monte Carlo (SHMC) algorithm. The SHMC algorithm is a modified version of the Hybrid Monte Carlo (HMC) and designed to improve sampling for large-system sizes and time steps. This is done by sampling from a modified Hamiltonian function instead of the normal Hamiltonian function. In this paper, the efficiency and accuracy of the SHMC method is tested on the updating of two real structures; an unsymmetrical H-shaped beam structure and a GARTEUR SM-AG19 structure and is compared to the application of the HMC algorithm on the same structures.

  12. [Economic Evaluation of mFOLFOX6-based First-line Regimens for Unresectable Advanced or Recurrent Colorectal Cancer Using Clinical Decision Analysis].

    PubMed

    Shida, Toshihiro; Endo, Yuji; Shiraishi, Tadashi; Yoshioka, Takashi; Suzuki, Kaoru; Kobayashi, Yuka; Ono, Yuki; Ito, Toshinori; Inoue, Tadao

    2018-01-01

     We evaluated four representative chemotherapy regimens for unresectable advanced or recurrent KRAS-wild type colorectal cancer: mFOLFOX6, mFOLFOX6+bevacizumab (Bmab), cetuximab (Cmab), or panitumumab (Pmab). We employed a decision analysis method in combination with clinical and economic evidence. The health outcomes of the regimens were analyzed on the basis of overall and progression-free survival. The data were drawn from the literature on randomized controlled clinical trials of the above-mentioned drugs. The total costs of the regimens were calculated on the basis of direct costs obtained from the medical records of patients diagnosed with unresectable advanced or recurrent colorectal cancer at Yamagata University Hospital and Yamagata Prefecture Central Hospital. Cost effectiveness was analyzed using a Markov chain Monte Carlo (MCMC) method. The study was designed from the viewpoint of public medical care. The MCMC analysis revealed that expected life months and expected cost were 20 months/3,527,119 yen for mFOLFOX6, 27 months/8,270,625 yen for mFOLFOX6+Bmab, 29 months/13,174,6297 yen for mFOLFOX6+Cmab, and 6 months/12,613,445 yen for mFOLFOX6+Pmab. Incremental costs per effectiveness ratios per life month against mFOLFOX6 were 637,592 yen for mFOLFOX6+Bmab, 1,075,162 yen for mFOLFOX6+Cmab, and 587,455 yen for mFOLFOX6+Pmab. Compared to the conventional mFOLFOX6 regimen, molecular-targeted drug regimens provide better health outcomes, but the cost increases accordingly. mFOLFOX 6+Pmab is the most cost-effective regimen among those surveyed in this study.

  13. On thinning of chains in MCMC

    USGS Publications Warehouse

    Link, William A.; Eaton, Mitchell J.

    2012-01-01

    4. We discuss the background and prevalence of thinning, illustrate its consequences, discuss circumstances when it might be regarded as a reasonable option and recommend against routine thinning of chains unless necessitated by computer memory limitations.

  14. Comparison of two stochastic techniques for reliable urban runoff prediction by modeling systematic errors

    NASA Astrophysics Data System (ADS)

    Del Giudice, Dario; Löwe, Roland; Madsen, Henrik; Mikkelsen, Peter Steen; Rieckermann, Jörg

    2015-07-01

    In urban rainfall-runoff, commonly applied statistical techniques for uncertainty quantification mostly ignore systematic output errors originating from simplified models and erroneous inputs. Consequently, the resulting predictive uncertainty is often unreliable. Our objective is to present two approaches which use stochastic processes to describe systematic deviations and to discuss their advantages and drawbacks for urban drainage modeling. The two methodologies are an external bias description (EBD) and an internal noise description (IND, also known as stochastic gray-box modeling). They emerge from different fields and have not yet been compared in environmental modeling. To compare the two approaches, we develop a unifying terminology, evaluate them theoretically, and apply them to conceptual rainfall-runoff modeling in the same drainage system. Our results show that both approaches can provide probabilistic predictions of wastewater discharge in a similarly reliable way, both for periods ranging from a few hours up to more than 1 week ahead of time. The EBD produces more accurate predictions on long horizons but relies on computationally heavy MCMC routines for parameter inferences. These properties make it more suitable for off-line applications. The IND can help in diagnosing the causes of output errors and is computationally inexpensive. It produces best results on short forecast horizons that are typical for online applications.

  15. Cosmological parameter estimation using Particle Swarm Optimization

    NASA Astrophysics Data System (ADS)

    Prasad, J.; Souradeep, T.

    2014-03-01

    Constraining parameters of a theoretical model from observational data is an important exercise in cosmology. There are many theoretically motivated models, which demand greater number of cosmological parameters than the standard model of cosmology uses, and make the problem of parameter estimation challenging. It is a common practice to employ Bayesian formalism for parameter estimation for which, in general, likelihood surface is probed. For the standard cosmological model with six parameters, likelihood surface is quite smooth and does not have local maxima, and sampling based methods like Markov Chain Monte Carlo (MCMC) method are quite successful. However, when there are a large number of parameters or the likelihood surface is not smooth, other methods may be more effective. In this paper, we have demonstrated application of another method inspired from artificial intelligence, called Particle Swarm Optimization (PSO) for estimating cosmological parameters from Cosmic Microwave Background (CMB) data taken from the WMAP satellite.

  16. Real-time characterization of partially observed epidemics using surrogate models.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Safta, Cosmin; Ray, Jaideep; Lefantzi, Sophia

    We present a statistical method, predicated on the use of surrogate models, for the 'real-time' characterization of partially observed epidemics. Observations consist of counts of symptomatic patients, diagnosed with the disease, that may be available in the early epoch of an ongoing outbreak. Characterization, in this context, refers to estimation of epidemiological parameters that can be used to provide short-term forecasts of the ongoing epidemic, as well as to provide gross information on the dynamics of the etiologic agent in the affected population e.g., the time-dependent infection rate. The characterization problem is formulated as a Bayesian inverse problem, and epidemiologicalmore » parameters are estimated as distributions using a Markov chain Monte Carlo (MCMC) method, thus quantifying the uncertainty in the estimates. In some cases, the inverse problem can be computationally expensive, primarily due to the epidemic simulator used inside the inversion algorithm. We present a method, based on replacing the epidemiological model with computationally inexpensive surrogates, that can reduce the computational time to minutes, without a significant loss of accuracy. The surrogates are created by projecting the output of an epidemiological model on a set of polynomial chaos bases; thereafter, computations involving the surrogate model reduce to evaluations of a polynomial. We find that the epidemic characterizations obtained with the surrogate models is very close to that obtained with the original model. We also find that the number of projections required to construct a surrogate model is O(10)-O(10{sup 2}) less than the number of samples required by the MCMC to construct a stationary posterior distribution; thus, depending upon the epidemiological models in question, it may be possible to omit the offline creation and caching of surrogate models, prior to their use in an inverse problem. The technique is demonstrated on synthetic data as well as observations from the 1918 influenza pandemic collected at Camp Custer, Michigan.« less

  17. Prospects of detection of the first sources with SKA using matched filters

    NASA Astrophysics Data System (ADS)

    Ghara, Raghunath; Choudhury, T. Roy; Datta, Kanan K.; Mellema, Garrelt; Choudhuri, Samir; Majumdar, Suman; Giri, Sambit K.

    2018-05-01

    The matched filtering technique is an efficient method to detect H ii bubbles and absorption regions in radio interferometric observations of the redshifted 21-cm signal from the epoch of reionization and the Cosmic Dawn. Here, we present an implementation of this technique to the upcoming observations such as the SKA1-low for a blind search of absorption regions at the Cosmic Dawn. The pipeline explores four dimensional parameter space on the simulated mock visibilities using a MCMC algorithm. The framework is able to efficiently determine the positions and sizes of the absorption/H ii regions in the field of view.

  18. Modeling and estimating the jump risk of exchange rates: Applications to RMB

    NASA Astrophysics Data System (ADS)

    Wang, Yiming; Tong, Hanfei

    2008-11-01

    In this paper we propose a new type of continuous-time stochastic volatility model, SVDJ, for the spot exchange rate of RMB, and other foreign currencies. In the model, we assume that the change of exchange rate can be decomposed into two components. One is the normally small-cope innovation driven by the diffusion motion; the other is a large drop or rise engendered by the Poisson counting process. Furthermore, we develop a MCMC method to estimate our model. Empirical results indicate the significant existence of jumps in the exchange rate. Jump components explain a large proportion of the exchange rate change.

  19. Bayesian hierarchical modelling of continuous non‐negative longitudinal data with a spike at zero: An application to a study of birds visiting gardens in winter

    PubMed Central

    Buckland, Stephen T.; King, Ruth; Toms, Mike P.

    2015-01-01

    The development of methods for dealing with continuous data with a spike at zero has lagged behind those for overdispersed or zero‐inflated count data. We consider longitudinal ecological data corresponding to an annual average of 26 weekly maximum counts of birds, and are hence effectively continuous, bounded below by zero but also with a discrete mass at zero. We develop a Bayesian hierarchical Tweedie regression model that can directly accommodate the excess number of zeros common to this type of data, whilst accounting for both spatial and temporal correlation. Implementation of the model is conducted in a Markov chain Monte Carlo (MCMC) framework, using reversible jump MCMC to explore uncertainty across both parameter and model spaces. This regression modelling framework is very flexible and removes the need to make strong assumptions about mean‐variance relationships a priori. It can also directly account for the spike at zero, whilst being easily applicable to other types of data and other model formulations. Whilst a correlative study such as this cannot prove causation, our results suggest that an increase in an avian predator may have led to an overall decrease in the number of one of its prey species visiting garden feeding stations in the United Kingdom. This may reflect a change in behaviour of house sparrows to avoid feeding stations frequented by sparrowhawks, or a reduction in house sparrow population size as a result of sparrowhawk increase. PMID:25737026

  20. Stochastic static fault slip inversion from geodetic data with non-negativity and bound constraints

    NASA Astrophysics Data System (ADS)

    Nocquet, J.-M.

    2018-07-01

    Despite surface displacements observed by geodesy are linear combinations of slip at faults in an elastic medium, determining the spatial distribution of fault slip remains a ill-posed inverse problem. A widely used approach to circumvent the illness of the inversion is to add regularization constraints in terms of smoothing and/or damping so that the linear system becomes invertible. However, the choice of regularization parameters is often arbitrary, and sometimes leads to significantly different results. Furthermore, the resolution analysis is usually empirical and cannot be made independently of the regularization. The stochastic approach of inverse problems provides a rigorous framework where the a priori information about the searched parameters is combined with the observations in order to derive posterior probabilities of the unkown parameters. Here, I investigate an approach where the prior probability density function (pdf) is a multivariate Gaussian function, with single truncation to impose positivity of slip or double truncation to impose positivity and upper bounds on slip for interseismic modelling. I show that the joint posterior pdf is similar to the linear untruncated Gaussian case and can be expressed as a truncated multivariate normal (TMVN) distribution. The TMVN form can then be used to obtain semi-analytical formulae for the single, 2-D or n-D marginal pdf. The semi-analytical formula involves the product of a Gaussian by an integral term that can be evaluated using recent developments in TMVN probabilities calculations. Posterior mean and covariance can also be efficiently derived. I show that the maximum posterior (MAP) can be obtained using a non-negative least-squares algorithm for the single truncated case or using the bounded-variable least-squares algorithm for the double truncated case. I show that the case of independent uniform priors can be approximated using TMVN. The numerical equivalence to Bayesian inversions using Monte Carlo Markov chain (MCMC) sampling is shown for a synthetic example and a real case for interseismic modelling in Central Peru. The TMVN method overcomes several limitations of the Bayesian approach using MCMC sampling. First, the need of computer power is largely reduced. Second, unlike Bayesian MCMC-based approach, marginal pdf, mean, variance or covariance are obtained independently one from each other. Third, the probability and cumulative density functions can be obtained with any density of points. Finally, determining the MAP is extremely fast.

  1. Bayesian Atmospheric Radiative Transfer (BART): Model, Statistics Driver, and Application to HD 209458b

    NASA Astrophysics Data System (ADS)

    Cubillos, Patricio; Harrington, Joseph; Blecic, Jasmina; Stemm, Madison M.; Lust, Nate B.; Foster, Andrew S.; Rojo, Patricio M.; Loredo, Thomas J.

    2014-11-01

    Multi-wavelength secondary-eclipse and transit depths probe the thermo-chemical properties of exoplanets. In recent years, several research groups have developed retrieval codes to analyze the existing data and study the prospects of future facilities. However, the scientific community has limited access to these packages. Here we premiere the open-source Bayesian Atmospheric Radiative Transfer (BART) code. We discuss the key aspects of the radiative-transfer algorithm and the statistical package. The radiation code includes line databases for all HITRAN molecules, high-temperature H2O, TiO, and VO, and includes a preprocessor for adding additional line databases without recompiling the radiation code. Collision-induced absorption lines are available for H2-H2 and H2-He. The parameterized thermal and molecular abundance profiles can be modified arbitrarily without recompilation. The generated spectra are integrated over arbitrary bandpasses for comparison to data. BART's statistical package, Multi-core Markov-chain Monte Carlo (MC3), is a general-purpose MCMC module. MC3 implements the Differental-evolution Markov-chain Monte Carlo algorithm (ter Braak 2006, 2009). MC3 converges 20-400 times faster than the usual Metropolis-Hastings MCMC algorithm, and in addition uses the Message Passing Interface (MPI) to parallelize the MCMC chains. We apply the BART retrieval code to the HD 209458b data set to estimate the planet's temperature profile and molecular abundances. This work was supported by NASA Planetary Atmospheres grant NNX12AI69G and NASA Astrophysics Data Analysis Program grant NNX13AF38G. JB holds a NASA Earth and Space Science Fellowship.

  2. Bayesian Treatment of Uncertainty in Environmental Modeling: Optimization, Sampling and Data Assimilation Using the DREAM Software Package

    NASA Astrophysics Data System (ADS)

    Vrugt, J. A.

    2012-12-01

    In the past decade much progress has been made in the treatment of uncertainty in earth systems modeling. Whereas initial approaches has focused mostly on quantification of parameter and predictive uncertainty, recent methods attempt to disentangle the effects of parameter, forcing (input) data, model structural and calibration data errors. In this talk I will highlight some of our recent work involving theory, concepts and applications of Bayesian parameter and/or state estimation. In particular, new methods for sequential Monte Carlo (SMC) and Markov Chain Monte Carlo (MCMC) simulation will be presented with emphasis on massively parallel distributed computing and quantification of model structural errors. The theoretical and numerical developments will be illustrated using model-data synthesis problems in hydrology, hydrogeology and geophysics.

  3. Bayesian Total-Evidence Dating Reveals the Recent Crown Radiation of Penguins

    PubMed Central

    Heath, Tracy A.; Ksepka, Daniel T.; Stadler, Tanja; Welch, David; Drummond, Alexei J.

    2017-01-01

    The total-evidence approach to divergence time dating uses molecular and morphological data from extant and fossil species to infer phylogenetic relationships, species divergence times, and macroevolutionary parameters in a single coherent framework. Current model-based implementations of this approach lack an appropriate model for the tree describing the diversification and fossilization process and can produce estimates that lead to erroneous conclusions. We address this shortcoming by providing a total-evidence method implemented in a Bayesian framework. This approach uses a mechanistic tree prior to describe the underlying diversification process that generated the tree of extant and fossil taxa. Previous attempts to apply the total-evidence approach have used tree priors that do not account for the possibility that fossil samples may be direct ancestors of other samples, that is, ancestors of fossil or extant species or of clades. The fossilized birth–death (FBD) process explicitly models the diversification, fossilization, and sampling processes and naturally allows for sampled ancestors. This model was recently applied to estimate divergence times based on molecular data and fossil occurrence dates. We incorporate the FBD model and a model of morphological trait evolution into a Bayesian total-evidence approach to dating species phylogenies. We apply this method to extant and fossil penguins and show that the modern penguins radiated much more recently than has been previously estimated, with the basal divergence in the crown clade occurring at \\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{upgreek} \\usepackage{mathrsfs} \\setlength{\\oddsidemargin}{-69pt} \\begin{document} }{}${\\sim}12.7$\\end{document} Ma and most splits leading to extant species occurring in the last 2 myr. Our results demonstrate that including stem-fossil diversity can greatly improve the estimates of the divergence times of crown taxa. The method is available in BEAST2 (version 2.4) software www.beast2.org with packages SA (version at least 1.1.4) and morph-models (version at least 1.0.4) installed. [Birth–death process; calibration; divergence times; MCMC; phylogenetics.] PMID:28173531

  4. Genetic network inference as a series of discrimination tasks.

    PubMed

    Kimura, Shuhei; Nakayama, Satoshi; Hatakeyama, Mariko

    2009-04-01

    Genetic network inference methods based on sets of differential equations generally require a great deal of time, as the equations must be solved many times. To reduce the computational cost, researchers have proposed other methods for inferring genetic networks by solving sets of differential equations only a few times, or even without solving them at all. When we try to obtain reasonable network models using these methods, however, we must estimate the time derivatives of the gene expression levels with great precision. In this study, we propose a new method to overcome the drawbacks of inference methods based on sets of differential equations. Our method infers genetic networks by obtaining classifiers capable of predicting the signs of the derivatives of the gene expression levels. For this purpose, we defined a genetic network inference problem as a series of discrimination tasks, then solved the defined series of discrimination tasks with a linear programming machine. Our experimental results demonstrated that the proposed method is capable of correctly inferring genetic networks, and doing so more than 500 times faster than the other inference methods based on sets of differential equations. Next, we applied our method to actual expression data of the bacterial SOS DNA repair system. And finally, we demonstrated that our approach relates to the inference method based on the S-system model. Though our method provides no estimation of the kinetic parameters, it should be useful for researchers interested only in the network structure of a target system. Supplementary data are available at Bioinformatics online.

  5. Sequential Designs Based on Bayesian Uncertainty Quantification in Sparse Representation Surrogate Modeling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, Ray -Bing; Wang, Weichung; Jeff Wu, C. F.

    A numerical method, called OBSM, was recently proposed which employs overcomplete basis functions to achieve sparse representations. While the method can handle non-stationary response without the need of inverting large covariance matrices, it lacks the capability to quantify uncertainty in predictions. We address this issue by proposing a Bayesian approach which first imposes a normal prior on the large space of linear coefficients, then applies the MCMC algorithm to generate posterior samples for predictions. From these samples, Bayesian credible intervals can then be obtained to assess prediction uncertainty. A key application for the proposed method is the efficient construction ofmore » sequential designs. Several sequential design procedures with different infill criteria are proposed based on the generated posterior samples. As a result, numerical studies show that the proposed schemes are capable of solving problems of positive point identification, optimization, and surrogate fitting.« less

  6. Bayesian Group Bridge for Bi-level Variable Selection.

    PubMed

    Mallick, Himel; Yi, Nengjun

    2017-06-01

    A Bayesian bi-level variable selection method (BAGB: Bayesian Analysis of Group Bridge) is developed for regularized regression and classification. This new development is motivated by grouped data, where generic variables can be divided into multiple groups, with variables in the same group being mechanistically related or statistically correlated. As an alternative to frequentist group variable selection methods, BAGB incorporates structural information among predictors through a group-wise shrinkage prior. Posterior computation proceeds via an efficient MCMC algorithm. In addition to the usual ease-of-interpretation of hierarchical linear models, the Bayesian formulation produces valid standard errors, a feature that is notably absent in the frequentist framework. Empirical evidence of the attractiveness of the method is illustrated by extensive Monte Carlo simulations and real data analysis. Finally, several extensions of this new approach are presented, providing a unified framework for bi-level variable selection in general models with flexible penalties.

  7. Sequential Designs Based on Bayesian Uncertainty Quantification in Sparse Representation Surrogate Modeling

    DOE PAGES

    Chen, Ray -Bing; Wang, Weichung; Jeff Wu, C. F.

    2017-04-12

    A numerical method, called OBSM, was recently proposed which employs overcomplete basis functions to achieve sparse representations. While the method can handle non-stationary response without the need of inverting large covariance matrices, it lacks the capability to quantify uncertainty in predictions. We address this issue by proposing a Bayesian approach which first imposes a normal prior on the large space of linear coefficients, then applies the MCMC algorithm to generate posterior samples for predictions. From these samples, Bayesian credible intervals can then be obtained to assess prediction uncertainty. A key application for the proposed method is the efficient construction ofmore » sequential designs. Several sequential design procedures with different infill criteria are proposed based on the generated posterior samples. As a result, numerical studies show that the proposed schemes are capable of solving problems of positive point identification, optimization, and surrogate fitting.« less

  8. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nitao, J J

    The goal of the Event Reconstruction Project is to find the location and strength of atmospheric release points, both stationary and moving. Source inversion relies on observational data as input. The methodology is sufficiently general to allow various forms of data. In this report, the authors will focus primarily on concentration measurements obtained at point monitoring locations at various times. The algorithms being investigated in the Project are the MCMC (Markov Chain Monte Carlo), SMC (Sequential Monte Carlo) Methods, classical inversion methods, and hybrids of these. They refer the reader to the report by Johannesson et al. (2004) for explanationsmore » of these methods. These methods require computing the concentrations at all monitoring locations for a given ''proposed'' source characteristic (locations and strength history). It is anticipated that the largest portion of the CPU time will take place performing this computation. MCMC and SMC will require this computation to be done at least tens of thousands of times. Therefore, an efficient means of computing forward model predictions is important to making the inversion practical. In this report they show how Green's functions and reciprocal Green's functions can significantly accelerate forward model computations. First, instead of computing a plume for each possible source strength history, they can compute plumes from unit impulse sources only. By using linear superposition, they can obtain the response for any strength history. This response is given by the forward Green's function. Second, they may use the law of reciprocity. Suppose that they require the concentration at a single monitoring point x{sub m} due to a potential (unit impulse) source that is located at x{sub s}. instead of computing a plume with source location x{sub s}, they compute a ''reciprocal plume'' whose (unit impulse) source is at the monitoring locations x{sub m}. The reciprocal plume is computed using a reversed-direction wind field. The wind field and transport coefficients must also be appropriately time-reversed. Reciprocity says that the concentration of reciprocal plume at x{sub s} is related to the desired concentration at x{sub m}. Since there are many less monitoring points than potential source locations, the number of forward model computations is drastically reduced.« less

  9. Resolution analysis of marine seismic full waveform data by Bayesian inversion

    NASA Astrophysics Data System (ADS)

    Ray, A.; Sekar, A.; Hoversten, G. M.; Albertin, U.

    2015-12-01

    The Bayesian posterior density function (PDF) of earth models that fit full waveform seismic data convey information on the uncertainty with which the elastic model parameters are resolved. In this work, we apply the trans-dimensional reversible jump Markov Chain Monte Carlo method (RJ-MCMC) for the 1D inversion of noisy synthetic full-waveform seismic data in the frequency-wavenumber domain. While seismic full waveform inversion (FWI) is a powerful method for characterizing subsurface elastic parameters, the uncertainty in the inverted models has remained poorly known, if at all and is highly initial model dependent. The Bayesian method we use is trans-dimensional in that the number of model layers is not fixed, and flexible such that the layer boundaries are free to move around. The resulting parameterization does not require regularization to stabilize the inversion. Depth resolution is traded off with the number of layers, providing an estimate of uncertainty in elastic parameters (compressional and shear velocities Vp and Vs as well as density) with depth. We find that in the absence of additional constraints, Bayesian inversion can result in a wide range of posterior PDFs on Vp, Vs and density. These PDFs range from being clustered around the true model, to those that contain little resolution of any particular features other than those in the near surface, depending on the particular data and target geometry. We present results for a suite of different frequencies and offset ranges, examining the differences in the posterior model densities thus derived. Though these results are for a 1D earth, they are applicable to areas with simple, layered geology and provide valuable insight into the resolving capabilities of FWI, as well as highlight the challenges in solving a highly non-linear problem. The RJ-MCMC method also presents a tantalizing possibility for extension to 2D and 3D Bayesian inversion of full waveform seismic data in the future, as it objectively tackles the problem of model selection (i.e., the number of layers or cells for parameterization), which could ease the computational burden of evaluating forward models with many parameters.

  10. Causal inference in biology networks with integrated belief propagation.

    PubMed

    Chang, Rui; Karr, Jonathan R; Schadt, Eric E

    2015-01-01

    Inferring causal relationships among molecular and higher order phenotypes is a critical step in elucidating the complexity of living systems. Here we propose a novel method for inferring causality that is no longer constrained by the conditional dependency arguments that limit the ability of statistical causal inference methods to resolve causal relationships within sets of graphical models that are Markov equivalent. Our method utilizes Bayesian belief propagation to infer the responses of perturbation events on molecular traits given a hypothesized graph structure. A distance measure between the inferred response distribution and the observed data is defined to assess the 'fitness' of the hypothesized causal relationships. To test our algorithm, we infer causal relationships within equivalence classes of gene networks in which the form of the functional interactions that are possible are assumed to be nonlinear, given synthetic microarray and RNA sequencing data. We also apply our method to infer causality in real metabolic network with v-structure and feedback loop. We show that our method can recapitulate the causal structure and recover the feedback loop only from steady-state data which conventional method cannot.

  11. A general method to determine sampling windows for nonlinear mixed effects models with an application to population pharmacokinetic studies.

    PubMed

    Foo, Lee Kien; McGree, James; Duffull, Stephen

    2012-01-01

    Optimal design methods have been proposed to determine the best sampling times when sparse blood sampling is required in clinical pharmacokinetic studies. However, the optimal blood sampling time points may not be feasible in clinical practice. Sampling windows, a time interval for blood sample collection, have been proposed to provide flexibility in blood sampling times while preserving efficient parameter estimation. Because of the complexity of the population pharmacokinetic models, which are generally nonlinear mixed effects models, there is no analytical solution available to determine sampling windows. We propose a method for determination of sampling windows based on MCMC sampling techniques. The proposed method attains a stationary distribution rapidly and provides time-sensitive windows around the optimal design points. The proposed method is applicable to determine sampling windows for any nonlinear mixed effects model although our work focuses on an application to population pharmacokinetic models. Copyright © 2012 John Wiley & Sons, Ltd.

  12. EFFICIENT MODEL-FITTING AND MODEL-COMPARISON FOR HIGH-DIMENSIONAL BAYESIAN GEOSTATISTICAL MODELS. (R826887)

    EPA Science Inventory

    Geostatistical models are appropriate for spatially distributed data measured at irregularly spaced locations. We propose an efficient Markov chain Monte Carlo (MCMC) algorithm for fitting Bayesian geostatistical models with substantial numbers of unknown parameters to sizable...

  13. MULTIVARIATE RECEPTOR MODELING FOR TEMPORALLY CORRELATED DATA BY USING MCMC. (R826238)

    EPA Science Inventory

    The perspectives, information and conclusions conveyed in research project abstracts, progress reports, final reports, journal abstracts and journal publications convey the viewpoints of the principal investigator and may not represent the views and policies of ORD and EPA. Concl...

  14. Micrometeorite Science with LISA Pathfinder

    NASA Astrophysics Data System (ADS)

    Pagane, Nicole; Thorpe, James Ira; Littenberg, Tyson; Littenberg, Tyson; Baker, John; Slutsky, Jacob; Hourihane, Sophie; LISA Pathfinder Team

    2018-01-01

    The primary objective of LISA Pathfinder (LPF) was to demonstrate drag-free control of test masses—along with the technology necessary to maintain the inertial motion—that LISA (Laser Interferometer Space Antenna) would later utilize as a space-based gravitational wave observatory. Due to the precise interferometry used during the mission, LPF could be employed as an accelerometer and used to detect micrometeorite impacts while in orbit about the Sun-Earth Lagrange Point L1. To infer micrometeorite impacts, the flight data was processed for event reconstruction to determine external acceleration of LPF; impact parameters were then estimated through a Markov-Chain Monte-Carlo (MCMC) tool via Bayesian analysis by fitting delta functions in the acceleration domain. As impact candidates were collected, a catalog of event data was curated with the reconstructed estimated parameters, among which were impact sky localizations that were later rotated into more intuitive reference frames. To infer the results of this dust modeling technique, current micrometeorite models were compared to the impact data. In the final reference frame common to the available micrometeorite models, the reconstructed impacts appear to cluster at (±90°, 0°)—where impacts prograde in this longitude-latitude frame were at (-90°, 0°), retrograde were (90°, 0°), and the sun was centered at the origin. The two available models used for comparison were of the Jupiter-family comets (JFC) and Halley-type comets (HTC), which clustered primarily around (±90°, 0°) and (0°, ±20°) respectively. This suggests that the JFC population seems to account for the majority of the impacts detected by LPF. The models’ expected rates given localization and velocity are currently being compared to the reconstructed data to further characterize the micrometeorite populations at L1. We will present our current analysis of this data set and discuss possibilities of extending such an analysis for LISA.

  15. A STUDY OF HIERARCHICAL ORGANIZATION LOCATION MODEL FOR SERVICE INDUSTRY ENTERPRISE

    NASA Astrophysics Data System (ADS)

    Okumura, Makoto; Takada, Naoki; Okubo, Kazuaki

    The service industry has come to participate in many economic activities, but there are few studies, which analyze on the hierarchical organization location of a service industry enterprise that treats information. We propose a hierarchical organized location model, which endogenously determines the number of hierarchies. Furtheremore, We propose a MCMC based statistical method to obtain parameter distributions corresponding to the observed macro emplowment distribution as well as the saralies paid. By using our model, we analyzed the regional disparities about the number of employees and the wage. We found that the change of the regional disparities is caused by internal factors such as the industrial structural change, rather than external factors such as traffic condition changes.

  16. Update on ɛK with lattice QCD inputs

    NASA Astrophysics Data System (ADS)

    Jang, Yong-Chull; Lee, Weonjong; Lee, Sunkyu; Leem, Jaehoon

    2018-03-01

    We report updated results for ɛK, the indirect CP violation parameter in neutral kaons, which is evaluated directly from the standard model with lattice QCD inputs. We use lattice QCD inputs to fix B\\hatk,|Vcb|,ξ0,ξ2,|Vus|, and mc(mc). Since Lattice 2016, the UTfit group has updated the Wolfenstein parameters in the angle-only-fit method, and the HFLAV group has also updated |Vcb|. Our results show that the evaluation of ɛK with exclusive |Vcb| (lattice QCD inputs) has 4.0σ tension with the experimental value, while that with inclusive |Vcb| (heavy quark expansion based on OPE and QCD sum rules) shows no tension.

  17. Cosmological Constraint on Brans-Dicke Theory

    NASA Astrophysics Data System (ADS)

    Chen, Xuelei; Wu, Fengquan

    We develop the covariant formalism of the cosmological perturbation theory for the Brans-Dicke gravity, and use it to calculate the cosmic microwave background (CMB) anisotropy and large scale structure (LSS) power spectrum. We introduce a new parameter ζ which is related to the Brans-Dicke parameter ζ = ln(1/ω + 1), and use the Markov-Chain Monte Carlo (MCMC) method to explore the parameter space. Using the latest CMB data published by WMAP, ACBAR, CBI, Boomerang teams, and the LSS data from the SDSS survey DR4, we find that the the 2σ (95.5%) bound on ζ is about |ζ| > 10-2, or |ω| > 102, the precise limit depends somewhat on the prior used.

  18. Skill (or lack thereof) of data-model fusion techniques to provide an early warning signal for an approaching tipping point.

    PubMed

    Singh, Riddhi; Quinn, Julianne D; Reed, Patrick M; Keller, Klaus

    2018-01-01

    Many coupled human-natural systems have the potential to exhibit a highly nonlinear threshold response to external forcings resulting in fast transitions to undesirable states (such as eutrophication in a lake). Often, there are considerable uncertainties that make identifying the threshold challenging. Thus, rapid learning is critical for guiding management actions to avoid abrupt transitions. Here, we adopt the shallow lake problem as a test case to compare the performance of four common data assimilation schemes to predict an approaching transition. In order to demonstrate the complex interactions between management strategies and the ability of the data assimilation schemes to predict eutrophication, we also analyze our results across two different management strategies governing phosphorus emissions into the shallow lake. The compared data assimilation schemes are: ensemble Kalman filtering (EnKF), particle filtering (PF), pre-calibration (PC), and Markov Chain Monte Carlo (MCMC) estimation. While differing in their core assumptions, each data assimilation scheme is based on Bayes' theorem and updates prior beliefs about a system based on new information. For large computational investments, EnKF, PF and MCMC show similar skill in capturing the observed phosphorus in the lake (measured as expected root mean squared prediction error). EnKF, followed by PF, displays the highest learning rates at low computational cost, thus providing a more reliable signal of an impending transition. MCMC approaches the true probability of eutrophication only after a strong signal of an impending transition emerges from the observations. Overall, we find that learning rates are greatest near regions of abrupt transitions, posing a challenge to early learning and preemptive management of systems with such abrupt transitions.

  19. Skill (or lack thereof) of data-model fusion techniques to provide an early warning signal for an approaching tipping point

    PubMed Central

    Quinn, Julianne D.; Reed, Patrick M.; Keller, Klaus

    2018-01-01

    Many coupled human-natural systems have the potential to exhibit a highly nonlinear threshold response to external forcings resulting in fast transitions to undesirable states (such as eutrophication in a lake). Often, there are considerable uncertainties that make identifying the threshold challenging. Thus, rapid learning is critical for guiding management actions to avoid abrupt transitions. Here, we adopt the shallow lake problem as a test case to compare the performance of four common data assimilation schemes to predict an approaching transition. In order to demonstrate the complex interactions between management strategies and the ability of the data assimilation schemes to predict eutrophication, we also analyze our results across two different management strategies governing phosphorus emissions into the shallow lake. The compared data assimilation schemes are: ensemble Kalman filtering (EnKF), particle filtering (PF), pre-calibration (PC), and Markov Chain Monte Carlo (MCMC) estimation. While differing in their core assumptions, each data assimilation scheme is based on Bayes’ theorem and updates prior beliefs about a system based on new information. For large computational investments, EnKF, PF and MCMC show similar skill in capturing the observed phosphorus in the lake (measured as expected root mean squared prediction error). EnKF, followed by PF, displays the highest learning rates at low computational cost, thus providing a more reliable signal of an impending transition. MCMC approaches the true probability of eutrophication only after a strong signal of an impending transition emerges from the observations. Overall, we find that learning rates are greatest near regions of abrupt transitions, posing a challenge to early learning and preemptive management of systems with such abrupt transitions. PMID:29389938

  20. Bayesian tomography by interacting Markov chains

    NASA Astrophysics Data System (ADS)

    Romary, T.

    2017-12-01

    In seismic tomography, we seek to determine the velocity of the undergound from noisy first arrival travel time observations. In most situations, this is an ill posed inverse problem that admits several unperfect solutions. Given an a priori distribution over the parameters of the velocity model, the Bayesian formulation allows to state this problem as a probabilistic one, with a solution under the form of a posterior distribution. The posterior distribution is generally high dimensional and may exhibit multimodality. Moreover, as it is known only up to a constant, the only sensible way to addressthis problem is to try to generate simulations from the posterior. The natural tools to perform these simulations are Monte Carlo Markov chains (MCMC). Classical implementations of MCMC algorithms generally suffer from slow mixing: the generated states are slow to enter the stationary regime, that is to fit the observations, and when one mode of the posterior is eventually identified, it may become difficult to visit others. Using a varying temperature parameter relaxing the constraint on the data may help to enter the stationary regime. Besides, the sequential nature of MCMC makes them ill fitted toparallel implementation. Running a large number of chains in parallel may be suboptimal as the information gathered by each chain is not mutualized. Parallel tempering (PT) can be seen as a first attempt to make parallel chains at different temperatures communicate but only exchange information between current states. In this talk, I will show that PT actually belongs to a general class of interacting Markov chains algorithm. I will also show that this class enables to design interacting schemes that can take advantage of the whole history of the chain, by authorizing exchanges toward already visited states. The algorithms will be illustrated with toy examples and an application to first arrival traveltime tomography.

  1. Using informative priors in facies inversion: The case of C-ISR method

    NASA Astrophysics Data System (ADS)

    Valakas, G.; Modis, K.

    2016-08-01

    Inverse problems involving the characterization of hydraulic properties of groundwater flow systems by conditioning on observations of the state variables are mathematically ill-posed because they have multiple solutions and are sensitive to small changes in the data. In the framework of McMC methods for nonlinear optimization and under an iterative spatial resampling transition kernel, we present an algorithm for narrowing the prior and thus producing improved proposal realizations. To achieve this goal, we cosimulate the facies distribution conditionally to facies observations and normal scores transformed hydrologic response measurements, assuming a linear coregionalization model. The approach works by creating an importance sampling effect that steers the process to selected areas of the prior. The effectiveness of our approach is demonstrated by an example application on a synthetic underdetermined inverse problem in aquifer characterization.

  2. Bayesian Peptide Peak Detection for High Resolution TOF Mass Spectrometry.

    PubMed

    Zhang, Jianqiu; Zhou, Xiaobo; Wang, Honghui; Suffredini, Anthony; Zhang, Lin; Huang, Yufei; Wong, Stephen

    2010-11-01

    In this paper, we address the issue of peptide ion peak detection for high resolution time-of-flight (TOF) mass spectrometry (MS) data. A novel Bayesian peptide ion peak detection method is proposed for TOF data with resolution of 10 000-15 000 full width at half-maximum (FWHW). MS spectra exhibit distinct characteristics at this resolution, which are captured in a novel parametric model. Based on the proposed parametric model, a Bayesian peak detection algorithm based on Markov chain Monte Carlo (MCMC) sampling is developed. The proposed algorithm is tested on both simulated and real datasets. The results show a significant improvement in detection performance over a commonly employed method. The results also agree with expert's visual inspection. Moreover, better detection consistency is achieved across MS datasets from patients with identical pathological condition.

  3. Bayesian Peptide Peak Detection for High Resolution TOF Mass Spectrometry

    PubMed Central

    Zhang, Jianqiu; Zhou, Xiaobo; Wang, Honghui; Suffredini, Anthony; Zhang, Lin; Huang, Yufei; Wong, Stephen

    2011-01-01

    In this paper, we address the issue of peptide ion peak detection for high resolution time-of-flight (TOF) mass spectrometry (MS) data. A novel Bayesian peptide ion peak detection method is proposed for TOF data with resolution of 10 000–15 000 full width at half-maximum (FWHW). MS spectra exhibit distinct characteristics at this resolution, which are captured in a novel parametric model. Based on the proposed parametric model, a Bayesian peak detection algorithm based on Markov chain Monte Carlo (MCMC) sampling is developed. The proposed algorithm is tested on both simulated and real datasets. The results show a significant improvement in detection performance over a commonly employed method. The results also agree with expert’s visual inspection. Moreover, better detection consistency is achieved across MS datasets from patients with identical pathological condition. PMID:21544266

  4. A sampling algorithm for segregation analysis

    PubMed Central

    Tier, Bruce; Henshall, John

    2001-01-01

    Methods for detecting Quantitative Trait Loci (QTL) without markers have generally used iterative peeling algorithms for determining genotype probabilities. These algorithms have considerable shortcomings in complex pedigrees. A Monte Carlo Markov chain (MCMC) method which samples the pedigree of the whole population jointly is described. Simultaneous sampling of the pedigree was achieved by sampling descent graphs using the Metropolis-Hastings algorithm. A descent graph describes the inheritance state of each allele and provides pedigrees guaranteed to be consistent with Mendelian sampling. Sampling descent graphs overcomes most, if not all, of the limitations incurred by iterative peeling algorithms. The algorithm was able to find the QTL in most of the simulated populations. However, when the QTL was not modeled or found then its effect was ascribed to the polygenic component. No QTL were detected when they were not simulated. PMID:11742631

  5. Bayesian random-effect model for predicting outcome fraught with heterogeneity--an illustration with episodes of 44 patients with intractable epilepsy.

    PubMed

    Yen, A M-F; Liou, H-H; Lin, H-L; Chen, T H-H

    2006-01-01

    The study aimed to develop a predictive model to deal with data fraught with heterogeneity that cannot be explained by sampling variation or measured covariates. The random-effect Poisson regression model was first proposed to deal with over-dispersion for data fraught with heterogeneity after making allowance for measured covariates. Bayesian acyclic graphic model in conjunction with Markov Chain Monte Carlo (MCMC) technique was then applied to estimate the parameters of both relevant covariates and random effect. Predictive distribution was then generated to compare the predicted with the observed for the Bayesian model with and without random effect. Data from repeated measurement of episodes among 44 patients with intractable epilepsy were used as an illustration. The application of Poisson regression without taking heterogeneity into account to epilepsy data yielded a large value of heterogeneity (heterogeneity factor = 17.90, deviance = 1485, degree of freedom (df) = 83). After taking the random effect into account, the value of heterogeneity factor was greatly reduced (heterogeneity factor = 0.52, deviance = 42.5, df = 81). The Pearson chi2 for the comparison between the expected seizure frequencies and the observed ones at two and three months of the model with and without random effect were 34.27 (p = 1.00) and 1799.90 (p < 0.0001), respectively. The Bayesian acyclic model using the MCMC method was demonstrated to have great potential for disease prediction while data show over-dispersion attributed either to correlated property or to subject-to-subject variability.

  6. Bayesian hierarchical modelling of continuous non-negative longitudinal data with a spike at zero: An application to a study of birds visiting gardens in winter.

    PubMed

    Swallow, Ben; Buckland, Stephen T; King, Ruth; Toms, Mike P

    2016-03-01

    The development of methods for dealing with continuous data with a spike at zero has lagged behind those for overdispersed or zero-inflated count data. We consider longitudinal ecological data corresponding to an annual average of 26 weekly maximum counts of birds, and are hence effectively continuous, bounded below by zero but also with a discrete mass at zero. We develop a Bayesian hierarchical Tweedie regression model that can directly accommodate the excess number of zeros common to this type of data, whilst accounting for both spatial and temporal correlation. Implementation of the model is conducted in a Markov chain Monte Carlo (MCMC) framework, using reversible jump MCMC to explore uncertainty across both parameter and model spaces. This regression modelling framework is very flexible and removes the need to make strong assumptions about mean-variance relationships a priori. It can also directly account for the spike at zero, whilst being easily applicable to other types of data and other model formulations. Whilst a correlative study such as this cannot prove causation, our results suggest that an increase in an avian predator may have led to an overall decrease in the number of one of its prey species visiting garden feeding stations in the United Kingdom. This may reflect a change in behaviour of house sparrows to avoid feeding stations frequented by sparrowhawks, or a reduction in house sparrow population size as a result of sparrowhawk increase. © 2015 The Author. Biometrical Journal published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  7. Model averaging in linkage analysis.

    PubMed

    Matthysse, Steven

    2006-06-05

    Methods for genetic linkage analysis are traditionally divided into "model-dependent" and "model-independent," but there may be a useful place for an intermediate class, in which a broad range of possible models is considered as a parametric family. It is possible to average over model space with an empirical Bayes prior that weights models according to their goodness of fit to epidemiologic data, such as the frequency of the disease in the population and in first-degree relatives (and correlations with other traits in the pleiotropic case). For averaging over high-dimensional spaces, Markov chain Monte Carlo (MCMC) has great appeal, but it has a near-fatal flaw: it is not possible, in most cases, to provide rigorous sufficient conditions to permit the user safely to conclude that the chain has converged. A way of overcoming the convergence problem, if not of solving it, rests on a simple application of the principle of detailed balance. If the starting point of the chain has the equilibrium distribution, so will every subsequent point. The first point is chosen according to the target distribution by rejection sampling, and subsequent points by an MCMC process that has the target distribution as its equilibrium distribution. Model averaging with an empirical Bayes prior requires rapid estimation of likelihoods at many points in parameter space. Symbolic polynomials are constructed before the random walk over parameter space begins, to make the actual likelihood computations at each step of the random walk very fast. Power analysis in an illustrative case is described. (c) 2006 Wiley-Liss, Inc.

  8. Ensemble stacking mitigates biases in inference of synaptic connectivity.

    PubMed

    Chambers, Brendan; Levy, Maayan; Dechery, Joseph B; MacLean, Jason N

    2018-01-01

    A promising alternative to directly measuring the anatomical connections in a neuronal population is inferring the connections from the activity. We employ simulated spiking neuronal networks to compare and contrast commonly used inference methods that identify likely excitatory synaptic connections using statistical regularities in spike timing. We find that simple adjustments to standard algorithms improve inference accuracy: A signing procedure improves the power of unsigned mutual-information-based approaches and a correction that accounts for differences in mean and variance of background timing relationships, such as those expected to be induced by heterogeneous firing rates, increases the sensitivity of frequency-based methods. We also find that different inference methods reveal distinct subsets of the synaptic network and each method exhibits different biases in the accurate detection of reciprocity and local clustering. To correct for errors and biases specific to single inference algorithms, we combine methods into an ensemble. Ensemble predictions, generated as a linear combination of multiple inference algorithms, are more sensitive than the best individual measures alone, and are more faithful to ground-truth statistics of connectivity, mitigating biases specific to single inference methods. These weightings generalize across simulated datasets, emphasizing the potential for the broad utility of ensemble-based approaches.

  9. Bayesian Analysis for Exponential Random Graph Models Using the Adaptive Exchange Sampler.

    PubMed

    Jin, Ick Hoon; Yuan, Ying; Liang, Faming

    2013-10-01

    Exponential random graph models have been widely used in social network analysis. However, these models are extremely difficult to handle from a statistical viewpoint, because of the intractable normalizing constant and model degeneracy. In this paper, we consider a fully Bayesian analysis for exponential random graph models using the adaptive exchange sampler, which solves the intractable normalizing constant and model degeneracy issues encountered in Markov chain Monte Carlo (MCMC) simulations. The adaptive exchange sampler can be viewed as a MCMC extension of the exchange algorithm, and it generates auxiliary networks via an importance sampling procedure from an auxiliary Markov chain running in parallel. The convergence of this algorithm is established under mild conditions. The adaptive exchange sampler is illustrated using a few social networks, including the Florentine business network, molecule synthetic network, and dolphins network. The results indicate that the adaptive exchange algorithm can produce more accurate estimates than approximate exchange algorithms, while maintaining the same computational efficiency.

  10. pyblocxs: Bayesian Low-Counts X-ray Spectral Analysis in Sherpa

    NASA Astrophysics Data System (ADS)

    Siemiginowska, A.; Kashyap, V.; Refsdal, B.; van Dyk, D.; Connors, A.; Park, T.

    2011-07-01

    Typical X-ray spectra have low counts and should be modeled using the Poisson distribution. However, χ2 statistic is often applied as an alternative and the data are assumed to follow the Gaussian distribution. A variety of weights to the statistic or a binning of the data is performed to overcome the low counts issues. However, such modifications introduce biases or/and a loss of information. Standard modeling packages such as XSPEC and Sherpa provide the Poisson likelihood and allow computation of rudimentary MCMC chains, but so far do not allow for setting a full Bayesian model. We have implemented a sophisticated Bayesian MCMC-based algorithm to carry out spectral fitting of low counts sources in the Sherpa environment. The code is a Python extension to Sherpa and allows to fit a predefined Sherpa model to high-energy X-ray spectral data and other generic data. We present the algorithm and discuss several issues related to the implementation, including flexible definition of priors and allowing for variations in the calibration information.

  11. Optimal Bayesian Adaptive Design for Test-Item Calibration.

    PubMed

    van der Linden, Wim J; Ren, Hao

    2015-06-01

    An optimal adaptive design for test-item calibration based on Bayesian optimality criteria is presented. The design adapts the choice of field-test items to the examinees taking an operational adaptive test using both the information in the posterior distributions of their ability parameters and the current posterior distributions of the field-test parameters. Different criteria of optimality based on the two types of posterior distributions are possible. The design can be implemented using an MCMC scheme with alternating stages of sampling from the posterior distributions of the test takers' ability parameters and the parameters of the field-test items while reusing samples from earlier posterior distributions of the other parameters. Results from a simulation study demonstrated the feasibility of the proposed MCMC implementation for operational item calibration. A comparison of performances for different optimality criteria showed faster calibration of substantial numbers of items for the criterion of D-optimality relative to A-optimality, a special case of c-optimality, and random assignment of items to the test takers.

  12. Kepler Uniform Modeling of KOIs: MCMC Notes for Data Release 25

    NASA Technical Reports Server (NTRS)

    Hoffman, Kelsey L.; Rowe, Jason F.

    2017-01-01

    This document describes data products related to the reported planetary parameters and uncertainties for the Kepler Objects of Interest (KOIs) based on a Markov-Chain-Monte-Carlo (MCMC) analysis. Reported parameters, uncertainties and data products can be found at the NASA Exoplanet Archive . The codes used for this data analysis are available on the Github website (Rowe 2016). The relevant paper for details of the calculations is Rowe et al. (2015). The main differences between the model fits discussed here and those in the DR24 catalogue are that the DR25 light curves were used in the analysis, our processing of the MAST light curves took into account different data flags, the number of chains calculated was doubled to 200 000, and the parameters which are reported are based on a damped least-squares fit, instead of the median value from the Markov chain or the chain with the lowest 2 as reported in the past.

  13. Assessing Model Fitting of Megamaser Disks with Simulated Observations

    NASA Astrophysics Data System (ADS)

    Han, Jiwon; Braatz, James; Pesce, Dominic

    2018-01-01

    The Megamaser Cosmology Project (MCP) measures the Hubble Constant by determining distances to galaxies with observations of 22 GHz H20 megamasers. The megamasers arise in the circumnuclear accretion disks of active galaxies. In this research, we aim to improve the estimation of systematic errors in MCP measurements. Currently, the MCP fits a disk model to the observed maser data with a Markov Chain Monte Carlo (MCMC) code. The disk model is described by up to 14 global parameters, including up to 6 that describe the disk warping. We first assess the model by generating synthetic datasets in which the locations and dynamics of the maser spots are exactly known, and fitting the model to these data. By doing so, we can also test the effects of unmodeled substructure on the estimated uncertainties. Furthermore, in order to gain better understanding of the physics behind accretion disk warping, we develop a physics-driven model for the warp and test it with the MCMC approach.

  14. Defense mutualisms enhance plant diversification

    PubMed Central

    Weber, Marjorie G.; Agrawal, Anurag A.

    2014-01-01

    The ability of plants to form mutualistic relationships with animal defenders has long been suspected to influence their evolutionary success, both by decreasing extinction risk and by increasing opportunity for speciation through an expanded realized niche. Nonetheless, the hypothesis that defense mutualisms consistently enhance plant diversification across lineages has not been well tested due to a lack of phenotypic and phylogenetic information. Using a global analysis, we show that the >100 vascular plant families in which species have evolved extrafloral nectaries (EFNs), sugar-secreting organs that recruit arthropod mutualists, have twofold higher diversification rates than families that lack species with EFNs. Zooming in on six distantly related plant clades, trait-dependent diversification models confirmed the tendency for lineages with EFNs to display increased rates of diversification. These results were consistent across methodological approaches. Inference using reversible-jump Markov chain Monte Carlo (MCMC) to model the placement and number of rate shifts revealed that high net diversification rates in EFN clades were driven by an increased number of positive rate shifts following EFN evolution compared with sister clades, suggesting that EFNs may be indirect facilitators of diversification. Our replicated analysis indicates that defense mutualisms put lineages on a path toward increased diversification rates within and between clades, and is concordant with the hypothesis that mutualistic interactions with animals can have an impact on deep macroevolutionary patterns and enhance plant diversity. PMID:25349406

  15. Defense mutualisms enhance plant diversification.

    PubMed

    Weber, Marjorie G; Agrawal, Anurag A

    2014-11-18

    The ability of plants to form mutualistic relationships with animal defenders has long been suspected to influence their evolutionary success, both by decreasing extinction risk and by increasing opportunity for speciation through an expanded realized niche. Nonetheless, the hypothesis that defense mutualisms consistently enhance plant diversification across lineages has not been well tested due to a lack of phenotypic and phylogenetic information. Using a global analysis, we show that the >100 vascular plant families in which species have evolved extrafloral nectaries (EFNs), sugar-secreting organs that recruit arthropod mutualists, have twofold higher diversification rates than families that lack species with EFNs. Zooming in on six distantly related plant clades, trait-dependent diversification models confirmed the tendency for lineages with EFNs to display increased rates of diversification. These results were consistent across methodological approaches. Inference using reversible-jump Markov chain Monte Carlo (MCMC) to model the placement and number of rate shifts revealed that high net diversification rates in EFN clades were driven by an increased number of positive rate shifts following EFN evolution compared with sister clades, suggesting that EFNs may be indirect facilitators of diversification. Our replicated analysis indicates that defense mutualisms put lineages on a path toward increased diversification rates within and between clades, and is concordant with the hypothesis that mutualistic interactions with animals can have an impact on deep macroevolutionary patterns and enhance plant diversity.

  16. An Application of Bayesian Approach in Modeling Risk of Death in an Intensive Care Unit

    PubMed Central

    Wong, Rowena Syn Yin; Ismail, Noor Azina

    2016-01-01

    Background and Objectives There are not many studies that attempt to model intensive care unit (ICU) risk of death in developing countries, especially in South East Asia. The aim of this study was to propose and describe application of a Bayesian approach in modeling in-ICU deaths in a Malaysian ICU. Methods This was a prospective study in a mixed medical-surgery ICU in a multidisciplinary tertiary referral hospital in Malaysia. Data collection included variables that were defined in Acute Physiology and Chronic Health Evaluation IV (APACHE IV) model. Bayesian Markov Chain Monte Carlo (MCMC) simulation approach was applied in the development of four multivariate logistic regression predictive models for the ICU, where the main outcome measure was in-ICU mortality risk. The performance of the models were assessed through overall model fit, discrimination and calibration measures. Results from the Bayesian models were also compared against results obtained using frequentist maximum likelihood method. Results The study involved 1,286 consecutive ICU admissions between January 1, 2009 and June 30, 2010, of which 1,111 met the inclusion criteria. Patients who were admitted to the ICU were generally younger, predominantly male, with low co-morbidity load and mostly under mechanical ventilation. The overall in-ICU mortality rate was 18.5% and the overall mean Acute Physiology Score (APS) was 68.5. All four models exhibited good discrimination, with area under receiver operating characteristic curve (AUC) values approximately 0.8. Calibration was acceptable (Hosmer-Lemeshow p-values > 0.05) for all models, except for model M3. Model M1 was identified as the model with the best overall performance in this study. Conclusion Four prediction models were proposed, where the best model was chosen based on its overall performance in this study. This study has also demonstrated the promising potential of the Bayesian MCMC approach as an alternative in the analysis and modeling of in-ICU mortality outcomes. PMID:27007413

  17. Faster Mass Spectrometry-based Protein Inference: Junction Trees are More Efficient than Sampling and Marginalization by Enumeration

    PubMed Central

    Serang, Oliver; Noble, William Stafford

    2012-01-01

    The problem of identifying the proteins in a complex mixture using tandem mass spectrometry can be framed as an inference problem on a graph that connects peptides to proteins. Several existing protein identification methods make use of statistical inference methods for graphical models, including expectation maximization, Markov chain Monte Carlo, and full marginalization coupled with approximation heuristics. We show that, for this problem, the majority of the cost of inference usually comes from a few highly connected subgraphs. Furthermore, we evaluate three different statistical inference methods using a common graphical model, and we demonstrate that junction tree inference substantially improves rates of convergence compared to existing methods. The python code used for this paper is available at http://noble.gs.washington.edu/proj/fido. PMID:22331862

  18. Wisdom of crowds for robust gene network inference

    PubMed Central

    Marbach, Daniel; Costello, James C.; Küffner, Robert; Vega, Nicci; Prill, Robert J.; Camacho, Diogo M.; Allison, Kyle R.; Kellis, Manolis; Collins, James J.; Stolovitzky, Gustavo

    2012-01-01

    Reconstructing gene regulatory networks from high-throughput data is a long-standing problem. Through the DREAM project (Dialogue on Reverse Engineering Assessment and Methods), we performed a comprehensive blind assessment of over thirty network inference methods on Escherichia coli, Staphylococcus aureus, Saccharomyces cerevisiae, and in silico microarray data. We characterize performance, data requirements, and inherent biases of different inference approaches offering guidelines for both algorithm application and development. We observe that no single inference method performs optimally across all datasets. In contrast, integration of predictions from multiple inference methods shows robust and high performance across diverse datasets. Thereby, we construct high-confidence networks for E. coli and S. aureus, each comprising ~1700 transcriptional interactions at an estimated precision of 50%. We experimentally test 53 novel interactions in E. coli, of which 23 were supported (43%). Our results establish community-based methods as a powerful and robust tool for the inference of transcriptional gene regulatory networks. PMID:22796662

  19. Investigation of error sources in regional inverse estimates of greenhouse gas emissions in Canada

    NASA Astrophysics Data System (ADS)

    Chan, E.; Chan, D.; Ishizawa, M.; Vogel, F.; Brioude, J.; Delcloo, A.; Wu, Y.; Jin, B.

    2015-08-01

    Inversion models can use atmospheric concentration measurements to estimate surface fluxes. This study is an evaluation of the errors in a regional flux inversion model for different provinces of Canada, Alberta (AB), Saskatchewan (SK) and Ontario (ON). Using CarbonTracker model results as the target, the synthetic data experiment analyses examined the impacts of the errors from the Bayesian optimisation method, prior flux distribution and the atmospheric transport model, as well as their interactions. The scaling factors for different sub-regions were estimated by the Markov chain Monte Carlo (MCMC) simulation and cost function minimization (CFM) methods. The CFM method results are sensitive to the relative size of the assumed model-observation mismatch and prior flux error variances. Experiment results show that the estimation error increases with the number of sub-regions using the CFM method. For the region definitions that lead to realistic flux estimates, the numbers of sub-regions for the western region of AB/SK combined and the eastern region of ON are 11 and 4 respectively. The corresponding annual flux estimation errors for the western and eastern regions using the MCMC (CFM) method are -7 and -3 % (0 and 8 %) respectively, when there is only prior flux error. The estimation errors increase to 36 and 94 % (40 and 232 %) resulting from transport model error alone. When prior and transport model errors co-exist in the inversions, the estimation errors become 5 and 85 % (29 and 201 %). This result indicates that estimation errors are dominated by the transport model error and can in fact cancel each other and propagate to the flux estimates non-linearly. In addition, it is possible for the posterior flux estimates having larger differences than the prior compared to the target fluxes, and the posterior uncertainty estimates could be unrealistically small that do not cover the target. The systematic evaluation of the different components of the inversion model can help in the understanding of the posterior estimates and percentage errors. Stable and realistic sub-regional and monthly flux estimates for western region of AB/SK can be obtained, but not for the eastern region of ON. This indicates that it is likely a real observation-based inversion for the annual provincial emissions will work for the western region whereas; improvements are needed with the current inversion setup before real inversion is performed for the eastern region.

  20. Inferring explicit weighted consensus networks to represent alternative evolutionary histories

    PubMed Central

    2013-01-01

    Background The advent of molecular biology techniques and constant increase in availability of genetic material have triggered the development of many phylogenetic tree inference methods. However, several reticulate evolution processes, such as horizontal gene transfer and hybridization, have been shown to blur the species evolutionary history by causing discordance among phylogenies inferred from different genes. Methods To tackle this problem, we hereby describe a new method for inferring and representing alternative (reticulate) evolutionary histories of species as an explicit weighted consensus network which can be constructed from a collection of gene trees with or without prior knowledge of the species phylogeny. Results We provide a way of building a weighted phylogenetic network for each of the following reticulation mechanisms: diploid hybridization, intragenic recombination and complete or partial horizontal gene transfer. We successfully tested our method on some synthetic and real datasets to infer the above-mentioned evolutionary events which may have influenced the evolution of many species. Conclusions Our weighted consensus network inference method allows one to infer, visualize and validate statistically major conflicting signals induced by the mechanisms of reticulate evolution. The results provided by the new method can be used to represent the inferred conflicting signals by means of explicit and easy-to-interpret phylogenetic networks. PMID:24359207

  1. Logistic random effects regression models: a comparison of statistical packages for binary and ordinal outcomes

    PubMed Central

    2011-01-01

    Background Logistic random effects models are a popular tool to analyze multilevel also called hierarchical data with a binary or ordinal outcome. Here, we aim to compare different statistical software implementations of these models. Methods We used individual patient data from 8509 patients in 231 centers with moderate and severe Traumatic Brain Injury (TBI) enrolled in eight Randomized Controlled Trials (RCTs) and three observational studies. We fitted logistic random effects regression models with the 5-point Glasgow Outcome Scale (GOS) as outcome, both dichotomized as well as ordinal, with center and/or trial as random effects, and as covariates age, motor score, pupil reactivity or trial. We then compared the implementations of frequentist and Bayesian methods to estimate the fixed and random effects. Frequentist approaches included R (lme4), Stata (GLLAMM), SAS (GLIMMIX and NLMIXED), MLwiN ([R]IGLS) and MIXOR, Bayesian approaches included WinBUGS, MLwiN (MCMC), R package MCMCglmm and SAS experimental procedure MCMC. Three data sets (the full data set and two sub-datasets) were analysed using basically two logistic random effects models with either one random effect for the center or two random effects for center and trial. For the ordinal outcome in the full data set also a proportional odds model with a random center effect was fitted. Results The packages gave similar parameter estimates for both the fixed and random effects and for the binary (and ordinal) models for the main study and when based on a relatively large number of level-1 (patient level) data compared to the number of level-2 (hospital level) data. However, when based on relatively sparse data set, i.e. when the numbers of level-1 and level-2 data units were about the same, the frequentist and Bayesian approaches showed somewhat different results. The software implementations differ considerably in flexibility, computation time, and usability. There are also differences in the availability of additional tools for model evaluation, such as diagnostic plots. The experimental SAS (version 9.2) procedure MCMC appeared to be inefficient. Conclusions On relatively large data sets, the different software implementations of logistic random effects regression models produced similar results. Thus, for a large data set there seems to be no explicit preference (of course if there is no preference from a philosophical point of view) for either a frequentist or Bayesian approach (if based on vague priors). The choice for a particular implementation may largely depend on the desired flexibility, and the usability of the package. For small data sets the random effects variances are difficult to estimate. In the frequentist approaches the MLE of this variance was often estimated zero with a standard error that is either zero or could not be determined, while for Bayesian methods the estimates could depend on the chosen "non-informative" prior of the variance parameter. The starting value for the variance parameter may be also critical for the convergence of the Markov chain. PMID:21605357

  2. Phylogenic inference using alignment-free methods for applications in microbial community surveys using 16s rRNA gene

    PubMed Central

    2017-01-01

    The diversity of microbiota is best explored by understanding the phylogenetic structure of the microbial communities. Traditionally, sequence alignment has been used for phylogenetic inference. However, alignment-based approaches come with significant challenges and limitations when massive amounts of data are analyzed. In the recent decade, alignment-free approaches have enabled genome-scale phylogenetic inference. Here we evaluate three alignment-free methods: ACS, CVTree, and Kr for phylogenetic inference with 16s rRNA gene data. We use a taxonomic gold standard to compare the accuracy of alignment-free phylogenetic inference with that of common microbiome-wide phylogenetic inference pipelines based on PyNAST and MUSCLE alignments with FastTree and RAxML. We re-simulate fecal communities from Human Microbiome Project data to evaluate the performance of the methods on datasets with properties of real data. Our comparisons show that alignment-free methods are not inferior to alignment-based methods in giving accurate and robust phylogenic trees. Moreover, consensus ensembles of alignment-free phylogenies are superior to those built from alignment-based methods in their ability to highlight community differences in low power settings. In addition, the overall running times of alignment-based and alignment-free phylogenetic inference are comparable. Taken together our empirical results suggest that alignment-free methods provide a viable approach for microbiome-wide phylogenetic inference. PMID:29136663

  3. Orbital fitting of imaged planetary companions with high eccentricities and unbound orbits. Their application to Fomalhaut b and PZ Telecopii B

    NASA Astrophysics Data System (ADS)

    Beust, H.; Bonnefoy, M.; Maire, A.-L.; Ehrenreich, D.; Lagrange, A.-M.; Chauvin, G.

    2016-03-01

    Context. Regular follow-up of imaged companions to main-sequence stars often allows a projected orbital motion to be detected. Markov chain Monte Carlo (MCMC) has become very popular recent years for fitting and constraining their orbits. Some of these imaged companions appear to move on very eccentric, possibly unbound orbits. This is, in particular, the case for the exoplanet Fomalhaut b and the brown dwarf companion PZ Tel B on which we focus here. Aims: For these orbits, standard MCMC codes that assume only bound orbits may be inappropriate. Our goal is to develop a new MCMC implementation that is able to handle both bound and unbound orbits in a continuous manner, and to apply this to the cases of Fomalhaut b and PZ Tel B. Methods: We present here this code, based on the use of universal Keplerian variables and Stumpff functions. We present two versions of this code, the second one using a different set of angular variables that were designed to avoid degeneracies arising when the projected orbital motion is quasi-radial, as is the case for PZ Tel B. We also present additional observations of PZ Tel B. Results: The code is applied to Fomalhaut b and PZ Tel B. We confirm previous results in relation to, but we show that on the sole basis of the astrometric data, open orbital solutions are also possible. The eccentricity distribution nevertheless still peaks around ~0.9 in the bound regime. We present a first successful orbital fit of PZ Tel B, which shows in particular that, while both bound and unbound orbital solutions are equally possible, the eccentricity distribution presents a sharp peak very close to e = 1, meaning a quasi-parabolic orbit. Conclusions: It has recently been suggested that the presence of unseen inner companions to imaged ones may lead orbital fitting algorithms to artificially give very high eccentricities. We show that this caveat is unlikely to apply to Fomalhaut b. Concerning PZ Tel B, we derive a possible solution, which involves an inner ~12 MJup companion, that would mimic a e = 1 orbit, despite a real eccentricity of around 0.7, but a dynamical analysis reveals that this type of system would not be stable. We thus conclude that our orbital fit is robust. Based on observations collected at the European Organisation for Astronomical Research in the Southern Hemisphere, Chile (Program ID: 085.C-0867(B) and 085.C-0277(B)).

  4. Stochastic static fault slip inversion from geodetic data with non-negativity and bounds constraints

    NASA Astrophysics Data System (ADS)

    Nocquet, J.-M.

    2018-04-01

    Despite surface displacements observed by geodesy are linear combinations of slip at faults in an elastic medium, determining the spatial distribution of fault slip remains a ill-posed inverse problem. A widely used approach to circumvent the illness of the inversion is to add regularization constraints in terms of smoothing and/or damping so that the linear system becomes invertible. However, the choice of regularization parameters is often arbitrary, and sometimes leads to significantly different results. Furthermore, the resolution analysis is usually empirical and cannot be made independently of the regularization. The stochastic approach of inverse problems (Tarantola & Valette 1982; Tarantola 2005) provides a rigorous framework where the a priori information about the searched parameters is combined with the observations in order to derive posterior probabilities of the unkown parameters. Here, I investigate an approach where the prior probability density function (pdf) is a multivariate Gaussian function, with single truncation to impose positivity of slip or double truncation to impose positivity and upper bounds on slip for interseismic modeling. I show that the joint posterior pdf is similar to the linear untruncated Gaussian case and can be expressed as a Truncated Multi-Variate Normal (TMVN) distribution. The TMVN form can then be used to obtain semi-analytical formulas for the single, two-dimensional or n-dimensional marginal pdf. The semi-analytical formula involves the product of a Gaussian by an integral term that can be evaluated using recent developments in TMVN probabilities calculations (e.g. Genz & Bretz 2009). Posterior mean and covariance can also be efficiently derived. I show that the Maximum Posterior (MAP) can be obtained using a Non-Negative Least-Squares algorithm (Lawson & Hanson 1974) for the single truncated case or using the Bounded-Variable Least-Squares algorithm (Stark & Parker 1995) for the double truncated case. I show that the case of independent uniform priors can be approximated using TMVN. The numerical equivalence to Bayesian inversions using Monte Carlo Markov Chain (MCMC) sampling is shown for a synthetic example and a real case for interseismic modeling in Central Peru. The TMVN method overcomes several limitations of the Bayesian approach using MCMC sampling. First, the need of computer power is largely reduced. Second, unlike Bayesian MCMC based approach, marginal pdf, mean, variance or covariance are obtained independently one from each other. Third, the probability and cumulative density functions can be obtained with any density of points. Finally, determining the Maximum Posterior (MAP) is extremely fast.

  5. Measurement and Structural Model Class Separation in Mixture CFA: ML/EM versus MCMC

    ERIC Educational Resources Information Center

    Depaoli, Sarah

    2012-01-01

    Parameter recovery was assessed within mixture confirmatory factor analysis across multiple estimator conditions under different simulated levels of mixture class separation. Mixture class separation was defined in the measurement model (through factor loadings) and the structural model (through factor variances). Maximum likelihood (ML) via the…

  6. GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods.

    PubMed

    Schaffter, Thomas; Marbach, Daniel; Floreano, Dario

    2011-08-15

    Over the last decade, numerous methods have been developed for inference of regulatory networks from gene expression data. However, accurate and systematic evaluation of these methods is hampered by the difficulty of constructing adequate benchmarks and the lack of tools for a differentiated analysis of network predictions on such benchmarks. Here, we describe a novel and comprehensive method for in silico benchmark generation and performance profiling of network inference methods available to the community as an open-source software called GeneNetWeaver (GNW). In addition to the generation of detailed dynamical models of gene regulatory networks to be used as benchmarks, GNW provides a network motif analysis that reveals systematic prediction errors, thereby indicating potential ways of improving inference methods. The accuracy of network inference methods is evaluated using standard metrics such as precision-recall and receiver operating characteristic curves. We show how GNW can be used to assess the performance and identify the strengths and weaknesses of six inference methods. Furthermore, we used GNW to provide the international Dialogue for Reverse Engineering Assessments and Methods (DREAM) competition with three network inference challenges (DREAM3, DREAM4 and DREAM5). GNW is available at http://gnw.sourceforge.net along with its Java source code, user manual and supporting data. Supplementary data are available at Bioinformatics online. dario.floreano@epfl.ch.

  7. Cluster mass inference via random field theory.

    PubMed

    Zhang, Hui; Nichols, Thomas E; Johnson, Timothy D

    2009-01-01

    Cluster extent and voxel intensity are two widely used statistics in neuroimaging inference. Cluster extent is sensitive to spatially extended signals while voxel intensity is better for intense but focal signals. In order to leverage strength from both statistics, several nonparametric permutation methods have been proposed to combine the two methods. Simulation studies have shown that of the different cluster permutation methods, the cluster mass statistic is generally the best. However, to date, there is no parametric cluster mass inference available. In this paper, we propose a cluster mass inference method based on random field theory (RFT). We develop this method for Gaussian images, evaluate it on Gaussian and Gaussianized t-statistic images and investigate its statistical properties via simulation studies and real data. Simulation results show that the method is valid under the null hypothesis and demonstrate that it can be more powerful than the cluster extent inference method. Further, analyses with a single subject and a group fMRI dataset demonstrate better power than traditional cluster size inference, and good accuracy relative to a gold-standard permutation test.

  8. Logistic random effects regression models: a comparison of statistical packages for binary and ordinal outcomes.

    PubMed

    Li, Baoyue; Lingsma, Hester F; Steyerberg, Ewout W; Lesaffre, Emmanuel

    2011-05-23

    Logistic random effects models are a popular tool to analyze multilevel also called hierarchical data with a binary or ordinal outcome. Here, we aim to compare different statistical software implementations of these models. We used individual patient data from 8509 patients in 231 centers with moderate and severe Traumatic Brain Injury (TBI) enrolled in eight Randomized Controlled Trials (RCTs) and three observational studies. We fitted logistic random effects regression models with the 5-point Glasgow Outcome Scale (GOS) as outcome, both dichotomized as well as ordinal, with center and/or trial as random effects, and as covariates age, motor score, pupil reactivity or trial. We then compared the implementations of frequentist and Bayesian methods to estimate the fixed and random effects. Frequentist approaches included R (lme4), Stata (GLLAMM), SAS (GLIMMIX and NLMIXED), MLwiN ([R]IGLS) and MIXOR, Bayesian approaches included WinBUGS, MLwiN (MCMC), R package MCMCglmm and SAS experimental procedure MCMC.Three data sets (the full data set and two sub-datasets) were analysed using basically two logistic random effects models with either one random effect for the center or two random effects for center and trial. For the ordinal outcome in the full data set also a proportional odds model with a random center effect was fitted. The packages gave similar parameter estimates for both the fixed and random effects and for the binary (and ordinal) models for the main study and when based on a relatively large number of level-1 (patient level) data compared to the number of level-2 (hospital level) data. However, when based on relatively sparse data set, i.e. when the numbers of level-1 and level-2 data units were about the same, the frequentist and Bayesian approaches showed somewhat different results. The software implementations differ considerably in flexibility, computation time, and usability. There are also differences in the availability of additional tools for model evaluation, such as diagnostic plots. The experimental SAS (version 9.2) procedure MCMC appeared to be inefficient. On relatively large data sets, the different software implementations of logistic random effects regression models produced similar results. Thus, for a large data set there seems to be no explicit preference (of course if there is no preference from a philosophical point of view) for either a frequentist or Bayesian approach (if based on vague priors). The choice for a particular implementation may largely depend on the desired flexibility, and the usability of the package. For small data sets the random effects variances are difficult to estimate. In the frequentist approaches the MLE of this variance was often estimated zero with a standard error that is either zero or could not be determined, while for Bayesian methods the estimates could depend on the chosen "non-informative" prior of the variance parameter. The starting value for the variance parameter may be also critical for the convergence of the Markov chain.

  9. Study of optical and electronic properties of nickel from reflection electron energy loss spectra

    NASA Astrophysics Data System (ADS)

    Xu, H.; Yang, L. H.; Da, B.; Tóth, J.; Tőkési, K.; Ding, Z. J.

    2017-09-01

    We use the classical Monte Carlo transport model of electrons moving near the surface and inside solids to reproduce the measured reflection electron energy-loss spectroscopy (REELS) spectra. With the combination of the classical transport model and the Markov chain Monte Carlo (MCMC) sampling of oscillator parameters the so-called reverse Monte Carlo (RMC) method was developed, and used to obtain optical constants of Ni in this work. A systematic study of the electronic and optical properties of Ni has been performed in an energy loss range of 0-200 eV from the measured REELS spectra at primary energies of 1000 eV, 2000 eV and 3000 eV. The reliability of our method was tested by comparing our results with the previous data. Moreover, the accuracy of our optical data has been confirmed by applying oscillator strength-sum rule and perfect-screening-sum rule.

  10. Segregation and recombination of a multipartite mitochondrial DNA in populations of the potato cyst nematode Globodera pallida.

    PubMed

    Armstrong, Miles R; Husmeier, Dirk; Phillips, Mark S; Blok, Vivian C

    2007-06-01

    The discovery that the potato cyst nematode Globodera pallida has a multipartite mitochondrial DNA (mtDNA) composed, at least in part, of six small circular mtDNAs (scmtDNAs) raised a number of questions concerning the population-level processes that might act on such a complex genome. Here we report our observations on the distribution of some scmtDNAs among a sample of European and South American G. pallida populations. The occurrence of sequence variants of scmtDNA IV in population P4A from South America, and that particular sequence variants are common to the individuals within a single cyst, is described. Evidence for recombination of sequence variants of scmtDNA IV in P4A is also reported. The mosaic structure of P4A scmtDNA IV sequences was revealed using several detection methods and recombination breakpoints were independently detected by maximum likelihood and Bayesian MCMC methods.

  11. Anchoring quartet-based phylogenetic distances and applications to species tree reconstruction.

    PubMed

    Sayyari, Erfan; Mirarab, Siavash

    2016-11-11

    Inferring species trees from gene trees using the coalescent-based summary methods has been the subject of much attention, yet new scalable and accurate methods are needed. We introduce DISTIQUE, a new statistically consistent summary method for inferring species trees from gene trees under the coalescent model. We generalize our results to arbitrary phylogenetic inference problems; we show that two arbitrarily chosen leaves, called anchors, can be used to estimate relative distances between all other pairs of leaves by inferring relevant quartet trees. This results in a family of distance-based tree inference methods, with running times ranging between quadratic to quartic in the number of leaves. We show in simulated studies that DISTIQUE has comparable accuracy to leading coalescent-based summary methods and reduced running times.

  12. SPOTting model parameters using a ready-made Python package

    NASA Astrophysics Data System (ADS)

    Houska, Tobias; Kraft, Philipp; Breuer, Lutz

    2015-04-01

    The selection and parameterization of reliable process descriptions in ecological modelling is driven by several uncertainties. The procedure is highly dependent on various criteria, like the used algorithm, the likelihood function selected and the definition of the prior parameter distributions. A wide variety of tools have been developed in the past decades to optimize parameters. Some of the tools are closed source. Due to this, the choice for a specific parameter estimation method is sometimes more dependent on its availability than the performance. A toolbox with a large set of methods can support users in deciding about the most suitable method. Further, it enables to test and compare different methods. We developed the SPOT (Statistical Parameter Optimization Tool), an open source python package containing a comprehensive set of modules, to analyze and optimize parameters of (environmental) models. SPOT comes along with a selected set of algorithms for parameter optimization and uncertainty analyses (Monte Carlo, MC; Latin Hypercube Sampling, LHS; Maximum Likelihood, MLE; Markov Chain Monte Carlo, MCMC; Scuffled Complex Evolution, SCE-UA; Differential Evolution Markov Chain, DE-MCZ), together with several likelihood functions (Bias, (log-) Nash-Sutcliff model efficiency, Correlation Coefficient, Coefficient of Determination, Covariance, (Decomposed-, Relative-, Root-) Mean Squared Error, Mean Absolute Error, Agreement Index) and prior distributions (Binomial, Chi-Square, Dirichlet, Exponential, Laplace, (log-, multivariate-) Normal, Pareto, Poisson, Cauchy, Uniform, Weibull) to sample from. The model-independent structure makes it suitable to analyze a wide range of applications. We apply all algorithms of the SPOT package in three different case studies. Firstly, we investigate the response of the Rosenbrock function, where the MLE algorithm shows its strengths. Secondly, we study the Griewank function, which has a challenging response surface for optimization methods. Here we see simple algorithms like the MCMC struggling to find the global optimum of the function, while algorithms like SCE-UA and DE-MCZ show their strengths. Thirdly, we apply an uncertainty analysis of a one-dimensional physically based hydrological model build with the Catchment Modelling Framework (CMF). The model is driven by meteorological and groundwater data from a Free Air Carbon Enrichment (FACE) experiment in Linden (Hesse, Germany). Simulation results are evaluated with measured soil moisture data. We search for optimal parameter sets of the van Genuchten-Mualem function and find different equally optimal solutions with some of the algorithms. The case studies reveal that the implemented SPOT methods work sufficiently well. They further show the benefit of having one tool at hand that includes a number of parameter search methods, likelihood functions and a priori parameter distributions within one platform independent package.

  13. Description of HIV-1 Group M Molecular Epidemiology and Drug Resistance Prevalence in Equatorial Guinea from Migrants in Spain

    PubMed Central

    Yebra, Gonzalo; de Mulder, Miguel; Holguín, África

    2013-01-01

    Background The HIV epidemic is increasing in Equatorial Guinea (GQ), West Central Africa, but few studies have reported its HIV molecular epidemiology. We aimed to describe the HIV-1 group M (HIV-1M) variants and drug-resistance mutations in GQ using sequences sampled in this country and in Spain, a frequent destination of Equatoguinean migrants. Methods We collected 195 HIV-1M pol sequences from Equatoguinean subjects attending Spanish clinics during 1997-2011, and 83 additional sequences sampled in GQ in 1997 and 2008 from GenBank. All (n = 278) were re-classified using phylogeny and tested for drug-resistance mutations. To evaluate the origin of CRF02_AG in GQ, we analyzed 2,562 CRF02_AG sequences and applied Bayesian MCMC inference (BEAST program). Results Most Equatoguinean patients recruited in Spain were women (61.1%) or heterosexuals (87.7%). In the 278 sequences, the variants found were CRF02_AG (47.8%), A (13.7%), B (7.2%), C (5.8%), G (5.4%) and others (20.1%). We found 6 CRF02_AG clusters emerged from 1983.9 to 2002.5 with origin in GQ (5.5 sequences/cluster). Transmitted drug-resistance (TDR) rate among naïve patients attended in Spain (n = 144) was 4.7%: 3.4% for PI (all with M46IL), 1.8% for NRTI (all with M184V) and 0.9% for NNRTI (Y188L). Among pre-treated patients, 9/31 (29%) presented any resistance, mainly affecting NNRTI (27.8%). Conclusions We report a low (<5%) TDR rate among naïve, with PI as the most affected class. Pre-treated patients also showed a low drug-resistance prevalence (29%) maybe related to the insufficient treatment coverage in GQ. CRF02_AG was the prevalent HIV-1M variant and entered GQ through independent introductions at least since the early 1980s. PMID:23717585

  14. Estimating safety effects of pavement management factors utilizing Bayesian random effect models.

    PubMed

    Jiang, Ximiao; Huang, Baoshan; Zaretzki, Russell L; Richards, Stephen; Yan, Xuedong

    2013-01-01

    Previous studies of pavement management factors that relate to the occurrence of traffic-related crashes are rare. Traditional research has mostly employed summary statistics of bidirectional pavement quality measurements in extended longitudinal road segments over a long time period, which may cause a loss of important information and result in biased parameter estimates. The research presented in this article focuses on crash risk of roadways with overall fair to good pavement quality. Real-time and location-specific data were employed to estimate the effects of pavement management factors on the occurrence of crashes. This research is based on the crash data and corresponding pavement quality data for the Tennessee state route highways from 2004 to 2009. The potential temporal and spatial correlations among observations caused by unobserved factors were considered. Overall 6 models were built accounting for no correlation, temporal correlation only, and both the temporal and spatial correlations. These models included Poisson, negative binomial (NB), one random effect Poisson and negative binomial (OREP, ORENB), and two random effect Poisson and negative binomial (TREP, TRENB) models. The Bayesian method was employed to construct these models. The inference is based on the posterior distribution from the Markov chain Monte Carlo (MCMC) simulation. These models were compared using the deviance information criterion. Analysis of the posterior distribution of parameter coefficients indicates that the pavement management factors indexed by Present Serviceability Index (PSI) and Pavement Distress Index (PDI) had significant impacts on the occurrence of crashes, whereas the variable rutting depth was not significant. Among other factors, lane width, median width, type of terrain, and posted speed limit were significant in affecting crash frequency. The findings of this study indicate that a reduction in pavement roughness would reduce the likelihood of traffic-related crashes. Hence, maintaining a low level of pavement roughness is strongly suggested. In addition, the results suggested that the temporal correlation among observations was significant and that the ORENB model outperformed all other models.

  15. Benchmarking Relatedness Inference Methods with Genome-Wide Data from Thousands of Relatives.

    PubMed

    Ramstetter, Monica D; Dyer, Thomas D; Lehman, Donna M; Curran, Joanne E; Duggirala, Ravindranath; Blangero, John; Mezey, Jason G; Williams, Amy L

    2017-09-01

    Inferring relatedness from genomic data is an essential component of genetic association studies, population genetics, forensics, and genealogy. While numerous methods exist for inferring relatedness, thorough evaluation of these approaches in real data has been lacking. Here, we report an assessment of 12 state-of-the-art pairwise relatedness inference methods using a data set with 2485 individuals contained in several large pedigrees that span up to six generations. We find that all methods have high accuracy (92-99%) when detecting first- and second-degree relationships, but their accuracy dwindles to <43% for seventh-degree relationships. However, most identical by descent (IBD) segment-based methods inferred seventh-degree relatives correct to within one relatedness degree for >76% of relative pairs. Overall, the most accurate methods are Estimation of Recent Shared Ancestry (ERSA) and approaches that compute total IBD sharing using the output from GERMLINE and Refined IBD to infer relatedness. Combining information from the most accurate methods provides little accuracy improvement, indicating that novel approaches, such as new methods that leverage relatedness signals from multiple samples, are needed to achieve a sizeable jump in performance. Copyright © 2017 Ramstetter et al.

  16. BagReg: Protein inference through machine learning.

    PubMed

    Zhao, Can; Liu, Dao; Teng, Ben; He, Zengyou

    2015-08-01

    Protein inference from the identified peptides is of primary importance in the shotgun proteomics. The target of protein inference is to identify whether each candidate protein is truly present in the sample. To date, many computational methods have been proposed to solve this problem. However, there is still no method that can fully utilize the information hidden in the input data. In this article, we propose a learning-based method named BagReg for protein inference. The method firstly artificially extracts five features from the input data, and then chooses each feature as the class feature to separately build models to predict the presence probabilities of proteins. Finally, the weak results from five prediction models are aggregated to obtain the final result. We test our method on six public available data sets. The experimental results show that our method is superior to the state-of-the-art protein inference algorithms. Copyright © 2015 Elsevier Ltd. All rights reserved.

  17. Posterior uncertainty of GEOS-5 L-band radiative transfer model parameters and brightness temperatures after calibration with SMOS observations

    NASA Astrophysics Data System (ADS)

    De Lannoy, G. J.; Reichle, R. H.; Vrugt, J. A.

    2012-12-01

    Simulated L-band (1.4 GHz) brightness temperatures are very sensitive to the values of the parameters in the radiative transfer model (RTM). We assess the optimum RTM parameter values and their (posterior) uncertainty in the Goddard Earth Observing System (GEOS-5) land surface model using observations of multi-angular brightness temperature over North America from the Soil Moisture Ocean Salinity (SMOS) mission. Two different parameter estimation methods are being compared: (i) a particle swarm optimization (PSO) approach, and (ii) an MCMC simulation procedure using the differential evolution adaptive Metropolis (DREAM) algorithm. Our results demonstrate that both methods provide similar "optimal" parameter values. Yet, DREAM exhibits better convergence properties, resulting in a reduced spread of the posterior ensemble. The posterior parameter distributions derived with both methods are used for predictive uncertainty estimation of brightness temperature. This presentation will highlight our model-data synthesis framework and summarize our initial findings.

  18. Application of Bayesian Inversion for Multilayer Reservoir Mapping while Drilling Measurements

    NASA Astrophysics Data System (ADS)

    Wang, J.; Chen, H.; Wang, X.

    2017-12-01

    Real-time geosteering technology plays a key role in horizontal well development, which keeps the wellbore trajectories within target zones to maximize reservoir contact. The new generation logging while drilling (LWD) resistivity tools have longer spacing and deeper investigation depth, but meanwhile bring a new challenge to inversion of logging data that is formation model not be restricted to few possible numbers of layer such as typical three layers model. If the inappropriate starting models of deterministic and gradient-based methods are adopted may mislead geophysicists in interpretation of subsurface structure. For this purpose, to take advantage of richness of the measurements and deep depth of investigation across multiple formation boundaries, a trans-dimensional Markov chain Monte Carlo(MCMC) inversion algorithm has been developed that combines phase and attenuation measurements at various frequencies and spacings. Unlike conventional gradient-based inversion approaches, MCMC algorithm does not introduce bias from prior information and require any subjective choice of regularization parameter. A synthetic three layers model example demonstrates how the algorithm can be used to image the subsurface using the LWD data. When the tool is far from top boundary, the inversion clearly resolves the boundary position; that is where the boundary histogram shows a large peak. But the measurements cannot resolve the bottom boundary; the large spread between quantiles reflects the uncertainty associated with the bed resolution. As the tool moves closer to the top boundary, the middle layer and bottom layer are resolved and retained models are more similar, the uncertainty associated with these two beds decreases. From the spread observed between models, we can evaluate actual depth of investigation, uncertainty, and sensitivity, which is more useful then just a single best model.

  19. Potential uncertainty reduction in model-averaged benchmark dose estimates informed by an additional dose study.

    PubMed

    Shao, Kan; Small, Mitchell J

    2011-10-01

    A methodology is presented for assessing the information value of an additional dosage experiment in existing bioassay studies. The analysis demonstrates the potential reduction in the uncertainty of toxicity metrics derived from expanded studies, providing insights for future studies. Bayesian methods are used to fit alternative dose-response models using Markov chain Monte Carlo (MCMC) simulation for parameter estimation and Bayesian model averaging (BMA) is used to compare and combine the alternative models. BMA predictions for benchmark dose (BMD) are developed, with uncertainty in these predictions used to derive the lower bound BMDL. The MCMC and BMA results provide a basis for a subsequent Monte Carlo analysis that backcasts the dosage where an additional test group would have been most beneficial in reducing the uncertainty in the BMD prediction, along with the magnitude of the expected uncertainty reduction. Uncertainty reductions are measured in terms of reduced interval widths of predicted BMD values and increases in BMDL values that occur as a result of this reduced uncertainty. The methodology is illustrated using two existing data sets for TCDD carcinogenicity, fitted with two alternative dose-response models (logistic and quantal-linear). The example shows that an additional dose at a relatively high value would have been most effective for reducing the uncertainty in BMA BMD estimates, with predicted reductions in the widths of uncertainty intervals of approximately 30%, and expected increases in BMDL values of 5-10%. The results demonstrate that dose selection for studies that subsequently inform dose-response models can benefit from consideration of how these models will be fit, combined, and interpreted. © 2011 Society for Risk Analysis.

  20. Accounting for model error in Bayesian solutions to hydrogeophysical inverse problems using a local basis approach

    NASA Astrophysics Data System (ADS)

    Köpke, Corinna; Irving, James; Elsheikh, Ahmed H.

    2018-06-01

    Bayesian solutions to geophysical and hydrological inverse problems are dependent upon a forward model linking subsurface physical properties to measured data, which is typically assumed to be perfectly known in the inversion procedure. However, to make the stochastic solution of the inverse problem computationally tractable using methods such as Markov-chain-Monte-Carlo (MCMC), fast approximations of the forward model are commonly employed. This gives rise to model error, which has the potential to significantly bias posterior statistics if not properly accounted for. Here, we present a new methodology for dealing with the model error arising from the use of approximate forward solvers in Bayesian solutions to hydrogeophysical inverse problems. Our approach is geared towards the common case where this error cannot be (i) effectively characterized through some parametric statistical distribution; or (ii) estimated by interpolating between a small number of computed model-error realizations. To this end, we focus on identification and removal of the model-error component of the residual during MCMC using a projection-based approach, whereby the orthogonal basis employed for the projection is derived in each iteration from the K-nearest-neighboring entries in a model-error dictionary. The latter is constructed during the inversion and grows at a specified rate as the iterations proceed. We demonstrate the performance of our technique on the inversion of synthetic crosshole ground-penetrating radar travel-time data considering three different subsurface parameterizations of varying complexity. Synthetic data are generated using the eikonal equation, whereas a straight-ray forward model is assumed for their inversion. In each case, our developed approach enables us to remove posterior bias and obtain a more realistic characterization of uncertainty.

  1. Impact of Federal drug law enforcement on the supply of heroin in Australia.

    PubMed

    Smithson, Michael; McFadden, Michael; Mwesigye, Sue-Ellen

    2005-08-01

    To conduct an empirical investigation of the efficacy of law enforcement in reducing heroin supply in Australia. Specifically, this paper addresses the question of whether heroin purity levels in the Australian Capital Territory (ACT) could be predicted by heroin seizures at the national level by the Australian Federal Police (AFP) in the preceding year. We considered two forms of evidence. First, a Bayesian Markov Chain Monte Carlo (MCMC) change-point model was used to discover (a) if there was a substantial increase in heroin seizures by the AFP, (b) when the increase began and (c) whether it occurred after increased funding to the Australian Federal Police for the purpose of drug law enforcement. Second, standard time-series methods were used to ascertain whether fluctuations in heroin seizure weights or the frequency of large-scale seizures after the aforementioned changes in seizure levels predicted fluctuations in heroin purity levels in the ACT after autocorrelation had been removed from the purity series. A Bayesian MCMC change-point model supported the hypothesis that heroin seizures rapidly increased about a year before the estimated decline in heroin purity and after the increased funding of AFP. The autoregression models suggested that 10-20% of the variance in the residuals of the heroin purity series was predicted by appropriately lagged residuals of the seizure-number and log-weight series, after autocorrelation had been removed. The overall results are consistent with the hypothesis that large-scale heroin seizures by the AFP reduce street-level heroin supply a year or so later, although the short-term dynamics suggest an 'opponent' response to residual fluctuations in seizures. To our knowledge, this is first time a connection has been identified between large-scale heroin seizures and street-level supply.

  2. Fast-NPS-A Markov Chain Monte Carlo-based analysis tool to obtain structural information from single-molecule FRET measurements

    NASA Astrophysics Data System (ADS)

    Eilert, Tobias; Beckers, Maximilian; Drechsler, Florian; Michaelis, Jens

    2017-10-01

    The analysis tool and software package Fast-NPS can be used to analyse smFRET data to obtain quantitative structural information about macromolecules in their natural environment. In the algorithm a Bayesian model gives rise to a multivariate probability distribution describing the uncertainty of the structure determination. Since Fast-NPS aims to be an easy-to-use general-purpose analysis tool for a large variety of smFRET networks, we established an MCMC based sampling engine that approximates the target distribution and requires no parameter specification by the user at all. For an efficient local exploration we automatically adapt the multivariate proposal kernel according to the shape of the target distribution. In order to handle multimodality, the sampler is equipped with a parallel tempering scheme that is fully adaptive with respect to temperature spacing and number of chains. Since the molecular surrounding of a dye molecule affects its spatial mobility and thus the smFRET efficiency, we introduce dye models which can be selected for every dye molecule individually. These models allow the user to represent the smFRET network in great detail leading to an increased localisation precision. Finally, a tool to validate the chosen model combination is provided. Programme Files doi:http://dx.doi.org/10.17632/7ztzj63r68.1 Licencing provisions: Apache-2.0 Programming language: GUI in MATLAB (The MathWorks) and the core sampling engine in C++ Nature of problem: Sampling of highly diverse multivariate probability distributions in order to solve for macromolecular structures from smFRET data. Solution method: MCMC algorithm with fully adaptive proposal kernel and parallel tempering scheme.

  3. WMAP7 constraints on oscillations in the primordial power spectrum

    NASA Astrophysics Data System (ADS)

    Meerburg, P. Daniel; Wijers, Ralph A. M. J.; van der Schaar, Jan Pieter

    2012-03-01

    We use the 7-year Wilkinson Microwave Anisotropy Probe (WMAP7) data to place constraints on oscillations supplementing an almost scale-invariant primordial power spectrum. Such oscillations are predicted by a variety of models, some of which amount to assuming that there is some non-trivial choice of the vacuum state at the onset of inflation. In this paper, we will explore data-driven constraints on two distinct models of initial state modifications. In both models, the frequency, phase and amplitude are degrees of freedom of the theory for which the theoretical bounds are rather weak: both the amplitude and frequency have allowed values ranging over several orders of magnitude. This requires many computationally expensive evaluations of the model cosmic microwave background (CMB) spectra and their goodness of fit, even in a Markov chain Monte Carlo (MCMC), normally the most efficient fitting method for such a problem. To search more efficiently, we first run a densely-spaced grid, with only three varying parameters: the frequency, the amplitude and the baryon density. We obtain the optimal frequency and run an MCMC at the best-fitting frequency, randomly varying all other relevant parameters. To reduce the computational time of each power spectrum computation, we adjust both comoving momentum integration and spline interpolation (in l) as a function of frequency and amplitude of the primordial power spectrum. Applying this to the WMAP7 data allows us to improve existing constraints on the presence of oscillations. We confirm earlier findings that certain frequencies can improve the fitting over a model without oscillations. For those frequencies we compute the posterior probability, allowing us to put some constraints on the primordial parameter space of both models.

  4. Development of engine activity cycles for the prime movers of unconventional natural gas well development.

    PubMed

    Johnson, Derek; Heltzel, Robert; Nix, Andrew; Barrow, Rebekah

    2017-03-01

    With the advent of unconventional natural gas resources, new research focuses on the efficiency and emissions of the prime movers powering these fleets. These prime movers also play important roles in emissions inventories for this sector. Industry seeks to reduce operating costs by decreasing the required fuel demands of these high horsepower engines but conducting in-field or full-scale research on new technologies is cost prohibitive. As such, this research completed extensive in-use data collection efforts for the engines powering over-the-road trucks, drilling engines, and hydraulic stimulation pump engines. These engine activity data were processed in order to make representative test cycles using a Markov Chain, Monte Carlo (MCMC) simulation method. Such cycles can be applied under controlled environments on scaled engines for future research. In addition to MCMC, genetic algorithms were used to improve the overall performance values for the test cycles and smoothing was applied to ensure regression criteria were met during implementation on a test engine and dynamometer. The variations in cycle and in-use statistics are presented along with comparisons to conventional test cycles used for emissions compliance. Development of representative, engine dynamometer test cycles, from in-use activity data, is crucial in understanding fuel efficiency and emissions for engine operating modes that are different from cycles mandated by the Code of Federal Regulations. Representative cycles were created for the prime movers of unconventional well development-over-the-road (OTR) trucks and drilling and hydraulic fracturing engines. The representative cycles are implemented on scaled engines to reduce fuel consumption during research and development of new technologies in controlled laboratory environments.

  5. Training-Image Based Geostatistical Inversion Using a Spatial Generative Adversarial Neural Network

    NASA Astrophysics Data System (ADS)

    Laloy, Eric; Hérault, Romain; Jacques, Diederik; Linde, Niklas

    2018-01-01

    Probabilistic inversion within a multiple-point statistics framework is often computationally prohibitive for high-dimensional problems. To partly address this, we introduce and evaluate a new training-image based inversion approach for complex geologic media. Our approach relies on a deep neural network of the generative adversarial network (GAN) type. After training using a training image (TI), our proposed spatial GAN (SGAN) can quickly generate 2-D and 3-D unconditional realizations. A key characteristic of our SGAN is that it defines a (very) low-dimensional parameterization, thereby allowing for efficient probabilistic inversion using state-of-the-art Markov chain Monte Carlo (MCMC) methods. In addition, available direct conditioning data can be incorporated within the inversion. Several 2-D and 3-D categorical TIs are first used to analyze the performance of our SGAN for unconditional geostatistical simulation. Training our deep network can take several hours. After training, realizations containing a few millions of pixels/voxels can be produced in a matter of seconds. This makes it especially useful for simulating many thousands of realizations (e.g., for MCMC inversion) as the relative cost of the training per realization diminishes with the considered number of realizations. Synthetic inversion case studies involving 2-D steady state flow and 3-D transient hydraulic tomography with and without direct conditioning data are used to illustrate the effectiveness of our proposed SGAN-based inversion. For the 2-D case, the inversion rapidly explores the posterior model distribution. For the 3-D case, the inversion recovers model realizations that fit the data close to the target level and visually resemble the true model well.

  6. Parameter-expanded data augmentation for Bayesian analysis of capture-recapture models

    USGS Publications Warehouse

    Royle, J. Andrew; Dorazio, Robert M.

    2012-01-01

    Data augmentation (DA) is a flexible tool for analyzing closed and open population models of capture-recapture data, especially models which include sources of hetereogeneity among individuals. The essential concept underlying DA, as we use the term, is based on adding "observations" to create a dataset composed of a known number of individuals. This new (augmented) dataset, which includes the unknown number of individuals N in the population, is then analyzed using a new model that includes a reformulation of the parameter N in the conventional model of the observed (unaugmented) data. In the context of capture-recapture models, we add a set of "all zero" encounter histories which are not, in practice, observable. The model of the augmented dataset is a zero-inflated version of either a binomial or a multinomial base model. Thus, our use of DA provides a general approach for analyzing both closed and open population models of all types. In doing so, this approach provides a unified framework for the analysis of a huge range of models that are treated as unrelated "black boxes" and named procedures in the classical literature. As a practical matter, analysis of the augmented dataset by MCMC is greatly simplified compared to other methods that require specialized algorithms. For example, complex capture-recapture models of an augmented dataset can be fitted with popular MCMC software packages (WinBUGS or JAGS) by providing a concise statement of the model's assumptions that usually involves only a few lines of pseudocode. In this paper, we review the basic technical concepts of data augmentation, and we provide examples of analyses of closed-population models (M 0, M h , distance sampling, and spatial capture-recapture models) and open-population models (Jolly-Seber) with individual effects.

  7. Data Analysis Techniques for Physical Scientists

    NASA Astrophysics Data System (ADS)

    Pruneau, Claude A.

    2017-10-01

    Preface; How to read this book; 1. The scientific method; Part I. Foundation in Probability and Statistics: 2. Probability; 3. Probability models; 4. Classical inference I: estimators; 5. Classical inference II: optimization; 6. Classical inference III: confidence intervals and statistical tests; 7. Bayesian inference; Part II. Measurement Techniques: 8. Basic measurements; 9. Event reconstruction; 10. Correlation functions; 11. The multiple facets of correlation functions; 12. Data correction methods; Part III. Simulation Techniques: 13. Monte Carlo methods; 14. Collision and detector modeling; List of references; Index.

  8. A COMBINED SPECTROSCOPIC AND PHOTOMETRIC STELLAR ACTIVITY STUDY OF EPSILON ERIDANI

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Giguere, Matthew J.; Fischer, Debra A.; Zhang, Cyril X. Y.

    2016-06-20

    We present simultaneous ground-based radial velocity (RV) measurements and space-based photometric measurements of the young and active K dwarf Epsilon Eridani. These measurements provide a data set for exploring methods of identifying and ultimately distinguishing stellar photospheric velocities from Keplerian motion. We compare three methods we have used in exploring this data set: Dalmatian, an MCMC spot modeling code that fits photometric and RV measurements simultaneously; the FF′ method, which uses photometric measurements to predict the stellar activity signal in simultaneous RV measurements; and H α analysis. We show that our H α measurements are strongly correlated with the Microvariabilitymore » and Oscillations of STars telescope ( MOST ) photometry, which led to a promising new method based solely on the spectroscopic observations. This new method, which we refer to as the HH′ method, uses H α measurements as input into the FF′ model. While the Dalmatian spot modeling analysis and the FF′ method with MOST space-based photometry are currently more robust, the HH′ method only makes use of one of the thousands of stellar lines in the visible spectrum. By leveraging additional spectral activity indicators, we believe the HH′ method may prove quite useful in disentangling stellar signals.« less

  9. Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data.

    PubMed

    Bhaskar, Anand; Wang, Y X Rachel; Song, Yun S

    2015-02-01

    With the recent increase in study sample sizes in human genetics, there has been growing interest in inferring historical population demography from genomic variation data. Here, we present an efficient inference method that can scale up to very large samples, with tens or hundreds of thousands of individuals. Specifically, by utilizing analytic results on the expected frequency spectrum under the coalescent and by leveraging the technique of automatic differentiation, which allows us to compute gradients exactly, we develop a very efficient algorithm to infer piecewise-exponential models of the historical effective population size from the distribution of sample allele frequencies. Our method is orders of magnitude faster than previous demographic inference methods based on the frequency spectrum. In addition to inferring demography, our method can also accurately estimate locus-specific mutation rates. We perform extensive validation of our method on simulated data and show that it can accurately infer multiple recent epochs of rapid exponential growth, a signal that is difficult to pick up with small sample sizes. Lastly, we use our method to analyze data from recent sequencing studies, including a large-sample exome-sequencing data set of tens of thousands of individuals assayed at a few hundred genic regions. © 2015 Bhaskar et al.; Published by Cold Spring Harbor Laboratory Press.

  10. Statistical inference for extended or shortened phase II studies based on Simon's two-stage designs.

    PubMed

    Zhao, Junjun; Yu, Menggang; Feng, Xi-Ping

    2015-06-07

    Simon's two-stage designs are popular choices for conducting phase II clinical trials, especially in the oncology trials to reduce the number of patients placed on ineffective experimental therapies. Recently Koyama and Chen (2008) discussed how to conduct proper inference for such studies because they found that inference procedures used with Simon's designs almost always ignore the actual sampling plan used. In particular, they proposed an inference method for studies when the actual second stage sample sizes differ from planned ones. We consider an alternative inference method based on likelihood ratio. In particular, we order permissible sample paths under Simon's two-stage designs using their corresponding conditional likelihood. In this way, we can calculate p-values using the common definition: the probability of obtaining a test statistic value at least as extreme as that observed under the null hypothesis. In addition to providing inference for a couple of scenarios where Koyama and Chen's method can be difficult to apply, the resulting estimate based on our method appears to have certain advantage in terms of inference properties in many numerical simulations. It generally led to smaller biases and narrower confidence intervals while maintaining similar coverages. We also illustrated the two methods in a real data setting. Inference procedures used with Simon's designs almost always ignore the actual sampling plan. Reported P-values, point estimates and confidence intervals for the response rate are not usually adjusted for the design's adaptiveness. Proper statistical inference procedures should be used.

  11. xspec_emcee: XSPEC-friendly interface for the emcee package

    NASA Astrophysics Data System (ADS)

    Sanders, Jeremy

    2018-05-01

    XSPEC_EMCEE is an XSPEC-friendly interface for emcee (ascl:1303.002). It carries out MCMC analyses of X-ray spectra in the X-ray spectral fitting program XSPEC (ascl:9910.005). It can run multiple xspec processes simultaneously, speeding up the analysis, and can switch to parameterizing norm parameters in log space.

  12. Teaching Markov Chain Monte Carlo: Revealing the Basic Ideas behind the Algorithm

    ERIC Educational Resources Information Center

    Stewart, Wayne; Stewart, Sepideh

    2014-01-01

    For many scientists, researchers and students Markov chain Monte Carlo (MCMC) simulation is an important and necessary tool to perform Bayesian analyses. The simulation is often presented as a mathematical algorithm and then translated into an appropriate computer program. However, this can result in overlooking the fundamental and deeper…

  13. Evaluation of the Multiple Careers Magnet and Assessment Centers at William B. Carrell, 1978-79.

    ERIC Educational Resources Information Center

    Maples, Wayne; And Others

    The report evaluates Texas' Multiple Careers Magnet Center (MCMC), a part time program to provide special education secondary students with career training. It is explained that students enter one of six career education clusters: furniture repair and upholstery, general construction trades, building and grounds maintenance, laundry and dry…

  14. Explicitly integrating parameter, input, and structure uncertainties into Bayesian Neural Networks for probabilistic hydrologic forecasting

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Xuesong; Liang, Faming; Yu, Beibei

    2011-11-09

    Estimating uncertainty of hydrologic forecasting is valuable to water resources and other relevant decision making processes. Recently, Bayesian Neural Networks (BNNs) have been proved powerful tools for quantifying uncertainty of streamflow forecasting. In this study, we propose a Markov Chain Monte Carlo (MCMC) framework to incorporate the uncertainties associated with input, model structure, and parameter into BNNs. This framework allows the structure of the neural networks to change by removing or adding connections between neurons and enables scaling of input data by using rainfall multipliers. The results show that the new BNNs outperform the BNNs that only consider uncertainties associatedmore » with parameter and model structure. Critical evaluation of posterior distribution of neural network weights, number of effective connections, rainfall multipliers, and hyper-parameters show that the assumptions held in our BNNs are not well supported. Further understanding of characteristics of different uncertainty sources and including output error into the MCMC framework are expected to enhance the application of neural networks for uncertainty analysis of hydrologic forecasting.« less

  15. State and parameter estimation of two land surface models using the ensemble Kalman filter and the particle filter

    NASA Astrophysics Data System (ADS)

    Zhang, Hongjuan; Hendricks Franssen, Harrie-Jan; Han, Xujun; Vrugt, Jasper A.; Vereecken, Harry

    2017-09-01

    Land surface models (LSMs) use a large cohort of parameters and state variables to simulate the water and energy balance at the soil-atmosphere interface. Many of these model parameters cannot be measured directly in the field, and require calibration against measured fluxes of carbon dioxide, sensible and/or latent heat, and/or observations of the thermal and/or moisture state of the soil. Here, we evaluate the usefulness and applicability of four different data assimilation methods for joint parameter and state estimation of the Variable Infiltration Capacity Model (VIC-3L) and the Community Land Model (CLM) using a 5-month calibration (assimilation) period (March-July 2012) of areal-averaged SPADE soil moisture measurements at 5, 20, and 50 cm depths in the Rollesbroich experimental test site in the Eifel mountain range in western Germany. We used the EnKF with state augmentation or dual estimation, respectively, and the residual resampling PF with a simple, statistically deficient, or more sophisticated, MCMC-based parameter resampling method. The performance of the calibrated LSM models was investigated using SPADE water content measurements of a 5-month evaluation period (August-December 2012). As expected, all DA methods enhance the ability of the VIC and CLM models to describe spatiotemporal patterns of moisture storage within the vadose zone of the Rollesbroich site, particularly if the maximum baseflow velocity (VIC) or fractions of sand, clay, and organic matter of each layer (CLM) are estimated jointly with the model states of each soil layer. The differences between the soil moisture simulations of VIC-3L and CLM are much larger than the discrepancies among the four data assimilation methods. The EnKF with state augmentation or dual estimation yields the best performance of VIC-3L and CLM during the calibration and evaluation period, yet results are in close agreement with the PF using MCMC resampling. Overall, CLM demonstrated the best performance for the Rollesbroich site. The large systematic underestimation of water storage at 50 cm depth by VIC-3L during the first few months of the evaluation period questions, in part, the validity of its fixed water table depth at the bottom of the modeled soil domain.

  16. Irregular Variability In Kepler Photometry

    NASA Astrophysics Data System (ADS)

    Schlecker, Martin

    2016-12-01

    The transit method is the most successful tool for exoplanet discovery to date. With more than half of all known exoplanets discovered by Kepler using this method, the mission also revealed a number of objects with dimming events that defy the common explanations, the most prominent being KIC 8462852 aka ``Tabby's star''. I embarked on a search for objects with such irregular transit signatures in the data of K2, the two-wheeled successor mission of Kepler. My method is a combination of automated pre-selection of targets showing downward flux excursions and visual light curve inspection of the selected subset comprising about SI{1.5}% of the initial sample. In addition, I developed a tool to constrain the effective temperature of a planet-hosting star from photometry alone. This software finds broad application in any science case where a photometric spectral type estimate is necessary. I used existing transit models and Bayesian inference to perform a Markov Chain Monte Carlo (MCMC) analysis of a planetary candidate I discovered. This putative gas giant is in a SI{1.32}day circular orbit with an exceptionally tight orbital radius of a ≈ 0.012 AU. My analysis revealed a scaled planetary radius of R_{p}/R_star = 0.0927±0.0026 and an edge-on orientation with an inclination i=89.8+3.0-3.4. EPIC 217393088.01 is one of the closest-orbiting exoplanets ever detected and the first giant planet with such a small orbital radius. An additional major finding of my search is EPIC 220262993, which exhibits aperiodic, asymmetric dips in flux with rapid dimming rates and up to SI{˜25}% depth, lasting for SIrange{2}{4} day. In previous works based on optical and mid-infrared photometry, this object was inconsistently classified as a possible quasar or a white dwarf. We conducted follow-up observations both photometrically with GROND on the MPI/ESO SI{2.2} meter telescope in La Silla (Chile) and spectroscopically with FIRE on the Magellan/Baade SI{6.5} meter telescope. With additional spectroscopy using ESI on the Keck rom{2} SI{10} meter telescope we were able to distinguish between these cases: EPIC 220262993 is a quasar with redshift z=1.42. This is the only known exemplar showing deep dips in flux on such a short time scale.

  17. Wireless Interconnects for Intra-chip & Inter-chip Transmission

    NASA Astrophysics Data System (ADS)

    Narde, Rounak Singh

    With the emergence of Internet of Things and information revolution, the demand of high performance computing systems is increasing. The copper interconnects inside the computing chips have evolved into a sophisticated network of interconnects known as Network on Chip (NoC) comprising of routers, switches, repeaters, just like computer networks. When network on chip is implemented on a large scale like in Multicore Multichip (MCMC) systems for High Performance Computing (HPC) systems, length of interconnects increases and so are the problems like power dissipation, interconnect delays, clock synchronization and electrical noise. In this thesis, wireless interconnects are chosen as the substitute for wired copper interconnects. Wireless interconnects offer easy integration with CMOS fabrication and chip packaging. Using wireless interconnects working at unlicensed mm-wave band (57-64GHz), high data rate of Gbps can be achieved. This thesis presents study of transmission between zigzag antennas as wireless interconnects for Multichip multicores (MCMC) systems and 3D IC. For MCMC systems, a four-chips 16-cores model is analyzed with only four wireless interconnects in three configurations with different antenna orientations and locations. Return loss and transmission coefficients are simulated in ANSYS HFSS. Moreover, wireless interconnects are designed, fabricated and tested on a 6'' silicon wafer with resistivity of 55O-cm using a basic standard CMOS process. Wireless interconnect are designed to work at 30GHz using ANSYS HFSS. The fabricated antennas are resonating around 20GHz with a return loss of less than -10dB. The transmission coefficients between antenna pair within a 20mm x 20mm silicon die is found to be varying between -45dB to -55dB. Furthermore, wireless interconnect approach is extended for 3D IC. Wireless interconnects are implemented as zigzag antenna. This thesis extends the work of analyzing the wireless interconnects in 3D IC with different configurations of antenna orientations and coolants. The return loss and transmission coefficients are simulated using ANSYS HFSS.

  18. A prior-based integrative framework for functional transcriptional regulatory network inference

    PubMed Central

    Siahpirani, Alireza F.

    2017-01-01

    Abstract Transcriptional regulatory networks specify regulatory proteins controlling the context-specific expression levels of genes. Inference of genome-wide regulatory networks is central to understanding gene regulation, but remains an open challenge. Expression-based network inference is among the most popular methods to infer regulatory networks, however, networks inferred from such methods have low overlap with experimentally derived (e.g. ChIP-chip and transcription factor (TF) knockouts) networks. Currently we have a limited understanding of this discrepancy. To address this gap, we first develop a regulatory network inference algorithm, based on probabilistic graphical models, to integrate expression with auxiliary datasets supporting a regulatory edge. Second, we comprehensively analyze our and other state-of-the-art methods on different expression perturbation datasets. Networks inferred by integrating sequence-specific motifs with expression have substantially greater agreement with experimentally derived networks, while remaining more predictive of expression than motif-based networks. Our analysis suggests natural genetic variation as the most informative perturbation for network inference, and, identifies core TFs whose targets are predictable from expression. Multiple reasons make the identification of targets of other TFs difficult, including network architecture and insufficient variation of TF mRNA level. Finally, we demonstrate the utility of our inference algorithm to infer stress-specific regulatory networks and for regulator prioritization. PMID:27794550

  19. On the use of Godhavn H-component as an indicator of the interplanetary sector polarity

    NASA Technical Reports Server (NTRS)

    Svalgaard, L.

    1974-01-01

    An objective method of inferring the polarity of the interplanetary magnetic field using the H-component at Godhavn is presented. The objectively inferred polarities are compared with a subjective index inferred earlier. It is concluded that no significant difference exists between the two methods. The inferred polarities derived from Godhavn H is biased by the (slp) sub q signature in the sense that during summer prolonged intervals of geomagnetic calm will result in inferred Away polarity regardless of the actual sector polarity. This bias does not significantly alter the large scale structure of the inferred sector structure.

  20. A Hierarchical Poisson Log-Normal Model for Network Inference from RNA Sequencing Data

    PubMed Central

    Gallopin, Mélina; Rau, Andrea; Jaffrézic, Florence

    2013-01-01

    Gene network inference from transcriptomic data is an important methodological challenge and a key aspect of systems biology. Although several methods have been proposed to infer networks from microarray data, there is a need for inference methods able to model RNA-seq data, which are count-based and highly variable. In this work we propose a hierarchical Poisson log-normal model with a Lasso penalty to infer gene networks from RNA-seq data; this model has the advantage of directly modelling discrete data and accounting for inter-sample variance larger than the sample mean. Using real microRNA-seq data from breast cancer tumors and simulations, we compare this method to a regularized Gaussian graphical model on log-transformed data, and a Poisson log-linear graphical model with a Lasso penalty on power-transformed data. For data simulated with large inter-sample dispersion, the proposed model performs better than the other methods in terms of sensitivity, specificity and area under the ROC curve. These results show the necessity of methods specifically designed for gene network inference from RNA-seq data. PMID:24147011

  1. Statistical Inference and Reverse Engineering of Gene Regulatory Networks from Observational Expression Data

    PubMed Central

    Emmert-Streib, Frank; Glazko, Galina V.; Altay, Gökmen; de Matos Simoes, Ricardo

    2012-01-01

    In this paper, we present a systematic and conceptual overview of methods for inferring gene regulatory networks from observational gene expression data. Further, we discuss two classic approaches to infer causal structures and compare them with contemporary methods by providing a conceptual categorization thereof. We complement the above by surveying global and local evaluation measures for assessing the performance of inference algorithms. PMID:22408642

  2. OncoNEM: inferring tumor evolution from single-cell sequencing data.

    PubMed

    Ross, Edith M; Markowetz, Florian

    2016-04-15

    Single-cell sequencing promises a high-resolution view of genetic heterogeneity and clonal evolution in cancer. However, methods to infer tumor evolution from single-cell sequencing data lag behind methods developed for bulk-sequencing data. Here, we present OncoNEM, a probabilistic method for inferring intra-tumor evolutionary lineage trees from somatic single nucleotide variants of single cells. OncoNEM identifies homogeneous cellular subpopulations and infers their genotypes as well as a tree describing their evolutionary relationships. In simulation studies, we assess OncoNEM's robustness and benchmark its performance against competing methods. Finally, we show its applicability in case studies of muscle-invasive bladder cancer and essential thrombocythemia.

  3. Inference of Vohradský's Models of Genetic Networks by Solving Two-Dimensional Function Optimization Problems

    PubMed Central

    Kimura, Shuhei; Sato, Masanao; Okada-Hatakeyama, Mariko

    2013-01-01

    The inference of a genetic network is a problem in which mutual interactions among genes are inferred from time-series of gene expression levels. While a number of models have been proposed to describe genetic networks, this study focuses on a mathematical model proposed by Vohradský. Because of its advantageous features, several researchers have proposed the inference methods based on Vohradský's model. When trying to analyze large-scale networks consisting of dozens of genes, however, these methods must solve high-dimensional non-linear function optimization problems. In order to resolve the difficulty of estimating the parameters of the Vohradský's model, this study proposes a new method that defines the problem as several two-dimensional function optimization problems. Through numerical experiments on artificial genetic network inference problems, we showed that, although the computation time of the proposed method is not the shortest, the method has the ability to estimate parameters of Vohradský's models more effectively with sufficiently short computation times. This study then applied the proposed method to an actual inference problem of the bacterial SOS DNA repair system, and succeeded in finding several reasonable regulations. PMID:24386175

  4. A Bayesian Alternative for Multi-objective Ecohydrological Model Specification

    NASA Astrophysics Data System (ADS)

    Tang, Y.; Marshall, L. A.; Sharma, A.; Ajami, H.

    2015-12-01

    Process-based ecohydrological models combine the study of hydrological, physical, biogeochemical and ecological processes of the catchments, which are usually more complex and parametric than conceptual hydrological models. Thus, appropriate calibration objectives and model uncertainty analysis are essential for ecohydrological modeling. In recent years, Bayesian inference has become one of the most popular tools for quantifying the uncertainties in hydrological modeling with the development of Markov Chain Monte Carlo (MCMC) techniques. Our study aims to develop appropriate prior distributions and likelihood functions that minimize the model uncertainties and bias within a Bayesian ecohydrological framework. In our study, a formal Bayesian approach is implemented in an ecohydrological model which combines a hydrological model (HyMOD) and a dynamic vegetation model (DVM). Simulations focused on one objective likelihood (Streamflow/LAI) and multi-objective likelihoods (Streamflow and LAI) with different weights are compared. Uniform, weakly informative and strongly informative prior distributions are used in different simulations. The Kullback-leibler divergence (KLD) is used to measure the dis(similarity) between different priors and corresponding posterior distributions to examine the parameter sensitivity. Results show that different prior distributions can strongly influence posterior distributions for parameters, especially when the available data is limited or parameters are insensitive to the available data. We demonstrate differences in optimized parameters and uncertainty limits in different cases based on multi-objective likelihoods vs. single objective likelihoods. We also demonstrate the importance of appropriately defining the weights of objectives in multi-objective calibration according to different data types.

  5. Atmospheric Properties Of T Dwarfs Inferred From Model Fits At Low Spectral Resolution

    NASA Astrophysics Data System (ADS)

    Giorla Godfrey, Paige A.; Rice, Emily L.; Filippazzo, Joseph C.; Douglas, Stephanie E.

    2016-09-01

    Brown dwarf spectral types (M, L, T, Y) correlate with spectral morphology, and generally appear to correspond with decreasing mass and effective temperature (Teff). Model fits to observed spectra suggest, however, that spectral subclasses do not share this monotonic temperature correlation, indicating that secondary parameters (gravity, metallicity, dust) significantly influence spectral morphology. We seekto disentangle the fundamental parameters that underlie the spectral type sequence of the coolest fully populated spectral class of brown dwarfs using atmosphere models. We investigate the relationship between spectral type and best fit model parameters for a sample of over 150 T dwarfs with low resolution (R 75-100) near-infrared ( 0.8-2.5 micron) SpeX Prism spectra. We use synthetic spectra from four model grids (Saumon & Marley 2008, Morley+ 2012, Saumon+ 2012, BT Settl 2013) and a Markov-Chain Monte Carlo (MCMC) analysis to determine robust best fit parameters and their uncertainties. We compare the consistency of each model grid by performing our analysis on the full spectrum and also on individual wavelength bands (Y,J,H,K). We find more consistent results between the J band and full spectrum fits and that our best fit spectral type-Teff results agree with the polynomial relationships of Stephens+2009 and Filippazzo+ 2015 using bolometric luminosities. Our analysis consists of the most extensive low resolution T dwarf model comparison to date, and lays the foundation for interpretation of cool brown dwarf and exoplanet spectra.

  6. Bayesian inference for multivariate meta-analysis Box-Cox transformation models for individual patient data with applications to evaluation of cholesterol lowering drugs

    PubMed Central

    Kim, Sungduk; Chen, Ming-Hui; Ibrahim, Joseph G.; Shah, Arvind K.; Lin, Jianxin

    2013-01-01

    In this paper, we propose a class of Box-Cox transformation regression models with multidimensional random effects for analyzing multivariate responses for individual patient data (IPD) in meta-analysis. Our modeling formulation uses a multivariate normal response meta-analysis model with multivariate random effects, in which each response is allowed to have its own Box-Cox transformation. Prior distributions are specified for the Box-Cox transformation parameters as well as the regression coefficients in this complex model, and the Deviance Information Criterion (DIC) is used to select the best transformation model. Since the model is quite complex, a novel Monte Carlo Markov chain (MCMC) sampling scheme is developed to sample from the joint posterior of the parameters. This model is motivated by a very rich dataset comprising 26 clinical trials involving cholesterol lowering drugs where the goal is to jointly model the three dimensional response consisting of Low Density Lipoprotein Cholesterol (LDL-C), High Density Lipoprotein Cholesterol (HDL-C), and Triglycerides (TG) (LDL-C, HDL-C, TG). Since the joint distribution of (LDL-C, HDL-C, TG) is not multivariate normal and in fact quite skewed, a Box-Cox transformation is needed to achieve normality. In the clinical literature, these three variables are usually analyzed univariately: however, a multivariate approach would be more appropriate since these variables are correlated with each other. A detailed analysis of these data is carried out using the proposed methodology. PMID:23580436

  7. Baking a mass-spectrometry data PIE with McMC and simulated annealing: predicting protein post-translational modifications from integrated top-down and bottom-up data.

    PubMed

    Jefferys, Stuart R; Giddings, Morgan C

    2011-03-15

    Post-translational modifications are vital to the function of proteins, but are hard to study, especially since several modified isoforms of a protein may be present simultaneously. Mass spectrometers are a great tool for investigating modified proteins, but the data they provide is often incomplete, ambiguous and difficult to interpret. Combining data from multiple experimental techniques-especially bottom-up and top-down mass spectrometry-provides complementary information. When integrated with background knowledge this allows a human expert to interpret what modifications are present and where on a protein they are located. However, the process is arduous and for high-throughput applications needs to be automated. This article explores a data integration methodology based on Markov chain Monte Carlo and simulated annealing. Our software, the Protein Inference Engine (the PIE) applies these algorithms using a modular approach, allowing multiple types of data to be considered simultaneously and for new data types to be added as needed. Even for complicated data representing multiple modifications and several isoforms, the PIE generates accurate modification predictions, including location. When applied to experimental data collected on the L7/L12 ribosomal protein the PIE was able to make predictions consistent with manual interpretation for several different L7/L12 isoforms using a combination of bottom-up data with experimentally identified intact masses. Software, demo projects and source can be downloaded from http://pie.giddingslab.org/

  8. Merging parallel tempering with sequential geostatistical resampling for improved posterior exploration of high-dimensional subsurface categorical fields

    NASA Astrophysics Data System (ADS)

    Laloy, Eric; Linde, Niklas; Jacques, Diederik; Mariethoz, Grégoire

    2016-04-01

    The sequential geostatistical resampling (SGR) algorithm is a Markov chain Monte Carlo (MCMC) scheme for sampling from possibly non-Gaussian, complex spatially-distributed prior models such as geologic facies or categorical fields. In this work, we highlight the limits of standard SGR for posterior inference of high-dimensional categorical fields with realistically complex likelihood landscapes and benchmark a parallel tempering implementation (PT-SGR). Our proposed PT-SGR approach is demonstrated using synthetic (error corrupted) data from steady-state flow and transport experiments in categorical 7575- and 10,000-dimensional 2D conductivity fields. In both case studies, every SGR trial gets trapped in a local optima while PT-SGR maintains an higher diversity in the sampled model states. The advantage of PT-SGR is most apparent in an inverse transport problem where the posterior distribution is made bimodal by construction. PT-SGR then converges towards the appropriate data misfit much faster than SGR and partly recovers the two modes. In contrast, for the same computational resources SGR does not fit the data to the appropriate error level and hardly produces a locally optimal solution that looks visually similar to one of the two reference modes. Although PT-SGR clearly surpasses SGR in performance, our results also indicate that using a small number (16-24) of temperatures (and thus parallel cores) may not permit complete sampling of the posterior distribution by PT-SGR within a reasonable computational time (less than 1-2 weeks).

  9. Bayesian analysis of zero inflated spatiotemporal HIV/TB child mortality data through the INLA and SPDE approaches: Applied to data observed between 1992 and 2010 in rural North East South Africa

    NASA Astrophysics Data System (ADS)

    Musenge, Eustasius; Chirwa, Tobias Freeman; Kahn, Kathleen; Vounatsou, Penelope

    2013-06-01

    Longitudinal mortality data with few deaths usually have problems of zero-inflation. This paper presents and applies two Bayesian models which cater for zero-inflation, spatial and temporal random effects. To reduce the computational burden experienced when a large number of geo-locations are treated as a Gaussian field (GF) we transformed the field to a Gaussian Markov Random Fields (GMRF) by triangulation. We then modelled the spatial random effects using the Stochastic Partial Differential Equations (SPDEs). Inference was done using a computationally efficient alternative to Markov chain Monte Carlo (MCMC) called Integrated Nested Laplace Approximation (INLA) suited for GMRF. The models were applied to data from 71,057 children aged 0 to under 10 years from rural north-east South Africa living in 15,703 households over the years 1992-2010. We found protective effects on HIV/TB mortality due to greater birth weight, older age and more antenatal clinic visits during pregnancy (adjusted RR (95% CI)): 0.73(0.53;0.99), 0.18(0.14;0.22) and 0.96(0.94;0.97) respectively. Therefore childhood HIV/TB mortality could be reduced if mothers are better catered for during pregnancy as this can reduce mother-to-child transmissions and contribute to improved birth weights. The INLA and SPDE approaches are computationally good alternatives in modelling large multilevel spatiotemporal GMRF data structures.

  10. Exact posterior computation in non-conjugate Gaussian location-scale parameters models

    NASA Astrophysics Data System (ADS)

    Andrade, J. A. A.; Rathie, P. N.

    2017-12-01

    In Bayesian analysis the class of conjugate models allows to obtain exact posterior distributions, however this class quite restrictive in the sense that it involves only a few distributions. In fact, most of the practical applications involves non-conjugate models, thus approximate methods, such as the MCMC algorithms, are required. Although these methods can deal with quite complex structures, some practical problems can make their applications quite time demanding, for example, when we use heavy-tailed distributions, convergence may be difficult, also the Metropolis-Hastings algorithm can become very slow, in addition to the extra work inevitably required on choosing efficient candidate generator distributions. In this work, we draw attention to the special functions as a tools for Bayesian computation, we propose an alternative method for obtaining the posterior distribution in Gaussian non-conjugate models in an exact form. We use complex integration methods based on the H-function in order to obtain the posterior distribution and some of its posterior quantities in an explicit computable form. Two examples are provided in order to illustrate the theory.

  11. Bayesian additive decision trees of biomarker by treatment interactions for predictive biomarker detection and subgroup identification.

    PubMed

    Zhao, Yang; Zheng, Wei; Zhuo, Daisy Y; Lu, Yuefeng; Ma, Xiwen; Liu, Hengchang; Zeng, Zhen; Laird, Glen

    2017-10-11

    Personalized medicine, or tailored therapy, has been an active and important topic in recent medical research. Many methods have been proposed in the literature for predictive biomarker detection and subgroup identification. In this article, we propose a novel decision tree-based approach applicable in randomized clinical trials. We model the prognostic effects of the biomarkers using additive regression trees and the biomarker-by-treatment effect using a single regression tree. Bayesian approach is utilized to periodically revise the split variables and the split rules of the decision trees, which provides a better overall fitting. Gibbs sampler is implemented in the MCMC procedure, which updates the prognostic trees and the interaction tree separately. We use the posterior distribution of the interaction tree to construct the predictive scores of the biomarkers and to identify the subgroup where the treatment is superior to the control. Numerical simulations show that our proposed method performs well under various settings comparing to existing methods. We also demonstrate an application of our method in a real clinical trial.

  12. Selection of species and sampling areas: The importance of inference

    Treesearch

    Paul Stephen Corn

    2009-01-01

    Inductive inference, the process of drawing general conclusions from specific observations, is fundamental to the scientific method. Platt (1964) termed conclusions obtained through rigorous application of the scientific method as "strong inference" and noted the following basic steps: generating alternative hypotheses; devising experiments, the...

  13. The Joker: A custom Monte Carlo sampler for binary-star and exoplanet radial velocity data

    NASA Astrophysics Data System (ADS)

    Price-Whelan, Adrian M.; Hogg, David W.; Foreman-Mackey, Daniel; Rix, Hans-Walter

    2017-01-01

    Given sparse or low-quality radial-velocity measurements of a star, there are often many qualitatively different stellar or exoplanet companion orbit models that are consistent with the data. The consequent multimodality of the likelihood function leads to extremely challenging search, optimization, and MCMC posterior sampling over the orbital parameters. The Joker is a custom-built Monte Carlo sampler that can produce a posterior sampling for orbital parameters given sparse or noisy radial-velocity measurements, even when the likelihood function is poorly behaved. The method produces correct samplings in orbital parameters for data that include as few as three epochs. The Joker can therefore be used to produce proper samplings of multimodal pdfs, which are still highly informative and can be used in hierarchical (population) modeling.

  14. Low Frequency Flats for Imaging Cameras on the Hubble Space Telescope

    NASA Astrophysics Data System (ADS)

    Kossakowski, Diana; Avila, Roberto J.; Borncamp, David; Grogin, Norman A.

    2017-01-01

    We created a revamped Low Frequency Flat (L-Flat) algorithm for the Hubble Space Telescope (HST) and all of its imaging cameras. The current program that makes these calibration files does not compile on modern computer systems and it requires translation to Python. We took the opportunity to explore various methods that reduce the scatter of photometric observations using chi-squared optimizers along with Markov Chain Monte Carlo (MCMC). We created simulations to validate the algorithms and then worked with the UV photometry of the globular cluster NGC6681 to update the calibration files for the Advanced Camera for Surveys (ACS) and Solar Blind Channel (SBC). The new software was made for general usage and therefore can be applied to any of the current imaging cameras on HST.

  15. A new approach for handling longitudinal count data with zero-inflation and overdispersion: poisson geometric process model.

    PubMed

    Wan, Wai-Yin; Chan, Jennifer S K

    2009-08-01

    For time series of count data, correlated measurements, clustering as well as excessive zeros occur simultaneously in biomedical applications. Ignoring such effects might contribute to misleading treatment outcomes. A generalized mixture Poisson geometric process (GMPGP) model and a zero-altered mixture Poisson geometric process (ZMPGP) model are developed from the geometric process model, which was originally developed for modelling positive continuous data and was extended to handle count data. These models are motivated by evaluating the trend development of new tumour counts for bladder cancer patients as well as by identifying useful covariates which affect the count level. The models are implemented using Bayesian method with Markov chain Monte Carlo (MCMC) algorithms and are assessed using deviance information criterion (DIC).

  16. Building an ACT-R Reader for Eye-Tracking Corpus Data.

    PubMed

    Dotlačil, Jakub

    2018-01-01

    Cognitive architectures have often been applied to data from individual experiments. In this paper, I develop an ACT-R reader that can model a much larger set of data, eye-tracking corpus data. It is shown that the resulting model has a good fit to the data for the considered low-level processes. Unlike previous related works (most prominently, Engelmann, Vasishth, Engbert & Kliegl, ), the model achieves the fit by estimating free parameters of ACT-R using Bayesian estimation and Markov-Chain Monte Carlo (MCMC) techniques, rather than by relying on the mix of manual selection + default values. The method used in the paper is generalizable beyond this particular model and data set and could be used on other ACT-R models. Copyright © 2017 Cognitive Science Society, Inc.

  17. An algebra-based method for inferring gene regulatory networks.

    PubMed

    Vera-Licona, Paola; Jarrah, Abdul; Garcia-Puente, Luis David; McGee, John; Laubenbacher, Reinhard

    2014-03-26

    The inference of gene regulatory networks (GRNs) from experimental observations is at the heart of systems biology. This includes the inference of both the network topology and its dynamics. While there are many algorithms available to infer the network topology from experimental data, less emphasis has been placed on methods that infer network dynamics. Furthermore, since the network inference problem is typically underdetermined, it is essential to have the option of incorporating into the inference process, prior knowledge about the network, along with an effective description of the search space of dynamic models. Finally, it is also important to have an understanding of how a given inference method is affected by experimental and other noise in the data used. This paper contains a novel inference algorithm using the algebraic framework of Boolean polynomial dynamical systems (BPDS), meeting all these requirements. The algorithm takes as input time series data, including those from network perturbations, such as knock-out mutant strains and RNAi experiments. It allows for the incorporation of prior biological knowledge while being robust to significant levels of noise in the data used for inference. It uses an evolutionary algorithm for local optimization with an encoding of the mathematical models as BPDS. The BPDS framework allows an effective representation of the search space for algebraic dynamic models that improves computational performance. The algorithm is validated with both simulated and experimental microarray expression profile data. Robustness to noise is tested using a published mathematical model of the segment polarity gene network in Drosophila melanogaster. Benchmarking of the algorithm is done by comparison with a spectrum of state-of-the-art network inference methods on data from the synthetic IRMA network to demonstrate that our method has good precision and recall for the network reconstruction task, while also predicting several of the dynamic patterns present in the network. Boolean polynomial dynamical systems provide a powerful modeling framework for the reverse engineering of gene regulatory networks, that enables a rich mathematical structure on the model search space. A C++ implementation of the method, distributed under LPGL license, is available, together with the source code, at http://www.paola-vera-licona.net/Software/EARevEng/REACT.html.

  18. Meta-learning framework applied in bioinformatics inference system design.

    PubMed

    Arredondo, Tomás; Ormazábal, Wladimir

    2015-01-01

    This paper describes a meta-learner inference system development framework which is applied and tested in the implementation of bioinformatic inference systems. These inference systems are used for the systematic classification of the best candidates for inclusion in bacterial metabolic pathway maps. This meta-learner-based approach utilises a workflow where the user provides feedback with final classification decisions which are stored in conjunction with analysed genetic sequences for periodic inference system training. The inference systems were trained and tested with three different data sets related to the bacterial degradation of aromatic compounds. The analysis of the meta-learner-based framework involved contrasting several different optimisation methods with various different parameters. The obtained inference systems were also contrasted with other standard classification methods with accurate prediction capabilities observed.

  19. A sub-space greedy search method for efficient Bayesian Network inference.

    PubMed

    Zhang, Qing; Cao, Yong; Li, Yong; Zhu, Yanming; Sun, Samuel S M; Guo, Dianjing

    2011-09-01

    Bayesian network (BN) has been successfully used to infer the regulatory relationships of genes from microarray dataset. However, one major limitation of BN approach is the computational cost because the calculation time grows more than exponentially with the dimension of the dataset. In this paper, we propose a sub-space greedy search method for efficient Bayesian Network inference. Particularly, this method limits the greedy search space by only selecting gene pairs with higher partial correlation coefficients. Using both synthetic and real data, we demonstrate that the proposed method achieved comparable results with standard greedy search method yet saved ∼50% of the computational time. We believe that sub-space search method can be widely used for efficient BN inference in systems biology. Copyright © 2011 Elsevier Ltd. All rights reserved.

  20. MCMC Sampling for a Multilevel Model with Nonindependent Residuals within and between Cluster Units

    ERIC Educational Resources Information Center

    Browne, William; Goldstein, Harvey

    2010-01-01

    In this article, we discuss the effect of removing the independence assumptions between the residuals in two-level random effect models. We first consider removing the independence between the Level 2 residuals and instead assume that the vector of all residuals at the cluster level follows a general multivariate normal distribution. We…

  1. A Markov Chain Monte Carlo Approach to Confirmatory Item Factor Analysis

    ERIC Educational Resources Information Center

    Edwards, Michael C.

    2010-01-01

    Item factor analysis has a rich tradition in both the structural equation modeling and item response theory frameworks. The goal of this paper is to demonstrate a novel combination of various Markov chain Monte Carlo (MCMC) estimation routines to estimate parameters of a wide variety of confirmatory item factor analysis models. Further, I show…

  2. Markov Chain Monte Carlo Estimation of Item Parameters for the Generalized Graded Unfolding Model

    ERIC Educational Resources Information Center

    de la Torre, Jimmy; Stark, Stephen; Chernyshenko, Oleksandr S.

    2006-01-01

    The authors present a Markov Chain Monte Carlo (MCMC) parameter estimation procedure for the generalized graded unfolding model (GGUM) and compare it to the marginal maximum likelihood (MML) approach implemented in the GGUM2000 computer program, using simulated and real personality data. In the simulation study, test length, number of response…

  3. Brain Imaging, Forward Inference, and Theories of Reasoning

    PubMed Central

    Heit, Evan

    2015-01-01

    This review focuses on the issue of how neuroimaging studies address theoretical accounts of reasoning, through the lens of the method of forward inference (Henson, 2005, 2006). After theories of deductive and inductive reasoning are briefly presented, the method of forward inference for distinguishing between psychological theories based on brain imaging evidence is critically reviewed. Brain imaging studies of reasoning, comparing deductive and inductive arguments, comparing meaningful versus non-meaningful material, investigating hemispheric localization, and comparing conditional and relational arguments, are assessed in light of the method of forward inference. Finally, conclusions are drawn with regard to future research opportunities. PMID:25620926

  4. Brain imaging, forward inference, and theories of reasoning.

    PubMed

    Heit, Evan

    2014-01-01

    This review focuses on the issue of how neuroimaging studies address theoretical accounts of reasoning, through the lens of the method of forward inference (Henson, 2005, 2006). After theories of deductive and inductive reasoning are briefly presented, the method of forward inference for distinguishing between psychological theories based on brain imaging evidence is critically reviewed. Brain imaging studies of reasoning, comparing deductive and inductive arguments, comparing meaningful versus non-meaningful material, investigating hemispheric localization, and comparing conditional and relational arguments, are assessed in light of the method of forward inference. Finally, conclusions are drawn with regard to future research opportunities.

  5. CompareSVM: supervised, Support Vector Machine (SVM) inference of gene regularity networks.

    PubMed

    Gillani, Zeeshan; Akash, Muhammad Sajid Hamid; Rahaman, M D Matiur; Chen, Ming

    2014-11-30

    Predication of gene regularity network (GRN) from expression data is a challenging task. There are many methods that have been developed to address this challenge ranging from supervised to unsupervised methods. Most promising methods are based on support vector machine (SVM). There is a need for comprehensive analysis on prediction accuracy of supervised method SVM using different kernels on different biological experimental conditions and network size. We developed a tool (CompareSVM) based on SVM to compare different kernel methods for inference of GRN. Using CompareSVM, we investigated and evaluated different SVM kernel methods on simulated datasets of microarray of different sizes in detail. The results obtained from CompareSVM showed that accuracy of inference method depends upon the nature of experimental condition and size of the network. For network with nodes (<200) and average (over all sizes of networks), SVM Gaussian kernel outperform on knockout, knockdown, and multifactorial datasets compared to all the other inference methods. For network with large number of nodes (~500), choice of inference method depend upon nature of experimental condition. CompareSVM is available at http://bis.zju.edu.cn/CompareSVM/ .

  6. Inferring the temperature dependence of population parameters: the effects of experimental design and inference algorithm

    PubMed Central

    Palamara, Gian Marco; Childs, Dylan Z; Clements, Christopher F; Petchey, Owen L; Plebani, Marco; Smith, Matthew J

    2014-01-01

    Understanding and quantifying the temperature dependence of population parameters, such as intrinsic growth rate and carrying capacity, is critical for predicting the ecological responses to environmental change. Many studies provide empirical estimates of such temperature dependencies, but a thorough investigation of the methods used to infer them has not been performed yet. We created artificial population time series using a stochastic logistic model parameterized with the Arrhenius equation, so that activation energy drives the temperature dependence of population parameters. We simulated different experimental designs and used different inference methods, varying the likelihood functions and other aspects of the parameter estimation methods. Finally, we applied the best performing inference methods to real data for the species Paramecium caudatum. The relative error of the estimates of activation energy varied between 5% and 30%. The fraction of habitat sampled played the most important role in determining the relative error; sampling at least 1% of the habitat kept it below 50%. We found that methods that simultaneously use all time series data (direct methods) and methods that estimate population parameters separately for each temperature (indirect methods) are complementary. Indirect methods provide a clearer insight into the shape of the functional form describing the temperature dependence of population parameters; direct methods enable a more accurate estimation of the parameters of such functional forms. Using both methods, we found that growth rate and carrying capacity of Paramecium caudatum scale with temperature according to different activation energies. Our study shows how careful choice of experimental design and inference methods can increase the accuracy of the inferred relationships between temperature and population parameters. The comparison of estimation methods provided here can increase the accuracy of model predictions, with important implications in understanding and predicting the effects of temperature on the dynamics of populations. PMID:25558365

  7. Exploring nonlinear feature space dimension reduction and data representation in breast Cadx with Laplacian eigenmaps and t-SNE.

    PubMed

    Jamieson, Andrew R; Giger, Maryellen L; Drukker, Karen; Li, Hui; Yuan, Yading; Bhooshan, Neha

    2010-01-01

    In this preliminary study, recently developed unsupervised nonlinear dimension reduction (DR) and data representation techniques were applied to computer-extracted breast lesion feature spaces across three separate imaging modalities: Ultrasound (U.S.) with 1126 cases, dynamic contrast enhanced magnetic resonance imaging with 356 cases, and full-field digital mammography with 245 cases. Two methods for nonlinear DR were explored: Laplacian eigenmaps [M. Belkin and P. Niyogi, "Laplacian eigenmaps for dimensionality reduction and data representation," Neural Comput. 15, 1373-1396 (2003)] and t-distributed stochastic neighbor embedding (t-SNE) [L. van der Maaten and G. Hinton, "Visualizing data using t-SNE," J. Mach. Learn. Res. 9, 2579-2605 (2008)]. These methods attempt to map originally high dimensional feature spaces to more human interpretable lower dimensional spaces while preserving both local and global information. The properties of these methods as applied to breast computer-aided diagnosis (CADx) were evaluated in the context of malignancy classification performance as well as in the visual inspection of the sparseness within the two-dimensional and three-dimensional mappings. Classification performance was estimated by using the reduced dimension mapped feature output as input into both linear and nonlinear classifiers: Markov chain Monte Carlo based Bayesian artificial neural network (MCMC-BANN) and linear discriminant analysis. The new techniques were compared to previously developed breast CADx methodologies, including automatic relevance determination and linear stepwise (LSW) feature selection, as well as a linear DR method based on principal component analysis. Using ROC analysis and 0.632+bootstrap validation, 95% empirical confidence intervals were computed for the each classifier's AUC performance. In the large U.S. data set, sample high performance results include, AUC0.632+ = 0.88 with 95% empirical bootstrap interval [0.787;0.895] for 13 ARD selected features and AUC0.632+ = 0.87 with interval [0.817;0.906] for four LSW selected features compared to 4D t-SNE mapping (from the original 81D feature space) giving AUC0.632+ = 0.90 with interval [0.847;0.919], all using the MCMC-BANN. Preliminary results appear to indicate capability for the new methods to match or exceed classification performance of current advanced breast lesion CADx algorithms. While not appropriate as a complete replacement of feature selection in CADx problems, DR techniques offer a complementary approach, which can aid elucidation of additional properties associated with the data. Specifically, the new techniques were shown to possess the added benefit of delivering sparse lower dimensional representations for visual interpretation, revealing intricate data structure of the feature space.

  8. A New Statistical Model for Eruption Forecasting at Open Conduit Volcanoes: an Application to Mt Etna and Kilauea Volcanoes

    NASA Astrophysics Data System (ADS)

    Passarelli, Luigi; Sanso, Bruno; Laura, Sandri; Marzocchi, Warner

    2010-05-01

    One of the main goals in volcanology is to forecast volcanic eruptions. A trenchant forecast should be made before the onset of a volcanic eruption, using the data available at that time, with the aim of mitigating the volcanic risk associated to the volcanic event. In other words, models implemented with forecast purposes have to take into account the possibility to provide "forward" forecasts and should avoid the idea of a merely "retrospective" fitting of the data available. In this perspective, the main idea of the present model is to forecast the next volcanic eruption after the end of the last one, using only the data available at that time. We focus our attention on volcanoes with open conduit regime and high eruption frequency. We assume a generalization of the classical time predictable model to describe the eruptive behavior of open conduit volcanoes and we use a Bayesian hierarchical model to make probabilistic forecast. We apply the model to Kilauea volcano eruptive data and Mt. Etna volcano flank eruption data. The aims of this model are: 1) to test whether or not the Kilauea and Mt Etna volcanoes follow a time predictable behavior; 2) to discuss the volcanological implications of the time predictable model parameters inferred; 3) to compare the forecast capabilities of this model with other models present in literature. The results obtained using the MCMC sampling algorithm show that both volcanoes follow a time predictable behavior. The numerical values of the time predictable model parameters inferred suggest that the amount of the erupted volume could change the dynamics of the magma chamber refilling process during the repose period. The probability gain of this model compared with other models already present in literature is appreciably greater than zero. This means that our model performs better forecast than previous models and it could be used in a probabilistic volcanic hazard assessment scheme. In this perspective, the probability of eruptions given by our model for Mt Etna volcano flank eruption are published on a internet website and are updated after any change in the eruptive activity.

  9. A review of causal inference for biomedical informatics

    PubMed Central

    Kleinberg, Samantha; Hripcsak, George

    2011-01-01

    Causality is an important concept throughout the health sciences and is particularly vital for informatics work such as finding adverse drug events or risk factors for disease using electronic health records. While philosophers and scientists working for centuries on formalizing what makes something a cause have not reached a consensus, new methods for inference show that we can make progress in this area in many practical cases. This article reviews core concepts in understanding and identifying causality and then reviews current computational methods for inference and explanation, focusing on inference from large-scale observational data. While the problem is not fully solved, we show that graphical models and Granger causality provide useful frameworks for inference and that a more recent approach based on temporal logic addresses some of the limitations of these methods. PMID:21782035

  10. Integrative approach for inference of gene regulatory networks using lasso-based random featuring and application to psychiatric disorders.

    PubMed

    Kim, Dongchul; Kang, Mingon; Biswas, Ashis; Liu, Chunyu; Gao, Jean

    2016-08-10

    Inferring gene regulatory networks is one of the most interesting research areas in the systems biology. Many inference methods have been developed by using a variety of computational models and approaches. However, there are two issues to solve. First, depending on the structural or computational model of inference method, the results tend to be inconsistent due to innately different advantages and limitations of the methods. Therefore the combination of dissimilar approaches is demanded as an alternative way in order to overcome the limitations of standalone methods through complementary integration. Second, sparse linear regression that is penalized by the regularization parameter (lasso) and bootstrapping-based sparse linear regression methods were suggested in state of the art methods for network inference but they are not effective for a small sample size data and also a true regulator could be missed if the target gene is strongly affected by an indirect regulator with high correlation or another true regulator. We present two novel network inference methods based on the integration of three different criteria, (i) z-score to measure the variation of gene expression from knockout data, (ii) mutual information for the dependency between two genes, and (iii) linear regression-based feature selection. Based on these criterion, we propose a lasso-based random feature selection algorithm (LARF) to achieve better performance overcoming the limitations of bootstrapping as mentioned above. In this work, there are three main contributions. First, our z score-based method to measure gene expression variations from knockout data is more effective than similar criteria of related works. Second, we confirmed that the true regulator selection can be effectively improved by LARF. Lastly, we verified that an integrative approach can clearly outperform a single method when two different methods are effectively jointed. In the experiments, our methods were validated by outperforming the state of the art methods on DREAM challenge data, and then LARF was applied to inferences of gene regulatory network associated with psychiatric disorders.

  11. Bayesian Inference for Functional Dynamics Exploring in fMRI Data.

    PubMed

    Guo, Xuan; Liu, Bing; Chen, Le; Chen, Guantao; Pan, Yi; Zhang, Jing

    2016-01-01

    This paper aims to review state-of-the-art Bayesian-inference-based methods applied to functional magnetic resonance imaging (fMRI) data. Particularly, we focus on one specific long-standing challenge in the computational modeling of fMRI datasets: how to effectively explore typical functional interactions from fMRI time series and the corresponding boundaries of temporal segments. Bayesian inference is a method of statistical inference which has been shown to be a powerful tool to encode dependence relationships among the variables with uncertainty. Here we provide an introduction to a group of Bayesian-inference-based methods for fMRI data analysis, which were designed to detect magnitude or functional connectivity change points and to infer their functional interaction patterns based on corresponding temporal boundaries. We also provide a comparison of three popular Bayesian models, that is, Bayesian Magnitude Change Point Model (BMCPM), Bayesian Connectivity Change Point Model (BCCPM), and Dynamic Bayesian Variable Partition Model (DBVPM), and give a summary of their applications. We envision that more delicate Bayesian inference models will be emerging and play increasingly important roles in modeling brain functions in the years to come.

  12. Semi-Supervised Multi-View Learning for Gene Network Reconstruction

    PubMed Central

    Ceci, Michelangelo; Pio, Gianvito; Kuzmanovski, Vladimir; Džeroski, Sašo

    2015-01-01

    The task of gene regulatory network reconstruction from high-throughput data is receiving increasing attention in recent years. As a consequence, many inference methods for solving this task have been proposed in the literature. It has been recently observed, however, that no single inference method performs optimally across all datasets. It has also been shown that the integration of predictions from multiple inference methods is more robust and shows high performance across diverse datasets. Inspired by this research, in this paper, we propose a machine learning solution which learns to combine predictions from multiple inference methods. While this approach adds additional complexity to the inference process, we expect it would also carry substantial benefits. These would come from the automatic adaptation to patterns on the outputs of individual inference methods, so that it is possible to identify regulatory interactions more reliably when these patterns occur. This article demonstrates the benefits (in terms of accuracy of the reconstructed networks) of the proposed method, which exploits an iterative, semi-supervised ensemble-based algorithm. The algorithm learns to combine the interactions predicted by many different inference methods in the multi-view learning setting. The empirical evaluation of the proposed algorithm on a prokaryotic model organism (E. coli) and on a eukaryotic model organism (S. cerevisiae) clearly shows improved performance over the state of the art methods. The results indicate that gene regulatory network reconstruction for the real datasets is more difficult for S. cerevisiae than for E. coli. The software, all the datasets used in the experiments and all the results are available for download at the following link: http://figshare.com/articles/Semi_supervised_Multi_View_Learning_for_Gene_Network_Reconstruction/1604827. PMID:26641091

  13. Snow Water Equivalent Retrieval By Markov Chain Monte Carlo Based on Memls and Hut Snow Emission Model

    NASA Astrophysics Data System (ADS)

    Pan, J.; Durand, M. T.; Vanderjagt, B. J.

    2014-12-01

    The Markov chain Monte Carlo (MCMC) method had been proved to be successful in snow water equivalent retrieval based on synthetic point-scale passive microwave brightness temperature (TB) observations. This method needs only general prior information about distribution of snow parameters, and could estimate layered snow properties, including the thickness, temperature, density and snow grain size (or exponential correlation length) of each layer. In this study, the multi-layer HUT (Helsinki University of Technology) model and the MEMLS (Microwave Emission Model of Layered Snowpacks) will be used as observation models to assimilate the observed TB into snow parameter prediction. Previous studies had shown that the multi-layer HUT model tends to underestimate TB at 37 GHz for deep snow, while the MEMLS does not show sensitivity of model bias to snow depth. Therefore, results using HUT model and MEMLS will be compared to see how the observation model will influence the retrieval of snow parameters. The radiometric measurements at 10.65, 18.7, 36.5 and 90 GHz at Sodankyla, Finland will be used as MCMC input, and the statistics of all snow property measurement will be used to calculate the prior information. 43 dry snowpits with complete measurements of all snow parameters will be used for validation. The entire dataset are from NorSREx (Nordic Snow Radar Experiment) experiments carried out by Juha Lemmetyinen, Anna Kontu and Jouni Pulliainen in FMI in 2009-2011 winters, and continued two more winters from 2011 to Spring of 2013. Besides the snow thickness and snow density that are directly related to snow water equivalent, other parameters will be compared with observations, too. For thin snow, the previous studies showed that influence of underlying soil is considerable, especially when the soil is half frozen with part of unfrozen liquid water and part of ice. Therefore, this study will also try to employ a simple frozen soil permittivity model to improve the performance of retrieval. The behavior of the Markov chain in soil parameters will be studied.

  14. Seismic imaging of Q structures by a trans-dimensional coda-wave analysis

    NASA Astrophysics Data System (ADS)

    Takahashi, Tsutomu

    2017-04-01

    Wave scattering and intrinsic attenuation are important processes to describe incoherent and complex wave trains of high frequency seismic wave (>1Hz). The multiple lapse time window analysis (MLTWA) has been used to estimate scattering and intrinsic Q values by assuming constant Q in a study area (e.g., Hoshiba 1993). This study generalizes this MLTWA to estimate lateral variations of Q values under the Bayesian framework in dimension variable space. Study area is partitioned into small areas by means of the Voronoi tessellation. Scattering and intrinsic Q in each small area are constant. We define a misfit function for spatiotemporal variations of wave energy as with the original MLTWA, and maximize the posterior probability with changing not only Q values but the number and spatial layout of the Voronoi cells. This maximization is conducted by means of the reversible jump Markov chain Monte Carlo (rjMCMC) (Green 1995) since the number of unknown parameters (i.e., dimension of posterior probability) is variable. After a convergence to the maximum posterior, we estimate Q structures from the ensemble averages of MCMC samples around the maximum posterior probability. Synthetic tests showed stable reconstructions of input structures with reasonable error distributions. We applied this method for seismic waveform data recorded by ocean bottom seismograms at the outer-rise area off Tohoku, and estimated Q values at 4-8Hz, 8-16Hz and 16-32Hz. Intrinsic Q are nearly constant at all frequency bands, and scattering Q shows two distinct strong scattering regions at petit spot area and high seismicity area. These strong scattering are probably related to magma inclusions and fractured structure, respectively. Difference between these two areas becomes clear at high frequencies. It means that scale dependences of inhomogeneities or smaller scale inhomogeneity is important to discuss medium property and origins of structural variations. While the generalized MLTWA is based on a classical waveform modeling in constant Q medium, this method can be a fundamental basis for Q structure imaging in the crust.

  15. Topology, divergence dates, and macroevolutionary inferences vary between different tip-dating approaches applied to fossil theropods (Dinosauria).

    PubMed

    Bapst, D W; Wright, A M; Matzke, N J; Lloyd, G T

    2016-07-01

    Dated phylogenies of fossil taxa allow palaeobiologists to estimate the timing of major divergences and placement of extinct lineages, and to test macroevolutionary hypotheses. Recently developed Bayesian 'tip-dating' methods simultaneously infer and date the branching relationships among fossil taxa, and infer putative ancestral relationships. Using a previously published dataset for extinct theropod dinosaurs, we contrast the dated relationships inferred by several tip-dating approaches and evaluate potential downstream effects on phylogenetic comparative methods. We also compare tip-dating analyses to maximum-parsimony trees time-scaled via alternative a posteriori approaches including via the probabilistic cal3 method. Among tip-dating analyses, we find opposing but strongly supported relationships, despite similarity in inferred ancestors. Overall, tip-dating methods infer divergence dates often millions (or tens of millions) of years older than the earliest stratigraphic appearance of that clade. Model-comparison analyses of the pattern of body-size evolution found that the support for evolutionary mode can vary across and between tree samples from cal3 and tip-dating approaches. These differences suggest that model and software choice in dating analyses can have a substantial impact on the dated phylogenies obtained and broader evolutionary inferences. © 2016 The Author(s).

  16. Connectivity inference from neural recording data: Challenges, mathematical bases and research directions.

    PubMed

    Magrans de Abril, Ildefons; Yoshimoto, Junichiro; Doya, Kenji

    2018-06-01

    This article presents a review of computational methods for connectivity inference from neural activity data derived from multi-electrode recordings or fluorescence imaging. We first identify biophysical and technical challenges in connectivity inference along the data processing pipeline. We then review connectivity inference methods based on two major mathematical foundations, namely, descriptive model-free approaches and generative model-based approaches. We investigate representative studies in both categories and clarify which challenges have been addressed by which method. We further identify critical open issues and possible research directions. Copyright © 2018 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  17. ExoSOFT: Exoplanet Simple Orbit Fitting Toolbox

    NASA Astrophysics Data System (ADS)

    Mede, Kyle; Brandt, Timothy D.

    2017-08-01

    ExoSOFT provides orbital analysis of exoplanets and binary star systems. It fits any combination of astrometric and radial velocity data, and offers four parameter space exploration techniques, including MCMC. It is packaged with an automated set of post-processing and plotting routines to summarize results, and is suitable for performing orbital analysis during surveys with new radial velocity and direct imaging instruments.

  18. Just Another Gibbs Sampler (JAGS): Flexible Software for MCMC Implementation

    ERIC Educational Resources Information Center

    Depaoli, Sarah; Clifton, James P.; Cobb, Patrice R.

    2016-01-01

    A review of the software Just Another Gibbs Sampler (JAGS) is provided. We cover aspects related to history and development and the elements a user needs to know to get started with the program, including (a) definition of the data, (b) definition of the model, (c) compilation of the model, and (d) initialization of the model. An example using a…

  19. Development and Tuning of a 3D Stochastic Inversion Methodology to the European Arctic

    DTIC Science & Technology

    2010-09-01

    from previous studies covering the region, in particular from Breivik et al. (2002). Our MCMC algorithm shown in Figure 3 has two major components...criteria, Geophys. J. Int., 156: 483–496, doi:10.1111/j.1365-246X.2004.570 02070.x. Breivik , A., R. Mjelde, P. Grogan, H. Shimamura, Y. Murai, Y

  20. Specific PCR Identification between Peucedanum praeruptorum and Angelica decursiva and Identification between Them and Adulterant Using DNA Barcode.

    PubMed

    Han, Bang-Xing; Yuan, Yuan; Huang, Lu-Qi; Zhao, Qun; Tan, Ling-Ling; Song, Xiang-Wen; He, Xiao-Mei; Xu, Tao; Liu, Feng; Wang, Jian

    2017-01-01

    The traditional Chinese medicine (TCM) Qianhu and Zihuaqianhu are the dried roots of Peucedanum praeruptorum and Angelica decursiva , respectively. Since the plant sources of Qianhu and Zihuaqianhu are more complex, the chemical compositions of P. praeruptorum and A. decursiva are significantly different, and many adulterants exist because of the differences in traditional understanding and medication habits. Therefore, the rapid and accurate identification methods are required. The aim was to study the feasibility of using DNA barcoding to distinguish between Traditional Chinese medicine Qianhu ( Peucedanum praeruptorum ), Zihuaqianhu ( Angelica decursiva ), and common adulterants, based on internal transcribed spacer (ITS) sequences, as well as specific PCR identification between P. praeruptorum and A. decursiva . The ITS sequences of P. praeruptorum , A. decursiva , and adulterant were studied, and a phylogenetic tree was constructed. Based on the ITS barcode, the specific PCR primer pairs QH-CP19s/QH-CP19a and ZHQH-CP3s/ZHQH-CP3a were designed for P. praeruptorum and A. decursiva , respectively. The amplification conditions were optimized, and specific PCR products were obtained. The results showed that the phylogenetic trees constructed using the BI and MP methods were consistent, and P. praeruptorum and A. decursiva sequence haplotypes formed their own monophyly. The experimental results showed that in PCR products, the target bands appeared in the genuine drug and not in the adulterant, which suggests the high specificity of the two primer pairs. The ITS sequence was ideal DNA barcode to identify P. praeruptorum , A. decursiva , and adulterant. The specific PCR is a quick and effective method to distinguish between P. praeruptorum and A. decursiva . Peucedanum praeruptorum and Angelica decursiva sequence haplotypes formed their own monophyly.The ITS sequence was ideal DNA barcode to identify P. praeruptorum , A. decursiva , and adulterant.Specific PCR is a quick and effective method to distinguish between P. praeruptorum and A. decursiva . Abbreviations used: TCM: The traditional Chinese medicine, P.: Peucedanum , A.: Angelica , ITS: The internal transcribed spacer, PCR: Polymerase chain reaction, NCBI: National Center for Biotechnology Information, NI: Number of individuals, HN: Haplotype number; GAN: Gen Bank accession numbers, L.: Ligusticum , O.: Ostericum , A.: Angelica , P.: Pimpinella , BI: Bayesian inference, MP: Maximum parsimony, AIC: Akaike Information Criterion, MCMC: Markov Chains Monte Carlo, TBR: Tree bisection-reconnection, LPP: Length of PCR product, PRP: PCR reaction procedure, SNP: Single nucleotide polymorphisms, PP: Posterior probability, BS: Bootstrap.Qun Zhao.

Top